WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Colin Ian King	032e091d3e	cifs: remove redundant initialization of variable rc The variable rc is being initialized with a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-06-20 21:28:16 -05:00
Rikard Falkeborn	57c8ce7ab3	cifs: Constify static struct genl_ops The only usage of cifs_genl_ops[] is to assign its address to the ops field in the genl_family struct, which is a pointer to const. Make it const to allow the compiler to put it in read-only memory. Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-06-20 21:28:16 -05:00
YueHaibing	a23a71abca	cifs: Remove unused inline function is_sysvol_or_netlogon() is_sysvol_or_netlogon() is never used, so can remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-06-20 21:28:16 -05:00
Steve French	f2756527d3	cifs: remove duplicated prototype smb2_find_smb_ses was defined twice in smb2proto.h Signed-off-by: Steve French <stfrench@microsoft.com>	2021-06-20 21:28:16 -05:00
Aurelien Aptel	5e538959f0	cifs: fix ipv6 formating in cifs_ses_add_channel Use %pI6 for IPv6 addresses Signed-off-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-06-20 21:28:16 -05:00
Linus Torvalds	6fab154a33	for-5.13-rc6-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmDNEFMACgkQxWXV+ddt WDuZQg/7BpGG3IDhxydM7fUrNT0xmW2/0VG8blXAgNTiaUO1zOrlrlDKm38+dtW6 yEv3ruf68tggrPNRCkyh51n45+ExqNwc7WwrxaKIRKmvYhYDsxnt8JLiNkv64isi R/CQVETX1cKsMuRhBuqmUq3Sy6VJZoi6coUHIC7ddBcLqnz8c9p7oGqzxBT8J9u3 1CkDSeLM4HKlISlVKhmT4lRG28cQTuy3mSABUt7N5ljJvrrpQAvEN1HCOE9XUQFQ wHH2DjNnBMvfB7mrGCBL7XGf8DF6ucgcyfofuOj6CQLFJ8bOnVKsk8dk/8XUQod+ rQoNIrVwW91LjmEO/I767JmjrRMtHbXvl3DEw3BvaD/O4efw78SN2VN+DRi4j7Xx aMtAWWfakfIyyJNZu9IEDa736iCdp+yl4bnq+hZpqmOYRqTq8n/zWuCMWZ5ohNay JyjxCm+xgo3vH9VEgzje6GDUki3I4Bwe7VlsaMr9F6F5GKzFp/4fb9lCewBrH6le +Y4gWxRT09plThsC2N3qmBQ9uVIJUyzmvcsYiMJ95tb24srdcPUTCG0C9zBvuMCC nm+1n5d3ENSEBaRNKQsC3MYcjKIh8VDEaKnntJrHAzHP41hrD+makrw3LVs6wLzu amGYz40XNq8zK2Xxv/N8O/U/PwQWKGj4bxq/2c1Wi9p9HACWfgk= =JbJO -----END PGP SIGNATURE----- Merge tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One more fix, for a space accounting bug in zoned mode. It happens when a block group is switched back rw->ro and unusable bytes (due to zoned constraints) are subtracted twice. It has user visible effects so I consider it important enough for late -rc inclusion and backport to stable" * tag 'for-5.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix negative space_info->bytes_readonly	2021-06-18 16:39:03 -07:00
Matthew Wilcox (Oracle)	9620ad86d0	afs: Re-enable freezing once a page fault is interrupted If a task is killed during a page fault, it does not currently call sb_end_pagefault(), which means that the filesystem cannot be frozen at any time thereafter. This may be reported by lockdep like this: ==================================== WARNING: fsstress/10757 still has locks held! 5.13.0-rc4-build4+ #91 Not tainted ------------------------------------ 1 lock held by fsstress/10757: #0: ffff888104eac530 ( sb_pagefaults as filesystem freezing is modelled as a lock. Fix this by removing all the direct returns from within the function, and using 'ret' to indicate whether we were interrupted or successful. Fixes: `1cf7a1518a` ("afs: Implement shared-writeable mmap") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20210616154900.1958373-1-willy@infradead.org/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-18 13:49:07 -07:00
Zhihao Cheng	819f9ab430	ubifs: Remove ui_mutex in ubifs_xattr_get and change_xattr Since ubifs_xattr_get and ubifs_xattr_set cannot being executed parallelly after importing @host_ui->xattr_sem, now we can remove ui_mutex imported by commit `ab92a20bce` ("ubifs: make ubifs_[get\|set]xattr atomic"). @xattr_size, @xattr_names and @xattr_cnt can't be out of protection by @host_ui->mutex yet, they are sill accesed in other places, such as pack_inode() called by ubifs_write_inode() triggered by page-writeback. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-06-18 22:04:47 +02:00
Zhihao Cheng	f4e3634a3b	ubifs: Fix races between xattr_{set\|get} and listxattr operations UBIFS may occur some problems with concurrent xattr_{set\|get} and listxattr operations, such as assertion failure, memory corruption, stale xattr value[1]. Fix it by importing a new rw-lock in @ubifs_inode to serilize write operations on xattr, concurrent read operations are still effective, just like ext4. [1] https://lore.kernel.org/linux-mtd/20200630130438.141649-1-houtao1@huawei.com Fixes: `1e51764a3c` ("UBIFS: add new flash file system") Cc: stable@vger.kernel.org # v2.6+ Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-06-18 22:04:47 +02:00
Dan Carpenter	be076fdf83	ubifs: fix snprintf() checking The snprintf() function returns the number of characters (not counting the NUL terminator) that it would have printed if we had space. This buffer has UBIFS_DFS_DIR_LEN characters plus one extra for the terminator. Printing UBIFS_DFS_DIR_LEN is okay but anything higher will result in truncation. Thus the comparison needs to be change from == to >. These strings are compile time constants so this patch doesn't affect runtime. Fixes: `ae380ce047` ("UBIFS: lessen the size of debugging info data structure") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alexander Dahl <ada@thorsis.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-06-18 22:04:47 +02:00
Zhen Lei	a2c2a622d4	ubifs: journal: Fix error return code in ubifs_jnl_write_inode() Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: `9ca2d73264` ("ubifs: Limit number of xattrs per inode") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>	2021-06-18 22:04:47 +02:00
Miklos Szeredi	b89ecd60d3	fuse: ignore PG_workingset after stealing Fix the "fuse: trying to steal weird page" warning. Description from Johannes Weiner: "Think of it as similar to PG_active. It's just another usage/heat indicator of file and anon pages on the reclaim LRU that, unlike PG_active, persists across deactivation and even reclaim (we store it in the page cache / swapper cache tree until the page refaults). So if fuse accepts pages that can legally have PG_active set, PG_workingset is fine too." Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com> Fixes: `1899ad18c6` ("mm: workingset: tell cache transitions from workingset thrashing") Cc: <stable@vger.kernel.org> # v4.20 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-06-18 21:16:42 +02:00
Dave Chinner	a79b28c284	xfs: separate CIL commit record IO To allow for iclog IO device cache flush behaviour to be optimised, we first need to separate out the commit record iclog IO from the rest of the checkpoint so we can wait for the checkpoint IO to complete before we issue the commit record. This separation is only necessary if the commit record is being written into a different iclog to the start of the checkpoint as the upcoming cache flushing changes requires completion ordering against the other iclogs submitted by the checkpoint. If the entire checkpoint and commit is in the one iclog, then they are both covered by the one set of cache flush primitives on the iclog and hence there is no need to separate them for ordering. Otherwise, we need to wait for all the previous iclogs to complete so they are ordered correctly and made stable by the REQ_PREFLUSH that the commit record iclog IO issues. This guarantees that if a reader sees the commit record in the journal, they will also see the entire checkpoint that commit record closes off. This also provides the guarantee that when the commit record IO completes, we can safely unpin all the log items in the checkpoint so they can be written back because the entire checkpoint is stable in the journal. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-06-18 08:24:23 -07:00
Geert Uytterhoeven	18842e0a4f	xfs: Fix 64-bit division on 32-bit in xlog_state_switch_iclogs() On 32-bit (e.g. m68k): ERROR: modpost: "__udivdi3" [fs/xfs/xfs.ko] undefined! Fix this by using a uint32_t intermediate, like before. Reported-by: noreply@ellerman.id.au Fixes: 7660a5b48fbef958 ("xfs: log stripe roundoff is a property of the log") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-06-18 08:24:19 -07:00
Pavel Begunkov	7a778f9dc3	io_uring: improve in tctx_task_work() resubmission If task_state is cleared, io_req_task_work_add() will go the slow path adding a task_work, setting the task_state, waking up the task and so on. Not to mention it's expensive. tctx_task_work() first clears the state and then executes all the work items queued, so if any of them resubmits or adds new task_work items, it would unnecessarily go through the slow path of io_req_task_work_add(). Let's clear the ->task_state at the end. We still have to check ->task_list for emptiness afterward to synchronise with io_req_task_work_add(), do that, and set the state back if we're going to retry, because clearing not-ours task_state on the next iteration would be buggy. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1ef72cdac7022adf0cd7ce4bfe3bb5c82a62eb93.1623949695.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-18 09:22:02 -06:00
Pavel Begunkov	16f7207038	io_uring: don't resched with empty task_list Entering tctx_task_work() with empty task_list is a strange scenario, that can happen only on rare occasion during task exit, so let's not check for task_list emptiness in advance and do it do-while style. The code still correct for the empty case, just would do extra work about which we don't care. Do extra step and do the check before cond_resched(), so we don't resched if have nothing to execute. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c4173e288e69793d03c7d7ce826f9d28afba718a.1623949695.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-18 09:22:02 -06:00
Pavel Begunkov	c6538be9e4	io_uring: refactor tctx task_work list splicing We don't need a full copy of tctx->task_list in tctx_task_work(), but only a first one, so just assign node directly. Taking into account that task_works are run in a context of a task, it's very unlikely to first see non-empty tctx->task_list and then splice it empty, can only happen with task_work cancellations that is not-normal slow path anyway. Hence, get rid of the check in the end, it's there not for validity but "performance" purposes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d076c83fedb8253baf43acb23b8fafd7c5da1714.1623949695.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-18 09:22:02 -06:00
Pavel Begunkov	ebd0df2e63	io_uring: optimise task_work submit flushing tctx_task_work() tries to fetch a next batch of requests, but before it would flush completions from the previous batch that may be sub-optimal. E.g. io_req_task_queue() executes a head of the link where all the linked may be enqueued through the same io_req_task_queue(). And there are more cases for that. Do the flushing at the end, so it can cache completions of several waves of a single tctx_task_work(), and do the flush at the very end. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3cac83934e4fbce520ff8025c3524398b3ae0270.1623949695.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-18 09:22:02 -06:00
Pavel Begunkov	3f18407dc6	io_uring: inline __tctx_task_work() Inline __tctx_task_work() into tctx_task_work() in preparation for further optimisations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f9c05c4bc9763af7bd8e25ebc3c5f7b6f69148f8.1623949695.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-18 09:22:02 -06:00
Pavel Begunkov	a3dbdf54da	io_uring: refactor io_get_sequence() Clean up io_get_sequence() and add a comment describing the magic around sequence correction. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f55dc409936b8afa4698d24b8677a34d31077ccb.1623949695.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-18 09:22:02 -06:00
Pavel Begunkov	c854357bc1	io_uring: clean all flags in io_clean_op() at once Clean all flags in io_clean_op() in the end in one operation, will save us a couple of operation and binary size. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b8efe1f022a037f74e7fe497c69fb554d59bfeaf.1623949695.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-18 09:22:02 -06:00
Pavel Begunkov	1dacb4df4e	io_uring: simplify iovec freeing in io_clean_op() We don't get REQ_F_NEED_CLEANUP for rw unless there is ->free_iovec set, so remove the optimisation of NULL checking it inline, kfree() will take care if that would ever be the case. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a233dc655d3d45bd4f69b73d55a61de46d914415.1623949695.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-18 09:22:02 -06:00
Pavel Begunkov	b8e64b5300	io_uring: track request creds with a flag Currently, if req->creds is not NULL, then there are creds assigned. Track the invariant with a new flag in req->flags. No need to clear the field at init, and also cleanup can be efficiently moved into io_clean_op(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5f8baeb8d3b909487f555542350e2eac97005556.1623949695.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-18 09:22:02 -06:00
Pavel Begunkov	c10d1f986b	io_uring: move creds from io-wq work to io_kiocb io-wq now doesn't have anything to do with creds now, so move ->creds from struct io_wq_work into request (aka struct io_kiocb). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8520c72ab8b8f4b96db12a228a2ab4c094ae64e1.1623949695.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-18 09:22:02 -06:00
Pavel Begunkov	2a2758f26d	io_uring: refactor io_submit_flush_completions() struct io_comp_state is always contained in struct io_ring_ctx, don't pass them into io_submit_flush_completions() separately, it makes the interface cleaner and simplifies it for the compiler. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/44d6ca57003a82484338e95197024dbd65a1b376.1623949695.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-18 09:22:02 -06:00
Pavel Begunkov	e6ab8991c5	io_uring: fix false WARN_ONCE WARNING: CPU: 1 PID: 11749 at fs/io-wq.c:244 io_wqe_wake_worker fs/io-wq.c:244 [inline] WARNING: CPU: 1 PID: 11749 at fs/io-wq.c:244 io_wqe_enqueue+0x7f6/0x910 fs/io-wq.c:751 A WARN_ON_ONCE() in io_wqe_wake_worker() can be triggered by a valid userspace setup. Replace it with pr_warn. Reported-by: syzbot+ea2f1484cffe5109dc10@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f7ede342c3342c4c26668f5168e2993e38bbd99c.1623949695.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-18 09:22:02 -06:00
Dave Chinner	a6a65fef5e	xfs: log stripe roundoff is a property of the log We don't need to look at the xfs_mount and superblock every time we need to do an iclog roundoff calculation. The property is fixed for the life of the log, so store the roundoff in the log at mount time and use that everywhere. On a debug build: $ size fs/xfs/xfs_log.o.* text data bss dec hex filename 27360 560 8 27928 6d18 fs/xfs/xfs_log.o.orig 27219 560 8 27787 6c8b fs/xfs/xfs_log.o.patched Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>	2021-06-18 08:21:48 -07:00
Shaokun Zhang	9bb38aa080	xfs: remove redundant initialization of variable error 'error' will be initialized, so clean up the redundant initialization. Cc: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-06-18 08:14:31 -07:00
Dave Chinner	90e2c1c20a	xfs: perag may be null in xfs_imap() Dan Carpenter's static checker reported: The patch 7b13c5155182: "xfs: use perag for ialloc btree cursors" from Jun 2, 2021, leads to the following Smatch complaint: fs/xfs/libxfs/xfs_ialloc.c:2403 xfs_imap() error: we previously assumed 'pag' could be null (see line 2294) And it's right. Fix it. Fixes: `7b13c51551` ("xfs: use perag for ialloc btree cursors") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>	2021-06-18 08:14:20 -07:00
Darrick J. Wong	d1015e2ebd	Merge tag 'xfs-delay-ready-attrs-v20.1' of https://github.com/allisonhenderson/xfs_work into xfs-5.14-merge4 xfs: Delay Ready Attributes Hi all, This set is a subset of a larger series for Dealyed Attributes. Which is a subset of a yet larger series for parent pointers. Delayed attributes allow attribute operations (set and remove) to be logged and committed in the same way that other delayed operations do. This allows more complex operations (like parent pointers) to be broken up into multiple smaller transactions. To do this, the existing attr operations must be modified to operate as a delayed operation. This means that they cannot roll, commit, or finish transactions. Instead, they return -EAGAIN to allow the calling function to handle the transaction. In this series, we focus on only the delayed attribute portion. We will introduce parent pointers in a later set. The set as a whole is a bit much to digest at once, so I usually send out the smaller sub series to reduce reviewer burn out. But the entire extended series is visible through the included github links. Updates since v19: Added Darricks fix for the remote block accounting as well as some minor nits about the default assert in xfs_attr_set_iter. Spent quite a bit of time testing this cycle to weed out any more unexpected bugs. No new test failures were observed with the addition of this set. xfs: Fix default ASSERT in xfs_attr_set_iter Replaced the assert with ASSERT(0); xfs: Add delay ready attr remove routines Added Darricks fix for remote block accounting This series can be viewed on github here: https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_v20 As well as the extended delayed attribute and parent pointer series: https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_v20_extended And the test cases: https://github.com/allisonhenderson/xfs_work/tree/pptr_xfstestsv3 In order to run the test cases, you will need have the corresponding xfsprogs changes as well. Which can be found here: https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_xfsprogs_v20 https://github.com/allisonhenderson/xfs_work/tree/delay_ready_attrs_xfsprogs_v20_extended To run the xfs attributes tests run: check -g attr To run as delayed attributes run: export MOUNT_OPTIONS="-o delattr" check -g attr To run parent pointer tests: check -g parent I've also made the corresponding updates to the user space side as well, and ported anything they need to seat correctly. Questions, comment and feedback appreciated! Thanks all! Allison * tag 'xfs-delay-ready-attrs-v20.1' of https://github.com/allisonhenderson/xfs_work: xfs: Make attr name schemes consistent xfs: Fix default ASSERT in xfs_attr_set_iter xfs: Clean up xfs_attr_node_addname_clear_incomplete xfs: Remove xfs_attr_rmtval_set xfs: Add delay ready attr set routines xfs: Add delay ready attr remove routines xfs: Hoist node transaction handling xfs: Hoist xfs_attr_leaf_addname xfs: Hoist xfs_attr_node_addname xfs: Add helper xfs_attr_node_addname_find_attr xfs: Separate xfs_attr_node_addname and xfs_attr_node_addname_clear_incomplete xfs: Refactor xfs_attr_set_shortform xfs: Add xfs_attr_node_remove_name xfs: Reverse apply `72b97ea40d`	2021-06-18 08:13:22 -07:00
Peter Zijlstra	2f064a59a1	sched: Change task_struct::state Change the type and name of task_struct::state. Drop the volatile and shrink it to an 'unsigned int'. Rename it in order to find all uses such that we can use READ_ONCE/WRITE_ONCE as appropriate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org	2021-06-18 11:43:09 +02:00
Ingo Molnar	b2c0931a07	Merge branch 'sched/urgent' into sched/core, to resolve conflicts This commit in sched/urgent moved the cfs_rq_is_decayed() function: a7b359fc6a37: ("sched/fair: Correctly insert cfs_rq's to list on unthrottle") and this fresh commit in sched/core modified it in the old location: 9e077b52d86a: ("sched/pelt: Check that _avg are null when _sum are") Merge the two variants. Conflicts: kernel/sched/fair.c Signed-off-by: Ingo Molnar <mingo@kernel.org>	2021-06-18 11:31:25 +02:00
Linus Torvalds	39519f6a56	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmDLL74ACgkQnJ2qBz9k QNleSAf/XikH+tsM6K9yDEeU93GGSqKUB71n9clSQBIiGZ7/UliG0wotrUjec9Rg vBTZlh3JEdfboeBei+mG3hmOdAoYK4HMsJJikqRGPyWOTujh1eOZlT1LOXaY5zNM 631A9pWe8edlpr4Mq7Wb4nO4FToEZ91iXDLliFF371aV8kP/yuv5ZjHwIn5Pt5gI DPnWwaJ+meW9KZ4gVKAfvZLVkKFat2xJ9r2LDpqbIkH9SBcfjBmeHOy0gFyCKx6l yma5iANgtWLhesP6ZwSeaRb1+T9altSLCCFZrYdKH9PXTMFUqzrbiZ8tfVmllePZ GaUOWcHYiLmvqvXnaAREiHnMFT6prg== =kevs -----END PGP SIGNATURE----- Merge tag 'fixes_for_v5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota and fanotify fixes from Jan Kara: "A fixup finishing disabling of quotactl_path() syscall (I've missed archs using different way to declare syscalls) and a fix of an fd leak in error handling path of fanotify" * tag 'fixes_for_v5.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: finish disable quotactl_path syscall fanotify: fix copy_event_to_user() fid error clean up	2021-06-17 09:49:48 -07:00
Jens Axboe	fe76421d1d	io_uring: allow user configurable IO thread CPU affinity io-wq defaults to per-node masks for IO workers. This works fine by default, but isn't particularly handy for workloads that prefer more specific affinities, for either performance or isolation reasons. This adds IORING_REGISTER_IOWQ_AFF that allows the user to pass in a CPU mask that is then applied to IO thread workers, and an IORING_UNREGISTER_IOWQ_AFF that simply resets the masks back to the default of per-node. Note that no care is given to existing IO threads, they will need to go through a reschedule before the affinity is correct if they are already running or sleeping. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-17 10:25:50 -06:00
Jens Axboe	0e03496d19	io-wq: use private CPU mask In preparation for allowing user specific CPU masks for IO thread creation, switch to using a mask embedded in the per-node wqe structure. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-17 10:08:11 -06:00
Colin Ian King	e8d46b3841	isofs: remove redundant continue statement The continue statement in the while-loop has no effect, remove it. Addresses-Coverity: ("Continue has no effect") Link: https://lore.kernel.org/r/20210617120837.11994-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz>	2021-06-17 17:11:42 +02:00
Yang Yingliang	8f6840c4fd	ext4: return error code when ext4_fill_flex_info() fails After commit `c89128a008` ("ext4: handle errors on ext4_commit_super"), 'ret' may be set to 0 before calling ext4_fill_flex_info(), if ext4_fill_flex_info() fails ext4_mount() doesn't return error code, it makes 'root' is null which causes crash in legacy_get_tree(). Fixes: `c89128a008` ("ext4: handle errors on ext4_commit_super") Reported-by: Hulk Robot <hulkci@huawei.com> Cc: <stable@vger.kernel.org> # v4.18+ Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20210510111051.55650-1-yangyingliang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2021-06-17 10:53:20 -04:00
Zhang Yi	b9a037b7f3	ext4: cleanup in-core orphan list if ext4_truncate() failed to get a transaction handle In ext4_orphan_cleanup(), if ext4_truncate() failed to get a transaction handle, it didn't remove the inode from the in-core orphan list, which may probably trigger below error dump in ext4_destroy_inode() during the final iput() and could lead to memory corruption on the later orphan list changes. EXT4-fs (sda): Inode 6291467 (00000000b8247c67): orphan list check failed! 00000000b8247c67: 0001f30a 00000004 00000000 00000023 ............#... 00000000e24cde71: 00000006 014082a3 00000000 00000000 ......@......... 0000000072c6a5ee: 00000000 00000000 00000000 00000000 ................ ... This patch fix this by cleanup in-core orphan list manually if ext4_truncate() return error. Cc: stable@kernel.org Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210507071904.160808-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2021-06-17 10:53:19 -04:00
Anirudh Rayabharam	ce3aba4359	ext4: fix kernel infoleak via ext4_extent_header Initialize eh_generation of struct ext4_extent_header to prevent leaking info to userspace. Fixes KMSAN kernel-infoleak bug reported by syzbot at: http://syzkaller.appspot.com/bug?id=78e9ad0e6952a3ca16e8234724b2fa92d041b9b8 Cc: stable@kernel.org Reported-by: syzbot+2dcfeaf8cb49b05e8f1a@syzkaller.appspotmail.com Fixes: `a86c618126` ("[PATCH] ext3: add extent map support") Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com> Link: https://lore.kernel.org/r/20210506185655.7118-1-mail@anirudhrb.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2021-06-17 10:53:19 -04:00
Pavel Skripkin	618f003199	ext4: fix memory leak in ext4_fill_super static int kthread(void *_create) will return -ENOMEM or -EINTR in case of internal failure or kthread_stop() call happens before threadfn call. To prevent fancy error checking and make code more straightforward we moved all cleanup code out of kmmpd threadfn. Also, dropped struct mmpd_data at all. Now struct super_block is a threadfn data and struct buffer_head embedded into struct ext4_sb_info. Reported-by: syzbot+d9e482e303930fa4f6ff@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Link: https://lore.kernel.org/r/20210430185046.15742-1-paskripkin@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2021-06-17 10:53:19 -04:00
Jiapeng Chong	1fc57ca5a2	ext4: remove redundant assignment to error Variable error is set to zero but this value is never read as it's not used later on, hence it is a redundant assignment and can be removed. Cleans up the following clang-analyzer warning: fs/ext4/ioctl.c:657:3: warning: Value stored to 'error' is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/1619691409-83160-1-git-send-email-jiapeng.chong@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2021-06-17 10:53:19 -04:00
Joseph Qi	5c680150d7	ext4: remove redundant check buffer_uptodate() Now set_buffer_uptodate() will test first and then set, so we don't have to check buffer_uptodate() first, remove it to simplify code. Reviewed-by: Ritesh Harjani <riteshh@linux.ibm.com> Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Link: https://lore.kernel.org/r/1619418587-5580-1-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2021-06-17 10:53:19 -04:00
Jan Kara	d0b040f5f2	ext4: fix overflow in ext4_iomap_alloc() A code in iomap alloc may overflow block number when converting it to byte offset. Luckily this is mostly harmless as we will just use more expensive method of writing using unwritten extents even though we are writing beyond i_size. Cc: stable@kernel.org Fixes: `378f32bab3` ("ext4: introduce direct I/O write using iomap infrastructure") Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210412102333.2676-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2021-06-17 10:53:19 -04:00
Naohiro Aota	f9f28e5bd0	btrfs: zoned: fix negative space_info->bytes_readonly Consider we have a using block group on zoned btrfs. \|<- ZU ->\|<- used ->\|<---free--->\| `- Alloc offset ZU: Zone unusable Marking the block group read-only will migrate the zone unusable bytes to the read-only bytes. So, we will have this. \|<- RO ->\|<- used ->\|<--- RO --->\| RO: Read only When marking it back to read-write, btrfs_dec_block_group_ro() subtracts the above "RO" bytes from the space_info->bytes_readonly. And, it moves the zone unusable bytes back and again subtracts those bytes from the space_info->bytes_readonly, leading to negative bytes_readonly. This can be observed in the output as eg.: Data, single: total=512.00MiB, used=165.21MiB, zone_unusable=16.00EiB Data, single: total=536870912, used=173256704, zone_unusable=18446744073603186688 This commit fixes the issue by reordering the operations. Link: https://github.com/naota/linux/issues/37 Reported-by: David Sterba <dsterba@suse.com> Fixes: `169e0da91a` ("btrfs: zoned: track unusable bytes for zones") CC: stable@vger.kernel.org # 5.12+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-17 11:12:14 +02:00
Kees Cook	1d1f6cc581	pstore/blk: Include zone in pstore_device_info Information was redundant between struct pstore_zone_info and struct pstore_device_info. Use struct pstore_zone_info, with member name "zone". Additionally untangle the logic for the "best effort" block device instance. Signed-off-by: Kees Cook <keescook@chromium.org> Fixed-by: Pu Lehui <pulehui@huawei.com> Link: https://lore.kernel.org/lkml/20210617005424.182305-1-pulehui@huawei.com	2021-06-16 21:09:31 -07:00
Kees Cook	c811659bb9	pstore/blk: Fix kerndoc and redundancy on blkdev param Remove redundant details of blkdev and fix up resulting kerndoc. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kees Cook <keescook@chromium.org>	2021-06-16 09:27:32 -07:00
Kees Cook	7bb9557b48	pstore/blk: Use the normal block device I/O path Stop poking into block layer internals and just open the block device file an use kernel_read and kernel_write on it. Note that this means the transformation from name_to_dev_t can't be used anymore when pstore_blk is loaded as a module: a full filesystem device path name must be used instead. Additionally removes ":internal:" kerndoc link, since no such documentation remains. Co-developed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kees Cook <keescook@chromium.org>	2021-06-16 09:26:56 -07:00
Mike Kravetz	846be08578	mm/hugetlb: expand restore_reserve_on_error functionality The routine restore_reserve_on_error is called to restore reservation information when an error occurs after page allocation. The routine alloc_huge_page modifies the mapping reserve map and potentially the reserve count during allocation. If code calling alloc_huge_page encounters an error after allocation and needs to free the page, the reservation information needs to be adjusted. Currently, restore_reserve_on_error only takes action on pages for which the reserve count was adjusted(HPageRestoreReserve flag). There is nothing wrong with these adjustments. However, alloc_huge_page ALWAYS modifies the reserve map during allocation even if the reserve count is not adjusted. This can cause issues as observed during development of this patch [1]. One specific series of operations causing an issue is: - Create a shared hugetlb mapping Reservations for all pages created by default - Fault in a page in the mapping Reservation exists so reservation count is decremented - Punch a hole in the file/mapping at index previously faulted Reservation and any associated pages will be removed - Allocate a page to fill the hole No reservation entry, so reserve count unmodified Reservation entry added to map by alloc_huge_page - Error after allocation and before instantiating the page Reservation entry remains in map - Allocate a page to fill the hole Reservation entry exists, so decrement reservation count This will cause a reservation count underflow as the reservation count was decremented twice for the same index. A user would observe a very large number for HugePages_Rsvd in /proc/meminfo. This would also likely cause subsequent allocations of hugetlb pages to fail as it would 'appear' that all pages are reserved. This sequence of operations is unlikely to happen, however they were easily reproduced and observed using hacked up code as described in [1]. Address the issue by having the routine restore_reserve_on_error take action on pages where HPageRestoreReserve is not set. In this case, we need to remove any reserve map entry created by alloc_huge_page. A new helper routine vma_del_reservation assists with this operation. There are three callers of alloc_huge_page which do not currently call restore_reserve_on error before freeing a page on error paths. Add those missing calls. [1] https://lore.kernel.org/linux-mm/20210528005029.88088-1-almasrymina@google.com/ Link: https://lkml.kernel.org/r/20210607204510.22617-1-mike.kravetz@oracle.com Fixes: `96b96a96dd` ("mm/hugetlb: fix huge page reservation leak in private mapping error paths" Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mina Almasry <almasrymina@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Peter Xu <peterx@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-16 09:24:42 -07:00
Kees Cook	2a03ddbde1	pstore/blk: Move verify_size() macro out of function There's no good reason for the verify_size macro to live inside the function. Move it up with the check_size() macro and fix indenting. Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org>	2021-06-16 08:19:40 -07:00
Kees Cook	6eed261f48	pstore/blk: Improve failure reporting There was no feedback on bad registration attempts. Add details on the failure cause. Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Kees Cook <keescook@chromium.org>	2021-06-16 08:19:37 -07:00
Olivier Langlois	ec16d35b6c	io-wq: remove header files not needed anymore mm related header files are not needed for io-wq module. remove them for a small clean-up. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Olivier Langlois <olivier@trillion01.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-16 06:41:49 -06:00
Olivier Langlois	236daeae36	io_uring: Add to traces the req pointer when available The req pointer uniquely identify a specific request. Having it in traces can provide valuable insights that is not possible to have if the calling process is reusing the same user_data value. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Olivier Langlois <olivier@trillion01.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-16 06:41:46 -06:00
Pavel Begunkov	2335f6f5dd	io_uring: optimise io_commit_cqring() In most cases io_commit_cqring() is just an smp_store_release(), and it's hot enough, especially for IRQ rw, to want it to save on a function call. Mark it inline and extract a non-inlined slow path doing drain and timeout flushing. The inlined part is pretty slim to not cause binary bloating. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7350f8b6b92caa50a48a80be39909f0d83eddd93.1623772051.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:44:34 -06:00
Pavel Begunkov	3c19966d37	io_uring: shove more drain bits out of hot path Place all drain_next logic into io_drain_req(), so it's never executed if there was no drained requests before. The only thing we need is to set ->drain_active if we see a request with IOSQE_IO_DRAIN, do that in io_init_req() where flags are definitely in registers. Also, all drain-related code is encapsulated in io_drain_req(), makes it cleaner. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/68bf4f7395ddaafbf1a26bd97b57d57d45a9f900.1623772051.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:44:34 -06:00
Pavel Begunkov	10c669040e	io_uring: switch !DRAIN fast path when possible ->drain_used is one way, which is not optimal if users use DRAIN but very rarely. However, we can just clear it in io_drain_req() when all drained before requests are gone. Also rename the flag to reflect the change and be more clear about it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7f37a240857546a94df6348507edddacab150460.1623772051.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:44:33 -06:00
Pavel Begunkov	27f6b318de	io_uring: fix min types mismatch in table alloc fs/io_uring.c: In function 'io_alloc_page_table': include/linux/minmax.h:20:28: warning: comparison of distinct pointer types lacks a cast Cast everything to size_t using min_t. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Fixes: `9123c8ffce` ("io_uring: add helpers for 2 level table alloc") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/50f420a956bca070a43810d4a805293ed54f39d8.1623759527.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:40:17 -06:00
Fam Zheng	dd9ae8a0b2	io_uring: Fix comment of io_get_sqe The sqe_ptr argument has been gone since `709b302fad` (io_uring: simplify io_get_sqring, 2020-04-08), made the return value of the function. Update the comment accordingly. Signed-off-by: Fam Zheng <fam.zheng@bytedance.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20210604164256.12242-1-fam.zheng@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:39:16 -06:00
Pavel Begunkov	441b8a7803	io_uring: optimise non-drain path Replace drain checks with one-way flag set upon seeing the first IOSQE_IO_DRAIN request. There are several places where it cuts cycles well: 1) It's much faster than the fast check with two conditions in io_drain_req() including pretty complex list_empty_careful(). 2) We can mark io_queue_sqe() inline now, that's a huge win. 3) It replaces timeout and drain checks in io_commit_cqring() with a single flags test. Also great not touching ->defer_list there without a reason so limiting cache bouncing. It adds a small amount of overhead to drain path, but it's negligible. The main nuisance is that once it meets any DRAIN request in io_uring instance lifetime it will _always_ go through a slower path, so drain-less and offset-mode timeout less applications are preferable. The overhead in that case would be not big, but it's worth to bear in mind. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/98d2fff8c4da5144bb0d08499f591d4768128ea3.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:38:40 -06:00
Pavel Begunkov	76cc33d791	io_uring: refactor io_req_defer() Rename io_req_defer() into io_drain_req() and refactor it uncoupling it from io_queue_sqe() error handling and preparing for coming optimisations. Also, prioritise non IOSQE_ASYNC path. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4f17dd56e7fbe52d1866f8acd8efe3284d2bebcb.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:38:40 -06:00
Pavel Begunkov	0499e582aa	io_uring: move uring_lock location ->uring_lock is prevalently used for submission, even though it protects many other things like iopoll, registeration, selected bufs, and more. And it's placed together with ->cq_wait poked on completion and CQ waiting sides. Move them apart, ->uring_lock goes to the submission data, and cq_wait to completion related chunk. The last one requires some reshuffling so everything needed by io_cqring_ev_posted*() is in one cacheline. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dea5e845caee4c98aa0922b46d713154d81f7bd8.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:38:40 -06:00
Pavel Begunkov	311997b3fc	io_uring: wait heads renaming We use several wait_queue_head's for different purposes, but namings are confusing. First rename ctx->cq_wait into ctx->poll_wait, because this one is used for polling an io_uring instance. Then rename ctx->wait into ctx->cq_wait, which is responsible for CQE waiting. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/47b97a097780c86c67b20b6ccc4e077523dce682.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:38:40 -06:00
Pavel Begunkov	5ed7a37d21	io_uring: clean up check_overflow flag There are no users of ->sq_check_overflow, only ->cq_check_overflow is used. Combine it and move out of completion related part of struct io_ring_ctx. A not so obvious benefit of it is fitting all completion side fields into a single cacheline. It was taking 2 lines before with 56B padding, and io_cqring_ev_posted*() were still touching both of them. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/25927394964df31d113e3c729416af573afff5f5.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:38:40 -06:00
Pavel Begunkov	5e159204d7	io_uring: small io_submit_sqe() optimisation submit_state.link is used only to assemble a link and not used for actual submission, so clear it before io_queue_sqe() in io_submit_sqe(), awhile it's hot and in caches and queueing doesn't spoil it. May also potentially help compiler with spilling or to do other optimisations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1579939426f3ad6b55af3005b1389bbbed7d780d.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:38:40 -06:00
Pavel Begunkov	f18ee4cf0a	io_uring: optimise completion timeout flushing io_commit_cqring() might be very hot and we definitely don't want to touch ->timeout_list there, because 1) it's shared with the submission side so might lead to cache bouncing and 2) may need to load an extra cache line, especially for IRQ completions. We're interested in it at the completion side only when there are offset-mode timeouts, which are not so popular. Replace list_empty(->timeout_list) hot path check with a new one-way flag, which is set when we prepare the first offset-mode timeout. note: the flag sits in the same line as briefly used after ->rings Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e4892ec68b71a69f92ffbea4a1499be3ec0d463b.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:38:40 -06:00
Pavel Begunkov	15641e4270	io_uring: don't cache number of dropped SQEs Kill ->cached_sq_dropped and wire DRAIN sequence number correction via ->cq_extra, which is there exactly for that purpose. User visible dropped counter will be populated by incrementing it instead of keeping a copy, similarly as it was done not so long ago with cq_overflow. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/088aceb2707a534d531e2770267c4498e0507cc1.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:38:40 -06:00
Pavel Begunkov	17d3aeb33c	io_uring: refactor io_get_sqe() The line of io_get_sqe() evaluating @head consists of too many operations including READ_ONCE(), it's not convenient for probing. Refactor it also improving readability. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/866ad6e4ef4851c7c61f6b0e08dbd0a8d1abce84.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:38:39 -06:00
Pavel Begunkov	7f1129d227	io_uring: shuffle more fields into SQ ctx section Since moving locked_free_* out of struct io_submit_state ctx->submit_state is accessed on submission side only, so move it into the submission section. Same goes for rsrc table pointers/nodes/etc., they must be taken and checked during submission because sync'ed by uring_lock, so move them there as well. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8a5899a50afc6ccca63249e716f580b246f3dec6.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:38:39 -06:00
Pavel Begunkov	b52ecf8cb5	io_uring: move ctx->flags from SQ cacheline ctx->flags are heavily used by both, completion and submission sides, so move it out from the ctx fields related to submissions. Instead, place it together with ctx->refs, because it's already cacheline-aligned and so pads lots of space, and both almost never change. Also, in most occasions they are accessed together as refs are taken at submission time and put back during completion. Do same with ctx->rings, where the pointer itself is never modified apart from ring init/free. Note: in percpu mode, struct percpu_ref doesn't modify the struct itself but takes indirection with ref->percpu_count_ptr. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4c48c173e63d35591383ba2b87e8b8e8dfdbd23d.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:38:39 -06:00
Pavel Begunkov	c7af47cf0f	io_uring: keep SQ pointers in a single cacheline sq_array and sq_sqes are always used together, however they are in different cachelines, where the borderline is right before cq_overflow_list is rather rarely touched. Move the fields together so it loads only one cacheline. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/3ef2411a94874da06492506a8897eff679244f49.1623709150.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:38:39 -06:00
Colin Ian King	b1b2fc3574	io-wq: remove redundant initialization of variable ret The variable ret is being initialized with a value that is never read, the assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20210615143424.60449-1-colin.king@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:37:51 -06:00
Colin Ian King	fdd1dc316e	io_uring: Fix incorrect sizeof operator for copy_from_user call Static analysis is warning that the sizeof being used is should be of *data->tags[i] and not data->tags[i]. Although these are the same size on 64 bit systems it is not a portable assumption to assume this is true for all cases. Fix this by using a temporary pointer tag_slot to make the code a clearer. Addresses-Coverity: ("Sizeof not portable") Fixes: `d878c81610` ("io_uring: hide rsrc tag copy into generic helpers") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20210615130011.57387-1-colin.king@canonical.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-15 15:37:11 -06:00
Linus Torvalds	94f0b2d4a1	proc: only require mm_struct for writing Commit `591a22c14d` ("proc: Track /proc/$pid/attr/ opener mm_struct") we started using __mem_open() to track the mm_struct at open-time, so that we could then check it for writes. But that also ended up making the permission checks at open time much stricter - and not just for writes, but for reads too. And that in turn caused a regression for at least Fedora 29, where NIC interfaces fail to start when using NetworkManager. Since only the write side wanted the mm_struct test, ignore any failures by __mem_open() at open time, leaving reads unaffected. The write() time verification of the mm_struct pointer will then catch the failure case because a NULL pointer will not match a valid 'current->mm'. Link: https://lore.kernel.org/netdev/YMjTlp2FSJYvoyFa@unreal/ Fixes: `591a22c14d` ("proc: Track /proc/$pid/attr/ opener mm_struct") Reported-and-tested-by: Leon Romanovsky <leon@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-15 10:47:51 -07:00
Ian Kent	d826e03651	kernfs: move revalidate to be near lookup While the dentry operation kernfs_dop_revalidate() is grouped with dentry type functions it also has a strong affinity to the inode operation ->lookup(). It makes sense to locate this function near to kernfs_iop_lookup() because we will be adding VFS negative dentry caching to reduce path lookup overhead for non-existent paths. There's no functional change from this patch. Signed-off-by: Ian Kent <raven@themaw.net> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/162375275365.232295.8995526416263659926.stgit@web.messagingengine.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-15 17:04:32 +02:00
Dan Carpenter	a33d62662d	afs: Fix an IS_ERR() vs NULL check The proc_symlink() function returns NULL on error, it doesn't return error pointers. Fixes: `5b86d4ff5d` ("afs: Implement network namespacing") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/YLjMRKX40pTrJvgf@mwanda/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-15 07:42:26 -07:00
Christoph Hellwig	d07f3b081e	mark pstore-blk as broken pstore-blk just pokes directly into the pagecache for the block device without going through the file operations for that by faking up it's own file operations that do not match the block device ones. As this breaks the control of the block layer of it's page cache, and even now just works by accident only the best thing is to just disable this driver. Fixes: `17639f67c1` ("pstore/blk: Introduce backend for block devices") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210608161327.1537919-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:26:03 -06:00
Pavel Begunkov	aeab9506ef	io_uring: inline io_iter_do_read() There are only two calls in source code of io_iter_do_read(), the function is small and pretty hot though is failed to get inlined. Makr it as inline. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/25a26dae7660da73fbc2244b361b397ef43d3caf.1623634182.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:13 -06:00
Pavel Begunkov	78cc687be9	io_uring: unify SQPOLL and user task cancellations Merge io_uring_cancel_sqpoll() and __io_uring_cancel() as it's easier to have a conditional ctx traverse inside than keeping them in sync. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/adfe24d6dad4a3883a40eee54352b8b65ac851bb.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:13 -06:00
Pavel Begunkov	09899b1915	io_uring: cache task struct refs tctx in submission part is always synchronised because is executed from the task's context, so we can batch allocate tctx/task references and store them across syscall boundaries. It avoids enough of operations, including an atomic for getting task ref and a percpu_counter_add() function call, which still fallback to spinlock for large batching cases (around >=32). Should be good for SQPOLL submitting in small portions and coming at some moment bpf submissions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/14b327b973410a3eec1f702ecf650e100513aca9.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:13 -06:00
Pavel Begunkov	2d091d62b1	io_uring: don't vmalloc rsrc tags We don't really need vmalloc for keeping tags, it's not a hot path and is there out of convenience, so replace it with two level tables to not litter kernel virtual memory mappings. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/241a3422747113a8909e7e1030eb585d4a349e0d.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:13 -06:00
Pavel Begunkov	9123c8ffce	io_uring: add helpers for 2 level table alloc Some parts like fixed file table use 2 level tables, factor out helpers for allocating/deallocating them as more users are to come. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1709212359cd82eb416d395f86fc78431ccfc0aa.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:13 -06:00
Pavel Begunkov	157d257f99	io_uring: remove rsrc put work irq save/restore io_rsrc_put_work() is executed by workqueue in non-irq context, so no need for irqsave/restore variants of spinlocking. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/2a7f77220735f4ad404ac885b4d73bdf42d2f836.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:13 -06:00
Pavel Begunkov	d878c81610	io_uring: hide rsrc tag copy into generic helpers Make io_rsrc_data_alloc() taking care of rsrc tags loading on registration, so we don't need to repeat it for each new rsrc type. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5609680697bd09735de10561b75edb95283459da.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:13 -06:00
Pavel Begunkov	e587227b68	io-wq: simplify worker exiting io_worker_handle_work() already takes care of the empty list case and releases spinlock, so get rid of ugly conditional unlocking and unconditionally call handle_work() Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7521e485677f381036676943e876a0afecc23017.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:13 -06:00
Pavel Begunkov	769e683715	io-wq: don't repeat IO_WQ_BIT_EXIT check by worker io_wqe_worker()'s main loop does check IO_WQ_BIT_EXIT flag, so no need for a second test_bit at the end as it will immediately jump to the first check afterwards. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d6af4a51c86523a527fb5417c9fbc775c4b26497.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:13 -06:00
Pavel Begunkov	eef51daa72	io_uring: rename function task_file What at some moment was references to struct file used to control lifetimes of task/ctx is now just internal tctx structures/nodes, so rename outdated task_file() routines into something more sensible. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e2fbce42932154c2631ce58ffbffaa232afe18d5.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:12 -06:00
Pavel Begunkov	cb3d8972c7	io_uring: refactor io_iopoll_req_issued A simple refactoring of io_iopoll_req_issued(), move in_async inside so we don't pass it around and save on double checking it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1513bfde4f0c835be25ac69a82737ab0668d7665.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:12 -06:00
Pavel Begunkov	382cb03046	io-wq: remove unused io-wq refcounting iowq->refs is initialised to one and killed on exit, so it's not used and we can kill it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/401007393528ea7c102360e69a29b64498e15db2.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:12 -06:00
Pavel Begunkov	c7f405d6fa	io-wq: embed wqe ptr array into struct io_wq io-wq keeps an array of pointers to struct io_wqe, allocate this array as a part of struct io-wq, it's easier to code and saves an extra indirection for nearly each io-wq call. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1482c6a001923bbed662dc38a8a580fb08b1ed8c.1623634181.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:12 -06:00
Pavel Begunkov	976517f162	io_uring: fix blocking inline submission There is a complaint against sys_io_uring_enter() blocking if it submits stdin reads. The problem is in __io_file_supports_async(), which sees that it's a cdev and allows it to be processed inline. Punt char devices using generic rules of io_file_supports_async(), including checking for presence of *_iter() versions of rw callbacks. Apparently, it will affect most of cdevs with some exceptions like null and zero devices. Cc: stable@vger.kernel.org Reported-by: Birk Hirdman <lonjil@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d60270856b8a4560a639ef5f76e55eb563633599.1623236455.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:05 -06:00
Pavel Begunkov	40dad765c0	io_uring: enable shmem/memfd memory registration Relax buffer registration restictions, which filters out file backed memory, and allow shmem/memfd as they have normal anonymous pages underneath. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:05 -06:00
Pavel Begunkov	d0acdee296	io_uring: don't bounce submit_state cachelines struct io_submit_state contains struct io_comp_state and so locked_free_, that renders cachelines around ->locked_free being invalidated on most non-inline completions, that may terrorise caches if submissions and completions are done by different tasks. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/290cb5412b76892e8631978ee8ab9db0c6290dd5.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:05 -06:00
Pavel Begunkov	d068b5068d	io_uring: rename io_get_cqring Rename io_get_cqring() into io_get_cqe() for consistency with SQ, and just because the old name is not as clear. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a46a53e3f781de372f5632c184e61546b86515ce.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:05 -06:00
Pavel Begunkov	8f6ed49a44	io_uring: kill cached_cq_overflow There are two copies of cq_overflow, shared with userspace and internal cached one. It was needed for DRAIN accounting, but now we have yet another knob to tune the accounting, i.e. cq_extra, and we can throw away the internal counter and just increment the one in the shared ring. If user modifies it as so never gets the right overflow value ever again, it's its problem, even though before we would have restored it back by next overflow. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8427965f5175dd051febc63804909861109ce859.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:05 -06:00
Pavel Begunkov	ea5ab3b579	io_uring: deduce cq_mask from cq_entries No need to cache cq_mask, it's exactly cq_entries - 1, so just deduce it to not carry it around. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d439efad0503c8398451dae075e68a04362fbc8d.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:05 -06:00
Pavel Begunkov	a566c5562d	io_uring: remove dependency on ring->sq/cq_entries We have numbers of {sq,cq} entries cached in ctx, don't look up them in user-shared rings as 1) it may fetch additional cacheline 2) user may change it and so it's always error prone. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/745d31bc2da41283ddd0489ef784af5c8d6310e9.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:05 -06:00
Pavel Begunkov	b13a8918d3	io_uring: better locality for rsrc fields ring has two types of resource-related fields: used for request submission, and field needed for update/registration. Reshuffle them into these two groups for better locality and readability. The second group is not in the hot path, so it's natural to place them somewhere in the end. Also update an outdated comment. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/05b34795bb4440f4ec4510f08abd5a31830f8ca0.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:04 -06:00
Pavel Begunkov	b986af7e2d	io_uring: shuffle rarely used ctx fields There is a bunch of scattered around ctx fields that are almost never used, e.g. only on ring exit, plunge them to the end, better locality, better aesthetically. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/782ff94b00355923eae757d58b1a47821b5b46d4.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:04 -06:00
Pavel Begunkov	93d2bcd2cb	io_uring: make fail flag not link specific The main difference is in req_set_fail_links() renamed into req_set_fail(), which now sets REQ_F_FAIL_LINK/REQ_F_FAIL flag unconditional on whether it has been a link or not. It only matters in io_disarm_next(), which already handles it well, and all calls to it have a fast path checking REQ_F_LINK/HARDLINK. It looks cleaner, and sheds binary size text data bss dec hex filename 84235 12390 8 96633 17979 ./fs/io_uring.o 84151 12414 8 96573 1793d ./fs/io_uring.o Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e2224154dd6e53b665ac835d29436b177872fa10.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:04 -06:00
Pavel Begunkov	3dd0c97a9e	io_uring: get rid of files in exit cancel We don't match against files on cancellation anymore, so no need to drag around files_struct anymore, just pass a flag telling whether only inflight or all requests should be killed. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7bfc5409a78f8e2d6b27dec3293ec2d248677348.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:04 -06:00
Pavel Begunkov	acfb381d9d	io_uring: simplify waking sqo_sq_wait Going through submission in __io_sq_thread() and still having a full SQ is rather unexpected, so remove a check for SQ fullness and just wake up whoever wait on sqo_sq_wait. Also skip if it doesn't do submission in the first place, likely may to happen for SQPOLL sharing and/or IOPOLL. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e2e91751e87b1a39f8d63ef884aaff578123f61e.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:04 -06:00
Pavel Begunkov	21f2fc080f	io_uring: remove unused park_task_work As sqpoll cancel via task_work is killed, remove everything related to park_task_work as it's not used anymore. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/310d8b76a2fbbf3e139373500e04ad9af7ee3dbb.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:04 -06:00
Pavel Begunkov	aaa9f0f481	io_uring: improve sq_thread waiting check If SQPOLL task finds a ring requesting it to continue running, no need to set wake flag to rest of the rings as it will be cleared in a moment anyway, so hide it in a single sqd->ctx_list loop. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1ee5a696d9fd08645994c58ee147d149a8957d94.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:04 -06:00
Pavel Begunkov	e4b6d902a9	io_uring: improve sqpoll event/state handling As sqd->state changes rarely, don't check every event one by one but look them all at once. Add a helper function. Also don't go into event waiting sleeping with STOP flag set. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/645025f95c7eeec97f88ff497785f4f1d6f3966f.1621201931.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-14 08:23:04 -06:00
Matthew Bobrowski	f644bc449b	fanotify: fix copy_event_to_user() fid error clean up Ensure that clean up is performed on the allocated file descriptor and struct file object in the event that an error is encountered while copying fid info objects. Currently, we return directly to the caller when an error is experienced in the fid info copying helper, which isn't ideal given that the listener process could be left with a dangling file descriptor in their fdtable. Fixes: `5e469c830f` ("fanotify: copy event fid info to user") Fixes: `44d705b037` ("fanotify: report name info for FAN_DIR_MODIFY event") Link: https://lore.kernel.org/linux-fsdevel/YMKv1U7tNPK955ho@google.com/T/#m15361cd6399dad4396aad650de25dbf6b312288e Link: https://lore.kernel.org/r/1ef8ae9100101eb1a91763c516c2e9a3a3b112bd.1623376346.git.repnop@google.com Signed-off-by: Matthew Bobrowski <repnop@google.com> Signed-off-by: Jan Kara <jack@suse.cz>	2021-06-14 12:16:37 +02:00
Greg Kroah-Hartman	68afbd8459	Linux 5.13-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmDGe+4eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/IUH/iyHVulAtAhL9bnR qL4M1kWfcG1sKS2TzGRZzo6YiUABf89vFP90r4sKxG3AKrb8YkTwmJr8B/sWwcsv PpKkXXTobbDfpSrsXGEapBkQOE7h2w739XeXyBLRPkoCR4UrEFn68TV2rLjMLBPS /EIZkonXLWzzWalgKDP4wSJ7GaQxi3LMx3dGAvbFArEGZ1mPHNlgWy2VokFY/yBf qh1EZ5rugysc78JCpTqfTf3fUPK2idQW5gtHSMbyESrWwJ/3XXL9o1ET3JWURYf1 b0FgVztzddwgULoIGWLxDH5WWts3l54sjBLj0yrLUlnGKA5FjrZb12g9PdhdywuY /8KfjeE= =JfJm -----END PGP SIGNATURE----- Merge tag 'v5.13-rc6' into driver-core-next We need the driver core fix in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-14 09:07:45 +02:00
Greg Kroah-Hartman	db4e54aefd	Linux 5.13-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmDGe+4eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/IUH/iyHVulAtAhL9bnR qL4M1kWfcG1sKS2TzGRZzo6YiUABf89vFP90r4sKxG3AKrb8YkTwmJr8B/sWwcsv PpKkXXTobbDfpSrsXGEapBkQOE7h2w739XeXyBLRPkoCR4UrEFn68TV2rLjMLBPS /EIZkonXLWzzWalgKDP4wSJ7GaQxi3LMx3dGAvbFArEGZ1mPHNlgWy2VokFY/yBf qh1EZ5rugysc78JCpTqfTf3fUPK2idQW5gtHSMbyESrWwJ/3XXL9o1ET3JWURYf1 b0FgVztzddwgULoIGWLxDH5WWts3l54sjBLj0yrLUlnGKA5FjrZb12g9PdhdywuY /8KfjeE= =JfJm -----END PGP SIGNATURE----- Merge tag 'v5.13-rc6' into char-misc-next We need the fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-14 08:59:06 +02:00
Trond Myklebust	3731d44bba	NFSv4: Fix an Oops in pnfs_mark_request_commit() when doing O_DIRECT Fix an Oopsable condition in pnfs_mark_request_commit() when we're putting a set of writes on the commit list to reschedule them after a failed pNFS attempt. Fixes: `9c455a8c1e` ("NFS/pNFS: Clean up pNFS commit operations") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-06-13 19:36:49 -04:00
Trond Myklebust	dd99e9f98f	NFSv4: Initialise connection to the server in nfs4_alloc_client() Set up the connection to the NFSv4 server in nfs4_alloc_client(), before we've added the struct nfs_client to the net-namespace's nfs_client_list so that a downed server won't cause other mounts to hang in the trunking detection code. Reported-by: Michael Wakabayashi <mwakabayashi@vmware.com> Fixes: `5c6e5b60aa` ("NFS: Fix an Oops in the pNFS files and flexfiles connection setup to the DS") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-06-13 19:36:49 -04:00
Trond Myklebust	e93a5e9306	NFSv4: Add support for application leases underpinned by a delegation If the NFSv4 client already holds a delegation for a file, then we can support application leases (i.e. fcntl(fd, F_SETLEASE,...)) because the underlying delegation guarantees that the file is not being modified on the server by another client in a way that might conflict with the lease guarantees. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-06-13 19:36:28 -04:00
Trond Myklebust	6b4befc0a0	NFSv4: Add lease breakpoints in case of a delegation recall or return When we add support for application level leases and knfsd delegations to the NFS client, we we want to have them safely underpinned by a "real" delegation to provide the caching guarantees. If that real delegation is recalled, then we need to ensure that the application leases/delegations are recalled too. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-06-13 19:36:28 -04:00
Trond Myklebust	be20037725	NFSv4: Fix delegation return in cases where we have to retry If we're unable to immediately recover all locks because the server is unable to immediately service our reclaim calls, then we want to retry after we've finished servicing all the other asynchronous delegation returns on our queue. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-06-13 19:36:27 -04:00
Linus Torvalds	960f0716d8	NFS client bugfixes for Linux 5.13 Highlights include Stable fixes: - Fix use-after-free in nfs4_init_client() Bugfixes: - Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode() - Fix second deadlock in nfs4_evict_inode() - nfs4_proc_set_acl should not change the value of NFS_CAP_UIDGID_NOMAP - Fix setting of the NFS_CAP_SECURITY_LABEL capability -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmDGJPEACgkQZwvnipYK APLH8xAAsdoKVCW35P+FtlzQvq0iWoTvk15i4Jv8+SyFtqAZe6y6pEj9+RT47CAV kt/uNa6CQ9KjxxgwBf2XoGTuf4MrOUU34kQBF/tRLy9zDdXUsZH263vapopmel6L BVHEEsID6hz8+BUt1LFsr+8sWxG+12UiimEu0CVo4BE8SgYushWpJOQ9iL/zxi1O gXmlAfA9g38I9aUApke4hOPSHVTGaQaAKl5LbSoycQlJblzgA1yIXdU9sVTHDJY6 sco9O9M+NPY8gefS4d7iXSihZin5V9rNuSJ9SKiCPikTEjZYgZbw1umGj6VnF/5e QD47QGgOwXKeCOBv6Oe4VYxE2JISoUFZw8+pxjy4eDO+EcJv3IrHOM8UrsiddGAA DLHzbbrMUx6mGdgibw/ktkwx0Q/DvGrfrvKidk33cs16DPWgTZAG//n7spuqYTmT 8fQbJF6DDjsYM7v+WdImf7VBA8dreXb/QcHwxCtH7uG+hGyRiYoDSOmH3mGBKpLX idkjz6Hvj7V7Y1z4qd+nvh4Ch1V0b9BX+J/+6dKHRykpmSJTIMIlQw7/wA6a8Lp6 WJX4KbUzZHojvqM1BMzRL34+qidihUso0RIj0VjCB1JQyosRnIeTPorfHLQZTOM0 IjP8h48BB7E7cJeJP1dmhvm7Hb8SpFVDxDHoWRtscbQflO3Wdkw= =PABi -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable fixes: - Fix use-after-free in nfs4_init_client() Bugfixes: - Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode() - Fix second deadlock in nfs4_evict_inode() - nfs4_proc_set_acl should not change the value of NFS_CAP_UIDGID_NOMAP - Fix setting of the NFS_CAP_SECURITY_LABEL capability" * tag 'nfs-for-5.13-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4: Fix second deadlock in nfs4_evict_inode() NFSv4: Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode() NFS: FMODE_READ and friends are C macros, not enum types NFS: Fix a potential NULL dereference in nfs_get_client() NFS: Fix use-after-free in nfs4_init_client() NFS: Ensure the NFS_CAP_SECURITY_LABEL capability is set when appropriate NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error.	2021-06-13 12:32:59 -07:00
Linus Torvalds	87a7f7368b	Driver core fix for 5.13-rc6 Here is a single debugfs fix for 5.13-rc6. It fixes a bug in debugfs_read_file_str() that showed up in 5.13-rc1. It has been in linux-next for a full week with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYMTWug8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+yl5MQCeMMEMCGsoQdeXI1t2WMAMTmWRTZYAn1GqGliM b3RkczkNgKnEfDB2+M1r =wWW8 -----END PGP SIGNATURE----- Merge tag 'driver-core-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fix from Greg KH: "A single debugfs fix for 5.13-rc6, fixing a bug in debugfs_read_file_str() that showed up in 5.13-rc1. It has been in linux-next for a full week with no reported problems" * tag 'driver-core-5.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: debugfs: Fix debugfs_read_file_str()	2021-06-12 12:18:49 -07:00
Linus Torvalds	b2568eeb96	io_uring-5.13-2021-06-12 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmDEwEEQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpu2uEACIZXc0e4Jz2tJmtlLzhm0T+YUXu88/n0Ki 3HsCfjyk0k2tvGjAmzLgBruR+0dxuoTlC8ZyLWkCgYFvRxCQMrjxB4+Q53WAAPud ictv/5C992eWfmkk5lKWYh/SVUZU0nN/HlcITggFzH+/Ek4RgqBJK6rYPpN4YM6W OifSZ22xwjZy9i8svzCPzGUbS5d5qbNeRSaacfADWFmzTqqzllWz/KkN633UFefR tkqWy610P0O8fz3xe5HcECIOc3aNRZuk5zrNqCJPvxcOdYlqlL/HfsWMACEiC/g1 N3ahNGrUzJqhB1QNAIKATKAlh8hzAws9t/alLJQzSHZWRu7vso0qctoVJT3i6xRp qD17EAQgrC0R0fQxdHmoMzRHEnKPCXQx36wb/mhZbG60/Q+scmSrFXp86XvbKZiI uzHTsUL/80bRXHuVrKXT+JWTRCzpv1yk9ufIVzSOheVCl/H6bxZ29cabBL2/XvvI d+OljDsy7oMH6rOBFi3XYmwZShEoUqeATeFoFf5isjkWfe7qdiMVu4apD8fBhIjX 8rNLjp0nIKN+5IjHwFkAXRwp8P1SJQ8c7Tl4I6xY82FsMQxUUgMhjSqrn58i2g9d Lem9YHKaXIbw1yfWcaf8erA6d0S4rujG+j3miG0y248kOTb9FeMbfbRgjj8v99m1 XB7F9SIQUw== =MbrN -----END PGP SIGNATURE----- Merge tag 'io_uring-5.13-2021-06-12' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Just an API change for the registration changes that went into this release. Better to get it sorted out now than before it's too late" * tag 'io_uring-5.13-2021-06-12' of git://git.kernel.dk/linux-block: io_uring: add feature flag for rsrc tags io_uring: change registration/upd/rsrc tagging ABI	2021-06-12 11:53:20 -07:00
Alexander Aring	957adb68b3	fs: dlm: invalid buffer access in lookup error This patch will evaluate the message length if a dlm opts header can fit in before accessing it if a node lookup fails. The invalid sequence error means that the version detection failed and an unexpected message arrived. For debugging such situation the type of arrived message is important to know. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-11 12:44:47 -05:00
Alexander Aring	f5fe8d5107	fs: dlm: fix race in mhandle deletion This patch fixes a race between mhandle deletion in case of receiving an acknowledge and flush of all pending mhandle in cases of an timeout or resetting node states. Fixes: `489d8e559c` ("fs: dlm: add reliable connection if reconnect") Reported-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-11 12:44:47 -05:00
Pavel Begunkov	9690557e22	io_uring: add feature flag for rsrc tags Add IORING_FEAT_RSRC_TAGS indicating that io_uring supports a bunch of new IORING_REGISTER operations, in particular IORING_REGISTER_[FILES[,UPDATE]2,BUFFERS[2,UPDATE]] that support rsrc tagging, and also indicating implemented dynamic fixed buffer updates. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9b995d4045b6c6b4ab7510ca124fd25ac2203af7.1623339162.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-10 16:33:51 -06:00
Pavel Begunkov	992da01aa9	io_uring: change registration/upd/rsrc tagging ABI There are ABI moments about recently added rsrc registration/update and tagging that might become a nuisance in the future. First, IORING_REGISTER_RSRC[_UPD] hide different types of resources under it, so breaks fine control over them by restrictions. It works for now, but once those are wanted under restrictions it would require a rework. It was also inconvenient trying to fit a new resource not supporting all the features (e.g. dynamic update) into the interface, so better to return to IORING_REGISTER_* top level dispatching. Second, register/update were considered to accept a type of resource, however that's not a good idea because there might be several ways of registration of a single resource type, e.g. we may want to add non-contig buffers or anything more exquisite as dma mapped memory. So, remove IORING_RSRC_[FILE,BUFFER] out of the ABI, and place them internally for now to limit changes. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9b554897a7c17ad6e3becc48dfed2f7af9f423d5.1623339162.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-10 16:33:51 -06:00
Eric W. Biederman	06af867944	coredump: Limit what can interrupt coredumps Olivier Langlois has been struggling with coredumps being incompletely written in processes using io_uring. Olivier Langlois <olivier@trillion01.com> writes: > io_uring is a big user of task_work and any event that io_uring made a > task waiting for that occurs during the core dump generation will > generate a TIF_NOTIFY_SIGNAL. > > Here are the detailed steps of the problem: > 1. io_uring calls vfs_poll() to install a task to a file wait queue > with io_async_wake() as the wakeup function cb from io_arm_poll_handler() > 2. wakeup function ends up calling task_work_add() with TWA_SIGNAL > 3. task_work_add() sets the TIF_NOTIFY_SIGNAL bit by calling > set_notify_signal() The coredump code deliberately supports being interrupted by SIGKILL, and depends upon prepare_signal to filter out all other signals. Now that signal_pending includes wake ups for TIF_NOTIFY_SIGNAL this hack in dump_emitted by the coredump code no longer works. Make the coredump code more robust by explicitly testing for all of the wakeup conditions the coredump code supports. This prevents new wakeup conditions from breaking the coredump code, as well as fixing the current issue. The filesystem code that the coredump code uses already limits itself to only aborting on fatal_signal_pending. So it should not develop surprising wake-up reasons either. v2: Don't remove the now unnecessary code in prepare_signal. Cc: stable@vger.kernel.org Fixes: `12db8b6900` ("entry: Add support for TIF_NOTIFY_SIGNAL") Reported-by: Olivier Langlois <olivier@trillion01.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-10 14:02:29 -07:00
Masami Hiramatsu	ca24306d83	bootconfig: Change array value to use child node It is not possible to put an array value with subkeys under a key node, because both of subkeys and the array elements are using "next" field of the xbc_node. Thus this changes the array values to use "child" field in the array case. The reason why split this change is to test it easily. Link: https://lkml.kernel.org/r/162262193838.264090.16044473274501498656.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>	2021-06-10 13:38:25 -04:00
Al Viro	f0b65f39ac	iov_iter: replace iov_iter_copy_from_user_atomic() with iterator-advancing variant Replacement is called copy_page_from_iter_atomic(); unlike the old primitive the callers do not need to do iov_iter_advance() after it. In case when they end up consuming less than they'd been given they need to do iov_iter_revert() on everything they had not consumed. That, however, needs to be done only on slow paths. All in-tree callers converted. And that kills the last user of iterate_all_kinds() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-06-10 11:45:14 -04:00
Linus Torvalds	cc6cf827dd	for-5.13-rc5-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmDAtXUACgkQxWXV+ddt WDtbdA//ccQ8JL5yC/x/j0ZXLJ2INqXpxIUPjadwwEjtTgOllvx+f1nU0QazeYfM XvvzDDvpemWajC2Ii54s2HCQbG+dAzO1YBl1XCyve91T0GeNGhzytZwM0pVxZePQ A+aOyVH7IcfFcmBy9T0yctqiGgtD3lre208kU9kolidsIyomLHxBckBhMYDXvJCK BOdrjq3f6H5J0zqOqAnWdc/Wc5z5pw3CHxlIuoA3Tp0Gv9TIx366Z/IvmFfCyvCt kYv2qnUaw10OlFLiqhetlZyv49ibW4waj0RbyY/rZx+69sE/PM4961NYAjLoFJc2 6OoZZO4OHWrNZpBJfbyyX9KVLspix075FID7qVhE/AVW4CYZGOFu5wJyXQiYlysH 1qqkihK3gbKEsB2429UeLZktupmx79LBIgg346+DSQYiMXMTGR8iZY1onbBM2wlf bep65hsiHhxoC6Z/KhxrTGZM2jyYW2nICw3o0xikhWv7MZPWKfKHrH9NJQ9Lpuhy gxut0ef9HbPXWP9PgRmY0Z8PsUi8RT1bv0bHVw7EnhLbi62neJLyxY3Q++W+7vBG LYeaxKWLTTJu73wpBQHLI0pD0UifXLrTkiCI+4gN8zVfzxUl+90mGz2AdSRRFI+U kNdX/haEHi00WBqYxWt33ae/FuSHjPuYXjiPQA7Kiy/C3n9GAB0= =mGAq -----END PGP SIGNATURE----- Merge tag 'for-5.13-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes that people hit during testing. Zoned mode fix: - fix 32bit value wrapping when calculating superblock offsets Error handling fixes: - properly check filesystema and device uuids - properly return errors when marking extents as written - do not write supers if we have an fs error" * tag 'for-5.13-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: promote debugging asserts to full-fledged checks in validate_super btrfs: return value from btrfs_mark_extent_written() in case of error btrfs: zoned: fix zone number to sector/physical calculation btrfs: do not write supers if we have an fs error	2021-06-09 13:34:48 -07:00
Allison Henderson	816c8e39b7	xfs: Make attr name schemes consistent This patch renames the following functions to make the nameing scheme more consistent: xfs_attr_shortform_remove -> xfs_attr_sf_removename xfs_attr_node_remove_name -> xfs_attr_node_removename xfs_attr_set_fmt -> xfs_attr_sf_addname Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-09 09:34:05 -07:00
Allison Henderson	4a4957c16d	xfs: Fix default ASSERT in xfs_attr_set_iter This ASSERT checks for the state value of RM_SHRINK in the set path which should never happen. Change to ASSERT(0); Suggested-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-09 09:33:14 -07:00
Greg Kurz	e4a9ccdd1c	fuse: Fix infinite loop in sget_fc() We don't set the SB_BORN flag on submounts. This is wrong as these superblocks are then considered as partially constructed or dying in the rest of the code and can break some assumptions. One such case is when you have a virtiofs filesystem with submounts and you try to mount it again : virtio_fs_get_tree() tries to obtain a superblock with sget_fc(). The logic in sget_fc() is to loop until it has either found an existing matching superblock with SB_BORN set or to create a brand new one. It is assumed that a superblock without SB_BORN is transient and the loop is restarted. Forgetting to set SB_BORN on submounts hence causes sget_fc() to retry forever. Setting SB_BORN requires special care, i.e. a write barrier for super_cache_count() which can check SB_BORN without taking any lock. We should call vfs_get_tree() to deal with that but this requires to have a proper ->get_tree() implementation for submounts, which is a bigger piece of work. Go for a simple bug fix in the meatime. Fixes: `bf109c6404` ("fuse: implement crossmounts") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-06-09 15:33:40 +02:00
Greg Kurz	e3a43f2a95	fuse: Fix crash if superblock of submount gets killed early As soon as fuse_dentry_automount() does up_write(&sb->s_umount), the superblock can theoretically be killed. If this happens before the submount was added to the &fc->mounts list, fuse_mount_remove() later crashes in list_del_init() because it assumes the submount to be already there. Add the submount before dropping sb->s_umount to fix the inconsistency. It is okay to nest fc->killsb under sb->s_umount, we already do this on the ->kill_sb() path. Signed-off-by: Greg Kurz <groug@kaod.org> Fixes: `bf109c6404` ("fuse: implement crossmounts") Cc: stable@vger.kernel.org # v5.10+ Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-06-09 15:33:40 +02:00
Greg Kurz	d92d88f056	fuse: Fix crash in fuse_dentry_automount() error path If fuse_fill_super_submount() returns an error, the error path triggers a crash: [ 26.206673] BUG: kernel NULL pointer dereference, address: 0000000000000000 [...] [ 26.226362] RIP: 0010:__list_del_entry_valid+0x25/0x90 [...] [ 26.247938] Call Trace: [ 26.248300] fuse_mount_remove+0x2c/0x70 [fuse] [ 26.248892] virtio_kill_sb+0x22/0x160 [virtiofs] [ 26.249487] deactivate_locked_super+0x36/0xa0 [ 26.250077] fuse_dentry_automount+0x178/0x1a0 [fuse] The crash happens because fuse_mount_remove() assumes that the FUSE mount was already added to list under the FUSE connection, but this only done after fuse_fill_super_submount() has returned success. This means that until fuse_fill_super_submount() has returned success, the FUSE mount isn't actually owned by the superblock. We should thus reclaim ownership by clearing sb->s_fs_info, which will skip the call to fuse_mount_remove(), and perform rollback, like virtio_fs_get_tree() already does for the root sb. Fixes: `bf109c6404` ("fuse: implement crossmounts") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Greg Kurz <groug@kaod.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2021-06-09 15:33:40 +02:00
Kees Cook	591a22c14d	proc: Track /proc/$pid/attr/ opener mm_struct Commit `bfb819ea20` ("proc: Check /proc/$pid/attr/ writes against file opener") tried to make sure that there could not be a confusion between the opener of a /proc/$pid/attr/ file and the writer. It used struct cred to make sure the privileges didn't change. However, there were existing cases where a more privileged thread was passing the opened fd to a differently privileged thread (during container setup). Instead, use mm_struct to track whether the opener and writer are still the same process. (This is what several other proc files already do, though for different reasons.) Reported-by: Christian Brauner <christian.brauner@ubuntu.com> Reported-by: Andrea Righi <andrea.righi@canonical.com> Tested-by: Andrea Righi <andrea.righi@canonical.com> Fixes: `bfb819ea20` ("proc: Check /proc/$pid/attr/ writes against file opener") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-08 10:24:09 -07:00
Darrick J. Wong	b26b2bf14f	xfs: rename struct xfs_eofblocks to xfs_icwalk The xfs_eofblocks structure is no longer well-named -- nowadays it provides optional filtering criteria to any walk of the incore inode cache. Only one of the cache walk goals has anything to do with clearing of speculative post-EOF preallocations, so change the name to be more appropriate. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-08 09:30:20 -07:00
Darrick J. Wong	2d53f66baf	xfs: change the prefix of XFS_EOF_FLAGS_* to XFS_ICWALK_FLAG_ In preparation for renaming struct xfs_eofblocks to struct xfs_icwalk, change the prefix of the existing XFS_EOF_FLAGS_* flags to XFS_ICWALK_FLAG_ and convert all the existing users. This adds a degree of interface separation between the ioctl definitions and the incore parameters. Since FLAGS_UNION is only used in xfs_icache.c, move it there as a private flag. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2021-06-08 09:30:20 -07:00
Darrick J. Wong	9492750a8b	xfs: selectively keep sick inodes in memory It's important that the filesystem retain its memory of sick inodes for a little while after problems are found so that reports can be collected about what was wrong. Don't let inode reclamation free sick inodes unless we're unmounting or the fs already went down. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2021-06-08 09:30:20 -07:00
Darrick J. Wong	7975e465af	xfs: drop IDONTCACHE on inodes when we mark them sick When we decide to mark an inode sick, clear the DONTCACHE flag so that the incore inode will be kept around until memory pressure forces it out of memory. This increases the chances that the sick status will be caught by someone compiling a health report later on. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2021-06-08 09:30:20 -07:00
Darrick J. Wong	255794c7ed	xfs: only reset incore inode health state flags when reclaiming an inode While running some fuzz tests on inode metadata, I noticed that the filesystem health report (as provided by xfs_spaceman) failed to report the file corruption even when spaceman was run immediately after running xfs_scrub to detect the corruption. That isn't the intended behavior; one ought to be able to run scrub to detect errors in the ondisk metadata and be able to access to those reports for some time after the scrub. After running the same sequence through an instrumented kernel, I discovered the reason why -- scrub igets the file, scans it, marks it sick, and ireleases the inode. When the VFS lets go of the incore inode, it moves to RECLAIMABLE state. If spaceman igets the incore inode before it moves to RECLAIM state, iget reinitializes the VFS state, clears the sick and checked masks, and hands back the inode. At this point, the caller has the exact same incore inode, but with all the health state erased. In other words, we're erasing the incore inode's health state flags when we've decided NOT to sever the link between the incore inode and the ondisk inode. This is wrong, so we need to remove the lines that zero the fields from xfs_iget_cache_hit. As a precaution, we add the same lines into xfs_reclaim_inode just after we sever the link between incore and ondisk inode. Strictly speaking this isn't necessary because once an inode has gone through reclaim it must go through xfs_inode_alloc (which also zeroes the state) and xfs_iget is careful to check for mismatches between the inode it pulls out of the radix tree and the one it wants. Fixes: `6772c1f112` ("xfs: track metadata health status") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2021-06-08 09:30:20 -07:00
Darrick J. Wong	ffc18582ed	xfs: clean up incore inode walk functions This ambitious series aims to cleans up redundant inode walk code in xfs_icache.c, hide implementation details of the quotaoff dquot release code, and eliminates indirect function calls from incore inode walks. The first thing it does is to move all the code that quotaoff calls to release dquots from all incore inodes into xfs_icache.c. Next, it separates the goal of an inode walk from the actual radix tree tags that may or may not be involved and drops the kludgy XFS_ICI_NO_TAG thing. Finally, we split the speculative preallocation (blockgc) and quotaoff dquot release code paths into separate functions so that we can keep the implementations cohesive. Christoph suggested last cycle that we 'simply' change quotaoff not to allow deactivating quota entirely, but as these cleanups are to enable one major change in behavior (deferred inode inactivation) I do not want to add a second behavior change (quotaoff) as a dependency. To be blunt: Additional cleanups are not in scope for this series. Next, I made two observations about incore inode radix tree walks -- since there's a 1:1 mapping between the walk goal and the per-inode processing function passed in, we can use the goal to make a direct call to the processing function. Furthermore, the only caller to supply a nonzero iter_flags argument is quotaoff, and there's only one INEW flag. From that observation, I concluded that it's quite possible to remove two parameters from the xfs_inode_walk* function signatures -- the iter_flags, and the execute function pointer. The middle of the series moves the INEW functionality into the one piece (quotaoff) that wants it, and removes the indirect calls. The final observation is that the inode reclaim walk loop is now almost the same as xfs_inode_walk, so it's silly to maintain two copies. Merge the reclaim loop code into xfs_inode_walk. Lastly, refactor the per-ag radix tagging functions since there's duplicated code that can be consolidated. This series is a prerequisite for the next two patchsets, since deferred inode inactivation will add another inode radix tree tag and iterator function to xfs_inode_walk. v2: walk the vfs inode list when running quotaoff instead of the radix tree, then rework the (now completely internal) inode walk function to take the tag as the main parameter. v3: merge the reclaim loop into xfs_inode_walk, then consolidate the radix tree tagging functions v4: rebase to 5.13-rc4 v5: combine with the quotaoff patchset, reorder functions to minimize forward declarations, split inode walk goals from radix tree tags to reduce conceptual confusion v6: start moving the inode cache code towards the xfs_icwalk prefix -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmC5Yv0ACgkQ+H93GTRK tOv7Fg//Z7cKph0zSg6qsukMEMZxscuNcEBydCW1bu9gSx1NpszDpiGqAiO5ZB3X wP2XkCqjuatbNGGvkNLHS/M4sbLX3ELogvYmMRvUhDoaSFxT/KKgxvsyNffiCSS7 xRB/rvWRp9MGRpBWPF0ZUxFU6VBzhCrYdMsNhvW95AEup8S/j+NplwoIif0gzaZZ Q6Fl4Ca9VEBvJQPV+/zkLih19iFItmARJhPHUs4BO1nZv+CzZBFQHg7Ijw7nW92j eSY68W4LH/IQ5cqm+HrD/+Z6ns0P7J2viewzVymkNEGnuX4a0xrQrzQ8ydRsAxTi 9EDrpIe3MbSI5YjJfmRe8G3LX5p7vBpqc8TeyZdRDMGWkFjT33HPlQNb6WxKLQbA mjKdfr8AYZR/UQKW/7oZFrJnOoMpYRAQ4Sn/9BAYZQYm7tiLzuZsrEZ7JBwiUA56 XHmlsDDeLzJeKvjmUu8M3H4oh4Nwf5/I2vJwHjueTfhl83uJP04igIXC4rnq56bM AAAjH9uV11Fo3q0ywAnRtN2HYj8PEJlCMK5CNskILrGeMITsBPGht0SbaA6hDI2h GYmltKInHzuPhHC9NfyPVrVr3BrmPR5cBsVFESiz5A4E9rbuKmmna6Yk8MFlMyl8 FRIA3zVatJ2qQXtsAcdI8AZzMd7ciYhkAgCqFKxv8qK/qxITHh4= =Rxdn -----END PGP SIGNATURE----- Merge tag 'inode-walk-cleanups-5.14_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.14-merge2 xfs: clean up incore inode walk functions This ambitious series aims to cleans up redundant inode walk code in xfs_icache.c, hide implementation details of the quotaoff dquot release code, and eliminates indirect function calls from incore inode walks. The first thing it does is to move all the code that quotaoff calls to release dquots from all incore inodes into xfs_icache.c. Next, it separates the goal of an inode walk from the actual radix tree tags that may or may not be involved and drops the kludgy XFS_ICI_NO_TAG thing. Finally, we split the speculative preallocation (blockgc) and quotaoff dquot release code paths into separate functions so that we can keep the implementations cohesive. Christoph suggested last cycle that we 'simply' change quotaoff not to allow deactivating quota entirely, but as these cleanups are to enable one major change in behavior (deferred inode inactivation) I do not want to add a second behavior change (quotaoff) as a dependency. To be blunt: Additional cleanups are not in scope for this series. Next, I made two observations about incore inode radix tree walks -- since there's a 1:1 mapping between the walk goal and the per-inode processing function passed in, we can use the goal to make a direct call to the processing function. Furthermore, the only caller to supply a nonzero iter_flags argument is quotaoff, and there's only one INEW flag. From that observation, I concluded that it's quite possible to remove two parameters from the xfs_inode_walk* function signatures -- the iter_flags, and the execute function pointer. The middle of the series moves the INEW functionality into the one piece (quotaoff) that wants it, and removes the indirect calls. The final observation is that the inode reclaim walk loop is now almost the same as xfs_inode_walk, so it's silly to maintain two copies. Merge the reclaim loop code into xfs_inode_walk. Lastly, refactor the per-ag radix tagging functions since there's duplicated code that can be consolidated. This series is a prerequisite for the next two patchsets, since deferred inode inactivation will add another inode radix tree tag and iterator function to xfs_inode_walk. v2: walk the vfs inode list when running quotaoff instead of the radix tree, then rework the (now completely internal) inode walk function to take the tag as the main parameter. v3: merge the reclaim loop into xfs_inode_walk, then consolidate the radix tree tagging functions v4: rebase to 5.13-rc4 v5: combine with the quotaoff patchset, reorder functions to minimize forward declarations, split inode walk goals from radix tree tags to reduce conceptual confusion v6: start moving the inode cache code towards the xfs_icwalk prefix * tag 'inode-walk-cleanups-5.14_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: refactor per-AG inode tagging functions xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_ag xfs: pass struct xfs_eofblocks to the inode scan callback xfs: fix radix tree tag signs xfs: make the icwalk processing functions clean up the grab state xfs: clean up inode state flag tests in xfs_blockgc_igrab xfs: remove indirect calls from xfs_inode_walk{,_ag} xfs: remove iter_flags parameter from xfs_inode_walk_* xfs: move xfs_inew_wait call into xfs_dqrele_inode xfs: separate the dqrele_all inode grab logic from xfs_inode_walk_ag_grab xfs: pass the goal of the incore inode walk to xfs_inode_walk() xfs: rename xfs_inode_walk functions to xfs_icwalk xfs: move the inode walk functions further down xfs: detach inode dquots at the end of inactivation xfs: move the quotaoff dqrele inode walk into xfs_icache.c [djwong: added variable names to function declarations while fixing merge conflicts] Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-06-08 09:26:44 -07:00
Darrick J. Wong	8b943d21d4	xfs: assorted fixes for 5.14, part 1 This branch contains the first round of various small fixes for 5.14. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmC5YvwACgkQ+H93GTRK tOuDTxAAmW+Mc4oPx/Fsa2xqvkDYCGaM1QtkIY9McrGsMdzF0JbB7bR5rrUET2+N /FsFTIQ31vZfMEJ69HM6NjQ4xXIKhdEj+PlGsjvg62BqhpPBRuFH7dLBWChiRcf2 M5SN1P9kcAjIoMR2tKzh8FMDmJTcJlJxxBi9kxb3YK1+UesUHsPS8/jSW3OuIov3 AJzAZ08SrA8ful5uCMy9Mf5uBgfRuAUjDzA5CM5kA1xqlHREQi+oHl62E81N33mu RR8tdHQzvPO5rHyX84GzV5cu2CsmDuOPF2nA5SUxRhIZFfMo5mEezA/nxqqACYti rnRGxZVwIG9YYBESVxXFQXIjn5lHoQ2jomk/CraszeVEJCteNRnpbVGzd1xczM3u 0iuDuHy+aVB/3QhfA6/0vjfttCzkMEld9U9c3WEjIaw5iUCxe531yfrvVEyF9blx NBjnQyHGbt+y26BzBjD33NJEdDoZqS3UIQ/rmb4f2mitGN5d9faAcJ754uRJt3o4 K9HXGjuR+iOH/tCZKDL1hBc4M/pFgNdeBWyFYdQhh8eSj9HSCTCG58zAyQ2WPOSr 6D/f4BMqivKzzz0HicZPJAoazrtcKGrWTHTxidHIkI4le367NOwv6YJquJ0pFMBs 8L7OdRqYT3yw6+qErCjEn03WkP9O7V8lHf8hFxt7+dNxDukIj40= =6eZB -----END PGP SIGNATURE----- Merge tag 'assorted-fixes-5.14-1_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.14-merge2 xfs: assorted fixes for 5.14, part 1 This branch contains the first round of various small fixes for 5.14. * tag 'assorted-fixes-5.14-1_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: don't take a spinlock unconditionally in the DIO fastpath xfs: mark xfs_bmap_set_attrforkoff static xfs: Remove redundant assignment to busy xfs: sort variable alphabetically to avoid repeated declaration	2021-06-08 09:22:34 -07:00
Darrick J. Wong	f52edf6c54	xfs: various unit conversions Crafting the realtime file extent size hint fixes revealed various opportunities to clean up unit conversions, so now that gets its own series. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmC5YvYACgkQ+H93GTRK tOtRihAAh5wKVV9Fwafkk3l2zTCw+K/SfciWvZqF8BClAjqzWUDaw4DJ2HS0lSmj ezOG+/HhJzZB4udC9WDM4TSFA/elR1pJBu5lSBpAt/K78Mpw/i1aFs1SyAfw6Qsp 1A8um80jaCyzvcSxU1Hx/35CN0MQwpjJ23LziEDUVBvj/qV7Qj8DmI61CfWWYzs9 jEJFzIvbKoRqs8Wqws7n4a7fw3VOyqC2EQV290xzumkYKOSAmpWQGkbiU2/YtqKY 7w/GvjS1zRE7o1cGeKBYRA/35zbw5mqoHD5vJmdlUIsWK22fnN8h5vqYo5dpFPiz nFyK+MHVrV6WxKBKECHtgm7Jya+jjH0TFu/9js4k31ehe4LJc/nhEqmSrFiH3eGS 4AXrhhZqtDZ1skjcPym83dCayW1uE2cWbHqYJ+ztQW5PrLafoWQ9Ld/vtrbL+b/U VIgh3LQmF371jGcE0twERNQPIb5F96w9mS6F+vq0JuvrGftxoa4tKVjE8jrNWJjo 750/KT0Nupti0AYKq9WMD47BiSf5BbdnhYTN15X0mc5TyHo/EyXIl/I/hFfEMFp9 AH/qWaQlt3LjMXvr1xF6R4RNSI+xxai7PHfW69Zi7tBea+7k9wTZjubgc2FFaMOQ Jf8n5L07IXVYDy0mGUGaokenQb2KPvfXVK7yYtaMj1qDh7Sjqjw= =HGL1 -----END PGP SIGNATURE----- Merge tag 'unit-conversion-cleanups-5.14_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-5.14-merge2 xfs: various unit conversions Crafting the realtime file extent size hint fixes revealed various opportunities to clean up unit conversions, so now that gets its own series. * tag 'unit-conversion-cleanups-5.14_2021-06-03' of https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux: xfs: remove unnecessary shifts xfs: clean up open-coded fs block unit conversions	2021-06-08 09:21:24 -07:00
Dave Chinner	9ba0889e22	xfs: drop the AGI being passed to xfs_check_agi_freecount From: Dave Chinner <dchinner@redhat.com> Stephen Rothwell reported this compiler warning from linux-next: fs/xfs/libxfs/xfs_ialloc.c: In function 'xfs_difree_finobt': fs/xfs/libxfs/xfs_ialloc.c:2032:20: warning: unused variable 'agi' [-Wunused-variable] 2032 \| struct xfs_agi *agi = agbp->b_addr; Which is fallout from agno -> perag conversions that were done in this function. xfs_check_agi_freecount() is the only user of "agi" in xfs_difree_finobt() now, and it only uses the agi to get the current free inode count. We hold that in the perag structure, so there's not need to directly reference the raw AGI to get this information. The btree cursor being passed to xfs_check_agi_freecount() has a reference to the perag being operated on, so use that directly in xfs_check_agi_freecount() rather than passing an AGI. Fixes: `7b13c51551` ("xfs: use perag for ialloc btree cursors") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-06-08 09:19:22 -07:00
Darrick J. Wong	c3eabd3650	xfs: initial agnumber -> perag conversions for shrink If we want to use active references to the perag to be able to gate shrink removing AGs and hence perags safely, we've got a fair bit of work to do actually use perags in all the places we need to. There's a lot of code that iterates ag numbers and then looks up perags from that, often multiple times for the same perag in the one operation. If we want to use reference counted perags for access control, then we need to convert all these uses to perag iterators, not agno iterators. [Patches 1-4] The first step of this is consolidating all the perag management - init, free, get, put, etc into a common location. THis is spread all over the place right now, so move it all into libxfs/xfs_ag.[ch]. This does expose kernel only bits of the perag to libxfs and hence userspace, so the structures and code is rearranged to minimise the number of ifdefs that need to be added to the userspace codebase. The perag iterator in xfs_icache.c is promoted to a first class API and expanded to the needs of the code as required. [Patches 5-10] These are the first basic perag iterator conversions and changes to pass the perag down the stack from those iterators where appropriate. A lot of this is obvious, simple changes, though in some places we stop passing the perag down the stack because the code enters into an as yet unconverted subsystem that still uses raw AGs. [Patches 11-16] These replace the agno passed in the btree cursor for per-ag btree operations with a perag that is passed to the cursor init function. The cursor takes it's own reference to the perag, and the reference is dropped when the cursor is deleted. Hence we get reference coverage for the entire time the cursor is active, even if the code that initialised the cursor drops it's reference before the cursor or any of it's children (duplicates) have been deleted. The first patch adds the perag infrastructure for the cursor, the next four patches convert a btree cursor at a time, and the last removes the agno from the cursor once it is unused. [Patches 17-21] These patches are a demonstration of the simplifications and cleanups that come from plumbing the perag through interfaces that select and then operate on a specific AG. In this case the inode allocation algorithm does up to three walks across all AGs before it either allocates an inode or fails. Two of these walks are purely just to select the AG, and even then it doesn't guarantee inode allocation success so there's a third walk if the selected AG allocation fails. These patches collapse the selection and allocation into a single loop, simplifies the error handling because xfs_dir_ialloc() always returns ENOSPC if no AG was selected for inode allocation or we fail to allocate an inode in any AG, gets rid of xfs_dir_ialloc() wrapper, converts inode allocation to run entirely from a single perag instance, and then factors xfs_dialloc() into a much, much simpler loop which is easy to understand. Hence we end up with the same inode allocation logic, but it only needs two complete iterations at worst, makes AG selection and allocation atomic w.r.t. shrink and chops out out over 100 lines of code from this hot code path. [Patch 22] Converts the unlink path to pass perags through it. There's more conversion work to be done, but this patchset gets through a large chunk of it in one hit. Most of the iterators are converted, so once this is solidified we can move on to converting these to active references for being able to free perags while the fs is still active. -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmC3HUgUHGRhdmlkQGZy b21vcmJpdC5jb20ACgkQregpR/R1+h2yaw/+P0JzpI+6n06Ei00mjgE/Du/WhMLi 0JQ93Grlj+miuGGT9DgGCiRpoZnefhEk+BH6JqoEw1DQ3T5ilmAzrHLUUHSQC3+S dv85sJduheQ6yHuoO+4MzkaSq6JWKe7E9gZwAsVyBul5aSjdmaJaQdPwYMTXSXo0 5Uqq8ECFkMcaHVNjcBfasgR/fdyWy2Qe4PFTHTHdQpd+DNZ9UXgFKHW2og+1iry/ zDIvdIppJULA09TvVcZuFjd/1NzHQ/fLj5PAzz8GwagB4nz2x3s78Zevmo5yW/jK 3/+50vXa8ldhiHDYGTS3QXvS0xJRyqUyD47eyWOOiojZw735jEvAlCgjX6+0X1HC k3gCkQLv8l96fRkvUpgnLf/fjrUnlCuNBkm9d1Eq2Tied8dvLDtiEzoC6L05Nqob yd/nIUb1zwJFa9tsoheHhn0bblTGX1+zP0lbRJBje0LotpNO9DjGX5JoIK4GR7F8 y1VojcdgRI14HlxUnbF3p8wmQByN+M2tnp6GSdv9BA65bjqi05Rj/steFdZHBV6x wiRs8Yh6BTvMwKgufHhRQHfRahjNHQ/T/vOE+zNbWqemS9wtEUDop+KvPhC36R/k o/cmr23cF8ESX2eChk7XM4On3VEYpcvp2zSFgrFqZYl6RWOwEis3Htvce3KuSTPp 8Xq70te0gr2DVUU= =YNzW -----END PGP SIGNATURE----- Merge tag 'xfs-perag-conv-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-5.14-merge2 xfs: initial agnumber -> perag conversions for shrink If we want to use active references to the perag to be able to gate shrink removing AGs and hence perags safely, we've got a fair bit of work to do actually use perags in all the places we need to. There's a lot of code that iterates ag numbers and then looks up perags from that, often multiple times for the same perag in the one operation. If we want to use reference counted perags for access control, then we need to convert all these uses to perag iterators, not agno iterators. [Patches 1-4] The first step of this is consolidating all the perag management - init, free, get, put, etc into a common location. THis is spread all over the place right now, so move it all into libxfs/xfs_ag.[ch]. This does expose kernel only bits of the perag to libxfs and hence userspace, so the structures and code is rearranged to minimise the number of ifdefs that need to be added to the userspace codebase. The perag iterator in xfs_icache.c is promoted to a first class API and expanded to the needs of the code as required. [Patches 5-10] These are the first basic perag iterator conversions and changes to pass the perag down the stack from those iterators where appropriate. A lot of this is obvious, simple changes, though in some places we stop passing the perag down the stack because the code enters into an as yet unconverted subsystem that still uses raw AGs. [Patches 11-16] These replace the agno passed in the btree cursor for per-ag btree operations with a perag that is passed to the cursor init function. The cursor takes it's own reference to the perag, and the reference is dropped when the cursor is deleted. Hence we get reference coverage for the entire time the cursor is active, even if the code that initialised the cursor drops it's reference before the cursor or any of it's children (duplicates) have been deleted. The first patch adds the perag infrastructure for the cursor, the next four patches convert a btree cursor at a time, and the last removes the agno from the cursor once it is unused. [Patches 17-21] These patches are a demonstration of the simplifications and cleanups that come from plumbing the perag through interfaces that select and then operate on a specific AG. In this case the inode allocation algorithm does up to three walks across all AGs before it either allocates an inode or fails. Two of these walks are purely just to select the AG, and even then it doesn't guarantee inode allocation success so there's a third walk if the selected AG allocation fails. These patches collapse the selection and allocation into a single loop, simplifies the error handling because xfs_dir_ialloc() always returns ENOSPC if no AG was selected for inode allocation or we fail to allocate an inode in any AG, gets rid of xfs_dir_ialloc() wrapper, converts inode allocation to run entirely from a single perag instance, and then factors xfs_dialloc() into a much, much simpler loop which is easy to understand. Hence we end up with the same inode allocation logic, but it only needs two complete iterations at worst, makes AG selection and allocation atomic w.r.t. shrink and chops out out over 100 lines of code from this hot code path. [Patch 22] Converts the unlink path to pass perags through it. There's more conversion work to be done, but this patchset gets through a large chunk of it in one hit. Most of the iterators are converted, so once this is solidified we can move on to converting these to active references for being able to free perags while the fs is still active. * tag 'xfs-perag-conv-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (23 commits) xfs: remove xfs_perag_t xfs: use perag through unlink processing xfs: clean up and simplify xfs_dialloc() xfs: inode allocation can use a single perag instance xfs: get rid of xfs_dir_ialloc() xfs: collapse AG selection for inode allocation xfs: simplify xfs_dialloc_select_ag() return values xfs: remove agno from btree cursor xfs: use perag for ialloc btree cursors xfs: convert allocbt cursors to use perags xfs: convert refcount btree cursor to use perags xfs: convert rmap btree cursor to using a perag xfs: add a perag to the btree cursor xfs: pass perags around in fsmap data dev functions xfs: push perags through the ag reservation callouts xfs: pass perags through to the busy extent code xfs: convert secondary superblock walk to use perags xfs: convert xfs_iwalk to use perag references xfs: convert raw ag walks to use for_each_perag xfs: make for_each_perag... a first class citizen ...	2021-06-08 09:13:13 -07:00
Darrick J. Wong	ebf2e33723	xfs: buffer cache bulk page allocation This patchset makes use of the new bulk page allocation interface to reduce the overhead of allocating large numbers of pages in a loop. The first two patches are refactoring buffer memory allocation and converting the uncached buffer path to use the same page allocation path, followed by converting the page allocation path to use bulk allocation. The rest of the patches are then consolidation of the page allocation and freeing code to simplify the code and remove a chunk of unnecessary abstraction. This is largely based on a series of changes made by Christoph Hellwig. -----BEGIN PGP SIGNATURE----- iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmC+6OwUHGRhdmlkQGZy b21vcmJpdC5jb20ACgkQregpR/R1+h21QQ/8C0f7wq1OKwNI2oRubf6J8jtttiRS SD2TA03AP2OIOKx4y2G0h0dJeX9tgnbLerIlpfT80nHDoBgKHbZCYSEHQT0DscYo fuwQTVR8RCklKspjUlCGR+Gbm6vI8HakK1lAppw168e4c6t8wX1KiSibwaVTQdaZ NaXUqTUzGiNq+iiLS6fW3mJ3PKWFJYyrDOSR2jIPbUGIJdejRCGe0xVnu+hIsz+y c2gSGCB+j3cYaazhlJTDYPGja3Wq3eR+Ya9i1GcA1tJiJLsu0ZjaVQ69Bl4dud2F c3OyhFK0El1VMSEVb3hY8gTpAO02jNWSnB2Zlidt0h4ZJVAxKus0xe2w3eS4uST2 hcMI3lwjdzRQuoBwOgXQ+CpYVv2wI8HPNLTSR+NYcC2IZaCNieFRWdTYwXrAJBB3 H09m04GT/7TkkrYHFD1zRtIedP4DZ6MZn/33bufNxEt1NRCFw5AFAEUFfjDA317A 4nByCmU6XjmmpI/XLixwu0BYCfKVB4UsrgOyzXBy7ZU0+pIser+ynP1V4d9Bb43Y xVQ8S0QirT7gqXjx75mD4B4qkXZ5nrz5Z7fSn6YU4TwqsYtZYlsBauLlWmmHp9MT CP4PA4j+CQORhfZzWXw2ViXYGoIssc1cw5i4JB6a4u/OaDi19dYkE6SO8P3b9GSm khHqWgcTC4VGpmc= =JsrV -----END PGP SIGNATURE----- Merge tag 'xfs-buf-bulk-alloc-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs into xfs-5.14-merge2 xfs: buffer cache bulk page allocation This patchset makes use of the new bulk page allocation interface to reduce the overhead of allocating large numbers of pages in a loop. The first two patches are refactoring buffer memory allocation and converting the uncached buffer path to use the same page allocation path, followed by converting the page allocation path to use bulk allocation. The rest of the patches are then consolidation of the page allocation and freeing code to simplify the code and remove a chunk of unnecessary abstraction. This is largely based on a series of changes made by Christoph Hellwig. * tag 'xfs-buf-bulk-alloc-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: merge xfs_buf_allocate_memory xfs: cleanup error handling in xfs_buf_get_map xfs: get rid of xb_to_gfp() xfs: simplify the b_page_count calculation xfs: remove ->b_offset handling for page backed buffers xfs: move page freeing into _xfs_buf_free_pages() xfs: merge _xfs_buf_get_pages() xfs: use alloc_pages_bulk_array() for buffers xfs: use xfs_buf_alloc_pages for uncached buffers xfs: split up xfs_buf_allocate_memory	2021-06-08 09:10:01 -07:00
Marc Dionne	dc2557308e	afs: Fix partial writeback of large files on fsync and close In commit `e87b03f583` ("afs: Prepare for use of THPs"), the return value for afs_write_back_from_locked_page was changed from a number of pages to a length in bytes. The loop in afs_writepages_region uses the return value to compute the index that will be used to find dirty pages in the next iteration, but treats it as a number of pages and wrongly multiplies it by PAGE_SIZE. This gives a very large index value, potentially skipping any dirty data that was not covered in the first pass, which is limited to 256M. This causes fsync(), and indirectly close(), to only do a partial writeback of a large file's dirty data. The rest is eventually written back by background threads after dirty_expire_centisecs. Fixes: `e87b03f583` ("afs: Prepare for use of THPs") Signed-off-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/20210604175504.4055-1-marc.c.dionne@gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-07 12:56:05 -07:00
Gao Xiang	c5fcb51111	erofs: clean up file headers & footers - Remove my outdated misleading email address; - Get rid of all unnecessary trailing newline by accident. Link: https://lore.kernel.org/r/20210602160634.10757-1-xiang@kernel.org Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2021-06-08 00:41:24 +08:00
Yue Hu	7dea3de7d3	erofs: remove the occupied parameter from z_erofs_pagevec_enqueue() No any behavior to variable occupied in z_erofs_attach_page() which is only caller to z_erofs_pagevec_enqueue(). Link: https://lore.kernel.org/r/20210419102623.2015-1-zbestahu@gmail.com Signed-off-by: Yue Hu <huyue2@yulong.com> Reviewed-by: Gao Xiang <xiang@kernel.org> Signed-off-by: Gao Xiang <xiang@kernel.org>	2021-06-08 00:40:18 +08:00
Wei Yongjun	0508c1ad0f	erofs: fix error return code in erofs_read_superblock() 'ret' will be overwritten to 0 if erofs_sb_has_sb_chksum() return true, thus 0 will return in some error handling cases. Fix to return negative error code -EINVAL instead of 0. Link: https://lore.kernel.org/r/20210519141657.3062715-1-weiyongjun1@huawei.com Fixes: `b858a4844c` ("erofs: support superblock checksum") Cc: stable <stable@vger.kernel.org> # 5.5+ Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Gao Xiang <xiang@kernel.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <xiang@kernel.org>	2021-06-08 00:40:18 +08:00
Jan Kara	64c2c2c62f	quota: Change quotactl_path() systcall to an fd-based one Some users have pointed out that path-based syscalls are problematic in some environments and at least directory fd argument and possibly also resolve flags are desirable for such syscalls. Rather than reimplementing all details of pathname lookup and following where it may eventually evolve, let's go for full file descriptor based syscall similar to how ioctl(2) works since the beginning. Managing of quotas isn't performance sensitive so the extra overhead of open does not matter and we are able to consume O_PATH descriptors as well which makes open cheap anyway. Also for frequent operations (such as retrieving usage information for all users) we can reuse single fd and in fact get even better performance as well as avoiding races with possible remounts etc. Tested-by: Sascha Hauer <s.hauer@pengutronix.de> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>	2021-06-07 12:11:24 +02:00
Dave Chinner	8bcac7448a	xfs: merge xfs_buf_allocate_memory It only has one caller and is now a simple function, so merge it into the caller. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-07 11:50:48 +10:00
Christoph Hellwig	170041f715	xfs: cleanup error handling in xfs_buf_get_map Use a single goto label for freeing the buffer and returning an error. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com>	2021-06-07 11:50:47 +10:00
Dave Chinner	289ae7b48c	xfs: get rid of xb_to_gfp() Only used in one place, so just open code the logic in the macro. Based on a patch from Christoph Hellwig. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-07 11:50:17 +10:00
Christoph Hellwig	934d1076bb	xfs: simplify the b_page_count calculation Ever since we stopped using the Linux page cache to back XFS buffers there is no need to take the start sector into account for calculating the number of pages in a buffer, as the data always start from the beginning of the buffer. Signed-off-by: Christoph Hellwig <hch@lst.de> [dgc: modified to suit this series] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-07 11:50:00 +10:00
Christoph Hellwig	54cd3aa6f8	xfs: remove ->b_offset handling for page backed buffers ->b_offset can only be non-zero for _XBF_KMEM backed buffers, so remove all code dealing with it for page backed buffers. Signed-off-by: Christoph Hellwig <hch@lst.de> [dgc: modified to fit this patchset] Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-07 11:49:50 +10:00
Linus Torvalds	20e41d9bc8	Miscellaneous ext4 bug fixes for v5.13 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmC82AQACgkQ8vlZVpUN gaOkAgf+KH57P/P0sB6aVBHpAzqa9jTKJWMA5kpCqYUDkYlfF7n2hwsjMzWpJ5MY ZvFpKAflmRnve/ULUZQX6+zrcbieNs3e+6VFZrZ0PmxN0dupyISLY7jnvCRDleA7 BFO34AcH+QEst9zXJmgta9eoy3LA8sawhQ/d7ujVY+IRFk40m26fuAMiaGznlQJ5 dmrx7pHZWKFIDFIg2TdFlP+Voqbxs2VTT16gmWpGBdTyWYHKjbSOLKJFc9DwYeE9 aANf6iIzwXz7y9pZiOnTrGuKDEJcIZNESkbIqw62YgqsoObLbsbCZNmNcqxyHpYQ Mh3L59KtmjANW3iOxQfyxkNTugxchw== =BSnf -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Miscellaneous ext4 bug fixes" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Only advertise encrypted_casefold when encryption and unicode are enabled ext4: fix no-key deletion for encrypt+casefold ext4: fix memory leak in ext4_fill_super ext4: fix fast commit alignment issues ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed ext4: fix accessing uninit percpu counter variable with fast_commit ext4: fix memory leak in ext4_mb_init_backend on error path.	2021-06-06 14:24:13 -07:00
Daniel Rosenberg	e71f99f2df	ext4: Only advertise encrypted_casefold when encryption and unicode are enabled Encrypted casefolding is only supported when both encryption and casefolding are both enabled in the config. Fixes: `471fbbea7f` ("ext4: handle casefolding with encryption") Cc: stable@vger.kernel.org # 5.13+ Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20210603094849.314342-1-drosen@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2021-06-06 10:10:23 -04:00
Daniel Rosenberg	63e7f12893	ext4: fix no-key deletion for encrypt+casefold commit `471fbbea7f` ("ext4: handle casefolding with encryption") is missing a few checks for the encryption key which are needed to support deleting enrypted casefolded files when the key is not present. This bug made it impossible to delete encrypted+casefolded directories without the encryption key, due to errors like: W : EXT4-fs warning (device vdc): __ext4fs_dirhash:270: inode #49202: comm Binder:378_4: Siphash requires key Repro steps in kvm-xfstests test appliance: mkfs.ext4 -F -E encoding=utf8 -O encrypt /dev/vdc mount /vdc mkdir /vdc/dir chattr +F /vdc/dir keyid=$(head -c 64 /dev/zero \| xfs_io -c add_enckey /vdc \| awk '{print $NF}') xfs_io -c "set_encpolicy $keyid" /vdc/dir for i in `seq 1 100`; do mkdir /vdc/dir/$i done xfs_io -c "rm_enckey $keyid" /vdc rm -rf /vdc/dir # fails with the bug Fixes: `471fbbea7f` ("ext4: handle casefolding with encryption") Signed-off-by: Daniel Rosenberg <drosen@google.com> Link: https://lore.kernel.org/r/20210522004132.2142563-1-drosen@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2021-06-06 10:10:23 -04:00
Alexey Makhalov	afd09b617d	ext4: fix memory leak in ext4_fill_super Buffer head references must be released before calling kill_bdev(); otherwise the buffer head (and its page referenced by b_data) will not be freed by kill_bdev, and subsequently that bh will be leaked. If blocksizes differ, sb_set_blocksize() will kill current buffers and page cache by using kill_bdev(). And then super block will be reread again but using correct blocksize this time. sb_set_blocksize() didn't fully free superblock page and buffer head, and being busy, they were not freed and instead leaked. This can easily be reproduced by calling an infinite loop of: systemctl start <ext4_on_lvm>.mount, and systemctl stop <ext4_on_lvm>.mount ... since systemd creates a cgroup for each slice which it mounts, and the bh leak get amplified by a dying memory cgroup that also never gets freed, and memory consumption is much more easily noticed. Fixes: `ce40733ce9` ("ext4: Check for return value from sb_set_blocksize") Fixes: `ac27a0ec11` ("ext4: initial copy of files from ext3") Link: https://lore.kernel.org/r/20210521075533.95732-1-amakhalov@vmware.com Signed-off-by: Alexey Makhalov <amakhalov@vmware.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2021-06-06 10:10:23 -04:00
Harshad Shirwadkar	a7ba36bc94	ext4: fix fast commit alignment issues Fast commit recovery data on disk may not be aligned. So, when the recovery code reads it, this patch makes sure that fast commit info found on-disk is first memcpy-ed into an aligned variable before accessing it. As a consequence of it, we also remove some macros that could resulted in unaligned accesses. Cc: stable@kernel.org Fixes: `8016e29f43` ("ext4: fast commit recovery path") Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20210519215920.2037527-1-harshads@google.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2021-06-06 10:10:23 -04:00
Ye Bin	082cd4ec24	ext4: fix bug on in ext4_es_cache_extent as ext4_split_extent_at failed We got follow bug_on when run fsstress with injecting IO fault: [130747.323114] kernel BUG at fs/ext4/extents_status.c:762! [130747.323117] Internal error: Oops - BUG: 0 [#1] SMP ...... [130747.334329] Call trace: [130747.334553] ext4_es_cache_extent+0x150/0x168 [ext4] [130747.334975] ext4_cache_extents+0x64/0xe8 [ext4] [130747.335368] ext4_find_extent+0x300/0x330 [ext4] [130747.335759] ext4_ext_map_blocks+0x74/0x1178 [ext4] [130747.336179] ext4_map_blocks+0x2f4/0x5f0 [ext4] [130747.336567] ext4_mpage_readpages+0x4a8/0x7a8 [ext4] [130747.336995] ext4_readpage+0x54/0x100 [ext4] [130747.337359] generic_file_buffered_read+0x410/0xae8 [130747.337767] generic_file_read_iter+0x114/0x190 [130747.338152] ext4_file_read_iter+0x5c/0x140 [ext4] [130747.338556] __vfs_read+0x11c/0x188 [130747.338851] vfs_read+0x94/0x150 [130747.339110] ksys_read+0x74/0xf0 This patch's modification is according to Jan Kara's suggestion in: https://patchwork.ozlabs.org/project/linux-ext4/patch/20210428085158.3728201-1-yebin10@huawei.com/ "I see. Now I understand your patch. Honestly, seeing how fragile is trying to fix extent tree after split has failed in the middle, I would probably go even further and make sure we fix the tree properly in case of ENOSPC and EDQUOT (those are easily user triggerable). Anything else indicates a HW problem or fs corruption so I'd rather leave the extent tree as is and don't try to fix it (which also means we will not create overlapping extents)." Cc: stable@kernel.org Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20210506141042.3298679-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2021-06-06 10:09:55 -04:00
Junxiao Bi	6bba4471f0	ocfs2: fix data corruption by fallocate When fallocate punches holes out of inode size, if original isize is in the middle of last cluster, then the part from isize to the end of the cluster will be zeroed with buffer write, at that time isize is not yet updated to match the new size, if writeback is kicked in, it will invoke ocfs2_writepage()->block_write_full_page() where the pages out of inode size will be dropped. That will cause file corruption. Fix this by zero out eof blocks when extending the inode size. Running the following command with qemu-image 4.2.1 can get a corrupted coverted image file easily. qemu-img convert -p -t none -T none -f qcow2 $qcow_image \ -O qcow2 -o compat=1.1 $qcow_image.conv The usage of fallocate in qemu is like this, it first punches holes out of inode size, then extend the inode size. fallocate(11, FALLOC_FL_KEEP_SIZE\|FALLOC_FL_PUNCH_HOLE, 2276196352, 65536) = 0 fallocate(11, 0, 2276196352, 65536) = 0 v1: https://www.spinics.net/lists/linux-fsdevel/msg193999.html v2: https://lore.kernel.org/linux-fsdevel/20210525093034.GB4112@quack2.suse.cz/T/ Link: https://lkml.kernel.org/r/20210528210648.9124-1-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Jan Kara <jack@suse.cz> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-05 08:58:12 -07:00
Eric Biggers	2fc2b430f5	fscrypt: fix derivation of SipHash keys on big endian CPUs Typically, the cryptographic APIs that fscrypt uses take keys as byte arrays, which avoids endianness issues. However, siphash_key_t is an exception. It is defined as 'u64 key[2];', i.e. the 128-bit key is expected to be given directly as two 64-bit words in CPU endianness. fscrypt_derive_dirhash_key() and fscrypt_setup_iv_ino_lblk_32_key() forgot to take this into account. Therefore, the SipHash keys used to index encrypted+casefolded directories differ on big endian vs. little endian platforms, as do the SipHash keys used to hash inode numbers for IV_INO_LBLK_32-encrypted directories. This makes such directories non-portable between these platforms. Fix this by always using the little endian order. This is a breaking change for big endian platforms, but this should be fine in practice since these features (encrypt+casefold support, and the IV_INO_LBLK_32 flag) aren't known to actually be used on any big endian platforms yet. Fixes: `aa408f835d` ("fscrypt: derive dirhash key for casefolded directories") Fixes: `e3b1078bed` ("fscrypt: add support for IV_INO_LBLK_32 policies") Cc: <stable@vger.kernel.org> # v5.6+ Link: https://lore.kernel.org/r/20210605075033.54424-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2021-06-05 00:52:52 -07:00
Eric Biggers	77f30bfcfc	fscrypt: don't ignore minor_hash when hash is 0 When initializing a no-key name, fscrypt_fname_disk_to_usr() sets the minor_hash to 0 if the (major) hash is 0. This doesn't make sense because 0 is a valid hash code, so we shouldn't ignore the filesystem-provided minor_hash in that case. Fix this by removing the special case for 'hash == 0'. This is an old bug that appears to have originated when the encryption code in ext4 and f2fs was moved into fs/crypto/. The original ext4 and f2fs code passed the hash by pointer instead of by value. So 'if (hash)' actually made sense then, as it was checking whether a pointer was NULL. But now the hashes are passed by value, and filesystems just pass 0 for any hashes they don't have. There is no need to handle this any differently from the hashes actually being 0. It is difficult to reproduce this bug, as it only made a difference in the case where a filename's 32-bit major hash happened to be 0. However, it probably had the largest chance of causing problems on ubifs, since ubifs uses minor_hash to do lookups of no-key names, in addition to using it as a readdir cookie. ext4 only uses minor_hash as a readdir cookie, and f2fs doesn't use minor_hash at all. Fixes: `0b81d07790` ("fs crypto: move per-file encryption from f2fs tree to fs/crypto") Cc: <stable@vger.kernel.org> # v4.6+ Link: https://lore.kernel.org/r/20210527235236.2376556-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2021-06-05 00:22:53 -07:00
Christoph Hellwig	603e4922f1	remove the raw driver The raw driver used to provide direct unbuffered access to block devices before O_DIRECT was invented. It has been obsolete for more than a decade. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/lkml/Pine.LNX.4.64.0703180754060.6605@CPE00045a9c397f-CM001225dbafb6/ Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210531072526.97052-1-hch@lst.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-04 15:35:03 +02:00
Dietmar Eggemann	f501b6a231	debugfs: Fix debugfs_read_file_str() Read the entire size of the buffer, including the trailing new line character. Discovered while reading the sched domain names of CPU0: before: cat /sys/kernel/debug/sched/domains/cpu0/domain/name SMTMCDIE after: cat /sys/kernel/debug/sched/domains/cpu0/domain/name SMT MC DIE Fixes: `9af0440ec8` ("debugfs: Implement debugfs_create_str()") Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lore.kernel.org/r/20210527091105.258457-1-dietmar.eggemann@arm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-06-04 15:01:08 +02:00
Nikolay Borisov	aefd7f7065	btrfs: promote debugging asserts to full-fledged checks in validate_super Syzbot managed to trigger this assert while performing its fuzzing. Turns out it's better to have those asserts turned into full-fledged checks so that in case buggy btrfs images are mounted the users gets an error and mounting is stopped. Alternatively with CONFIG_BTRFS_ASSERT disabled such image would have been erroneously allowed to be mounted. Reported-by: syzbot+a6bf271c02e4fe66b4e4@syzkaller.appspotmail.com CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> [ add uuids to the messages ] Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-04 13:12:06 +02:00
Ritesh Harjani	e7b2ec3d3d	btrfs: return value from btrfs_mark_extent_written() in case of error We always return 0 even in case of an error in btrfs_mark_extent_written(). Fix it to return proper error value in case of a failure. All callers handle it. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-04 13:11:58 +02:00
Naohiro Aota	5b434df877	btrfs: zoned: fix zone number to sector/physical calculation In btrfs_get_dev_zone_info(), we have "u32 sb_zone" and calculate "sector_t sector" by shifting it. But, this "sector" is calculated in 32bit, leading it to be 0 for the 2nd superblock copy. Since zone number is u32, shifting it to sector (sector_t) or physical address (u64) can easily trigger a missing cast bug like this. This commit introduces helpers to convert zone number to sector/LBA, so we won't fall into the same pitfall again. Reported-by: Dmitry Fomichev <Dmitry.Fomichev@wdc.com> Fixes: `12659251ca` ("btrfs: implement log-structured superblock for ZONED mode") CC: stable@vger.kernel.org # 5.11+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-04 13:11:50 +02:00
Josef Bacik	165ea85f14	btrfs: do not write supers if we have an fs error Error injection testing uncovered a pretty severe problem where we could end up committing a super that pointed to the wrong tree roots, resulting in transid mismatch errors. The way we commit the transaction is we update the super copy with the current generations and bytenrs of the important roots, and then copy that into our super_for_commit. Then we allow transactions to continue again, we write out the dirty pages for the transaction, and then we write the super. If the write out fails we'll bail and skip writing the supers. However since we've allowed a new transaction to start, we can have a log attempting to sync at this point, which would be blocked on fs_info->tree_log_mutex. Once the commit fails we're allowed to do the log tree commit, which uses super_for_commit, which now points at fs tree's that were not written out. Fix this by checking BTRFS_FS_STATE_ERROR once we acquire the tree_log_mutex. This way if the transaction commit fails we're sure to see this bit set and we can skip writing the super out. This patch fixes this specific transid mismatch error I was seeing with this particular error path. CC: stable@vger.kernel.org # 5.12+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-04 13:11:38 +02:00
Darrick J. Wong	c076ae7a93	xfs: refactor per-AG inode tagging functions In preparation for adding another incore inode tree tag, refactor the code that sets and clears tags from the per-AG inode tree and the tree of per-AG structures, and remove the open-coded versions used by the blockgc code. Note: For reclaim, we now rely on the radix tree tags instead of the reclaimable inode count more heavily than we used to. The conversion should be fine, but the logic isn't 100% identical. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:04 -07:00
Darrick J. Wong	f1bc5c5630	xfs: merge xfs_reclaim_inodes_ag into xfs_inode_walk_ag Merge these two inode walk loops together, since they're pretty similar now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:04 -07:00
Darrick J. Wong	9d5ee83759	xfs: pass struct xfs_eofblocks to the inode scan callback Pass a pointer to the actual eofb structure around the inode scanner functions instead of a void pointer, now that none of the functions is used as a callback. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:04 -07:00
Darrick J. Wong	919a4ddb68	xfs: fix radix tree tag signs Radix tree tags are supposed to be unsigned ints, so fix the callers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:04 -07:00
Darrick J. Wong	594ab00b76	xfs: make the icwalk processing functions clean up the grab state Soon we're going to be adding two new callers to the incore inode walk code: reclaim of incore inodes, and (later) inactivation of inodes. Both states operate on inodes that no longer have any VFS state, so we need to move the xfs_irele calls into the processing functions. In other words, icwalk processing functions are responsible for cleaning up whatever state changes are made by the corresponding icwalk igrab function that picked the inode for processing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:03 -07:00
Darrick J. Wong	d20d5edcf9	xfs: clean up inode state flag tests in xfs_blockgc_igrab Clean up the definition of which inode states are not eligible for speculative preallocation garbage collecting by creating a private #define. The deferred inactivation patchset will add two new entries to the set of flags-to-ignore, so we want the definition not to end up a cluttered mess. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:03 -07:00
Darrick J. Wong	f427cf5c62	xfs: remove indirect calls from xfs_inode_walk{,_ag} It turns out that there is a 1:1 mapping between the execute and goal parameters that are passed to xfs_inode_walk_ag: xfs_blockgc_scan_inode <=> XFS_ICWALK_BLOCKGC xfs_dqrele_inode <=> XFS_ICWALK_DQRELE Because of this exact correspondence, we don't need the execute function pointer and can replace it with a direct call. For the price of a forward static declaration, we can eliminate the indirect function call. This likely has a negligible impact on performance (since the execute function runs transactions), but it also simplifies the function signature. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:03 -07:00
Darrick J. Wong	7fdff52623	xfs: remove iter_flags parameter from xfs_inode_walk_* The sole iter_flags is XFS_INODE_WALK_INEW_WAIT, and there are no users. Remove the flag, and the parameter, and all the code that used it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:03 -07:00
Darrick J. Wong	9d2793ceec	xfs: move xfs_inew_wait call into xfs_dqrele_inode Move the INEW wait into xfs_dqrele_inode so that we can drop the iter_flags parameter in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:03 -07:00
Darrick J. Wong	b9baaef42f	xfs: separate the dqrele_all inode grab logic from xfs_inode_walk_ag_grab Disentangle the dqrele_all inode grab code from the "generic" inode walk grabbing code, and and use the opportunity to document why the dqrele grab function does what it does. Since xfs_inode_walk_ag_grab is now only used for blockgc, rename it to reflect that. Ultimately, there will be four reasons to perform a walk of incore inodes: quotaoff dquote releasing (dqrele), garbage collection of speculative preallocations (blockgc), reclamation of incore inodes (reclaim), and deferred inactivation (inodegc). Each of these four have their own slightly different criteria for deciding if they want to handle an inode, so it makes more sense to have four cohesive igrab functions than one confusing parameteric grab function like we do now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:03 -07:00
Darrick J. Wong	c809d7e948	xfs: pass the goal of the incore inode walk to xfs_inode_walk() As part of removing the indirect calls and radix tag implementation details from the incore inode walk loop, create an enum to represent the goal of the inode iteration. More immediately, this separate removes the need for the "ICI_NOTAG" define which makes little sense. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:02 -07:00
Darrick J. Wong	c1115c0cba	xfs: rename xfs_inode_walk functions to xfs_icwalk Shorten the prefix so that all the incore inode cache walk code has "xfs_icwalk" in the name somewhere. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:02 -07:00
Darrick J. Wong	df60019739	xfs: move the inode walk functions further down Move the inode walk functions further down in the file to limit the forward declarations to the two walk functions as we add new code that uses the inode walks. We'll clean them out later (i.e. after the deferred inode inactivation series). Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:02 -07:00
Darrick J. Wong	3ea06d73e3	xfs: detach inode dquots at the end of inactivation Once we're done with inactivating an inode, we're finished updating metadata for that inode. This means that we can detach the dquots at the end and not have to wait for reclaim to do it for us. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:02 -07:00
Darrick J. Wong	1ad2cfe0a5	xfs: move the quotaoff dqrele inode walk into xfs_icache.c The only external caller of xfs_inode_walk* happens in quotaoff, when we want to walk all the incore inodes to detach the dquots. Move this code to xfs_icache.c so that we can hide xfs_inode_walk as the starting step in more cleanups of inode walks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>	2021-06-03 15:56:02 -07:00
Linus Torvalds	ec95502396	io_uring-5.13-2021-06-03 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmC5BrwQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpq3tD/9FGANoxDDpLbQg/FCiK1pNoSf0EyoEWSdg ysTF5KPAPC3msQOmuPYwRZfRFCkvtOHmrexPAZAaorCxEYPjiVAZ9b/a0hBC4Zc1 vVW8RcTp6hSonAp1kk6VgLEHulJMcLANjAx3Me3NDRB/g0KGW5gevqkUXIJ+nXiR nqZcxaK7MD90v74IomO7y4P1GgwCbRhKYUL0JGQ4tXndYLxBYJnXBSnIKS2WdLZD PCBf+TDDFAZeioueZ/GrXRhWBmy97j8sEKUJLRqjI5YG8VVZSofgPlwNBi1e42C8 l3ZEmXldyk18O8KDsZCI2E8axt62gLjuD7Tu6+gv0GBJTcdyXP/FaZYbkBMWjMBH yq4Dk4QyJWfMFHJ886ukbGpwj1HJT1cJqzg4UUdkV3BlMNKtmZD8XrKTBw4HcPww EmB+yywRiuH+XqamxPglFXUEOa4bJH/EAsQ0R5NNxAT/X/9iIOLUDBDAGvtWtBr0 7cz+7jTQchqmV11gN+JcgN2LvG14m6Xq4Xtv5oHhIy/FHbRCNPPrC7KJ22TOBSaD d9mS5VM12+O9r9plYW7Cqdhdhnho/7/VfB+puiHg/lVcsXrMlrr0sc/WyrUixZeL AUlhDtmoROcyFpdcA49LCBEFvacu13ivEstkxIonx997Ct4MW7joYds2YfHCfuoO YlPVGdqeag== =3m3m -----END PGP SIGNATURE----- Merge tag 'io_uring-5.13-2021-06-03' of git://git.kernel.dk/linux-block Pull io_uring fix from Jens Axboe: "Just a single one-liner fix for an accounting regression in this release" * tag 'io_uring-5.13-2021-06-03' of git://git.kernel.dk/linux-block: io_uring: fix misaccounting fix buf pinned pages	2021-06-03 11:41:00 -07:00
Linus Torvalds	fd2ff2774e	for-5.13-rc4-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmC435cACgkQxWXV+ddt WDuh5w/+IGfsUFfKikJZpZUP7q/2gC0t0dzZemxeZMutJbT/KCZCDd4CjLf6YH6r oV9uYIgOWGd3aem9fe0R60ErJ4htgszIgeydCw3s2EuTms6WvAVA6Wp+wK/3UNx3 vQgYsqYkhMzIYKm/D4q8G+bqA2nPbBTDRNsXDIDrZYONxwSb+dNbQCGVknBRzRPa hiCqYhUSyXA7E6UZdlma7MvpDOquZN+iW3RRVx1AULLqVs01PCnG/CEN+0oQm2JE r9IyRxOZUvSeW6opT80yzZFCoboNSduMjPENTfzLY6Q1xzS/EtP4kM86fB/7AoJv UI0c3Sr84SC9vOsBsbGJaBHpxP3OpzxohKU///jVQgEDpGv4STPlkVfxk23BHcux Fdfg7wodkXeLU1Ff4dlJhvCqNYqc5V8lT5Kl52ai9Scct6D4yZBAq4KJp2LmYFC0 cHv6xFxBUv5zFZP1j6NMOmiLlCdDEkOruku2mMweQOBWYW/lHYNU469V5RCvfbLl HlbDrtZdnQ3m2IhpQrXiTnT47Ib4DPYWkhRVfWbyVJHA+CbcOV62RQfl+r95Bc7j FB1gM5vwUTJV7wgzErrq7+BD8quxG6/NuLDFjHYRcIj1kSIMK4/I1fOWruzuK+CL 6n7LLvBOojYfFo+ruQMSp2imDn3JJucBuh0/ssOlUWl2zsy6lDA= =8066 -----END PGP SIGNATURE----- Merge tag 'for-5.13-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "Error handling improvements, caught by error injection: - handle errors during checksum deletion - set error on mapping when ordered extent io cannot be finished - inode link count fixup in tree-log - missing return value checks for inode updates in tree-log - abort transaction in rename exchange if adding second reference fails Fixes: - fix fsync failure after writes to prealloc extents - fix deadlock when cloning inline extents and low on available space - fix compressed writes that cross stripe boundary" * tag 'for-5.13-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: MAINTAINERS: add btrfs IRC link btrfs: fix deadlock when cloning inline extents and low on available space btrfs: fix fsync failure and transaction abort after writes to prealloc extents btrfs: abort in rename_exchange if we fail to insert the second ref btrfs: check error value from btrfs_update_inode in tree log btrfs: fixup error handling in fixup_inode_link_counts btrfs: mark ordered extent and inode with error if we fail to finish btrfs: return errors from btrfs_del_csums in cleanup_ref_head btrfs: fix error handling in btrfs_del_csums btrfs: fix compressed writes that cross stripe boundary	2021-06-03 11:37:14 -07:00
Ingo Molnar	a9e906b71f	Merge branch 'sched/urgent' into sched/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2021-06-03 19:00:49 +02:00
Al Viro	8959a23924	fuse_fill_write_pages(): don't bother with iov_iter_single_seg_count() another rudiment of fault-in originally having been limited to the first segment, same as in generic_perform_write() and friends. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-06-03 10:34:55 -04:00
Trond Myklebust	c3aba897c6	NFSv4: Fix second deadlock in nfs4_evict_inode() If the inode is being evicted but has to return a layout first, then that too can cause a deadlock in the corner case where the server reboots. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-06-03 10:14:42 -04:00
Trond Myklebust	dfe1fe75e0	NFSv4: Fix deadlock between nfs4_evict_inode() and nfs4_opendata_get_inode() If the inode is being evicted, but has to return a delegation first, then it can cause a deadlock in the corner case where the server reboots before the delegreturn completes, but while the call to iget5_locked() in nfs4_opendata_get_inode() is waiting for the inode free to complete. Since the open call still holds a session slot, the reboot recovery cannot proceed. In order to break the logjam, we can turn the delegation return into a privileged operation for the case where we're evicting the inode. We know that in that case, there can be no other state recovery operation that conflicts. Reported-by: zhangxiaoxu (A) <zhangxiaoxu5@huawei.com> Fixes: `5fcdfacc01` ("NFSv4: Return delegations synchronously in evict_inode") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-06-03 10:14:42 -04:00
Chuck Lever	d1b5c230e9	NFS: FMODE_READ and friends are C macros, not enum types Address a sparse warning: CHECK fs/nfs/nfstrace.c fs/nfs/nfstrace.c: note: in included file (through /home/cel/src/linux/rpc-over-tls/include/trace/trace_events.h, /home/cel/src/linux/rpc-over-tls/include/trace/define_trace.h, ...): fs/nfs/./nfstrace.h:424:1: warning: incorrect type in initializer (different base types) fs/nfs/./nfstrace.h:424:1: expected unsigned long eval_value fs/nfs/./nfstrace.h:424:1: got restricted fmode_t [usertype] fs/nfs/./nfstrace.h:425:1: warning: incorrect type in initializer (different base types) fs/nfs/./nfstrace.h:425:1: expected unsigned long eval_value fs/nfs/./nfstrace.h:425:1: got restricted fmode_t [usertype] fs/nfs/./nfstrace.h:426:1: warning: incorrect type in initializer (different base types) fs/nfs/./nfstrace.h:426:1: expected unsigned long eval_value fs/nfs/./nfstrace.h:426:1: got restricted fmode_t [usertype] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-06-03 10:14:42 -04:00
Dan Carpenter	09226e8303	NFS: Fix a potential NULL dereference in nfs_get_client() None of the callers are expecting NULL returns from nfs_get_client() so this code will lead to an Oops. It's better to return an error pointer. I expect that this is dead code so hopefully no one is affected. Fixes: `31434f496a` ("nfs: check hostname in nfs_get_client") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-06-03 10:14:42 -04:00
Anna Schumaker	476bdb04c5	NFS: Fix use-after-free in nfs4_init_client() KASAN reports a use-after-free when attempting to mount two different exports through two different NICs that belong to the same server. Olga was able to hit this with kernels starting somewhere between 5.7 and 5.10, but I traced the patch that introduced the clear_bit() call to 4.13. So something must have changed in the refcounting of the clp pointer to make this call to nfs_put_client() the very last one. Fixes: `8dcbec6d20` ("NFSv41: Handle EXCHID4_FLAG_CONFIRMED_R during NFSv4.1 migration") Cc: stable@vger.kernel.org # 4.13+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-06-03 10:14:42 -04:00
Scott Mayhew	0b4f132b15	NFS: Ensure the NFS_CAP_SECURITY_LABEL capability is set when appropriate Commit `ce62b114bb` ("NFS: Split attribute support out from the server capabilities") removed the logic from _nfs4_server_capabilities() that sets the NFS_CAP_SECURITY_LABEL capability based on the presence of FATTR4_WORD2_SECURITY_LABEL in the attr_bitmask of the server's response. Now NFS_CAP_SECURITY_LABEL is never set, which breaks labelled NFS. This was replaced with logic that clears the NFS_ATTR_FATTR_V4_SECURITY_LABEL bit in the newly added fattr_valid field based on the absence of FATTR4_WORD2_SECURITY_LABEL in the attr_bitmask of the server's response. This essentially has no effect since there's nothing looks for that bit in fattr_supported. So revert that part of the commit, but adding the logic that sets NFS_CAP_SECURITY_LABEL near where the other capabilities are set in _nfs4_server_capabilities(). Fixes: `ce62b114bb` ("NFS: Split attribute support out from the server capabilities") Signed-off-by: Scott Mayhew <smayhew@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-06-03 10:14:42 -04:00
Ritesh Harjani	b45f189a19	ext4: fix accessing uninit percpu counter variable with fast_commit When running generic/527 with fast_commit configuration, the following issue is seen on Power. With fast_commit, during ext4_fc_replay() (which can be called from ext4_fill_super()), if inode eviction happens then it can access an uninitialized percpu counter variable. This patch adds the check before accessing the counters in ext4_free_inode() path. [ 321.165371] run fstests generic/527 at 2021-04-29 08:38:43 [ 323.027786] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: block_validity. Quota mode: none. [ 323.618772] BUG: Unable to handle kernel data access on read at 0x1fbd80000 [ 323.619767] Faulting instruction address: 0xc000000000bae78c cpu 0x1: Vector: 300 (Data Access) at [c000000010706ef0] pc: c000000000bae78c: percpu_counter_add_batch+0x3c/0x100 lr: c0000000006d0bb0: ext4_free_inode+0x780/0xb90 pid = 5593, comm = mount ext4_free_inode+0x780/0xb90 ext4_evict_inode+0xa8c/0xc60 evict+0xfc/0x1e0 ext4_fc_replay+0xc50/0x20f0 do_one_pass+0xfe0/0x1350 jbd2_journal_recover+0x184/0x2e0 jbd2_journal_load+0x1c0/0x4a0 ext4_fill_super+0x2458/0x4200 mount_bdev+0x1dc/0x290 ext4_mount+0x28/0x40 legacy_get_tree+0x4c/0xa0 vfs_get_tree+0x4c/0x120 path_mount+0xcf8/0xd70 do_mount+0x80/0xd0 sys_mount+0x3fc/0x490 system_call_exception+0x384/0x3d0 system_call_common+0xec/0x278 Cc: stable@kernel.org Fixes: `8016e29f43` ("ext4: fast commit recovery path") Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/6cceb9a75c54bef8fa9696c1b08c8df5ff6169e2.1619692410.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2021-06-02 21:40:42 -04:00
Dave Chinner	977ec4ddf0	xfs: don't take a spinlock unconditionally in the DIO fastpath Because this happens at high thread counts on high IOPS devices doing mixed read/write AIO-DIO to a single file at about a million iops: 64.09% 0.21% [kernel] [k] io_submit_one - 63.87% io_submit_one - 44.33% aio_write - 42.70% xfs_file_write_iter - 41.32% xfs_file_dio_write_aligned - 25.51% xfs_file_write_checks - 21.60% _raw_spin_lock - 21.59% do_raw_spin_lock - 19.70% __pv_queued_spin_lock_slowpath This also happens of the IO completion IO path: 22.89% 0.69% [kernel] [k] xfs_dio_write_end_io - 22.49% xfs_dio_write_end_io - 21.79% _raw_spin_lock - 20.97% do_raw_spin_lock - 20.10% __pv_queued_spin_lock_slowpath IOWs, fio is burning ~14 whole CPUs on this spin lock. So, do an unlocked check against inode size first, then if we are at/beyond EOF, take the spinlock and recheck. This makes the spinlock disappear from the overwrite fastpath. I'd like to report that fixing this makes things go faster. It doesn't - it just exposes the the XFS_ILOCK as the next severe contention point doing extent mapping lookups, and that now burns all the 14 CPUs this spinlock was burning. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 15:00:38 -07:00
Christoph Hellwig	5a981e4ea8	xfs: mark xfs_bmap_set_attrforkoff static xfs_bmap_set_attrforkoff is only used inside of xfs_bmap.c, so mark it static. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 14:58:59 -07:00
Jiapeng Chong	9673261c32	xfs: Remove redundant assignment to busy Variable busy is set to false, but this value is never read as it is overwritten or not used later on, hence it is a redundant assignment and can be removed. Clean up the following clang-analyzer warning: fs/xfs/libxfs/xfs_alloc.c:1679:2: warning: Value stored to 'busy' is never read [clang-analyzer-deadcode.DeadStores]. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 14:56:29 -07:00
Shaokun Zhang	5f7fd75086	xfs: sort variable alphabetically to avoid repeated declaration Variable 'xfs_agf_buf_ops', 'xfs_agi_buf_ops', 'xfs_dquot_buf_ops' and 'xfs_symlink_buf_ops' are declared twice, so sort these variables alphabetically and remove the repeated declaration. Cc: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 14:54:09 -07:00
Al Viro	bc1bb416bb	generic_perform_write()/iomap_write_actor(): saner logics for short copy if we run into a short copy and ->write_end() refuses to advance at all, use the amount we'd managed to copy for the next iteration to handle. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-06-02 17:50:44 -04:00
Al Viro	9067931236	ntfs_copy_from_user_iter(): don't bother with copying iov_iter Advance the original, let the caller revert if it needs to. Don't mess with iov_iter_single_seg_count() in the caller - if we got a (non-zero) short copy, use the amount actually copied for the next pass, limit it to "up to the end of page" if nothing got copied at all. Originally fault-in only read the first iovec; back then it used to make sense to limit to the just one iovec for the pass after short copy. These days it's no long true. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-06-02 17:50:38 -04:00
Alexander Aring	d10a0b8875	fs: dlm: rename socket and app buffer defines This patch renames DEFAULT_BUFFER_SIZE to DLM_MAX_SOCKET_BUFSIZE and LOWCOMMS_MAX_TX_BUFFER_LEN to DLM_MAX_APP_BUFSIZE as they are proper names to define what's behind those values. The DLM_MAX_SOCKET_BUFSIZE defines the maximum size of buffer which can be handled on socket layer, the DLM_MAX_APP_BUFSIZE defines the maximum size of buffer which can be handled by the DLM application layer. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-02 11:53:04 -05:00
Alexander Aring	ac7d5d036d	fs: dlm: introduce proto values Currently the dlm protocol values are that TCP is 0 and everything else is SCTP. This makes it difficult to introduce possible other transport layers. The only one user space tool dlm_controld, which I am aware of, handles the protocol value 1 for SCTP. We change it now to handle SCTP as 1, this will break user space API but it will fix it so we can add possible other transport layers. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-02 11:53:04 -05:00
Alexander Aring	9a4139a794	fs: dlm: move dlm allow conn This patch checks if possible allowing new connections is allowed before queueing the listen socket to accept new connections. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-02 11:53:04 -05:00
Alexander Aring	6c6a1cc666	fs: dlm: use alloc_ordered_workqueue The proper way to allocate ordered workqueues is to use alloc_ordered_workqueue() function. The current way implies an ordered workqueue which is also required by dlm. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-02 11:53:04 -05:00
Alexander Aring	700ab1c363	fs: dlm: fix memory leak when fenced I got some kmemleak report when a node was fenced. The user space tool dlm_controld will therefore run some rmdir() in dlm configfs which was triggering some memleaks. This patch stores the sps and cms attributes which stores some handling for subdirectories of the configfs cluster entry and free them if they get released as the parent directory gets freed. unreferenced object 0xffff88810d9e3e00 (size 192): comm "dlm_controld", pid 342, jiffies 4294698126 (age 55438.801s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 73 70 61 63 65 73 00 00 ........spaces.. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000db8b640b>] make_cluster+0x5d/0x360 [<000000006a571db4>] configfs_mkdir+0x274/0x730 [<00000000b094501c>] vfs_mkdir+0x27e/0x340 [<0000000058b0adaf>] do_mkdirat+0xff/0x1b0 [<00000000d1ffd156>] do_syscall_64+0x40/0x80 [<00000000ab1408c8>] entry_SYSCALL_64_after_hwframe+0x44/0xae unreferenced object 0xffff88810d9e3a00 (size 192): comm "dlm_controld", pid 342, jiffies 4294698126 (age 55438.801s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 63 6f 6d 6d 73 00 00 00 ........comms... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000a7ef6ad2>] make_cluster+0x82/0x360 [<000000006a571db4>] configfs_mkdir+0x274/0x730 [<00000000b094501c>] vfs_mkdir+0x27e/0x340 [<0000000058b0adaf>] do_mkdirat+0xff/0x1b0 [<00000000d1ffd156>] do_syscall_64+0x40/0x80 [<00000000ab1408c8>] entry_SYSCALL_64_after_hwframe+0x44/0xae Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-02 11:53:04 -05:00
Alexander Aring	fcef0e6c27	fs: dlm: fix lowcomms_start error case This patch fixes the error path handling in lowcomms_start(). We need to cleanup some static allocated data structure and cleanup possible workqueue if these have started. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-06-02 11:53:04 -05:00
Dave Chinner	509201163f	xfs: remove xfs_perag_t Almost unused, gets rid of another typedef. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:51 +10:00
Dave Chinner	f40aadb2bb	xfs: use perag through unlink processing Unlinked lists are held in the perag, and freeing of inodes needs to be passed a perag, too, so look up the perag early in the unlink processing and use it throughout. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>	2021-06-02 10:48:51 +10:00
Dave Chinner	8237fbf53d	xfs: clean up and simplify xfs_dialloc() Because it's a mess. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	309161f660	xfs: inode allocation can use a single perag instance Now that we've internalised the two-phase inode allocation, we can now easily make the AG selection and allocation atomic from the perspective of a single perag context. This will ensure AGs going offline/away cannot occur between the selection and allocation steps. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	b652afd937	xfs: get rid of xfs_dir_ialloc() This is just a simple wrapper around the per-ag inode allocation that doesn't need to exist. The internal mechanism to select and allocate within an AG does not need to be exposed outside xfs_ialloc.c, and it being exposed simply makes it harder to follow the code and simplify it. This is simplified by internalising xf_dialloc_select_ag() and xfs_dialloc_ag() into a single xfs_dialloc() function and then xfs_dir_ialloc() can go away. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	89b1f55a29	xfs: collapse AG selection for inode allocation xfs_dialloc_select_ag() does a lot of repetitive work. It first calls xfs_ialloc_ag_select() to select the AG to start allocation attempts in, which can do up to two entire loops across the perags that inodes can be allocated in. This is simply checking if there is spce available to allocate inodes in an AG, and it returns when it finds the first candidate AG. xfs_dialloc_select_ag() then does it's own iterative walk across all the perags locking the AGIs and trying to allocate inodes from the locked AG. It also doesn't limit the search to mp->m_maxagi, so it will walk all AGs whether they can allocate inodes or not. Hence if we are really low on inodes, we could do almost 3 entire walks across the whole perag range before we find an allocation group we can allocate inodes in or report ENOSPC. Because xfs_ialloc_ag_select() returns on the first candidate AG it finds, we can simply do these checks directly in xfs_dialloc_select_ag() before we lock and try to allocate inodes. This reduces the inode allocation pass down to 2 perag sweeps at most - one for aligned inode cluster allocation and if we can't allocate full, aligned inode clusters anywhere we'll do another pass trying to do sparse inode cluster allocation. This also removes a big chunk of duplicate code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	4268547305	xfs: simplify xfs_dialloc_select_ag() return values The only caller of xfs_dialloc_select_ag() will always return -ENOSPC to it's caller if the agbp returned from xfs_dialloc_select_ag() is NULL. IOWs, failure to find a candidate AGI we can allocate inodes from is always an ENOSPC condition, so move this logic up into xfs_dialloc_select_ag() so we can simplify the return logic in this function. xfs_dialloc_select_ag() now only ever returns 0 with a locked agbp, or an error with no agbp. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	50f02fe333	xfs: remove agno from btree cursor Now that everything passes a perag, the agno is not needed anymore. Convert all the users to use pag->pag_agno instead and remove the agno from the cursor. This was largely done as an automated search and replace. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	7b13c51551	xfs: use perag for ialloc btree cursors Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	289d38d22c	xfs: convert allocbt cursors to use perags Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	a81a06211f	xfs: convert refcount btree cursor to use perags Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	fa9c3c1973	xfs: convert rmap btree cursor to using a perag Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	be9fb17d88	xfs: add a perag to the btree cursor Which will eventually completely replace the agno in it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>	2021-06-02 10:48:24 +10:00
Dave Chinner	58d43a7e32	xfs: pass perags around in fsmap data dev functions Needs a [from, to] ranged AG walk, and the perag to be stuffed into the info structure for callouts to use. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	30933120ad	xfs: push perags through the ag reservation callouts We currently pass an agno from the AG reservation functions to the individual feature accounting functions, which in future may have to do perag lookups to access per-AG state. Instead, pre-emptively plumb the perag through from the highest AG reservation layer to the feature callouts so they won't have to look it up again. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>	2021-06-02 10:48:24 +10:00
Dave Chinner	45d0662117	xfs: pass perags through to the busy extent code All of the callers of the busy extent API either have perag references available to use so we can pass a perag to the busy extent functions rather than having them have to do unnecessary lookups. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	7f8d3b3ca6	xfs: convert secondary superblock walk to use perags Clean up the last external manual AG walk. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	6f4118fc64	xfs: convert xfs_iwalk to use perag references Rather than manually walking the ags and passing agnunbers around, pass the perag for the AG we are currently working on around in the iwalk structure. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	934933c3ee	xfs: convert raw ag walks to use for_each_perag Convert the raw walks to an iterator, pulling the current AG out of pag->pag_agno instead of the loop iterator variable. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	f250eedcf7	xfs: make for_each_perag... a first class citizen for_each_perag_tag() is defined in xfs_icache.c for local use. Promote this to xfs_ag.h and define equivalent iteration functions so that we can use them to iterate AGs instead to replace open coded perag walks and perag lookups. We also convert as many of the straight forward open coded AG walks to use these iterators as possible. Anything that is not a direct conversion to an iterator is ignored and will be updated in future commits. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	07b6403a68	xfs: move perag structure and setup to libxfs/xfs_ag.[ch] Move the xfs_perag infrastructure to the libxfs files that contain all the per AG infrastructure. This helps set up for passing perags around all the code instead of bare agnos with minimal extra includes for existing files. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	61aa005a5b	xfs: prepare for moving perag definitions and support to libxfs The perag structures really need to be defined with the rest of the AG support infrastructure. The struct xfs_perag and init/teardown has been placed in xfs_mount.[ch] because there are differences in the structure between kernel and userspace. Mainly that userspace doesn't have a lot of the internal stuff that the kernel has for caches and discard and other such structures. However, it makes more sense to move this to libxfs than to keep this separation because we are now moving to use struct perags everywhere in the code instead of passing raw agnumber_t values about. Hence we shoudl really move the support infrastructure to libxfs/xfs_ag.[ch]. To do this without breaking userspace, first we need to rearrange the structures and code so that all the kernel specific code is located together. This makes it simple for userspace to ifdef out the all the parts it does not need, minimising the code differences between kernel and userspace. The next commit will do the move... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Dave Chinner	9bbafc7191	xfs: move xfs_perag_get/put to xfs_ag.[ch] They are AG functions, not superblock functions, so move them to the appropriate location. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-02 10:48:24 +10:00
Andreas Gruenbacher	d5b8145455	Revert "gfs2: Fix mmap locking for write faults" This reverts commit `b7f55d928e`. As explained by Linus in [], write faults on a mmap region are reads from a filesysten point of view, so taking the inode glock exclusively on write faults is incorrect. Instead, when a page is marked writable, the .page_mkwrite vm operation will be called, which is where the exclusive lock taking needs to happen. I got this wrong because of a broken test case that made me believe .page_mkwrite isn't getting called when it actually is. [] https://lore.kernel.org/lkml/CAHk-=wj8EWr_D65i4oRSj2FTbrc6RdNydNNCGxeabRnwtoU=3Q@mail.gmail.com/ Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-06-01 23:16:42 +02:00
Darrick J. Wong	20bd8e63f3	xfs: remove unnecessary shifts The superblock verifier already validates that (1 << blocklog) == blocksize, so use the value directly instead of doing math. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2021-06-01 12:53:59 -07:00
Darrick J. Wong	a7bcb147fe	xfs: clean up open-coded fs block unit conversions Replace some open-coded fs block unit conversions with the standard conversion macro. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>	2021-06-01 12:53:59 -07:00
Allison Henderson	4fd084dbbd	xfs: Clean up xfs_attr_node_addname_clear_incomplete We can use the helper function xfs_attr_node_remove_name to reduce duplicate code in this function Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-01 10:49:49 -07:00
Allison Henderson	0e6acf29db	xfs: Remove xfs_attr_rmtval_set This function is no longer used, so it is safe to remove Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>	2021-06-01 10:49:49 -07:00
Allison Henderson	8f502a4009	xfs: Add delay ready attr set routines This patch modifies the attr set routines to be delay ready. This means they no longer roll or commit transactions, but instead return -EAGAIN to have the calling routine roll and refresh the transaction. In this series, xfs_attr_set_args has become xfs_attr_set_iter, which uses a state machine like switch to keep track of where it was when EAGAIN was returned. See xfs_attr.h for a more detailed diagram of the states. Two new helper functions have been added: xfs_attr_rmtval_find_space and xfs_attr_rmtval_set_blk. They provide a subset of logic similar to xfs_attr_rmtval_set, but they store the current block in the delay attr context to allow the caller to roll the transaction between allocations. This helps to simplify and consolidate code used by xfs_attr_leaf_addname and xfs_attr_node_addname. xfs_attr_set_args has now become a simple loop to refresh the transaction until the operation is completed. Lastly, xfs_attr_rmtval_remove is no longer used, and is removed. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>	2021-06-01 10:49:48 -07:00
Allison Henderson	2b74b03c13	xfs: Add delay ready attr remove routines This patch modifies the attr remove routines to be delay ready. This means they no longer roll or commit transactions, but instead return -EAGAIN to have the calling routine roll and refresh the transaction. In this series, xfs_attr_remove_args is merged with xfs_attr_node_removename become a new function, xfs_attr_remove_iter. This new version uses a sort of state machine like switch to keep track of where it was when EAGAIN was returned. A new version of xfs_attr_remove_args consists of a simple loop to refresh the transaction until the operation is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the transaction where ever the existing code used to. Calls to xfs_attr_rmtval_remove are replaced with the delay ready version __xfs_attr_rmtval_remove. We will rename __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are done. xfs_attr_rmtval_remove itself is still in use by the set routines (used during a rename). For reasons of preserving existing function, we modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is set. Similar to how xfs_attr_remove_args does here. Once we transition the set routines to be delay ready, xfs_attr_rmtval_remove is no longer used and will be removed. This patch also adds a new struct xfs_delattr_context, which we will use to keep track of the current state of an attribute operation. The new xfs_delattr_state enum is used to track various operations that are in progress so that we know not to repeat them, and resume where we left off before EAGAIN was returned to cycle out the transaction. Other members take the place of local variables that need to retain their values across multiple function calls. See xfs_attr.h for a more detailed diagram of the states. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-01 10:49:47 -07:00
Allison Henderson	3f562d092b	xfs: Hoist node transaction handling This patch basically hoists the node transaction handling around the leaf code we just hoisted. This will helps setup this area for the state machine since the goto is easily replaced with a state since it ends with a transaction roll. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>	2021-06-01 10:49:46 -07:00
Allison Henderson	83c6e70789	xfs: Hoist xfs_attr_leaf_addname This patch hoists xfs_attr_leaf_addname into the calling function. The goal being to get all the code that will require state management into the same scope. This isn't particularly aesthetic right away, but it is a preliminary step to merging in the state machine code. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2021-06-01 10:49:45 -07:00
Allison Henderson	5d954cc09f	xfs: Hoist xfs_attr_node_addname This patch hoists the later half of xfs_attr_node_addname into the calling function. We do this because it is this area that will need the most state management, and we want to keep such code in the same scope as much as possible Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>	2021-06-01 10:49:45 -07:00
Allison Henderson	6ca5a4a1f5	xfs: Add helper xfs_attr_node_addname_find_attr This patch separates the first half of xfs_attr_node_addname into a helper function xfs_attr_node_addname_find_attr. It also replaces the restart goto with an EAGAIN return code driven by a loop in the calling function. This looks odd now, but will clean up nicly once we introduce the state machine. It will also enable hoisting the last state out of xfs_attr_node_addname with out having to plumb in a "done" parameter to know if we need to move to the next state or not. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>	2021-06-01 10:49:44 -07:00
Allison Henderson	f0f7c502c7	xfs: Separate xfs_attr_node_addname and xfs_attr_node_addname_clear_incomplete This patch separate xfs_attr_node_addname into two functions. This will help to make it easier to hoist parts of xfs_attr_node_addname that need state management Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-01 10:49:43 -07:00
Allison Henderson	6286514b63	xfs: Refactor xfs_attr_set_shortform This patch is actually the combination of patches from the previous version (v18). Initially patch 3 hoisted xfs_attr_set_shortform, and the next added the helper xfs_attr_set_fmt. xfs_attr_set_fmt is similar the old xfs_attr_set_shortform. It returns 0 when the attr has been set and no further action is needed. It returns -EAGAIN when shortform has been transformed to leaf, and the calling function should proceed the set the attr in leaf form. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>	2021-06-01 10:49:42 -07:00
Allison Henderson	a8490f699f	xfs: Add xfs_attr_node_remove_name This patch pulls a new helper function xfs_attr_node_remove_name out of xfs_attr_node_remove_step. This helps to modularize xfs_attr_node_remove_step which will help make the delayed attribute code easier to follow Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>	2021-06-01 10:48:41 -07:00
Allison Henderson	4126c06e25	xfs: Reverse apply `72b97ea40d` Originally we added this patch to help modularize the attr code in preparation for delayed attributes and the state machine it requires. However, later reviews found that this slightly alters the transaction handling as the helper function is ambiguous as to whether the transaction is diry or clean. This may cause a dirty transaction to be included in the next roll, where previously it had not. To preserve the existing code flow, we reverse apply this commit. Signed-off-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-01 10:48:19 -07:00
Dai Ngo	f8849e206e	NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error. Currently if __nfs4_proc_set_acl fails with NFS4ERR_BADOWNER it re-enables the idmapper by clearing NFS_CAP_UIDGID_NOMAP before retrying again. The NFS_CAP_UIDGID_NOMAP remains cleared even if the retry fails. This causes problem for subsequent setattr requests for v4 server that does not have idmapping configured. This patch modifies nfs4_proc_set_acl to detect NFS4ERR_BADOWNER and NFS4ERR_BADNAME and skips the retry, since the kernel isn't involved in encoding the ACEs, and return -EINVAL. Steps to reproduce the problem: # mount -o vers=4.1,sec=sys server:/export/test /tmp/mnt # touch /tmp/mnt/file1 # chown 99 /tmp/mnt/file1 # nfs4_setfacl -a A::unknown.user@xyz.com:wrtncy /tmp/mnt/file1 Failed setxattr operation: Invalid argument # chown 99 /tmp/mnt/file1 chown: changing ownership of ‘/tmp/mnt/file1’: Invalid argument # umount /tmp/mnt # mount -o vers=4.1,sec=sys server:/export/test /tmp/mnt # chown 99 /tmp/mnt/file1 # v2: detect NFS4ERR_BADOWNER and NFS4ERR_BADNAME and skip retry in nfs4_proc_set_acl. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-06-01 13:16:17 -04:00
Jiapeng Chong	492109333c	fs/jfs: Fix missing error code in lmLogInit() The error code is missing in this code scenario, add the error code '-EINVAL' to the return value 'rc. Eliminate the follow smatch warning: fs/jfs/jfs_logmgr.c:1327 lmLogInit() warn: missing error code 'rc'. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2021-06-01 10:29:12 -05:00
Christoph Hellwig	ab4b57057d	block: move bd_part_count to struct gendisk The bd_part_count value only makes sense for whole devices, so move it to struct gendisk and give it a more descriptive name. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210525061301.2242282-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-01 07:45:27 -06:00
Christoph Hellwig	c8276b954d	block: split __blkdev_put Split __blkdev_put into one helper for the whole device, and one for partitions as well as another shared helper for flushing the block device inode mapping. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210525061301.2242282-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-01 07:44:56 -06:00
Christoph Hellwig	e54069acac	block: move adjusting bd_part_count out of __blkdev_get Keep in the callers and thus remove the for_part argument. This mirrors what is done on the blkdev_get side and slightly simplifies blkdev_get_part as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@rehat.com> Link: https://lore.kernel.org/r/20210525061301.2242282-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-01 07:44:56 -06:00
Christoph Hellwig	a8698707a1	block: move bd_mutex to struct gendisk Replace the per-block device bd_mutex with a per-gendisk open_mutex, thus simplifying locking wherever we deal with partitions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Link: https://lore.kernel.org/r/20210525061301.2242282-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-01 07:44:32 -06:00
Christoph Hellwig	210a6d756f	block: move sync_blockdev from __blkdev_put to blkdev_put Do the early unlocked syncing even earlier to move more code out of the recursive path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210525061301.2242282-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-01 07:43:32 -06:00
Christoph Hellwig	362529d928	block: split __blkdev_get Split __blkdev_get into one helper for the whole device, and one for opening partitions. This removes the (bounded) recursion when opening a partition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20210525061301.2242282-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-06-01 07:43:32 -06:00
Christian Brauner	dd8b477f9a	mount: Support "nosymfollow" in new mount api Commit `dab741e0e0` ("Add a "nosymfollow" mount option.") added support for the "nosymfollow" mount option allowing to block following symlinks when resolving paths. The mount option so far was only available in the old mount api. Make it available in the new mount api as well. Bonus is that it can be applied to a whole subtree not just a single mount. Cc: Christoph Hellwig <hch@lst.de> Cc: Mattias Nissler <mnissler@chromium.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ross Zwisler <zwisler@google.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-06-01 12:09:27 +02:00
Dave Chinner	e7d236a6fe	xfs: move page freeing into _xfs_buf_free_pages() Rather than open coding it just before we call _xfs_buf_free_pages(). Also, rename the function to xfs_buf_free_pages() as the leading underscore has no useful meaning. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-01 13:40:36 +10:00
Dave Chinner	02c5117386	xfs: merge _xfs_buf_get_pages() Only called from one place now, so merge it into xfs_buf_alloc_pages(). Because page array allocation is dependent on bp->b_pages being null, always ensure that when the pages array is freed we always set bp->b_pages to null. Also convert the page array to use kmalloc() rather than kmem_alloc() so we can use the gfp flags we've already calculated for the allocation context instead of hard coding KM_NOFS semantics. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-01 13:40:36 +10:00
Dave Chinner	c9fa563072	xfs: use alloc_pages_bulk_array() for buffers Because it's more efficient than allocating pages one at a time in a loop. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-01 13:40:36 +10:00
Dave Chinner	07b5c5add4	xfs: use xfs_buf_alloc_pages for uncached buffers Use the newly factored out page allocation code. This adds automatic buffer zeroing for non-read uncached buffers. This also allows us to greatly simply the error handling in xfs_buf_get_uncached(). Because xfs_buf_alloc_pages() cleans up partial allocation failure, we can just call xfs_buf_free() in all error cases now to clean up after failures. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-01 13:40:35 +10:00
Dave Chinner	0a683794ac	xfs: split up xfs_buf_allocate_memory Based on a patch from Christoph Hellwig. This splits out the heap allocation and page allocation portions of the buffer memory allocation into two separate helper functions. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>	2021-06-01 13:40:02 +10:00
Linus Torvalds	c2131f7e73	Various gfs2 fixes -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmC0vkAUHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqCGQ/+JiCdfHQao3/W9KsIeA5YO5fbsQXi tElY61L4eM7F+gEe1mbMzr8sefbejv73aAMGWJJD06gLPz/wIPeW/fYnC4/gcQEn +jLjVb7taGaxOn0fioCqjU+esGW4wstYrAXLp6XZmLnMETmr7PCbOhohRG7sK1TX m8si6riMOiNw20MOHhUK9DFZ3rF4Q5Rp/vYaTDwoGpcORIv5bpPoQKYT1FMCTD9h 5qI6ldOO2E4d9qXQXiCv2RqXElYqQxwxqvGP0Hj+HQLZQBCmJJYZNDqRwDunJTaN K9++1/XbTCFKEQz0UWz1x1k5fCDIewbzxX348aQjiLMkkpXr885AGhasAX8gRS3p D7Y4q6VCY3J5JzlCDfNWrTBd0abLJAjeJ70R71/kN/hgIY2PbU/CaPcyhUrp7rwH B6spZDXb2fBNdfYA5wmuUdSA9BRmw/MDpiGd9aQc5nv25YvZ5Apl9X4QSH2250vo MKTrlt90EyTmOgF6vRf28apVr41JO3PIXgMu+svZq769Ox2jSZJQT0UI4vzVThoP RGBsTDPtDL67OvNoC6H7Poc7ad+BRqtFxkwNCz7kkcwQlYkmVPUf49UC/pBnBV3M HtlkJdlhD7VEWqUPl3T02rTdLRXLuPIGw9Kk6gKiDCikONoD+icJ3fV7rWShMjhD O/KT/r3XM1V1sP8= =vV+Y -----END PGP SIGNATURE----- Merge tag 'gfs2-v5.13-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "Various gfs2 fixes" * tag 'gfs2-v5.13-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Fix use-after-free in gfs2_glock_shrink_scan gfs2: Fix mmap locking for write faults gfs2: Clean up revokes on normal withdraws gfs2: fix a deadlock on withdraw-during-mount gfs2: fix scheduling while atomic bug in glocks gfs2: Fix I_NEW check in gfs2_dinode_in gfs2: Prevent direct-I/O write fallback errors from getting lost	2021-05-31 05:57:22 -10:00
Linus Torvalds	36c795513a	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmC0vhsACgkQnJ2qBz9k QNlI9ggAjZSqIvNNs1w6VafSRY7XP5vItKAe0jhguD0o1ZtUI1gM1JlOJzbgt2z5 gpm/4v4485h5JUXNrB5TeQ1woOOvFKzlUcIr+ZgUiyq2UgZj6PzvK599u2TFf1vc gLMAUx5YgWafr048orhcSBqaYic04LESQ17op+9UjgBB7ATbNjJmEBb/+WvGh9os 8c4V9JrCTMdNJ5Rpc5+JsWAksgZKrW9VjTw8mHisWB0NIIPQWGCML8Z4ACzNObCW CrXL9xWgaQDov1okJSA0ZNkdatGhh4h/NxIZ2sLGg2F3bDfZwN+kFu6gqpxhTEVV v83aTAP3UxbK8bwRj0+lm/LImxULjA== =t4P5 -----END PGP SIGNATURE----- Merge tag 'fsnotify_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fixes from Jan Kara: "A fix for permission checking with fanotify unpriviledged groups. Also there's a small update in MAINTAINERS file for fanotify" * tag 'fsnotify_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fanotify: fix permission model of unprivileged group MAINTAINERS: Add Matthew Bobrowski as a reviewer	2021-05-31 05:52:22 -10:00
Hillf Danton	1ab19c5de4	gfs2: Fix use-after-free in gfs2_glock_shrink_scan The GLF_LRU flag is checked under lru_lock in gfs2_glock_remove_from_lru() to remove the glock from the lru list in __gfs2_glock_put(). On the shrink scan path, the same flag is cleared under lru_lock but because of cond_resched_lock(&lru_lock) in gfs2_dispose_glock_lru(), progress on the put side can be made without deleting the glock from the lru list. Keep GLF_LRU across the race window opened by cond_resched_lock(&lru_lock) to ensure correct behavior on both sides - clear GLF_LRU after list_del under lru_lock. Reported-by: syzbot <syzbot+34ba7ddbf3021981a228@syzkaller.appspotmail.com> Signed-off-by: Hillf Danton <hdanton@sina.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-05-31 12:03:28 +02:00
Greg Kroah-Hartman	92722bac5f	Merge 5.13-rc4 into driver-core-next We need the driver core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-31 09:10:03 +02:00
Linus Torvalds	75b9c727af	Fixes for 5.13-rc4: - Fix a bug where unmapping operations end earlier than expected, which can cause chaos on multi-block directory and symlink shrink operations. - Fix an erroneous assert that can trigger if we try to transition a bmap structure from btree format to extents format with zero extents. This was exposed by xfs/538. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmCvxs0ACgkQ+H93GTRK tOsfAQ//fAtDZkjKYKHhWUFoyG6kYNsIZr7wf+kow8jJgeWUibwtUYYQQV/RCRtJ zR+Tiys9ZorAReYpzq69s1LbZg/Zz1GT4bPgq/9Icni9x8EXIS6MVaWJHjnkFtKD 7IqDztV3tC3XgSuAEsjey5PA1V1xpSgxxVtaT1Q2BcY8zqf2bnEPzM/rpKdmE++x jlTYrgLBctI24nbmTX2Y/+Te1UWXjM4QiV/EBHiUPedAqJZhwA0hU7hJJv/9I/EG /GOjisxhAonKR7fr7wPE+LwJMaxxK4LAt3aLZmGKpm3smSYX8O6sGnJv9VI/stsS wRD9c3wzLvfmqL5MXeAYq83u3s5DuFsfqmYD2U49xHlFF9tvLTT5S0Pdi/Qiq962 n3wabi0slBCdzeY3xXXr9M4cCLL6utYY8Vfi7KvBiDHdtCRZUU33/SAwxZzvhHQv 0XN+2sqnIn3jM9xg342+/BZbi4+SX7h28qixmgxCo+hez96GHuwhdN5GUVa5lF0r 4uRPn+VVaOJPcNRx69/iTkrJ1R4YPqedCkgLShs6lZX5Ct92UtANLzqmm0xCZ6U5 Pe7WjXO6aVAugEsVd9qnPdx4o/+sabs6CEQ65BbYyKvOBVg1XWH0yUEeKyGRYDt9 21ol7QSKJ6+HIg473j8A3omM5s2XKUT8NZMmRm4EojNI+j4O3R4= =y7+K -----END PGP SIGNATURE----- Merge tag 'xfs-5.13-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: "This week's pile mitigates some decades-old problems in how extent size hints interact with realtime volumes, fixes some failures in online shrink, and fixes a problem where directory and symlink shrinking on extremely fragmented filesystems could fail. The most user-notable change here is to point users at our (new) IRC channel on OFTC. Freedom isn't free, it costs folks like you and me; and if you don't kowtow, they'll expel everyone and take over your channel. (Ok, ok, that didn't fit the song lyrics...) Summary: - Fix a bug where unmapping operations end earlier than expected, which can cause chaos on multi-block directory and symlink shrink operations. - Fix an erroneous assert that can trigger if we try to transition a bmap structure from btree format to extents format with zero extents. This was exposed by xfs/538" * tag 'xfs-5.13-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: bunmapi has unnecessary AG lock ordering issues xfs: btree format inode forks can have zero extents xfs: add new IRC channel to MAINTAINERS xfs: validate extsz hints against rt extent size when rtinherit is set xfs: standardize extent size hint validation xfs: check free AG space when making per-AG reservations	2021-05-29 17:47:19 -10:00
Pavel Begunkov	216e583596	io_uring: fix misaccounting fix buf pinned pages As Andres reports "... io_sqe_buffer_register() doesn't initialize imu. io_buffer_account_pin() does imu->acct_pages++, before calling io_account_mem(ctx, imu->acct_pages).", leading to evevntual -ENOMEM. Initialise the field. Reported-by: Andres Freund <andres@anarazel.de> Fixes: `41edf1a5ec` ("io_uring: keep table of pointers to ubufs") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/438a6f46739ae5e05d9c75a0c8fa235320ff367c.1622285901.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-29 19:27:21 -06:00
Linus Torvalds	e1a9e3db3b	Driver core fixes for 5.13-rc4 Here are 3 small driver core / debugfs fixes for 5.13-rc4: - debugfs fix for incorrect "lockdown" mode for selinux accesses - 2 device link changes, one bugfix and one cleanup All of these have been in linux-next for over a week with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYLJMrQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynf8ACgvsZCX7Wi3GYtFovfomHsCRKpZBsAn0sqfSAL TXHePEnj2tJ5c22TSqSt =Zx6Z -----END PGP SIGNATURE----- Merge tag 'driver-core-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are three small driver core / debugfs fixes for 5.13-rc4: - debugfs fix for incorrect "lockdown" mode for selinux accesses - two device link changes, one bugfix and one cleanup All of these have been in linux-next for over a week with no reported problems" * tag 'driver-core-5.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: drivers: base: Reduce device link removal code duplication drivers: base: Fix device link removal debugfs: fix security_locked_down() call for SELinux	2021-05-29 06:33:28 -10:00
Linus Torvalds	b3dbbae609	io_uring-5.13-2021-05-28 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmCxY4wQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpqJnD/sEHg2ZVzc3CUtvLI11C+O4nkqzUpetOD8I iKtvCYKYNTATOPLGQjsznNTTVcUhN4Mud9XWHjyR3nli98fwRrzLuK3EfJjuq1cL v6DZVuYKq4k6s0QN6K8yTMslYBQTmk85l8rvXs06jVqDadnnVc+JdfWWBDducs0e 56Wtmlse18PhzfDjqtsjAOQBjpv4bhQaJTrYOHcEIqFiih2ZpSvyP3SLED7/nvoe Q8MNF0Htff/oVbUEzp/NfhHoOFIZ17wwPV3fRC7zat2Dp4R9ZxpScmozLn8PkdO9 DW+rKpuCbYTYwY1p11cQ5EhiNWNfPMxX4YXovUP9z+M2cgGUK1IhWQRM83L9bAXt r/9Md5WjnNpeDr6/YW6uMe1lOrrEy2ZJfNJ2JJbiXo6CWiz+g2qfHLOxwVsEnfoy vZoSbDD8ItZDooaXDFGEp1PLpkka4vt/6Ebg0fUtEeG8QQ48eG5L9xpPMSjm90y9 /UKZdS1pvSl/x6he+RDPg4aVGBWIhGJhv+Q22hNTO3g5u5QE+hXLvFh0QvoOkDQK FGlhIa431EiOdm3rdFCG2I4kH1QzQTO6XLHpoVabGXJULPvS2ztnHCz3pYqOU9w1 Mh12t1RtWzvcTkyOutfsjVqszV3kTl6O6GkI8CiqqjomnbbfORj6CDsi7h9RFZI+ HtnY2GbSJg== =dfLl -----END PGP SIGNATURE----- Merge tag 'io_uring-5.13-2021-05-28' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "A few minor fixes: - Fix an issue with hashed wait removal on exit (Zqiang, Pavel) - Fix a recent data race introduced in this series (Marco)" * tag 'io_uring-5.13-2021-05-28' of git://git.kernel.dk/linux-block: io_uring: fix data race to avoid potential NULL-deref io-wq: Fix UAF when wakeup wqe in hash waitqueue io_uring/io-wq: close io-wq full-stop gap	2021-05-28 14:35:55 -10:00
Linus Torvalds	7c0ec89d31	3 SMB3 fixes, two for stable, and the other fixes a problem pointed out with a recently added ioctl -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmCxbv4ACgkQiiy9cAdy T1GkPwwAqq0tvNDZnQ6aAur1jwHMiOIAydvpgNKXlKkiXYu+qQbMRgOdUsOtjM00 Idi8PHKkID2X63HeiwLwoQTfTXGcs6I4UM1iOslEs2ZaX+Fkgo5PG25lIFRTAsqR tqYGqGi6yL6TWE7PlVJxr3QuwGeMr7B8X9A0lTZ7YJwslhByK8ymasPdF+jSgPQI zDOuAeXiYvlph8sCftWX7gF34aBfKgiH8LhA6M2SY5S16g7LwtXUJjq1PJactoD3 +nEPyCtRoN6ohScKNVnM8JDpOKIrM+mJ42RG28ZLo6//8so0SFcUdC8VhECBxOWL 9WkoL2GxRV0LoRnzCZS30EpAi/eQU+QlTrPueGp+n8GjauJMDPoxJ2l6UXox6CLm 8CqwxKATG6WbrdcGhaVbIxVbAWC7Ze271C/7L61R5K+RmDTXc6jI4vAIw1Pib4o+ CG6XtxHya5PM0zvyLgU28M6aY+WExbwnkSQKvI2FJZkOVG0xdFCy2O1QLLRBChmn a6hsA05a =8nZ2 -----END PGP SIGNATURE----- Merge tag '5.13-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Three SMB3 fixes. Two for stable, and the other fixes a problem pointed out with a recently added ioctl" * tag '5.13-rc4-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: change format of CIFS_FULL_KEY_DUMP ioctl cifs: fix string declarations and assignments in tracepoints cifs: set server->cipher_type to AES-128-CCM for SMB3.0	2021-05-28 14:15:47 -10:00
Linus Torvalds	5ff2756afd	NFS client bugfixes for Linux 5.13 Highlights include: Stable fixes - Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config - Fix Oops in xs_tcp_send_request() when transport is disconnected - Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return() Bugfixes - Fix instances where signal_pending() should be fatal_signal_pending() - fix an incorrect limit in filelayout_decode_layout() - Fixes for the SUNRPC backlogged RPC queue - Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce() - Revert commit `586a0787ce` ("Clean up rpcrdma_prepare_readch()") -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmCvomgACgkQZwvnipYK APLg6xAAqlR/HLNYOLAToQ6d9wzrL6Po3x8Lx7VURjCkFaoKB/jMq3Zbu/K8mZ+X CC6/XFtB9AikloK7sle6lRPuwwPL6y+vOML0Ais/dkYPNkbhe9ylf0rsYQiPljXT 8PAcqn8FXTZ9fKpKU8Quw24X1Jfkk6zUEeMy50HYDBfTx+gYEojMEKa6cl4URGzO 2JpuBO4Ku/vWDOPj7bWBX9wi7wkrJjGSYDnx1A5SOgUdV87H8VJkbTo9vVdEwFoE OtE8MQmFhdton0u9+MKImFQdVfxoYLB1Ig1G45NXGHee91dwfYU0U05THj7E/xP9 RQWtmJcKdvY1w8sRK/PNEHo43Vkow4usffSrIWNBZ6aO5EkbQFn1tmKMSDtsrkZ2 ONMfKBiEhhQSy+QRXMR/RC86t4dsQ8SApu62qQT4VuuXqzYhrBum2DqkW0X6Zcti gi17+PfjRbgWNvul2yegBvDU016H324aCeT9nfWe0D9iwF7tPK4xsuNTYrWwbFOA YFAecIXoyBRtbIV6NZ95/+P5HEFBLAYewEVLpdAOBGQ9fjO023ERiC2sitl5P+ku v6V+4HAtBgcPfm/8BZwUYUYBpXnnTZFTizqdJdGGydPXeC671gANPJe6e4xOttCK frXFGd9OOqPSXdsRZLUVvhczTOFOGa/UVVG0GxIr4ggy8oKjDMk= =66ks -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "Stable fixes: - Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config - Fix Oops in xs_tcp_send_request() when transport is disconnected - Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return() Bugfixes: - Fix instances where signal_pending() should be fatal_signal_pending() - fix an incorrect limit in filelayout_decode_layout() - Fixes for the SUNRPC backlogged RPC queue - Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce() - Revert commit `586a0787ce` ("Clean up rpcrdma_prepare_readch()")" * tag 'nfs-for-5.13-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: Remove trailing semicolon in macros xprtrdma: Revert `586a0787ce` NFSv4: Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config NFS: Clean up reset of the mirror accounting variables NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce() NFS: Fix an Oopsable condition in __nfs_pageio_add_request() SUNRPC: More fixes for backlog congestion SUNRPC: Fix Oops in xs_tcp_send_request() when transport is disconnected NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return() SUNRPC in case of backlog, hand free slots directly to waiting task pNFS/NFSv4: Remove redundant initialization of 'rd_size' NFS: fix an incorrect limit in filelayout_decode_layout() fs/nfs: Use fatal_signal_pending instead of signal_pending	2021-05-28 08:53:19 -10:00
Christian Brauner	cfe80306a0	open: don't silently ignore unknown O-flags in openat2() The new openat2() syscall verifies that no unknown O-flag values are set and returns an error to userspace if they are while the older open syscalls like open() and openat() simply ignore unknown flag values: #define O_FLAG_CURRENTLY_INVALID (1 << 31) struct open_how how = { .flags = O_RDONLY \| O_FLAG_CURRENTLY_INVALID, .resolve = 0, }; /* fails / fd = openat2(-EBADF, "/dev/null", &how, sizeof(how)); / succeeds / fd = openat(-EBADF, "/dev/null", O_RDONLY \| O_FLAG_CURRENTLY_INVALID); However, openat2() silently truncates the upper 32 bits meaning: #define O_FLAG_CURRENTLY_INVALID_LOWER32 (1 << 31) #define O_FLAG_CURRENTLY_INVALID_UPPER32 (1 << 40) struct open_how how_lowe32 = { .flags = O_RDONLY \| O_FLAG_CURRENTLY_INVALID_LOWER32, }; struct open_how how_upper32 = { .flags = O_RDONLY \| O_FLAG_CURRENTLY_INVALID_UPPER32, }; / fails / fd = openat2(-EBADF, "/dev/null", &how_lower32, sizeof(how_lower32)); / succeeds / fd = openat2(-EBADF, "/dev/null", &how_upper32, sizeof(how_upper32)); Fix this by preventing the immediate truncation in build_open_flags(). There's a snafu here though stripping FMODE_ directly from flags would cause the upper 32 bits to be truncated as well due to integer promotion rules since FMODE_* is unsigned int, O_* are signed ints (yuck). In addition, struct open_flags currently defines flags to be 32 bit which is reasonable. If we simply were to bump it to 64 bit we would need to change a lot of code preemptively which doesn't seem worth it. So simply add a compile-time check verifying that all currently known O_* flags are within the 32 bit range and fail to build if they aren't anymore. This change shouldn't regress old open syscalls since they silently truncate any unknown values anyway. It is a tiny semantic change for openat2() but it is very unlikely people pass ing > 32 bit unknown flags and the syscall is relatively new too. Link: https://lore.kernel.org/r/20210528092417.3942079-3-brauner@kernel.org Cc: Christoph Hellwig <hch@lst.de> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Reported-by: Richard Guy Briggs <rgb@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Reviewed-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-05-28 17:44:37 +02:00
Filipe Manana	76a6d5cd74	btrfs: fix deadlock when cloning inline extents and low on available space There are a few cases where cloning an inline extent requires copying data into a page of the destination inode. For these cases we are allocating the required data and metadata space while holding a leaf locked. This can result in a deadlock when we are low on available space because allocating the space may flush delalloc and two deadlock scenarios can happen: 1) When starting writeback for an inode with a very small dirty range that fits in an inline extent, we deadlock during the writeback when trying to insert the inline extent, at cow_file_range_inline(), if the extent is going to be located in the leaf for which we are already holding a read lock; 2) After successfully starting writeback, for non-inline extent cases, the async reclaim thread will hang waiting for an ordered extent to complete if the ordered extent completion needs to modify the leaf for which the clone task is holding a read lock (for adding or replacing file extent items). So the cloning task will wait forever on the async reclaim thread to make progress, which in turn is waiting for the ordered extent completion which in turn is waiting to acquire a write lock on the same leaf. So fix this by making sure we release the path (and therefore the leaf) every time we need to copy the inline extent's data into a page of the destination inode, as by that time we do not need to have the leaf locked. Fixes: `05a5a7621c` ("Btrfs: implement full reflink support for inline extents") CC: stable@vger.kernel.org # 5.10+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-27 23:31:52 +02:00
Filipe Manana	ea7036de0d	btrfs: fix fsync failure and transaction abort after writes to prealloc extents When doing a series of partial writes to different ranges of preallocated extents with transaction commits and fsyncs in between, we can end up with a checksum items in a log tree. This causes an fsync to fail with -EIO and abort the transaction, turning the filesystem to RO mode, when syncing the log. For this to happen, we need to have a full fsync of a file following one or more fast fsyncs. The following example reproduces the problem and explains how it happens: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt # Create our test file with 2 preallocated extents. Leave a 1M hole # between them to ensure that we get two file extent items that will # never be merged into a single one. The extents are contiguous on disk, # which will later result in the checksums for their data to be merged # into a single checksum item in the csums btree. # $ xfs_io -f \ -c "falloc 0 1M" \ -c "falloc 3M 3M" \ /mnt/foobar # Now write to the second extent and leave only 1M of it as unwritten, # which corresponds to the file range [4M, 5M[. # # Then fsync the file to flush delalloc and to clear full sync flag from # the inode, so that a future fsync will use the fast code path. # # After the writeback triggered by the fsync we have 3 file extent items # that point to the second extent we previously allocated: # # 1) One file extent item of type BTRFS_FILE_EXTENT_REG that covers the # file range [3M, 4M[ # # 2) One file extent item of type BTRFS_FILE_EXTENT_PREALLOC that covers # the file range [4M, 5M[ # # 3) One file extent item of type BTRFS_FILE_EXTENT_REG that covers the # file range [5M, 6M[ # # All these file extent items have a generation of 6, which is the ID of # the transaction where they were created. The split of the original file # extent item is done at btrfs_mark_extent_written() when ordered extents # complete for the file ranges [3M, 4M[ and [5M, 6M[. # $ xfs_io -c "pwrite -S 0xab 3M 1M" \ -c "pwrite -S 0xef 5M 1M" \ -c "fsync" \ /mnt/foobar # Commit the current transaction. This wipes out the log tree created by # the previous fsync. sync # Now write to the unwritten range of the second extent we allocated, # corresponding to the file range [4M, 5M[, and fsync the file, which # triggers the fast fsync code path. # # The fast fsync code path sees that there is a new extent map covering # the file range [4M, 5M[ and therefore it will log a checksum item # covering the range [1M, 2M[ of the second extent we allocated. # # Also, after the fsync finishes we no longer have the 3 file extent # items that pointed to 3 sections of the second extent we allocated. # Instead we end up with a single file extent item pointing to the whole # extent, with a type of BTRFS_FILE_EXTENT_REG and a generation of 7 (the # current transaction ID). This is due to the file extent item merging we # do when completing ordered extents into ranges that point to unwritten # (preallocated) extents. This merging is done at # btrfs_mark_extent_written(). # $ xfs_io -c "pwrite -S 0xcd 4M 1M" \ -c "fsync" \ /mnt/foobar # Now do some write to our file outside the range of the second extent # that we allocated with fallocate() and truncate the file size from 6M # down to 5M. # # The truncate operation sets the full sync runtime flag on the inode, # forcing the next fsync to use the slow code path. It also changes the # length of the second file extent item so that it represents the file # range [3M, 5M[ and not the range [3M, 6M[ anymore. # # Finally fsync the file. Since this is a fsync that triggers the slow # code path, it will remove all items associated to the inode from the # log tree and then it will scan for file extent items in the # fs/subvolume tree that have a generation matching the current # transaction ID, which is 7. This means it will log 2 file extent # items: # # 1) One for the first extent we allocated, covering the file range # [0, 1M[ # # 2) Another for the first 2M of the second extent we allocated, # covering the file range [3M, 5M[ # # When logging the first file extent item we log a single checksum item # that has all the checksums for the entire extent. # # When logging the second file extent item, we also lookup for the # checksums that are associated with the range [0, 2M[ of the second # extent we allocated (file range [3M, 5M[), and then we log them with # btrfs_csum_file_blocks(). However that results in ending up with a log # that has two checksum items with ranges that overlap: # # 1) One for the range [1M, 2M[ of the second extent we allocated, # corresponding to the file range [4M, 5M[, which we logged in the # previous fsync that used the fast code path; # # 2) One for the ranges [0, 1M[ and [0, 2M[ of the first and second # extents, respectively, corresponding to the files ranges [0, 1M[ # and [3M, 5M[. This one was added during this last fsync that uses # the slow code path and overlaps with the previous one logged by # the previous fast fsync. # # This happens because when logging the checksums for the second # extent, we notice they start at an offset that matches the end of the # checksums item that we logged for the first extent, and because both # extents are contiguous on disk, btrfs_csum_file_blocks() decides to # extend that existing checksums item and append the checksums for the # second extent to this item. The end result is we end up with two # checksum items in the log tree that have overlapping ranges, as # listed before, resulting in the fsync to fail with -EIO and aborting # the transaction, turning the filesystem into RO mode. # $ xfs_io -c "pwrite -S 0xff 0 1M" \ -c "truncate 5M" \ -c "fsync" \ /mnt/foobar fsync: Input/output error After running the example, dmesg/syslog shows the tree checker complained about the checksum items with overlapping ranges and we aborted the transaction: $ dmesg (...) [756289.557487] BTRFS critical (device sdc): corrupt leaf: root=18446744073709551610 block=30720000 slot=5, csum end range (16777216) goes beyond the start range (15728640) of the next csum item [756289.560583] BTRFS info (device sdc): leaf 30720000 gen 7 total ptrs 7 free space 11677 owner 18446744073709551610 [756289.562435] BTRFS info (device sdc): refs 2 lock_owner 0 current 2303929 [756289.563654] item 0 key (257 1 0) itemoff 16123 itemsize 160 [756289.564649] inode generation 6 size 5242880 mode 100600 [756289.565636] item 1 key (257 12 256) itemoff 16107 itemsize 16 [756289.566694] item 2 key (257 108 0) itemoff 16054 itemsize 53 [756289.567725] extent data disk bytenr 13631488 nr 1048576 [756289.568697] extent data offset 0 nr 1048576 ram 1048576 [756289.569689] item 3 key (257 108 1048576) itemoff 16001 itemsize 53 [756289.570682] extent data disk bytenr 0 nr 0 [756289.571363] extent data offset 0 nr 2097152 ram 2097152 [756289.572213] item 4 key (257 108 3145728) itemoff 15948 itemsize 53 [756289.573246] extent data disk bytenr 14680064 nr 3145728 [756289.574121] extent data offset 0 nr 2097152 ram 3145728 [756289.574993] item 5 key (18446744073709551606 128 13631488) itemoff 12876 itemsize 3072 [756289.576113] item 6 key (18446744073709551606 128 15728640) itemoff 11852 itemsize 1024 [756289.577286] BTRFS error (device sdc): block=30720000 write time tree block corruption detected [756289.578644] ------------[ cut here ]------------ [756289.579376] WARNING: CPU: 0 PID: 2303929 at fs/btrfs/disk-io.c:465 csum_one_extent_buffer+0xed/0x100 [btrfs] [756289.580857] Modules linked in: btrfs dm_zero dm_dust loop dm_snapshot (...) [756289.591534] CPU: 0 PID: 2303929 Comm: xfs_io Tainted: G W 5.12.0-rc8-btrfs-next-87 #1 [756289.592580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [756289.594161] RIP: 0010:csum_one_extent_buffer+0xed/0x100 [btrfs] [756289.595122] Code: 5d c3 e8 76 60 (...) [756289.597509] RSP: 0018:ffffb51b416cb898 EFLAGS: 00010282 [756289.598142] RAX: 0000000000000000 RBX: fffff02b8a365bc0 RCX: 0000000000000000 [756289.598970] RDX: 0000000000000000 RSI: ffffffffa9112421 RDI: 00000000ffffffff [756289.599798] RBP: ffffa06500880000 R08: 0000000000000000 R09: 0000000000000000 [756289.600619] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 [756289.601456] R13: ffffa0652b1d8980 R14: ffffa06500880000 R15: 0000000000000000 [756289.602278] FS: 00007f08b23c9800(0000) GS:ffffa0682be00000(0000) knlGS:0000000000000000 [756289.603217] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [756289.603892] CR2: 00005652f32d0138 CR3: 000000025d616003 CR4: 0000000000370ef0 [756289.604725] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [756289.605563] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [756289.606400] Call Trace: [756289.606704] btree_csum_one_bio+0x244/0x2b0 [btrfs] [756289.607313] btrfs_submit_metadata_bio+0xb7/0x100 [btrfs] [756289.608040] submit_one_bio+0x61/0x70 [btrfs] [756289.608587] btree_write_cache_pages+0x587/0x610 [btrfs] [756289.609258] ? free_debug_processing+0x1d5/0x240 [756289.609812] ? __module_address+0x28/0xf0 [756289.610298] ? lock_acquire+0x1a0/0x3e0 [756289.610754] ? lock_acquired+0x19f/0x430 [756289.611220] ? lock_acquire+0x1a0/0x3e0 [756289.611675] do_writepages+0x43/0xf0 [756289.612101] ? __filemap_fdatawrite_range+0xa4/0x100 [756289.612800] __filemap_fdatawrite_range+0xc5/0x100 [756289.613393] btrfs_write_marked_extents+0x68/0x160 [btrfs] [756289.614085] btrfs_sync_log+0x21c/0xf20 [btrfs] [756289.614661] ? finish_wait+0x90/0x90 [756289.615096] ? __mutex_unlock_slowpath+0x45/0x2a0 [756289.615661] ? btrfs_log_inode_parent+0x3c9/0xdc0 [btrfs] [756289.616338] ? lock_acquire+0x1a0/0x3e0 [756289.616801] ? lock_acquired+0x19f/0x430 [756289.617284] ? lock_acquire+0x1a0/0x3e0 [756289.617750] ? lock_release+0x214/0x470 [756289.618221] ? lock_acquired+0x19f/0x430 [756289.618704] ? dput+0x20/0x4a0 [756289.619079] ? dput+0x20/0x4a0 [756289.619452] ? lockref_put_or_lock+0x9/0x30 [756289.619969] ? lock_release+0x214/0x470 [756289.620445] ? lock_release+0x214/0x470 [756289.620924] ? lock_release+0x214/0x470 [756289.621415] btrfs_sync_file+0x46a/0x5b0 [btrfs] [756289.621982] do_fsync+0x38/0x70 [756289.622395] __x64_sys_fsync+0x10/0x20 [756289.622907] do_syscall_64+0x33/0x80 [756289.623438] entry_SYSCALL_64_after_hwframe+0x44/0xae [756289.624063] RIP: 0033:0x7f08b27fbb7b [756289.624588] Code: 0f 05 48 3d 00 (...) [756289.626760] RSP: 002b:00007ffe2583f940 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [756289.627639] RAX: ffffffffffffffda RBX: 00005652f32cd0f0 RCX: 00007f08b27fbb7b [756289.628464] RDX: 00005652f32cbca0 RSI: 00005652f32cd110 RDI: 0000000000000003 [756289.629323] RBP: 00005652f32cd110 R08: 0000000000000000 R09: 00007f08b28c4be0 [756289.630172] R10: fffffffffffff39a R11: 0000000000000293 R12: 0000000000000001 [756289.631007] R13: 00005652f32cd0f0 R14: 0000000000000001 R15: 00005652f32cc480 [756289.631819] irq event stamp: 0 [756289.632188] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [756289.632911] hardirqs last disabled at (0): [<ffffffffa7e97c29>] copy_process+0x879/0x1cc0 [756289.633893] softirqs last enabled at (0): [<ffffffffa7e97c29>] copy_process+0x879/0x1cc0 [756289.634871] softirqs last disabled at (0): [<0000000000000000>] 0x0 [756289.635606] ---[ end trace 0a039fdc16ff3fef ]--- [756289.636179] BTRFS: error (device sdc) in btrfs_sync_log:3136: errno=-5 IO failure [756289.637082] BTRFS info (device sdc): forced readonly Having checksum items covering ranges that overlap is dangerous as in some cases it can lead to having extent ranges for which we miss checksums after log replay or getting the wrong checksum item. There were some fixes in the past for bugs that resulted in this problem, and were explained and fixed by the following commits: `27b9a8122f` ("Btrfs: fix csum tree corruption, duplicate and outdated checksums") `b84b8390d6` ("Btrfs: fix file read corruption after extent cloning and fsync") `40e046acbd` ("Btrfs: fix missing data checksums after replaying a log tree") `e289f03ea7` ("btrfs: fix corrupt log due to concurrent fsync of inodes with shared extents") Fix the issue by making btrfs_csum_file_blocks() taking into account the start offset of the next checksum item when it decides to extend an existing checksum item, so that it never extends the checksum to end at a range that goes beyond the start range of the next checksum item. When we can not access the next checksum item without releasing the path, simply drop the optimization of extending the previous checksum item and fallback to inserting a new checksum item - this happens rarely and the optimization is not significant enough for a log tree in order to justify the extra complexity, as it would only save a few bytes (the size of a struct btrfs_item) of leaf space. This behaviour is only needed when inserting into a log tree because for the regular checksums tree we never have a case where we try to insert a range of checksums that overlap with a range that was previously inserted. A test case for fstests will follow soon. Reported-by: Philipp Fent <fent@in.tum.de> Link: https://lore.kernel.org/linux-btrfs/93c4600e-5263-5cba-adf0-6f47526e7561@in.tum.de/ CC: stable@vger.kernel.org # 5.4+ Tested-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-27 23:31:36 +02:00
Josef Bacik	dc09ef3562	btrfs: abort in rename_exchange if we fail to insert the second ref Error injection stress uncovered a problem where we'd leave a dangling inode ref if we failed during a rename_exchange. This happens because we insert the inode ref for one side of the rename, and then for the other side. If this second inode ref insert fails we'll leave the first one dangling and leave a corrupt file system behind. Fix this by aborting if we did the insert for the first inode ref. CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-27 23:31:16 +02:00
Josef Bacik	f96d44743a	btrfs: check error value from btrfs_update_inode in tree log Error injection testing uncovered a case where we ended up with invalid link counts on an inode. This happened because we failed to notice an error when updating the inode while replaying the tree log, and committed the transaction with an invalid file system. Fix this by checking the return value of btrfs_update_inode. This resolved the link count errors I was seeing, and we already properly handle passing up the error values in these paths. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-27 23:31:13 +02:00
Josef Bacik	011b28acf9	btrfs: fixup error handling in fixup_inode_link_counts This function has the following pattern while (1) { ret = whatever(); if (ret) goto out; } ret = 0 out: return ret; However several places in this while loop we simply break; when there's a problem, thus clearing the return value, and in one case we do a return -EIO, and leak the memory for the path. Fix this by re-arranging the loop to deal with ret == 1 coming from btrfs_search_slot, and then simply delete the ret = 0; out: bit so everybody can break if there is an error, which will allow for proper error handling to occur. CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-27 23:31:08 +02:00
Josef Bacik	d61bec08b9	btrfs: mark ordered extent and inode with error if we fail to finish While doing error injection testing I saw that sometimes we'd get an abort that wouldn't stop the current transaction commit from completing. This abort was coming from finish ordered IO, but at this point in the transaction commit we should have gotten an error and stopped. It turns out the abort came from finish ordered io while trying to write out the free space cache. It occurred to me that any failure inside of finish_ordered_io isn't actually raised to the person doing the writing, so we could have any number of failures in this path and think the ordered extent completed successfully and the inode was fine. Fix this by marking the ordered extent with BTRFS_ORDERED_IOERR, and marking the mapping of the inode with mapping_set_error, so any callers that simply call fdatawait will also get the error. With this we're seeing the IO error on the free space inode when we fail to do the finish_ordered_io. CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-27 23:31:01 +02:00
Josef Bacik	856bd270dc	btrfs: return errors from btrfs_del_csums in cleanup_ref_head We are unconditionally returning 0 in cleanup_ref_head, despite the fact that btrfs_del_csums could fail. We need to return the error so the transaction gets aborted properly, fix this by returning ret from btrfs_del_csums in cleanup_ref_head. Reviewed-by: Qu Wenruo <wqu@suse.com> CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-27 23:30:55 +02:00
Josef Bacik	b86652be7c	btrfs: fix error handling in btrfs_del_csums Error injection stress would sometimes fail with checksums on disk that did not have a corresponding extent. This occurred because the pattern in btrfs_del_csums was while (1) { ret = btrfs_search_slot(); if (ret < 0) break; } ret = 0; out: btrfs_free_path(path); return ret; If we got an error from btrfs_search_slot we'd clear the error because we were breaking instead of goto out. Instead of using goto out, simply handle the cases where we may leave a random value in ret, and get rid of the ret = 0; out: pattern and simply allow break to have the proper error reporting. With this fix we properly abort the transaction and do not commit thinking we successfully deleted the csum. Reviewed-by: Qu Wenruo <wqu@suse.com> CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-27 23:30:49 +02:00
Qu Wenruo	4c80a97d7b	btrfs: fix compressed writes that cross stripe boundary [BUG] When running btrfs/027 with "-o compress" mount option, it always crashes with the following call trace: BTRFS critical (device dm-4): mapping failed logical 298901504 bio len 12288 len 8192 ------------[ cut here ]------------ kernel BUG at fs/btrfs/volumes.c:6651! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 5 PID: 31089 Comm: kworker/u24:10 Tainted: G OE 5.13.0-rc2-custom+ #26 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Workqueue: btrfs-delalloc btrfs_work_helper [btrfs] RIP: 0010:btrfs_map_bio.cold+0x58/0x5a [btrfs] Call Trace: btrfs_submit_compressed_write+0x2d7/0x470 [btrfs] submit_compressed_extents+0x3b0/0x470 [btrfs] ? mark_held_locks+0x49/0x70 btrfs_work_helper+0x131/0x3e0 [btrfs] process_one_work+0x28f/0x5d0 worker_thread+0x55/0x3c0 ? process_one_work+0x5d0/0x5d0 kthread+0x141/0x160 ? __kthread_bind_mask+0x60/0x60 ret_from_fork+0x22/0x30 ---[ end trace 63113a3a91f34e68 ]--- [CAUSE] The critical message before the crash means we have a bio at logical bytenr 298901504 length 12288, but only 8192 bytes can fit into one stripe, the remaining 4096 bytes go to another stripe. In btrfs, all bios are properly split to avoid cross stripe boundary, but commit `764c7c9a46` ("btrfs: zoned: fix parallel compressed writes") changed the behavior for compressed writes. Previously if we find our new page can't be fitted into current stripe, ie. "submit == 1" case, we submit current bio without adding current page. submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio, 0); page->mapping = NULL; if (submit \|\| bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) { But after the modification, we will add the page no matter if it crosses stripe boundary, leading to the above crash. submit = btrfs_bio_fits_in_stripe(page, PAGE_SIZE, bio, 0); if (pg_index == 0 && use_append) len = bio_add_zone_append_page(bio, page, PAGE_SIZE, 0); else len = bio_add_page(bio, page, PAGE_SIZE, 0); page->mapping = NULL; if (submit \|\| len < PAGE_SIZE) { [FIX] It's no longer possible to revert to the original code style as we have two different bio_add__page() calls now. The new fix is to skip the bio_add__page() call if @submit is true. Also to avoid @len to be uninitialized, always initialize it to zero. If @submit is true, @len will not be checked. If @submit is not true, @len will be the return value of bio_add_*_page() call. Either way, the behavior is still the same as the old code. Reported-by: Josef Bacik <josef@toxicpanda.com> Fixes: `764c7c9a46` ("btrfs: zoned: fix parallel compressed writes") Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-27 23:30:38 +02:00
Aurelien Aptel	1bb5681067	cifs: change format of CIFS_FULL_KEY_DUMP ioctl Make CIFS_FULL_KEY_DUMP ioctl able to return variable-length keys. * userspace needs to pass the struct size along with optional session_id and some space at the end to store keys * if there is enough space kernel returns keys in the extra space and sets the length of each key via xyz_key_length fields This also fixes the build error for get_user() on ARM. Sample program: #include <stdlib.h> #include <stdio.h> #include <stdint.h> #include <sys/fcntl.h> #include <sys/ioctl.h> struct smb3_full_key_debug_info { uint32_t in_size; uint64_t session_id; uint16_t cipher_type; uint8_t session_key_length; uint8_t server_in_key_length; uint8_t server_out_key_length; uint8_t data[]; /* * return this struct with the keys appended at the end: * uint8_t session_key[session_key_length]; * uint8_t server_in_key[server_in_key_length]; * uint8_t server_out_key[server_out_key_length]; / } __attribute__((packed)); #define CIFS_IOCTL_MAGIC 0xCF #define CIFS_DUMP_FULL_KEY _IOWR(CIFS_IOCTL_MAGIC, 10, struct smb3_full_key_debug_info) void dump(const void p, size_t len) { const char hex = "0123456789ABCDEF"; const uint8_t b = p; for (int i = 0; i < len; i++) printf("%c%c ", hex[(b[i]>>4)&0xf], hex[b[i]&0xf]); putchar('\n'); } int main(int argc, char *argv) { struct smb3_full_key_debug_info keys; uint8_t buf[sizeof(keys)+1024] = {0}; size_t off = 0; int fd, rc; keys = (struct smb3_full_key_debug_info )&buf; keys->in_size = sizeof(buf); fd = open(argv[1], O_RDONLY); if (fd < 0) perror("open"), exit(1); rc = ioctl(fd, CIFS_DUMP_FULL_KEY, keys); if (rc < 0) perror("ioctl"), exit(1); printf("SessionId "); dump(&keys->session_id, 8); printf("Cipher %04x\n", keys->cipher_type); printf("SessionKey "); dump(keys->data+off, keys->session_key_length); off += keys->session_key_length; printf("ServerIn Key "); dump(keys->data+off, keys->server_in_key_length); off += keys->server_in_key_length; printf("ServerOut Key "); dump(keys->data+off, keys->server_out_key_length); return 0; } Usage: $ gcc -o dumpkeys dumpkeys.c Against Windows Server 2020 preview (with AES-256-GCM support): # mount.cifs //$ip/test /mnt -o "username=administrator,password=foo,vers=3.0,seal" # ./dumpkeys /mnt/somefile SessionId 0D 00 00 00 00 0C 00 00 Cipher 0002 SessionKey AB CD CC 0D E4 15 05 0C 6F 3C 92 90 19 F3 0D 25 ServerIn Key 73 C6 6A C8 6B 08 CF A2 CB 8E A5 7D 10 D1 5B DC ServerOut Key 6D 7E 2B A1 71 9D D7 2B 94 7B BA C4 F0 A5 A4 F8 # umount /mnt With 256 bit keys: # echo 1 > /sys/module/cifs/parameters/require_gcm_256 # mount.cifs //$ip/test /mnt -o "username=administrator,password=foo,vers=3.11,seal" # ./dumpkeys /mnt/somefile SessionId 09 00 00 00 00 0C 00 00 Cipher 0004 SessionKey 93 F5 82 3B 2F B7 2A 50 0B B9 BA 26 FB 8C 8B 03 ServerIn Key 6C 6A 89 B2 CB 7B 78 E8 04 93 37 DA 22 53 47 DF B3 2C 5F 02 26 70 43 DB 8D 33 7B DC 66 D3 75 A9 ServerOut Key 04 11 AA D7 52 C7 A8 0F ED E3 93 3A 65 FE 03 AD 3F 63 03 01 2B C0 1B D7 D7 E5 52 19 7F CC 46 B4 Signed-off-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-05-27 15:26:32 -05:00
Shyam Prasad N	eb06881805	cifs: fix string declarations and assignments in tracepoints We missed using the variable length string macros in several tracepoints. Fixed them in this change. There's probably more useful macros that we can use to print others like flags etc. But I'll submit sepawrate patches for those at a future date. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Cc: <stable@vger.kernel.org> # v5.12 Signed-off-by: Steve French <stfrench@microsoft.com>	2021-05-27 14:04:32 -05:00
Aurelien Aptel	6d2fcfe6b5	cifs: set server->cipher_type to AES-128-CCM for SMB3.0 SMB3.0 doesn't have encryption negotiate context but simply uses the SMB2_GLOBAL_CAP_ENCRYPTION flag. When that flag is present in the neg response cifs.ko uses AES-128-CCM which is the only cipher available in this context. cipher_type was set to the server cipher only when parsing encryption negotiate context (SMB3.1.1). For SMB3.0 it was set to 0. This means cipher_type value can be 0 or 1 for AES-128-CCM. Fix this by checking for SMB3.0 and encryption capability and setting cipher_type appropriately. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-05-27 14:03:47 -05:00
David Howells	f610a5a29c	afs: Fix the nlink handling of dir-over-dir rename Fix rename of one directory over another such that the nlink on the deleted directory is cleared to 0 rather than being decremented to 1. This was causing the generic/035 xfstest to fail. Fixes: `e49c7b2f6d` ("afs: Build an abstraction around an "operation" concept") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> cc: linux-afs@lists.infradead.org Link: https://lore.kernel.org/r/162194384460.3999479.7605572278074191079.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-27 06:23:58 -10:00
Dave Chinner	0fe0bbe00a	xfs: bunmapi has unnecessary AG lock ordering issues large directory block size operations are assert failing because xfs_bunmapi() is not completely removing fragmented directory blocks like so: XFS: Assertion failed: done, file: fs/xfs/libxfs/xfs_dir2.c, line: 677 .... Call Trace: xfs_dir2_shrink_inode+0x1a8/0x210 xfs_dir2_block_to_sf+0x2ae/0x410 xfs_dir2_block_removename+0x21a/0x280 xfs_dir_removename+0x195/0x1d0 xfs_rename+0xb79/0xc50 ? avc_has_perm+0x8d/0x1a0 ? avc_has_perm_noaudit+0x9a/0x120 xfs_vn_rename+0xdb/0x150 vfs_rename+0x719/0xb50 ? __lookup_hash+0x6a/0xa0 do_renameat2+0x413/0x5e0 __x64_sys_rename+0x45/0x50 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae We are aborting the bunmapi() pass because of this specific chunk of code: /* * Make sure we don't touch multiple AGF headers out of order * in a single transaction, as that could cause AB-BA deadlocks. */ if (!wasdel && !isrt) { agno = XFS_FSB_TO_AGNO(mp, del.br_startblock); if (prev_agno != NULLAGNUMBER && prev_agno > agno) break; prev_agno = agno; } This is designed to prevent deadlocks in AGF locking when freeing multiple extents by ensuring that we only ever lock in increasing AG number order. Unfortunately, this also violates the "bunmapi will always succeed" semantic that some high level callers depend on, such as xfs_dir2_shrink_inode(), xfs_da_shrink_inode() and xfs_inactive_symlink_rmt(). This AG lock ordering was introduced back in 2017 to fix deadlocks triggered by generic/299 as reported here: https://lore.kernel.org/linux-xfs/800468eb-3ded-9166-20a4-047de8018582@gmail.com/ This codebase is old enough that it was before we were defering all AG based extent freeing from within xfs_bunmapi(). THat is, we never actually lock AGs in xfs_bunmapi() any more - every non-rt based extent free is added to the defer ops list, as is all BMBT block freeing. And RT extents are not RT based, so there's no lock ordering issues associated with them. Hence this AGF lock ordering code is both broken and dead. Let's just remove it so that the large directory block code works reliably again. Tested against xfs/538 and generic/299 which is the original test that exposed the deadlocks that this code fixed. Fixes: `5b094d6dac` ("xfs: fix multi-AG deadlock in xfs_bunmapi") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-05-27 08:11:24 -07:00
Dave Chinner	991c2c5980	xfs: btree format inode forks can have zero extents xfs/538 is assert failing with this trace when testing with directory block sizes of 64kB: XFS: Assertion failed: !xfs_need_iread_extents(ifp), file: fs/xfs/libxfs/xfs_bmap.c, line: 608 .... Call Trace: xfs_bmap_btree_to_extents+0x2a9/0x470 ? kmem_cache_alloc+0xe7/0x220 __xfs_bunmapi+0x4ca/0xdf0 xfs_bunmapi+0x1a/0x30 xfs_dir2_shrink_inode+0x71/0x210 xfs_dir2_block_to_sf+0x2ae/0x410 xfs_dir2_block_removename+0x21a/0x280 xfs_dir_removename+0x195/0x1d0 xfs_remove+0x244/0x460 xfs_vn_unlink+0x53/0xa0 ? selinux_inode_unlink+0x13/0x20 vfs_unlink+0x117/0x220 do_unlinkat+0x1a2/0x2d0 __x64_sys_unlink+0x42/0x60 do_syscall_64+0x3a/0x70 entry_SYSCALL_64_after_hwframe+0x44/0xae This is a check to ensure that the extents have been read into memory before we are doing a ifork btree manipulation. This assert is bogus in the above case. We have a fragmented directory block that has more extents in it than can fit in extent format, so the inode data fork is in btree format. xfs_dir2_shrink_inode() asks to remove all remaining 16 filesystem blocks from the inode so it can convert to short form, and __xfs_bunmapi() removes all the extents. We now have a data fork in btree format but have zero extents in the fork. This incorrectly trips the xfs_need_iread_extents() assert because it assumes that an empty extent btree means the extent tree has not been read into memory yet. This is clearly not the case with xfs_bunmapi(), as it has an explicit call to xfs_iread_extents() in it to pull the extents into memory before it starts unmapping. Also, the assert directly after this bogus one is: ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE); Which covers the context in which it is legal to call xfs_bmap_btree_to_extents just fine. Hence we should just remove the bogus assert as it is clearly wrong and causes a regression. The returns the test behaviour to the pre-existing assert failure in xfs_dir2_shrink_inode() that indicates xfs_bunmapi() has failed to remove all the extents in the range it was asked to unmap. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>	2021-05-27 08:11:24 -07:00
Marco Elver	b16ef427ad	io_uring: fix data race to avoid potential NULL-deref Commit `ba5ef6dc8a` ("io_uring: fortify tctx/io_wq cleanup") introduced setting tctx->io_wq to NULL a bit earlier. This has caused KCSAN to detect a data race between accesses to tctx->io_wq: write to 0xffff88811d8df330 of 8 bytes by task 3709 on cpu 1: io_uring_clean_tctx fs/io_uring.c:9042 [inline] __io_uring_cancel fs/io_uring.c:9136 io_uring_files_cancel include/linux/io_uring.h:16 [inline] do_exit kernel/exit.c:781 do_group_exit kernel/exit.c:923 get_signal kernel/signal.c:2835 arch_do_signal_or_restart arch/x86/kernel/signal.c:789 handle_signal_work kernel/entry/common.c:147 [inline] exit_to_user_mode_loop kernel/entry/common.c:171 [inline] ... read to 0xffff88811d8df330 of 8 bytes by task 6412 on cpu 0: io_uring_try_cancel_iowq fs/io_uring.c:8911 [inline] io_uring_try_cancel_requests fs/io_uring.c:8933 io_ring_exit_work fs/io_uring.c:8736 process_one_work kernel/workqueue.c:2276 ... With the config used, KCSAN only reports data races with value changes: this implies that in the case here we also know that tctx->io_wq was non-NULL. Therefore, depending on interleaving, we may end up with: [CPU 0] \| [CPU 1] io_uring_try_cancel_iowq() \| io_uring_clean_tctx() if (!tctx->io_wq) // false \| ... ... \| tctx->io_wq = NULL io_wq_cancel_cb(tctx->io_wq, ...) \| ... -> NULL-deref \| Note: It is likely that thus far we've gotten lucky and the compiler optimizes the double-read into a single read into a register -- but this is never guaranteed, and can easily change with a different config! Fix the data race by restoring the previous behaviour, where both setting io_wq to NULL and put of the wq are _serialized_ after concurrent io_uring_try_cancel_iowq() via acquisition of the uring_lock and removal of the node in io_uring_del_task_file(). Fixes: `ba5ef6dc8a` ("io_uring: fortify tctx/io_wq cleanup") Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Reported-by: syzbot+bf2b3d0435b9b728946c@syzkaller.appspotmail.com Signed-off-by: Marco Elver <elver@google.com> Cc: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20210527092547.2656514-1-elver@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-27 07:44:49 -06:00
Huilong Deng	a799b68a7c	nfs: Remove trailing semicolon in macros Macros should not use a trailing semicolon. Signed-off-by: Huilong Deng <denghuilong@cdjrlc.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-05-27 09:19:33 -04:00
Zhang Xiaoxu	e67afa7ee4	NFSv4: Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config Since commit `bdcc2cd14e` ("NFSv4.2: handle NFS-specific llseek errors"), nfs42_proc_llseek would return -EOPNOTSUPP rather than -ENOTSUPP when SEEK_DATA on NFSv4.0/v4.1. This will lead xfstests generic/285 not run on NFSv4.0/v4.1 when set the CONFIG_NFS_V4_2, rather than run failed. Fixes: `bdcc2cd14e` ("NFSv4.2: handle NFS-specific llseek errors") Cc: <stable.vger.kernel.org> # 4.2 Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-05-27 08:46:19 -04:00
Gustavo A. R. Silva	53004ee78d	xfs: Fix fall-through warnings for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix the following warnings by replacing /* fall through / comments, and its variants, with the new pseudo-keyword macro fallthrough: fs/xfs/libxfs/xfs_alloc.c:3167:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_da_btree.c:286:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_ag_resv.c:346:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/libxfs/xfs_ag_resv.c:388:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_bmap_util.c:246:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_export.c:88:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_export.c:96:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_file.c:867:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_ioctl.c:562:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_ioctl.c:1548:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_iomap.c:1040:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_inode.c:852:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_log.c:2627:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/xfs_trans_buf.c:298:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/bmap.c:275:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/btree.c:48:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/common.c:85:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/common.c:138:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/common.c:698:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/dabtree.c:51:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/repair.c:951:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] fs/xfs/scrub/agheader.c:89:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough] Notice that Clang doesn't recognize / fall through */ comments as implicit fall-through markings, so in order to globally enable -Wimplicit-fallthrough for Clang, these comments need to be replaced with fallthrough; in the whole codebase. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2021-05-26 14:51:26 -05:00
Zqiang	3743c1723b	io-wq: Fix UAF when wakeup wqe in hash waitqueue BUG: KASAN: use-after-free in __wake_up_common+0x637/0x650 Read of size 8 at addr ffff8880304250d8 by task iou-wrk-28796/28802 Call Trace: __dump_stack [inline] dump_stack+0x141/0x1d7 print_address_description.constprop.0.cold+0x5b/0x2c6 __kasan_report [inline] kasan_report.cold+0x7c/0xd8 __wake_up_common+0x637/0x650 __wake_up_common_lock+0xd0/0x130 io_worker_handle_work+0x9dd/0x1790 io_wqe_worker+0xb2a/0xd40 ret_from_fork+0x1f/0x30 Allocated by task 28798: kzalloc_node [inline] io_wq_create+0x3c4/0xdd0 io_init_wq_offload [inline] io_uring_alloc_task_context+0x1bf/0x6b0 __io_uring_add_task_file+0x29a/0x3c0 io_uring_add_task_file [inline] io_uring_install_fd [inline] io_uring_create [inline] io_uring_setup+0x209a/0x2bd0 do_syscall_64+0x3a/0xb0 entry_SYSCALL_64_after_hwframe+0x44/0xae Freed by task 28798: kfree+0x106/0x2c0 io_wq_destroy+0x182/0x380 io_wq_put [inline] io_wq_put_and_exit+0x7a/0xa0 io_uring_clean_tctx [inline] __io_uring_cancel+0x428/0x530 io_uring_files_cancel do_exit+0x299/0x2a60 do_group_exit+0x125/0x310 get_signal+0x47f/0x2150 arch_do_signal_or_restart+0x2a8/0x1eb0 handle_signal_work[inline] exit_to_user_mode_loop [inline] exit_to_user_mode_prepare+0x171/0x280 __syscall_exit_to_user_mode_work [inline] syscall_exit_to_user_mode+0x19/0x60 do_syscall_64+0x47/0xb0 entry_SYSCALL_64_after_hwframe There are the following scenarios, hash waitqueue is shared by io-wq1 and io-wq2. (note: wqe is worker) io-wq1:worker2 \| locks bit1 io-wq2:worker1 \| waits bit1 io-wq1:worker3 \| waits bit1 io-wq1:worker2 \| completes all wqe bit1 work items io-wq1:worker2 \| drop bit1, exit io-wq2:worker1 \| locks bit1 io-wq1:worker3 \| can not locks bit1, waits bit1 and exit io-wq1 \| exit and free io-wq1 io-wq2:worker1 \| drops bit1 io-wq1:worker3 \| be waked up, even though wqe is freed After all iou-wrk belonging to io-wq1 have exited, remove wqe form hash waitqueue, it is guaranteed that there will be no more wqe belonging to io-wq1 in the hash waitqueue. Reported-by: syzbot+6cb11ade52aa17095297@syzkaller.appspotmail.com Signed-off-by: Zqiang <qiang.zhang@windriver.com> Link: https://lore.kernel.org/r/20210526050826.30500-1-qiang.zhang@windriver.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-26 09:03:56 -06:00
Colin Ian King	7d3848c03e	fs: dlm: Fix spelling mistake "stucked" -> "stuck" There are spelling mistake in log messages. Fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-26 09:49:04 -05:00
Colin Ian King	f6089981d0	fs: dlm: Fix memory leak of object mh There is an error return path that is not kfree'ing mh after it has been successfully allocates. Fix this by moving the call to create_rcom to after the check on rc_in->rc_id check to avoid this. Thanks to Alexander Ahring Oder Aring for suggesting the correct way to fix this. Addresses-Coverity: ("Resource leak") Fixes: `a070a91cf1` ("fs: dlm: add more midcomms hooks") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-26 09:49:04 -05:00
Chao Yu	8939a8489c	f2fs: atgc: export entries for better tunability via sysfs This patch export below sysfs entries for better ATGC tunability. /sys/fs/f2fs/<disk>/atgc_candidate_ratio /sys/fs/f2fs/<disk>/atgc_candidate_count /sys/fs/f2fs/<disk>/atgc_age_weight /sys/fs/f2fs/<disk>/atgc_age_threshold Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-26 07:00:36 -07:00
Chao Yu	4a67d9b07a	f2fs: compress: fix to disallow temp extension This patch restricts to configure compress extension as format of: [filename + '.' + extension] rather than: [filename + '.' + extension + (optional: '.' + temp extension)] in order to avoid to enable compression incorrectly: 1. compress_extension=so 2. touch file.soa 3. touch file.so.tmp Fixes: `4c8ff7095b` ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-26 07:00:36 -07:00
Jaegeuk Kim	e3c548323d	f2fs: let's allow compression for mmap files This patch allows to compress mmap files. E.g., for so files. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-26 07:00:36 -07:00
Chao Yu	0dd571785d	f2fs: add MODULE_SOFTDEP to ensure crc32 is included in the initramfs As marcosfrm reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=213089 Initramfs generators rely on "pre" softdeps (and "depends") to include additional required modules. F2FS does not declare "pre: crc32" softdep. Then every generator (dracut, mkinitcpio...) has to maintain a hardcoded list for this purpose. Hence let's use MODULE_SOFTDEP("pre: crc32") in f2fs code. Fixes: `43b6573bac` ("f2fs: use cryptoapi crc32 functions") Reported-by: marcosfrm <marcosfrm@gmail.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-26 07:00:36 -07:00
Tom Rix	4f55dc2a98	f2fs: return success if there is no work to do Static analysis reports this problem file.c:3206:2: warning: Undefined or garbage value returned to caller return err; ^~~~~~~~~~ err is only set if there is some work to do. Because the loop returns immediately on an error, if all the work was done, a 0 would be returned. Instead of checking the unlikely case that there was no work to do, change the return of err to 0. Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-26 07:00:35 -07:00
Trond Myklebust	70536bf4eb	NFS: Clean up reset of the mirror accounting variables Now that nfs_pageio_do_add_request() resets the pg_count, we don't need these other inlined resets. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-05-26 06:36:13 -04:00
Trond Myklebust	0d0ea30935	NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce() The value of mirror->pg_bytes_written should only be updated after a successful attempt to flush out the requests on the list. Fixes: `a7d42ddb30` ("nfs: add mirroring support to pgio layer") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-05-26 06:36:13 -04:00
Trond Myklebust	56517ab958	NFS: Fix an Oopsable condition in __nfs_pageio_add_request() Ensure that nfs_pageio_error_cleanup() resets the mirror array contents, so that the structure reflects the fact that it is now empty. Also change the test in nfs_pageio_do_add_request() to be more robust by checking whether or not the list is empty rather than relying on the value of pg_count. Fixes: `a7d42ddb30` ("nfs: add mirroring support to pgio layer") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-05-26 06:36:13 -04:00
Pavel Begunkov	17a91051fe	io_uring/io-wq: close io-wq full-stop gap There is an old problem with io-wq cancellation where requests should be killed and are in io-wq but are not discoverable, e.g. in @next_hashed or @linked vars of io_worker_handle_work(). It adds some unreliability to individual request canellation, but also may potentially get __io_uring_cancel() stuck. For instance: 1) An __io_uring_cancel()'s cancellation round have not found any request but there are some as desribed. 2) __io_uring_cancel() goes to sleep 3) Then workers wake up and try to execute those hidden requests that happen to be unbound. As we already cancel all requests of io-wq there, set IO_WQ_BIT_EXIT in advance, so preventing 3) from executing unbound requests. The workers will initially break looping because of getting a signal as they are threads of the dying/exec()'ing user task. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/abfcf8c54cb9e8f7bfbad7e9a0cc5433cc70bdc2.1621781238.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-25 19:39:58 -06:00
Dai Ngo	f4e44b3933	NFSD: delay unmount source's export after inter-server copy completed. Currently the source's export is mounted and unmounted on every inter-server copy operation. This patch is an enhancement to delay the unmount of the source export for a certain period of time to eliminate the mount and unmount overhead on subsequent copy operations. After a copy operation completes, a work entry is added to the delayed unmount list with an expiration time. This list is serviced by the laundromat thread to unmount the export of the expired entries. Each time the export is being used again, its expiration time is extended and the entry is re-inserted to the tail of the list. The unmount task and the mount operation of the copy request are synced to make sure the export is not unmounted while it's being used. Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-25 17:06:51 -04:00
Olga Kornievskaia	eac0b17a77	NFSD add vfs_fsync after async copy is done Currently, the server does all copies as NFS_UNSTABLE. For synchronous copies linux client will append a COMMIT to the COPY compound but for async copies it does not (because COMMIT needs to be done after all bytes are copied and not as a reply to the COPY operation). However, in order to save the client doing a COMMIT as a separate rpc, the server can reply back with NFS_FILE_SYNC copy. This patch proposed to add vfs_fsync() call at the end of the async copy. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-25 17:06:51 -04:00
J. Bruce Fields	eeeadbb9bd	nfsd: move some commit_metadata()s outside the inode lock The commit may be time-consuming and there's no need to hold the lock for it. More of these are possible, these were just some easy ones. Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-25 17:06:51 -04:00
Yu Hsiang Huang	e5d74a2d0e	nfsd: Prevent truncation of an unlinked inode from blocking access to its directory Truncation of an unlinked inode may take a long time for I/O waiting, and it doesn't have to prevent access to the directory. Thus, let truncation occur outside the directory's mutex, just like do_unlinkat() does. Signed-off-by: Yu Hsiang Huang <nickhuang@synology.com> Signed-off-by: Bing Jing Chang <bingjingc@synology.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-25 17:06:51 -04:00
Kees Cook	bfb819ea20	proc: Check /proc/$pid/attr/ writes against file opener Fix another "confused deputy" weakness[1]. Writes to /proc/$pid/attr/ files need to check the opener credentials, since these fds do not transition state across execve(). Without this, it is possible to trick another process (which may have different credentials) to write to its own /proc/$pid/attr/ files, leading to unexpected and possibly exploitable behaviors. [1] https://www.kernel.org/doc/html/latest/security/credentials.html?highlight=confused#open-file-credentials Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-25 10:24:41 -10:00
Linus Torvalds	ad9f25d338	netfslib fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAmCs8qYACgkQ+7dXa6fL C2s3QhAAkRSWoZQEmisGMcRKOhDJQh8Qc7lV4aKXFIa4EaLdeBPvhJhnJqMld2KY m35g4bSU/RsUjzSCLXVnEiHa9jdFKyK0C/XWshyidzrTDUk0HN6NXsBpp3ztWKlq iMOvQYnKWKoWr4seIdC1fAKSFcQ3uRlVnDnmm0GtB5ahu5ThNQtqf8nYMSuULZbo K9SybNUVCrDsORqDu2595gfK63MCOVn72Hj066s8owHrbD8Io52Kf6Q7jP1CkMGL x6Kl0pwjql6usUsaDEaqmNT3ck7UjlLp5h1EZnt/7SWbgInpNzk6BLP33DwCis+4 rUpu+Zf8TEeOYDU5if8QpVszwsMyoKtkp9AjgjZkvxbedCqHkXJjxrnkk6/H7yJc 4Zvi8sIU52D9PcZO0bD8zP/8eYm/ZTVjMjDt8PvIbTA583oGNWsfRBbvJYi1huxB i3G0PNVbqH0U3Z78XH4dmrkE1oMxbq2O5fg9ZNCuStxqD2vrZyyo/CcfidElLnCq vcT+obEI+NYFphMzk7rwSL4pH4OPwPziJfiudKANmUDei8rOejQ8nrw18CVF7neN Ewj1XiHOdi4JgGq92owpmCmTvle7GG9KNuCvfd4U67S9KOJAPT5UrSD696PrJJN7 YpcBHJMqS9XLXwrGuKD7oDroxEEpvJEunRH+yt3YPa5OQtX3wIA= =poNo -----END PGP SIGNATURE----- Merge tag 'netfs-lib-fixes-20200525' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull netfs fixes from David Howells: "A couple of fixes to the new netfs lib: - Pass the AOP flags through from netfs_write_begin() into grab_cache_page_write_begin(). - Automatically enable in Kconfig netfs lib rather than presenting an option for manual enablement" * tag 'netfs-lib-fixes-20200525' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: netfs: Make CONFIG_NETFS_SUPPORT auto-selected rather than manual netfs: Pass flags through to grab_cache_page_write_begin()	2021-05-25 07:31:49 -10:00
Gustavo A. R. Silva	b2db6c35ba	afs: Fix fall-through warnings for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple warnings by explicitly adding multiple fallthrough pseudo-keywords in places where the code is intended to fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeffrey Altman <jaltman@auristor.com> cc: linux-afs@lists.infradead.org cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/r/51150b54e0b0431a2c401cd54f2c4e7f50e94601.1605896059.git.gustavoars@kernel.org/ # v1 Link: https://lore.kernel.org/r/20210420211615.GA51432@embeddedor/ # v2 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-25 07:30:34 -10:00
Alexander Aring	706474fbc5	fs: dlm: don't allow half transmitted messages This patch will clean a dirty page buffer if a reconnect occurs. If a page buffer was half transmitted we cannot start inside the middle of a dlm message if a node connects again. I observed invalid length receptions errors and was guessing that this behaviour occurs, after this patch I never saw an invalid message length again. This patch might drops more messages for dlm version 3.1 but 3.1 can't deal with half messages as well, for 3.2 it might trigger more re-transmissions but will not leave dlm in a broken state. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	5b2f981fde	fs: dlm: add midcomms debugfs functionality This patch adds functionality to debug midcomms per connection state inside a comms directory which is similar like dlm configfs. Currently there exists the possibility to read out two attributes which is the send queue counter and the version of each midcomms node state. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	489d8e559c	fs: dlm: add reliable connection if reconnect This patch introduce to make a tcp lowcomms connection reliable even if reconnects occurs. This is done by an application layer re-transmission handling and sequence numbers in dlm protocols. There are three new dlm commands: DLM_OPTS: This will encapsulate an existing dlm message (and rcom message if they don't have an own application side re-transmission handling). As optional handling additional tlv's (type length fields) can be appended. This can be for example a sequence number field. However because in DLM_OPTS the lockspace field is unused and a sequence number is a mandatory field it isn't made as a tlv and we put the sequence number inside the lockspace id. The possibility to add optional options are still there for future purposes. DLM_ACK: Just a dlm header to acknowledge the receive of a DLM_OPTS message to it's sender. DLM_FIN: This provides a 4 way handshake for connection termination inclusive support for half-closed connections. It's provided on application layer because SCTP doesn't support half-closed sockets, the shutdown() call can interrupted by e.g. TCP resets itself and a hard logic to implement it because the othercon paradigm in lowcomms. The 4-way termination handshake also solve problems to synchronize peer EOF arrival and that the cluster manager removes the peer in the node membership handling of DLM. In some cases messages can be still transmitted in this time and we need to wait for the node membership event. To provide a reliable connection the node will retransmit all unacknowledges message to it's peer on reconnect. The receiver will then filtering out the next received message and drop all messages which are duplicates. As RCOM_STATUS and RCOM_NAMES messages are the first messages which are exchanged and they have they own re-transmission handling, there exists logic that these messages must be first. If these messages arrives we store the dlm version field. This handling is on DLM 3.1 and after this patch 3.2 the same. A backwards compatibility handling has been added which seems to work on tests without tcpkill, however it's not recommended to use DLM 3.1 and 3.2 at the same time, because DLM 3.2 tries to fix long term bugs in the DLM protocol. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	8e2e40860c	fs: dlm: add union in dlm header for lockspace id This patch adds union inside the lockspace id to handle it also for another use case for a different dlm command. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	37a247da51	fs: dlm: move out some hash functionality This patch moves out some lowcomms hash functionality into lowcomms header to provide them to other layers like midcomms as well. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	2874d1a68c	fs: dlm: add functionality to re-transmit a message This patch introduces a retransmit functionality for a lowcomms message handle. It's just allocates a new buffer and transmit it again, no special handling about prioritize it because keeping bytestream in order. To avoid another connection look some refactor was done to make a new buffer allocation with a preexisting connection pointer. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	8f2dc78dbc	fs: dlm: make buffer handling per msg This patch makes the void pointer handle for lowcomms functionality per message and not per page allocation entry. A refcount handling for the handle was added to keep the message alive until the user doesn't need it anymore. There exists now a per message callback which will be called when allocating a new buffer. This callback will be guaranteed to be called according the order of the sending buffer, which can be used that the caller increments a sequence number for the dlm message handle. For transition process we cast the dlm_mhandle to dlm_msg and vice versa until the midcomms layer will implement a specific dlm_mhandle structure. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	a070a91cf1	fs: dlm: add more midcomms hooks This patch prepares hooks to redirect to the midcomms layer which will be used by the midcomms re-transmit handling. There exists the new concept of stateless buffers allocation and commits. This can be used to bypass the midcomms re-transmit handling. It is used by RCOM_STATUS and RCOM_NAMES messages, because they have their own ping-like re-transmit handling. As well these two messages will be used to determine the DLM version per node, because these two messages are per observation the first messages which are exchanged. Cluster manager events for node membership are added to add support for half-closed connections in cases that the peer connection get to an end of file but DLM still holds membership of the node. In this time DLM can still trigger new message which we should allow. After the cluster manager node removal event occurs it safe to close the connection. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	6fb5cf9d42	fs: dlm: public header in out utility This patch allows to use header_out() and header_in() outside of dlm util functionality. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	8aa31cbf20	fs: dlm: fix connection tcp EOF handling This patch fixes the EOF handling for TCP that if and EOF is received we will close the socket next time the writequeue runs empty. This is a half-closed socket functionality which doesn't exists in SCTP. The midcomms layer will do a half closed socket functionality on DLM side to solve this problem for the SCTP case. However there is still the last ack flying around but other reset functionality will take care of it if it got lost. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	c6aa00e3d2	fs: dlm: cancel work sync othercon These rx tx flags arguments are for signaling close_connection() from which worker they are called. Obviously the receive worker cannot cancel itself and vice versa for swork. For the othercon the receive worker should only be used, however to avoid deadlocks we should pass the same flags as the original close_connection() was called. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	ba868d9dea	fs: dlm: reconnect if socket error report occurs This patch will change the reconnect handling that if an error occurs if a socket error callback is occurred. This will also handle reconnects in a non blocking connecting case which is currently missing. If error ECONNREFUSED is reported we delay the reconnect by one second. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	7443bc9625	fs: dlm: set is othercon flag There is a is othercon flag which is never used, this patch will set it and printout a warning if the othercon ever sends a dlm message which should never be the case. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	b38bc9c2b3	fs: dlm: fix srcu read lock usage This patch holds the srcu connection read lock in cases where we lookup the connections and accessing it. We don't hold the srcu lock in workers function where the scheduled worker is part of the connection itself. The connection should not be freed if any worker is scheduled or pending. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	2df6b7627a	fs: dlm: add dlm macros for ratelimit log This patch add ratelimit macro to dlm subsystem and will set the connecting log message to ratelimit. In non blocking connecting cases it will print out this message a lot. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
Alexander Aring	c937aabbd7	fs: dlm: always run complete for possible waiters This patch changes the ping_members() result that we always run complete() for possible waiters. We handle the -EINTR error code as successful. This error code is returned if the recovery is stopped which is likely that a new recovery is triggered with a new members configuration and ping_members() runs again. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>	2021-05-25 09:22:20 -05:00
David Howells	b71c791254	netfs: Make CONFIG_NETFS_SUPPORT auto-selected rather than manual Make the netfs helper library selected automatically by the things that use it rather than being manually configured, even though it's required[1]. Fixes: `3a5829fefd` ("netfs: Make a netfs helper module") Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/CAMuHMdXJZ7iNQE964CdBOU=vRKVMFzo=YF_eiwsGgqzuvZ+TuA@mail.gmail.com [1] Link: https://lore.kernel.org/r/162090298141.3166007.2971118149366779916.stgit@warthog.procyon.org.uk # v1	2021-05-25 13:48:04 +01:00
David Howells	19dee61381	netfs: Pass flags through to grab_cache_page_write_begin() In netfs_write_begin(), pass the AOP flags through to grab_cache_page_write_begin() so that a request to use GFP_NOFS is honoured. Fixes: `e1b1240c1f` ("netfs: Add write_begin helper") Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/162090295383.3165945.13595101698295243662.stgit@warthog.procyon.org.uk # v1	2021-05-25 13:46:32 +01:00
Amir Goldstein	a8b98c808e	fanotify: fix permission model of unprivileged group Reporting event->pid should depend on the privileges of the user that initialized the group, not the privileges of the user reading the events. Use an internal group flag FANOTIFY_UNPRIV to record the fact that the group was initialized by an unprivileged user. To be on the safe side, the premissions to setup filesystem and mount marks now require that both the user that initialized the group and the user setting up the mark have CAP_SYS_ADMIN. Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiA77_P5vtv7e83g0+9d7B5W9ZTE4GfQEYbWmfT1rA=VA@mail.gmail.com/ Fixes: `7cea2a3c50` ("fanotify: support limited functionality for unprivileged users") Cc: <Stable@vger.kernel.org> # v5.12+ Link: https://lore.kernel.org/r/20210524135321.2190062-1-amir73il@gmail.com Reviewed-by: Matthew Bobrowski <repnop@google.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2021-05-25 12:21:14 +02:00
Bart Van Assche	7fe1e79b59	configfs: implement the .read_iter and .write_iter methods Configfs is one of the few filesystems that does not yet support the .read_iter and .write_iter callbacks. This patch adds support for these methods in configfs. Signed-off-by: Bart Van Assche <bvanassche@acm.org> [hch: split out a separate fix] Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-05-25 10:30:56 +02:00
Christoph Hellwig	44b9a000df	configfs: drop pointless kerneldoc comments file.c has a bunch of kerneldoc comments for static functions that do not document any API but just list what is done. Drop them. Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-05-25 10:30:31 +02:00
Bart Van Assche	dd33f1f7aa	configfs: fix the kerneldoc comment for configfs_create_bin_file Mention the correct argument name. Signed-off-by: Bart Van Assche <bvanassche@acm.org> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de>	2021-05-25 10:30:31 +02:00
Darrick J. Wong	603f000b15	xfs: validate extsz hints against rt extent size when rtinherit is set The RTINHERIT bit can be set on a directory so that newly created regular files will have the REALTIME bit set to store their data on the realtime volume. If an extent size hint (and EXTSZINHERIT) are set on the directory, the hint will also be copied into the new file. As pointed out in previous patches, for realtime files we require the extent size hint be an integer multiple of the realtime extent, but we don't perform the same validation on a directory with both RTINHERIT and EXTSZINHERIT set, even though the only use-case of that combination is to propagate extent size hints into new realtime files. This leads to inode corruption errors when the bad values are propagated. Because there may be existing filesystems with such a configuration, we cannot simply amend the inode verifier to trip on these directories and call it a day because that will cause previously "working" filesystems to start throwing errors abruptly. Note that it's valid to have directories with rtinherit set even if there is no realtime volume, in which case the problem does not manifest because rtinherit is ignored if there's no realtime device; and it's possible that someone set the flag, crashed, repaired the filesystem (which clears the hint on the realtime file) and continued. Therefore, mitigate this issue in several ways: First, if we try to write out an inode with both rtinherit/extszinherit set and an unaligned extent size hint, turn off the hint to correct the error. Second, if someone tries to misconfigure a directory via the fssetxattr ioctl, fail the ioctl. Third, reverify both extent size hint values when we propagate heritable inode attributes from parent to child, to prevent misconfigurations from spreading. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com>	2021-05-24 18:01:04 -07:00
Darrick J. Wong	6b69e48589	xfs: standardize extent size hint validation While chasing a bug involving invalid extent size hints being propagated into newly created realtime files, I noticed that the xfs_ioctl_setattr checks for the extent size hints weren't the same as the ones now encoded in libxfs and used for validation in repair and mkfs. Because the checks in libxfs are more stringent than the ones in the ioctl, it's possible for a live system to set inode flags that immediately result in corruption warnings. Specifically, it's possible to set an extent size hint on an rtinherit directory without checking if the hint is aligned to the realtime extent size, which makes no sense since that combination is used only to seed new realtime files. Replace the open-coded and inadequate checks with the libxfs verifier versions and update the code comments a bit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-05-24 18:01:04 -07:00
Darrick J. Wong	0f9342513c	xfs: check free AG space when making per-AG reservations The new online shrink code exposed a gap in the per-AG reservation code, which is that we only return ENOSPC to callers if the entire fs doesn't have enough free blocks. Except for debugging mode, the reservation init code doesn't ever check that there's enough free space in that AG to cover the reservation. Not having enough space is not considered an immediate fatal error that requires filesystem offlining because (a) it's shouldn't be possible to wind up in that state through normal file operations and (b) even if one did, freeing data blocks would recover the situation. However, online shrink now needs to know if shrinking would not leave enough space so that it can abort the shrink operation. Hence we need to promote this assertion into an actual error return. Observed by running xfs/168 with a 1k block size, though in theory this could happen with any configuration. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2021-05-24 18:01:04 -07:00
zhangyi (F)	12e0613715	block_dump: remove block_dump feature in mark_inode_dirty() block_dump is an old debugging interface, one of it's functions is used to print the information about who write which file on disk. If we enable block_dump through /proc/sys/vm/block_dump and turn on debug log level, we can gather information about write process name, target file name and disk from kernel message. This feature is realized in block_dump___mark_inode_dirty(), it print above information into kernel message directly when marking inode dirty, so it is noisy and can easily trigger log storm. At the same time, get the dentry refcount is also not safe, we found it will lead to deadlock on ext4 file system with data=journal mode. After tracepoints has been introduced into the kernel, we got a tracepoint in __mark_inode_dirty(), which is a better replacement of block_dump___mark_inode_dirty(). The only downside is that it only trace the inode number and not a file name, but it probably doesn't matter because the original printed file name in block_dump is not accurate in some cases, and we can still find it through the inode number and device id. So this patch delete the dirting inode part of block_dump feature. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210313030146.2882027-2-yi.zhang@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-24 06:47:21 -06:00
YueHaibing	21e4e15a84	reiserfs: Remove unneed check in reiserfs_write_full_page() Condition !A \|\| A && B is equivalent to !A \|\| B. Generated by: scripts/coccinelle/misc/excluded_middle.cocci Link: https://lore.kernel.org/r/20210523090258.27696-1-yuehaibing@huawei.com Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz>	2021-05-24 11:00:56 +02:00
Mike Kravetz	e32905e573	userfaultfd: hugetlbfs: fix new flag usage in error path In commit `d6995da311` ("hugetlb: use page.private for hugetlb specific page flags") the use of PagePrivate to indicate a reservation count should be restored at free time was changed to the hugetlb specific flag HPageRestoreReserve. Changes to a userfaultfd error path as well as a VM_BUG_ON() in remove_inode_hugepages() were overlooked. Users could see incorrect hugetlb reserve counts if they experience an error with a UFFDIO_COPY operation. Specifically, this would be the result of an unlikely copy_huge_page_from_user error. There is not an increased chance of hitting the VM_BUG_ON. Link: https://lkml.kernel.org/r/20210521233952.236434-1-mike.kravetz@oracle.com Fixes: `d6995da311` ("hugetlb: use page.private for hugetlb specific page flags") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Mina Almasry <almasry.mina@google.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: David Hildenbrand <david@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Mina Almasry <almasrymina@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-22 15:09:07 -10:00
Linus Torvalds	4ff2473bdb	block-5.13-2021-05-22 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmCpPO4QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpgI0EACitV5OwfX+saZdQEj3LF4dAo7uZkMV0cZK GJ3m1NWsMDXJJofcczyVTEs0iNT4fpb1dKE9cyOVjAFDoH8Dn7C+UZ163QWu+SCk WGgyiY+Qdwr7cyl6+2+WQkLBeLcyuFVjGtYHTxYWY2O+DpyhRw94Oiih1bfnI/6i KZTpaA3z+pZs/KFIE7eUnkI/iWC39VShZ1T8/gXO9vmIhUkA67j1o9i3LYpGYnXx Awza8Lpql7s3tfWcDL6FNHQmFPUjiowCSUNupzdnHgjggWwUCosJTTcL+mfdTHOJ YuYM3qRuzTbIeXXy/5JTZUt5AOkS8SCre7BpclSDrhZBiL/dkvAndN43ce/6vc7i FrgvnbY/Ik2PWQwcbxiXZzcEKxT9dzXbsyJG08ePZwQ5s+8M5KVZv+ElrV+T7/nJ DYjnWahQ674tHv2Z7Bp4hAjnchwiypxqie8OnOKBI+WseT2D8Pjs2sinUHSYKYDk 3m2e0BVsw+FAYt3bcdhocDQnrJwMNrhSuA9Rtyh6qeMG34yxOXJmZvrHNrbg2fG/ a/xgVewn/P4sDxGCwS3XH/zILYgvJAwTFWIfDeRXE4epqsPZ9h8FBq3Fzl5asL7V yl9iQlWuE1+Ks8IQMjunbJfQSTEghPCjJWHVQQVJm+rT33qI80Ac4a0vdd99TaXh 8P58LE+0jg== =ADzj -----END PGP SIGNATURE----- Merge tag 'block-5.13-2021-05-22' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - Fix BLKRRPART and deletion race (Gulam, Christoph) - NVMe pull request (Christoph): - nvme-tcp corruption and timeout fixes (Sagi Grimberg, Keith Busch) - nvme-fc teardown fix (James Smart) - nvmet/nvme-loop memory leak fixes (Wu Bo)" * tag 'block-5.13-2021-05-22' of git://git.kernel.dk/linux-block: block: fix a race between del_gendisk and BLKRRPART block: prevent block device lookups at the beginning of del_gendisk nvme-fc: clear q_live at beginning of association teardown nvme-tcp: rerun io_work if req_list is not empty nvme-tcp: fix possible use-after-completion nvme-loop: fix memory leak in nvme_loop_create_ctrl() nvmet: fix memory leak in nvmet_alloc_ctrl()	2021-05-22 07:40:34 -10:00
Linus Torvalds	b9231dfbcb	io_uring-5.13-2021-05-22 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmCpPQcQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgptBfD/0XV2+UXT7GTq3zfAgeCRojxzjH8YSh/fu1 uSYA2fME0J7B2lpgxwHmDwgW/JkxkQ9oal+QxoNUJnmTF4CN3c7edQYxaA+QAnb/ XiEY6s5slpBtopJCXQqPlE6dUnn1yjc0wNIm3EjvmmMaFjb6MVfMZWqWn9AANNTd yiRtk8a7KQuYdBeQMPVQG4v1ue37VTL5B9D9tM3p03W2ngNhtWw2Yxy5k/ePseip HYhPm+SKcbpmSFS+KN/a4aBLHyW89FRnhBWZF50sBmdUD+HLgz09IyFdSqyo9s2f wb7h3u3FbzTk3JofcJhYfqoQXkwmYhHrwNGCMjhK/zy+qloCIOu8Nw4jkcH+VYwK Rf7cFu+CZDRgcIu4Op/W5CPHNPY680Rxd/yBKlG/n4aZ/zxuuOu08992Z5BSaxfw UpIFMOWMuDvbBRUk71R34ME0o1wNhWL75Rljh97dAMRZLez1h8CmGdktT9g4keuo 71Swq51AQk7fWXW0yQK2kIpbTjazfh6+AEvdF4c/Njss83K7PHCq00xeKI9PeNXN aQvPBpFifTeN1B1IENH5wEHO8F7e38eU45WHPwgNJUuSEpuBoXQGoLBlf6WXzNUS sIt6UjGDCFTZddIYwVfVISl7+DLBLCxYRYnw0Mx99x1shUpH+6q0HdpPgOxiKmoH ZgdG/q8rVg== =tipG -----END PGP SIGNATURE----- Merge tag 'io_uring-5.13-2021-05-22' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "One fix for a regression with poll in this merge window, and another just hardens the io-wq exit path a bit" * tag 'io_uring-5.13-2021-05-22' of git://git.kernel.dk/linux-block: io_uring: fortify tctx/io_wq cleanup io_uring: don't modify req->poll for rw	2021-05-22 07:36:36 -10:00
Linus Torvalds	a3969ef463	Fixes for 5.13-rc3: - Fix some math errors in the realtime allocator when extent size hints are applied. - Fix unnecessary short writes to realtime files when free space is fragmented. - Fix a crash when using scrub tracepoints. - Restore ioctl uapi definitions that were accidentally removed in 5.13-rc1. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmCmgSoACgkQ+H93GTRK tOvInhAAj9JbQLz/BPt14qVdteNDNBpdFdl/SC5Bow5ABOWi8FSOu9N/F32nMy9y fGPXP2G//sYzfW5jwE+ZPEEq7e+K55rLZHxHsbtWXjCL96t9edEDGFS5p6LmgnRW aJwE1QxhFHFJZEX1+Y08zmB+fSnwHJ4HzZihYb1f9sTI5cJRh7pvvj26HiqfUk9I FUT5Oo/Dy8gSSRmNCPWlLWhXJurrSwAYHmoE44vNHoEYHcodVwAK+ZR/Dj/5DQaS DTYgLeHZQmPijcW7/B/RKcEz96hMQ9afg7vBdgZwhG/oFCRKV2m3rYjjjo/AgKv4 4FkmUoP5+CTT1vt9UKrdyl9uLTMqxiHuU1qvb82DM5XbbsLEFBBa+e3QrWmJqR8D 3lBp6ogtOmbNEJpgLxCdbVl80HOjB+yaIWUB536nauz4USZvCcRGvdYDQ922q6ig 1eT5Q6KCNgO3e6WIQ3W5kNJsM+/gXlNvwwhN/jHCQKy//bgVRnNYbODC6K46mjp/ H8+NWyzQXpFjRWjmQ60LD2/RlyJVbbLbMkPTO1g/vjyZWjgp0fV2wtG6Ag/FlwVF DERUdp/0m6V+lPRcKbWCasjEc+pis6TDnvn+tCftXthh3fXGcalC3bnmv3yH9rkP YAejnDvuuLVtgPyVARA9DTQR5ch5CSRuFc6sU9SFL5nv2p59RtU= =IvUb -----END PGP SIGNATURE----- Merge tag 'xfs-5.13-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs fixes from Darrick Wong: - Fix some math errors in the realtime allocator when extent size hints are applied. - Fix unnecessary short writes to realtime files when free space is fragmented. - Fix a crash when using scrub tracepoints. - Restore ioctl uapi definitions that were accidentally removed in 5.13-rc1. * tag 'xfs-5.13-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: restore old ioctl definitions xfs: fix deadlock retry tracepoint arguments xfs: retry allocations when locality-based search fails xfs: adjust rt allocation minlen when extszhint > rtextsize	2021-05-21 18:45:09 -10:00
Linus Torvalds	45af60e7ce	for-5.13-rc2-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmCoEQkACgkQxWXV+ddt WDsn6Q//XXQVextL6g6Wjx0SR9b5C1ndSV841jNY+KQ0drBPSOBs+0SXI+nIWAK1 iTpmj3s2qrRElZZ6DT4fKP28KnbUJed9+CcirNnN3IMOeauI760CLobXZLsw1wGH o0HKKgcPhw/v9o9jqX22rSfzDZ2Rx2KhZ8iEb1ZXIG5iJNFcnXCCoFOqk4I+UEvH /5734KU8RI3sCRhziSf/vDCF50p+BIWr8VilQkmZUzi0oa6Y1wXm0qd9j0unhICR NxcBk1NYdOosAvVRhSqync1BNLhXSctg4rwhLlSI5SDvt/Ivz5tguNr9HcizOvmW zyb0g1c3Pq0p2wQJLybbs1zn67d0+7Q23UPWx1C+IKU3nmX5mGWzToxjVOQASYaZ 8UbzYAjUHtJpLDB4dp6+k5Pv/yfVGyhxXI+qLMWow77qRPPf7/vw5nEwTXmjcPRH 9st0TopZVXI4IEpZP+HeNFdNONuPL3CqV0t1+MnC73WMhmUfXR5E8Yq5H3MscuFl smkrWUq/g+cmkiOw5r4MyadFuN1MsXGw4rOdbYjY4JqVht6gPkOp3P73Hme5rD3H Txw/1WKEl+w3I6wS0Dl/NFcMGOyl8gEv4rATDyRWkxfmCue2mcTGS/3jjjWWguu4 +Q7e6p1390PLAvMV/rEDoYmFCoPSYp6trvupW+5fkZdOyei1SZM= =98LW -----END PGP SIGNATURE----- Merge tag 'for-5.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes: - fix unaligned compressed writes in zoned mode - fix false positive lockdep warning when cloning inline extent - remove wrong BUG_ON in tree-log error handling" * tag 'for-5.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix parallel compressed writes btrfs: zoned: pass start block to btrfs_use_zone_append btrfs: do not BUG_ON in link_to_fixup_dir btrfs: release path before starting transaction when cloning inline extent	2021-05-21 13:24:12 -10:00
Linus Torvalds	8bb14ca171	7 SMB3 fixes, one for stable, 3 others fix problems found in testing handle leases, and a compounded request fix -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmCn2x8ACgkQiiy9cAdy T1HjnQv+M87Xx++VVaJzeLQQlKGA/vfkhM7YLEkIwxmbUpt8JURORoK91xVa/RZA eS/K2tYOilAuuV7VXXw6ng6WNCWE/l+BNT5FHZ4WJt71pE1/tN/NIACtOhBB01GO r+JhAE08zYLu8vA1Ax1EBtSSBjTLUjDX0fWMfwD4C/BBABw5VZISnkSEj2lC6wT9 vovEalU9amMRrvlhK9Z+MRJRJFzxY4LingiEVlFIdLczCGia5PgSl3NXRY1//rNO wc//34cCGxBNc5Su5Bvn1kTZT5mdBFR98mLOuD+Dw55LlIlShKDnhZHGQDGPyQGT ey2w2b+pNAr3rwVNtU6JNmI7AiUllNHiDu5UsyB0ctDWJljzrILd4uPaWofcNXAh 5qPRvuGsqjo3D/10DPshla1pJtmFr8eKXy8o6UVfMYQSHDo1LbqMll7ArGgV3Fxn B2g5N+ax1+DXZlykKJGhYBBkvGANuUBU/tq810i5BvLhfrc1dx+pJlZAeO5OxCSA SBUiirq4 =neWC -----END PGP SIGNATURE----- Merge tag '5.13-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Seven smb3 fixes: one for stable, three others fix problems found in testing handle leases, and a compounded request fix" * tag '5.13-rc3-smb3' of git://git.samba.org/sfrench/cifs-2.6: Fix KASAN identified use-after-free issue. Defer close only when lease is enabled. Fix kernel oops when CONFIG_DEBUG_ATOMIC_SLEEP is enabled. cifs: Fix inconsistent indenting cifs: fix memory leak in smb2_copychunk_range SMB3: incorrect file id in requests compounded with open cifs: remove deadstore in cifs_close_all_deferred_files()	2021-05-21 13:12:51 -10:00
Greg Kroah-Hartman	fb05b14c5b	debugfs: remove return value of debugfs_create_ulong() No one checks the return value of debugfs_create_ulong(), as it's not needed, so make the return value void, so that no one tries to do so in the future. Link: https://lore.kernel.org/r/20210521184340.1348539-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-21 20:59:55 +02:00
Greg Kroah-Hartman	393b06383f	debugfs: remove return value of debugfs_create_bool() No one checks the return value of debugfs_create_bool(), as it's not needed, so make the return value void, so that no one tries to do so in the future. Link: https://lore.kernel.org/r/20210521184519.1356639-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-21 20:59:03 +02:00
Linus Torvalds	a0e31f3a38	Merge branch 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo fix from Eric Biederman: "During the merge window an issue with si_perf and the siginfo ABI came up. The alpha and sparc siginfo structure layout had changed with the addition of SIGTRAP TRAP_PERF and the new field si_perf. The reason only alpha and sparc were affected is that they are the only architectures that use si_trapno. Looking deeper it was discovered that si_trapno is used for only a few select signals on alpha and sparc, and that none of the other _sigfault fields past si_addr are used at all. Which means technically no regression on alpha and sparc. While the alignment concerns might be dismissed the abuse of si_errno by SIGTRAP TRAP_PERF does have the potential to cause regressions in existing userspace. While we still have time before userspace starts using and depending on the new definition siginfo for SIGTRAP TRAP_PERF this set of changes cleans up siginfo_t. - The si_trapno field is demoted from magic alpha and sparc status and made an ordinary union member of the _sigfault member of siginfo_t. Without moving it of course. - si_perf is replaced with si_perf_data and si_perf_type ending the abuse of si_errno. - Unnecessary additions to signalfd_siginfo are removed" * 'for-v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo signal: Deliver all of the siginfo perf data in _perf signal: Factor force_sig_perf out of perf_sigtrap signal: Implement SIL_FAULT_TRAPNO siginfo: Move si_trapno inside the union inside _si_fault	2021-05-21 06:12:52 -10:00
Huilong Deng	cf1031ed47	jfs: Remove trailing semicolon in macros Macros should not use a trailing semicolon. Signed-off-by: Huilong Deng <denghuilong@cdjrlc.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2021-05-21 10:36:22 -05:00
zuoqilin	577ebd195f	fs: Fix typo issue Change 'inacitve' to 'inactive'. Signed-off-by: zuoqilin <zuoqilin@yulong.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2021-05-21 10:36:00 -05:00
Phillip Potter	a8867f4e38	ext4: fix memory leak in ext4_mb_init_backend on error path. Fix a memory leak discovered by syzbot when a file system is corrupted with an illegally large s_log_groups_per_flex. Reported-by: syzbot+aa12d6106ea4ca1b6aae@syzkaller.appspotmail.com Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Cc: stable@kernel.org Link: https://lore.kernel.org/r/20210412073837.1686-1-phil@philpotter.co.uk Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2021-05-20 23:29:32 -04:00
Andreas Gruenbacher	b7f55d928e	gfs2: Fix mmap locking for write faults When a write fault occurs, we need to take the inode glock of the underlying inode in exclusive mode. Otherwise, there's no guarantee that the dirty page will be written back to disk. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-05-21 05:16:38 +02:00
Rohith Surabattula	9687c85dfb	Fix KASAN identified use-after-free issue. [ 612.157429] ================================================================== [ 612.158275] BUG: KASAN: use-after-free in process_one_work+0x90/0x9b0 [ 612.158801] Read of size 8 at addr ffff88810a31ca60 by task kworker/2:9/2382 [ 612.159611] CPU: 2 PID: 2382 Comm: kworker/2:9 Tainted: G OE 5.13.0-rc2+ #98 [ 612.159623] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014 [ 612.159640] Workqueue: 0x0 (deferredclose) [ 612.159669] Call Trace: [ 612.159685] dump_stack+0xbb/0x107 [ 612.159711] print_address_description.constprop.0+0x18/0x140 [ 612.159733] ? process_one_work+0x90/0x9b0 [ 612.159743] ? process_one_work+0x90/0x9b0 [ 612.159754] kasan_report.cold+0x7c/0xd8 [ 612.159778] ? lock_is_held_type+0x80/0x130 [ 612.159789] ? process_one_work+0x90/0x9b0 [ 612.159812] kasan_check_range+0x145/0x1a0 [ 612.159834] process_one_work+0x90/0x9b0 [ 612.159877] ? pwq_dec_nr_in_flight+0x110/0x110 [ 612.159914] ? spin_bug+0x90/0x90 [ 612.159967] worker_thread+0x3b6/0x6c0 [ 612.160023] ? process_one_work+0x9b0/0x9b0 [ 612.160038] kthread+0x1dc/0x200 [ 612.160051] ? kthread_create_worker_on_cpu+0xd0/0xd0 [ 612.160092] ret_from_fork+0x1f/0x30 [ 612.160399] Allocated by task 2358: [ 612.160757] kasan_save_stack+0x1b/0x40 [ 612.160768] __kasan_kmalloc+0x9b/0xd0 [ 612.160778] cifs_new_fileinfo+0xb0/0x960 [cifs] [ 612.161170] cifs_open+0xadf/0xf20 [cifs] [ 612.161421] do_dentry_open+0x2aa/0x6b0 [ 612.161432] path_openat+0xbd9/0xfa0 [ 612.161441] do_filp_open+0x11d/0x230 [ 612.161450] do_sys_openat2+0x115/0x240 [ 612.161460] __x64_sys_openat+0xce/0x140 When mod_delayed_work is called to modify the delay of pending work, it might return false and queue a new work when pending work is already scheduled or when try to grab pending work failed. So, Increase the reference count when new work is scheduled to avoid use-after-free. Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-05-20 12:20:42 -05:00
Linus Torvalds	50f09a3dd5	Char/misc driver fixes for 5.13-rc3 Here is a big set of char/misc/other driver fixes for 5.13-rc3. The majority here is the fallout of the umn.edu re-review of all prior submissions. That resulted in a bunch of reverts along with the "correct" changes made, such that there is no regression of any of the potential fixes that were made by those individuals. I would like to thank the over 80 different developers who helped with the review and fixes for this mess. Other than that, there's a few habanna driver fixes for reported issues, and some dyndbg fixes for reported problems. All of these have been in linux-next for a while with no reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYKZCBg8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ynhRQCdGk6ri4oluyn/Z/2KAjvXDOmTmvgAn12VP42d S1Zmh4qRH2OWaLOBg7c2 =qtxj -----END PGP SIGNATURE----- Merge tag 'char-misc-5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here is a big set of char/misc/other driver fixes for 5.13-rc3. The majority here is the fallout of the umn.edu re-review of all prior submissions. That resulted in a bunch of reverts along with the "correct" changes made, such that there is no regression of any of the potential fixes that were made by those individuals. I would like to thank the over 80 different developers who helped with the review and fixes for this mess. Other than that, there's a few habanna driver fixes for reported issues, and some dyndbg fixes for reported problems. All of these have been in linux-next for a while with no reported problems" * tag 'char-misc-5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (82 commits) misc: eeprom: at24: check suspend status before disable regulator uio_hv_generic: Fix another memory leak in error handling paths uio_hv_generic: Fix a memory leak in error handling paths uio/uio_pci_generic: fix return value changed in refactoring Revert "Revert "ALSA: usx2y: Fix potential NULL pointer dereference"" dyndbg: drop uninformative vpr_info dyndbg: avoid calling dyndbg_emit_prefix when it has no work binder: Return EFAULT if we fail BINDER_ENABLE_ONEWAY_SPAM_DETECTION cdrom: gdrom: initialize global variable at init time brcmfmac: properly check for bus register errors Revert "brcmfmac: add a check for the status of usb_register" video: imsttfb: check for ioremap() failures Revert "video: imsttfb: fix potential NULL pointer dereferences" net: liquidio: Add missing null pointer checks Revert "net: liquidio: fix a NULL pointer dereference" media: gspca: properly check for errors in po1030_probe() Revert "media: gspca: Check the return value of write_bridge for timeout" media: gspca: mt9m111: Check write_bridge for timeout Revert "media: gspca: mt9m111: Check write_bridge for timeout" media: dvb: Add check on sp8870_readreg return ...	2021-05-20 06:31:52 -10:00
Linus Torvalds	7ac177143c	\n -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmCmN9AACgkQnJ2qBz9k QNn5ZwgAwnLdgBuILDqJwPaYpXOzvMhjjG8AwBDzhMYhhpt+OOCUevoRm7mDU7J2 t/DlwWGMhpp80ku+x+AURR/ltOfFvw4QAHeIXPWjkoieFKcLOEvAjWWZP6oIFC12 5e/QVXqK58fuRJwveYp4jZ+AXvDMoHJrDXsoTFezjBDIQQgzlIlrMzPavS/6UzUN mAF2sapE9lcQoRMfU8kktBWPVM/GpFkus2Q48EYFCZ1rp3aRyw/aahTVuvSUZCV0 XiY6f2F7qgFLtomK6UurlxTc7rPsrG+UmNvGWuXf3R81UawegmKQeG5zcaMGrZs1 kHyJQcP9nGYPLDXt/4kW9cY0s8oOKg== =RbOE -----END PGP SIGNATURE----- Merge tag 'quota_for_v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota fixes from Jan Kara: "The most important part in the pull is disablement of the new syscall quotactl_path() which was added in rc1. The reason is some people at LWN discussion pointed out dirfd would be useful for this path based syscall and Christian Brauner agreed. Without dirfd it may be indeed problematic for containers. So let's just disable the syscall for now when it doesn't have users yet so that we have more time to mull over how to best specify the filesystem we want to work on" * tag 'quota_for_v5.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Disable quotactl_path syscall quota: Use 'hlist_for_each_entry' to simplify code	2021-05-20 06:20:15 -10:00
Anna Schumaker	a421d21860	NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return() Commit `de144ff423` changes _pnfs_return_layout() to call pnfs_mark_matching_lsegs_return() passing NULL as the struct pnfs_layout_range argument. Unfortunately, pnfs_mark_matching_lsegs_return() doesn't check if we have a value here before dereferencing it, causing an oops. I'm able to hit this crash consistently when running connectathon basic tests on NFS v4.1/v4.2 against Ontap. Fixes: `de144ff423` ("NFSv4: Don't discard segments marked for return in _pnfs_return_layout()") Cc: stable@vger.kernel.org Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-05-20 12:17:08 -04:00
Yang Li	d1d973950a	pNFS/NFSv4: Remove redundant initialization of 'rd_size' Variable 'rd_size' is being initialized however this value is never read as 'rd_size' is assigned a new value in for statement. Remove the redundant assignment. Clean up clang warning: fs/nfs/pnfs.c:2681:6: warning: Value stored to 'rd_size' during its initialization is never read [clang-analyzer-deadcode.DeadStores] Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-05-20 12:17:08 -04:00
Dan Carpenter	769b01ea68	NFS: fix an incorrect limit in filelayout_decode_layout() The "sizeof(struct nfs_fh)" is two bytes too large and could lead to memory corruption. It should be NFS_MAXFHSIZE because that's the size of the ->data[] buffer. I reversed the size of the arguments to put the variable on the left. Fixes: `16b374ca43` ("NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-05-20 12:17:08 -04:00
zhouchuangao	bb00238890	fs/nfs: Use fatal_signal_pending instead of signal_pending We set the state of the current process to TASK_KILLABLE via prepare_to_wait(). Should we use fatal_signal_pending() to detect the signal here? Fixes: `b4868b44c5` ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE") Signed-off-by: zhouchuangao <zhouchuangao@vivo.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>	2021-05-20 12:15:35 -04:00
Darrick J. Wong	e3c2b04747	xfs: restore old ioctl definitions These ioctl definitions in xfs_fs.h are part of the userspace ABI and were mistakenly removed during the 5.13 merge window. Fixes: `9fefd5db08` ("xfs: convert to fileattr") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>	2021-05-20 08:31:22 -07:00
Darrick J. Wong	16c9de54dc	xfs: fix deadlock retry tracepoint arguments sc->ip is the inode that's being scrubbed, which means that it's not set for scrub types that don't involve inodes. If one of those scrubbers (e.g. inode btrees) returns EDEADLOCK, we'll trip over the null pointer. Fix that by reporting either the file being examined or the file that was used to call scrub. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com>	2021-05-20 08:31:22 -07:00
Darrick J. Wong	676a659b60	xfs: retry allocations when locality-based search fails If a realtime allocation fails because we can't find a sufficiently large free extent satisfying locality rules, relax the locality rules and try again. This reduces the occurrence of short writes to realtime files when the write size is large and the free space is fragmented. This was originally discovered by running generic/186 with the realtime reflink patchset and a 128k cow extent size hint, but the short write symptoms can manifest with a 128k extent size hint and no reflink, so apply the fix now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>	2021-05-20 08:28:34 -07:00
Gulam Mohamed	bc6a385132	block: fix a race between del_gendisk and BLKRRPART When BLKRRPART is called concurrently with del_gendisk, the partitions rescan can create a stale partition that will never be be cleaned up. Fix this by checking the the disk is up before rescanning partitions while under bd_mutex. Signed-off-by: Gulam Mohamed <gulam.mohamed@oracle.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210514131842.1600568-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-20 07:59:35 -06:00
Christoph Hellwig	6c60ff048c	block: prevent block device lookups at the beginning of del_gendisk As an artifact of how gendisk lookup used to work in earlier kernels, GENHD_FL_UP is only cleared very late in del_gendisk, and a global lock is used to prevent opens from succeeding while del_gendisk is tearing down the gendisk. Switch to clearing the flag early and under bd_mutex so that callers can use bd_mutex to stabilize the flag, which removes the need for the global mutex. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210514131842.1600568-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-20 07:59:35 -06:00
Johannes Thumshirn	764c7c9a46	btrfs: zoned: fix parallel compressed writes When multiple processes write data to the same block group on a compressed zoned filesystem, the underlying device could report I/O errors and data corruption is possible. This happens because on a zoned file system, compressed data writes where sent to the device via a REQ_OP_WRITE instead of a REQ_OP_ZONE_APPEND operation. But with REQ_OP_WRITE and parallel submission it cannot be guaranteed that the data is always submitted aligned to the underlying zone's write pointer. The change to using REQ_OP_ZONE_APPEND instead of REQ_OP_WRITE on a zoned filesystem is non intrusive on a regular file system or when submitting to a conventional zone on a zoned filesystem, as it is guarded by btrfs_use_zone_append. Reported-by: David Sterba <dsterba@suse.com> Fixes: `9d294a685f` ("btrfs: zoned: enable to mount ZONED incompat flag") CC: stable@vger.kernel.org # 5.12.x: e380adfc213a13: btrfs: zoned: pass start block to btrfs_use_zone_append CC: stable@vger.kernel.org # 5.12.x Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-20 15:51:07 +02:00
Johannes Thumshirn	e380adfc21	btrfs: zoned: pass start block to btrfs_use_zone_append btrfs_use_zone_append only needs the passed in extent_map's block_start member, so there's no need to pass in the full extent map. This also enables the use of btrfs_use_zone_append in places where we only have a start byte but no extent_map. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-20 15:50:49 +02:00
Pavel Begunkov	ba5ef6dc8a	io_uring: fortify tctx/io_wq cleanup We don't want anyone poking into tctx->io_wq awhile it's being destroyed by io_wq_put_and_exit(), and even though it shouldn't even happen, if buggy would be preferable to get a NULL-deref instead of subtle delayed failure or UAF. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/827b021de17926fd807610b3e53a5a5fa8530856.1621513214.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-20 07:29:11 -06:00
Bob Peterson	f5456b5d67	gfs2: Clean up revokes on normal withdraws Before this patch, the system ail lists were cleaned up if the logd process withdrew, but on other withdraws, they were not cleaned up. This included the cleaning up of the revokes as well. This patch reorganizes things a bit so that all withdraws (not just logd) clean up the ail lists, including any pending revokes. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-05-20 13:31:37 +02:00
Bob Peterson	865cc3e9cc	gfs2: fix a deadlock on withdraw-during-mount Before this patch, gfs2 would deadlock because of the following sequence during mount: mount gfs2_fill_super gfs2_make_fs_rw <--- Detects IO error with glock kthread_stop(sdp->sd_quotad_process); <--- Blocked waiting for quotad to finish logd Detects IO error and the need to withdraw calls gfs2_withdraw gfs2_make_fs_ro kthread_stop(sdp->sd_quotad_process); <--- Blocked waiting for quotad to finish gfs2_quotad gfs2_statfs_sync gfs2_glock_wait <---- Blocked waiting for statfs glock to be granted glock_work_func do_xmote <---Detects IO error, can't release glock: blocked on withdraw glops->go_inval glock_blocked_by_withdraw requeue glock work & exit <--- work requeued, blocked by withdraw This patch makes a special exception for the statfs system inode glock, which allows the statfs glock UNLOCK to proceed normally. That allows the quotad daemon to exit during the withdraw, which allows the logd daemon to exit during the withdraw, which allows the mount to exit. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-05-20 13:31:37 +02:00
Bob Peterson	20265d9a67	gfs2: fix scheduling while atomic bug in glocks Before this patch, in the unlikely event that gfs2_glock_dq encountered a withdraw, it would do a wait_on_bit to wait for its journal to be recovered, but it never released the glock's spin_lock, which caused a scheduling-while-atomic error. This patch unlocks the lockref spin_lock before waiting for recovery. Fixes: `601ef0d52e` ("gfs2: Force withdraw to replay journals and wait for it to finish") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-05-20 13:31:37 +02:00
Bob Peterson	4194dec4b4	gfs2: Fix I_NEW check in gfs2_dinode_in Patch `4a378d8a0d` added a new check for I_NEW inodes, but unfortunately it used the wrong variable, i_flags. This caused GFS2 to withdraw when gfs2_lookup_by_inum needed to refresh an I_NEW inode. This patch switches to use the correct variable, i_state. Fixes: `4a378d8a0d` ("gfs2: be careful with inode refresh") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-05-20 13:31:37 +02:00
Andreas Gruenbacher	43a511c44e	gfs2: Prevent direct-I/O write fallback errors from getting lost When a direct I/O write falls entirely and falls back to buffered I/O and the buffered I/O fails, the write failed with return value 0 instead of the error number reported by the buffered I/O. Fix that. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>	2021-05-20 13:31:36 +02:00
Arturo Giusti	fa236c2b2d	udf: Fix NULL pointer dereference in udf_symlink function In function udf_symlink, epos.bh is assigned with the value returned by udf_tgetblk. The function udf_tgetblk is defined in udf/misc.c and returns the value of sb_getblk function that could be NULL. Then, epos.bh is used without any check, causing a possible NULL pointer dereference when sb_getblk fails. This fix adds a check to validate the value of epos.bh. Link: https://bugzilla.kernel.org/show_bug.cgi?id=213083 Signed-off-by: Arturo Giusti <koredump@protonmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2021-05-20 12:14:44 +02:00
Rohith Surabattula	0ab95c2510	Defer close only when lease is enabled. When smb2 lease parameter is disabled on server. Server grants batch oplock instead of RHW lease by default on open, inode page cache needs to be zapped immediatley upon close as cache is not valid. Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-05-19 21:11:28 -05:00
Rohith Surabattula	860b69a9d7	Fix kernel oops when CONFIG_DEBUG_ATOMIC_SLEEP is enabled. Removed oplock_break_received flag which was added to achieve synchronization between oplock handler and open handler by earlier commit. It is not needed because there is an existing lock open_file_lock to achieve the same. find_readable_file takes open_file_lock and then traverses the openFileList. Similarly, cifs_oplock_break while closing the deferred handle (i.e cifsFileInfo_put) takes open_file_lock and then sends close to the server. Added comments for better readability. Signed-off-by: Rohith Surabattula <rohiths@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-05-19 21:11:26 -05:00
Jiapeng Chong	e83aa3528a	cifs: Fix inconsistent indenting Eliminate the follow smatch warning: fs/cifs/fs_context.c:1148 smb3_fs_context_parse_param() warn: inconsistent indenting. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-05-19 21:11:09 -05:00
Ronnie Sahlberg	d201d7631c	cifs: fix memory leak in smb2_copychunk_range When using smb2_copychunk_range() for large ranges we will run through several iterations of a loop calling SMB2_ioctl() but never actually free the returned buffer except for the final iteration. This leads to memory leaks everytime a large copychunk is requested. Fixes: `9bf0c9cd43` ("CIFS: Fix SMB2/SMB3 Copy offload support (refcopy) for large files") Cc: <stable@vger.kernel.org> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-05-19 19:19:20 -05:00
Linus Torvalds	c3d0e3fd41	fs.idmapped.mount_setattr.v5.13-rc3 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYKT8wgAKCRCRxhvAZXjc op2mAP9hyc3sp2/HvEuTYDc6LmljPNCqdKeCP1eiX5SZR0yMVwEAwR5xym7YeqYZ LRj+xjuvOmrSNhcxpZqLpXuPOcY78wM= =E4a/ -----END PGP SIGNATURE----- Merge tag 'fs.idmapped.mount_setattr.v5.13-rc3' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux Pull mount_setattr fix from Christian Brauner: "This makes an underlying idmapping assumption more explicit. We currently don't have any filesystems that support idmapped mounts which are mountable inside a user namespace, i.e. where s_user_ns != init_user_ns. That was a deliberate decision for now as userns root can just mount the filesystem themselves. Express this restriction explicitly and enforce it until there's a real use-case for this. This way we can notice it and will have a chance to adapt and audit our translation helpers and fstests appropriately if we need to support such filesystems" * tag 'fs.idmapped.mount_setattr.v5.13-rc3' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux: fs/mount_setattr: tighten permission checks	2021-05-19 06:12:31 -10:00
Steve French	c0d46717b9	SMB3: incorrect file id in requests compounded with open See MS-SMB2 3.2.4.1.4, file ids in compounded requests should be set to 0xFFFFFFFFFFFFFFFF (we were treating it as u32 not u64 and setting it incorrectly). Signed-off-by: Steve French <stfrench@microsoft.com> Reported-by: Stefan Metzmacher <metze@samba.org> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>	2021-05-19 10:10:58 -05:00
Al Viro	e4b2755318	getcwd(2): clean up error handling Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-18 20:15:58 -04:00
Al Viro	cf4febc1ad	d_path: prepend_path() is unlikely to return non-zero Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-18 20:15:58 -04:00
Al Viro	008673ff74	d_path: prepend_path(): lift the inner loop into a new helper ... and leave the rename_lock/mount_lock handling in prepend_path() itself Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-18 20:15:57 -04:00
Al Viro	2dac0ad175	d_path: prepend_path(): lift resetting b in case when we'd return 3 out of loop preparation to extracting the loop into helper (next commit) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-18 20:15:56 -04:00
Al Viro	7c0d552fd5	d_path: prepend_path(): get rid of vfsmnt it's kept equal to &mnt->mnt all along. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-18 20:15:56 -04:00
Al Viro	ad08ae5865	d_path: introduce struct prepend_buffer We've a lot of places where we have pairs of form (pointer to end of buffer, amount of space left in front of that). These sit in pairs of variables located next to each other and usually passed by reference. Turn those into instances of new type (struct prepend_buffer) and pass reference to the pair instead of pairs of references to its fields. Declared and initialized by DECLARE_BUFFER(name, buf, buflen). extract_string(prepend_buffer) returns the buffer contents if no overflow has happened, ERR_PTR(ENAMETOOLONG) otherwise. All places where we used to have that boilerplate converted to use of that helper. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-18 20:14:53 -04:00
Al Viro	95b55c42f6	d_path: make prepend_name() boolean It returns only 0 or -ENAMETOOLONG and both callers only check if the result is negative. Might as well return true on success and false on failure... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-18 20:08:12 -04:00
Al Viro	01a4428ee7	d_path: lift -ENAMETOOLONG handling into callers of prepend_path() The only negative value ever returned by prepend_path() is -ENAMETOOLONG and callers can recognize that situation (overflow) by looking at the sign of buflen. Lift that into the callers; we already have the same logics (buf if buflen is non-negative, ERR_PTR(-ENAMETOOLONG) otherwise) in several places and that'll become a new primitive several commits down the road. Make prepend_path() return 0 instead of -ENAMETOOLONG. That makes for saner calling conventions (0/1/2/3/-ENAMETOOLONG is obnoxious) and callers actually get simpler, especially once the aforementioned primitive gets added. In prepend_path() itself we switch prepending the / (in case of empty path) to use of prepend() - no need to open-code that, compiler will do the right thing. It's exactly the same logics as in __dentry_path(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-18 20:08:12 -04:00
Al Viro	d8548232ea	d_path: don't bother with return value of prepend() Only simple_dname() checks it, and there we can simply do those calls and check for overflow (by looking of negative buflen) in the end. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-18 20:08:11 -04:00
Al Viro	a0378fb9b3	getcwd(2): saner logics around prepend_path() call The only negative value that might get returned by prepend_path() is -ENAMETOOLONG, and that happens only on overflow. The same goes for prepend_unreachable(). Overflow is detectable by observing negative buflen, so we can simplify the control flow around the prepend_path() call. Expand prepend_unreachable(), while we are at it - that's the only caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-18 20:08:11 -04:00
Al Viro	9024348f53	d_path: get rid of path_with_deleted() expand in the sole caller; transform the initial prepends similar to what we'd done in dentry_path() (prepend_path() will fail the right way if we call it with negative buflen, same as __dentry_path() does). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-18 20:08:10 -04:00
Al Viro	3acca04326	d_path: regularize handling of root dentry in __dentry_path() All path-forming primitives boil down to sequence of prepend_name() on dentries encountered along the way toward root. Each time we prepend / + dentry name to the buffer. Normally that does exactly what we want, but there's a corner case when we don't call prepend_name() at all (in case of __dentry_path() that happens if we are given root dentry). We obviously want to end up with "/", rather than "", so this corner case needs to be handled. __dentry_path() used to manually put '/' in the end of buffer before doing anything else, to be overwritten by the first call of prepend_name() if one happens and to be left in place if we don't call prepend_name() at all. That required manually checking that we had space in the buffer (prepend_name() and prepend() take care of such checks themselves) and lead to clumsy keeping track of return value. A better approach is to check if the main loop has added anything into the buffer and prepend "/" if it hasn't. A side benefit of using prepend() is that it does the right thing if we'd already run out of buffer, making the overflow-handling logics simpler. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-18 20:06:36 -04:00
Eric W. Biederman	922e301304	signalfd: Remove SIL_PERF_EVENT fields from signalfd_siginfo With the addition of ssi_perf_data and ssi_perf_type struct signalfd_siginfo is dangerously close to running out of space. All that remains is just enough space for two additional 64bit fields. A practice of adding all possible siginfo_t fields into struct singalfd_siginfo can not be supported as adding the missing fields ssi_lower, ssi_upper, and ssi_pkey would require two 64bit fields and one 32bit fields. In practice the fields ssi_perf_data and ssi_perf_type can never be used by signalfd as the signal that generates them always delivers them synchronously to the thread that triggers them. Therefore until someone actually needs the fields ssi_perf_data and ssi_perf_type in signalfd_siginfo remove them. This leaves a bit more room for future expansion. v1: https://lkml.kernel.org/r/20210503203814.25487-12-ebiederm@xmission.com v2: https://lkml.kernel.org/r/20210505141101.11519-12-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-5-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2021-05-18 16:20:54 -05:00
Eric W. Biederman	0683b53197	signal: Deliver all of the siginfo perf data in _perf Don't abuse si_errno and deliver all of the perf data in _perf member of siginfo_t. Note: The data field in the perf data structures in a u64 to allow a pointer to be encoded without needed to implement a 32bit and 64bit version of the same structure. There already exists a 32bit and 64bit versions siginfo_t, and the 32bit version can not include a 64bit member as it only has 32bit alignment. So unsigned long is used in siginfo_t instead of a u64 as unsigned long can encode a pointer on all architectures linux supports. v1: https://lkml.kernel.org/r/m11rarqqx2.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210503203814.25487-10-ebiederm@xmission.com v3: https://lkml.kernel.org/r/20210505141101.11519-11-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-4-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2021-05-18 16:20:54 -05:00
Eric W. Biederman	9abcabe311	signal: Implement SIL_FAULT_TRAPNO Now that si_trapno is part of the union in _si_fault and available on all architectures, add SIL_FAULT_TRAPNO and update siginfo_layout to return SIL_FAULT_TRAPNO when the code assumes si_trapno is valid. There is room for future changes to reduce when si_trapno is valid but this is all that is needed to make si_trapno and the other members of the the union in _sigfault mutually exclusive. Update the code that uses siginfo_layout to deal with SIL_FAULT_TRAPNO and have the same code ignore si_trapno in in all other cases. v1: https://lkml.kernel.org/r/m1o8dvs7s7.fsf_-_@fess.ebiederm.org v2: https://lkml.kernel.org/r/20210505141101.11519-6-ebiederm@xmission.com Link: https://lkml.kernel.org/r/20210517195748.8880-2-ebiederm@xmission.com Reviewed-by: Marco Elver <elver@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2021-05-18 16:20:34 -05:00
Chuck Lever	d6cbe98ff3	NFSD: Update nfsd_cb_args tracepoint Clean-up: Re-order the display of IP address and client ID to be consistent with other _cb_ tracepoints. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:04 -04:00
Chuck Lever	1d2bf65983	NFSD: Remove the nfsd_cb_work and nfsd_cb_done tracepoints Clean up: These are noise in properly working systems. If you really need to observe the operation of the callback mechanism, use the sunrpc:rpc\* tracepoints along with the workqueue tracepoints. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:04 -04:00
Chuck Lever	4ade892ae1	NFSD: Add an nfsd_cb_probe tracepoint Record a tracepoint event when the server performs a callback probe. This event can be enabled as a group with other nfsd_cb tracepoints. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:04 -04:00
Chuck Lever	17d76ddf76	NFSD: Replace the nfsd_deleg_break tracepoint Renamed so it can be enabled as a set with the other nfsd_cb_ tracepoints. And, consistent with those tracepoints, report the address of the client, the client ID the server has given it, and the state ID being recalled. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:04 -04:00
Chuck Lever	87512386e9	NFSD: Add an nfsd_cb_offload tracepoint Record the arguments of CB_OFFLOAD callbacks so we can better observe asynchronous copy-offload behavior. For example: nfsd-995 [008] 7721.934222: nfsd_cb_offload: addr=192.168.2.51:0 client 6092a47c:35a43fc1 fh_hash=0x8739113a count=116528 status=0 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Dai Ngo <Dai.Ngo@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:04 -04:00
Chuck Lever	2cde7f8118	NFSD: Add an nfsd_cb_lm_notify tracepoint When the server kicks off a CB_LM_NOTIFY callback, record its arguments so we can better observe asynchronous locking behavior. For example: nfsd-998 [002] 1471.705873: nfsd_cb_notify_lock: addr=192.168.2.51:0 client 6092a47c:35a43fc1 fh_hash=0x8950b23a Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:04 -04:00
Chuck Lever	3c92fba557	NFSD: Enhance the nfsd_cb_setup tracepoint Display the transport protocol and authentication flavor so admins can see what they might be getting wrong. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:04 -04:00
Chuck Lever	9f57c6062b	NFSD: Remove spurious cb_setup_err tracepoint This path is not really an error path, so the tracepoint I added there is just noise. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:04 -04:00
Chuck Lever	b200f0e353	NFSD: Adjust cb_shutdown tracepoint Show when the upper layer requested a shutdown. RPC tracepoints can already show when rpc_shutdown_client() is called. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:04 -04:00
Chuck Lever	806d65b617	NFSD: Add cb_lost tracepoint Provide more clarity about when the callback channel is in trouble. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:04 -04:00
Chuck Lever	167145cc64	NFSD: Drop TRACE_DEFINE_ENUM for NFSD4_CB_<state> macros TRACE_DEFINE_ENUM() is necessary for enum {} but not for C macros. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:04 -04:00
Chuck Lever	8476c69a7f	NFSD: Capture every CB state transition We were missing one. As a clean-up, add a helper that sets the new CB state and fires a tracepoint. The tracepoint fires only when the state changes, to help reduce trace log noise. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	1736aec82a	NFSD: Constify @fh argument of knfsd_fh_hash() Enable knfsd_fh_hash() to be invoked in functions where the filehandle pointer is a const. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	e8f80c5545	NFSD: Add tracepoints for EXCHANGEID edge cases Some of the most common cases are traced. Enough infrastructure is now in place that more can be added later, as needed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	237f91c85a	NFSD: Add tracepoints for SETCLIENTID edge cases Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	2958d2ee71	NFSD: Add a couple more nfsd_clid_expired call sites Improve observation of NFSv4 lease expiry. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	c41a9b7a90	NFSD: Add nfsd_clid_destroyed tracepoint Record client-requested termination of client IDs. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	cee8aa0742	NFSD: Add nfsd_clid_reclaim_complete tracepoint Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	7e3b32ace6	NFSD: Add nfsd_clid_confirmed tracepoint This replaces a dprintk call site in order to get greater visibility on when client IDs are confirmed or re-used. Simple example: nfsd-995 [000] 126.622975: nfsd_compound: xid=0x3a34e2b1 opcnt=1 nfsd-995 [000] 126.623005: nfsd_cb_args: addr=192.168.2.51:45901 client 60958e3b:9213ef0e prog=1073741824 ident=1 nfsd-995 [000] 126.623007: nfsd_compound_status: op=1/1 OP_SETCLIENTID status=0 nfsd-996 [001] 126.623142: nfsd_compound: xid=0x3b34e2b1 opcnt=1 >>>> nfsd-996 [001] 126.623146: nfsd_clid_confirmed: client 60958e3b:9213ef0e nfsd-996 [001] 126.623148: nfsd_cb_probe: addr=192.168.2.51:45901 client 60958e3b:9213ef0e state=UNKNOWN nfsd-996 [001] 126.623154: nfsd_compound_status: op=1/1 OP_SETCLIENTID_CONFIRM status=0 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	0bfaacac57	NFSD: Remove trace_nfsd_clid_inuse_err This tracepoint has been replaced by nfsd_clid_cred_mismatch and nfsd_clid_verf_mismatch, and can simply be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	744ea54c86	NFSD: Add nfsd_clid_verf_mismatch tracepoint Record when a client presents a different boot verifier than the one we know about. Typically this is a sign the client has rebooted, but sometimes it signals a conflicting client ID, which the client's administrator will need to address. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:03 -04:00
Chuck Lever	27787733ef	NFSD: Add nfsd_clid_cred_mismatch tracepoint Record when a client tries to establish a lease record but uses an unexpected credential. This is often a sign of a configuration problem. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:02 -04:00
Chuck Lever	87b2394d60	NFSD: Add an RPC authflavor tracepoint display helper To be used in subsequent patches. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:02 -04:00
Chuck Lever	a948b1142c	NFSD: Fix TP_printk() format specifier in nfsd_clid_class Since commit `9a6944fee6` ("tracing: Add a verifier to check string pointers for trace events"), which was merged in v5.13-rc1, TP_printk() no longer tacitly supports the "%.*s" format specifier. These are low value tracepoints, so just remove them. Reported-by: David Wysochanski <dwysocha@redhat.com> Fixes: `dd5e3fbc1f` ("NFSD: Add tracepoints to the NFSD state management code") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2021-05-18 13:44:02 -04:00
Ondrej Mosnacek	5881fa8dc2	debugfs: fix security_locked_down() call for SELinux When (ia->ia_valid & (ATTR_MODE \| ATTR_UID \| ATTR_GID)) is zero, then the SELinux implementation of the locked_down hook might report a denial even though the operation would actually be allowed. To fix this, make sure that security_locked_down() is called only when the return value will be taken into account (i.e. when changing one of the problematic attributes). Note: this was introduced by commit `5496197f9b` ("debugfs: Restrict debugfs when the kernel is locked down"), but it didn't matter at that time, as the SELinux support came in later. Fixes: `59438b4647` ("security,lockdown,selinux: implement SELinux lockdown") Cc: stable <stable@vger.kernel.org> Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Link: https://lore.kernel.org/r/20210507125304.144394-1-omosnace@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-18 18:05:59 +02:00
Al Viro	3a291c974c	d_path: saner calling conventions for __dentry_path() 1) lift NUL-termination into the callers 2) pass pointer to the end of buffer instead of that to beginning. (1) allows to simplify dentry_path() - we don't need to play silly games with restoring the leading / of "//deleted" after __dentry_path() would've overwritten it with NUL. We also do not need to check if (either) prepend() in there fails - if the buffer is not large enough, we'll end with negative buflen after prepend() and __dentry_path() will return the right value (ERR_PTR(-ENAMETOOLONG)) just fine. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-17 21:13:57 -04:00
Al Viro	dfe5087675	d_path: "\0" is {0,0}, not {0} Single-element array consisting of one NUL is spelled ""... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-05-17 21:11:45 -04:00
Gustavo A. R. Silva	c3754da3b7	reiserfs: Fix fall-through warnings for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a break statement instead of letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2021-05-17 19:01:38 -05:00
Linus Torvalds	8ac91e6c60	for-5.13-rc2-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmCibywACgkQxWXV+ddt WDs8QhAAlJ1INZGF01lP2mUhzesVIctIAPGBf/77Zsxmcu0rA6E66RVVsYMgGU54 +FWd+LwuFCtC1364OnDa2DnmYtvHfgR4If7EGowpk3qzZFeZQSLqayOFa5tZLYPG tJStjY32QTerfZRoxPJ1QPcoWjxNMxYqYw/s68G3tTTSHEYtlH9zNHbLm9ny507x uPHpxqKXdv3/LYHLt6XUypFqsZkMoDW98oOKvo0MZE/fjcqiDcrvAoYe+y8raFC3 FztlfA2TBmmp/PouDXLCspXAksLpVo9mgTQ0kW4K7152cC0X/zWXYNH01uQ+qTAS OFNKt2DSRIq5TR56ZmReYvRgq0FNMotYpRpxoePSF/rwL+wnsTl7QI3r/d/h/uxQ IzBeBv1Wd+1ZJcqnmEGx8Mws3nGswKyl4W65x8yin41djVoHgM4tYu3nGqielu+w ifEBmU5tUGo05z2HA5kpLjDzc6MwWaCIduQvjH/I4Vgo9fhDo6pQO2dyPC50Nkk5 DQ5jfxiXJ/ZSh5NbWtIkB/OQuwkVL1nDy2jtj3qnK06HDKstK1zui5nccFKFNOiX wtYjnGqd3+vIGIZniMuu9rbPLtG4CCerq44v1gyS6LSEycNW9/r2cOXRaiQk5pej CoYMdnmAqzwidtn4FZPRNQ7JgyckKCXQQSGCazN2vvLCXisCUrw= =ue6o -----END PGP SIGNATURE----- Merge tag 'for-5.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "A few more fixes: - fix fiemap to print extents that could get misreported due to internal extent splitting and logical merging for fiemap output - fix RCU stalls during delayed iputs - fix removed dentries still existing after log is synced" * tag 'for-5.13-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix removed dentries still existing after log is synced btrfs: return whole extents in fiemap btrfs: avoid RCU stalls while running delayed iputs btrfs: return 0 for dev_extent_hole_check_zoned hole_start in case of error	2021-05-17 09:55:10 -07:00
Josef Bacik	91df99a6eb	btrfs: do not BUG_ON in link_to_fixup_dir While doing error injection testing I got the following panic kernel BUG at fs/btrfs/tree-log.c:1862! invalid opcode: 0000 [#1] SMP NOPTI CPU: 1 PID: 7836 Comm: mount Not tainted 5.13.0-rc1+ #305 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-2.fc32 04/01/2014 RIP: 0010:link_to_fixup_dir+0xd5/0xe0 RSP: 0018:ffffb5800180fa30 EFLAGS: 00010216 RAX: fffffffffffffffb RBX: 00000000fffffffb RCX: ffff8f595287faf0 RDX: ffffb5800180fa37 RSI: ffff8f5954978800 RDI: 0000000000000000 RBP: ffff8f5953af9450 R08: 0000000000000019 R09: 0000000000000001 R10: 000151f408682970 R11: 0000000120021001 R12: ffff8f5954978800 R13: ffff8f595287faf0 R14: ffff8f5953c77dd0 R15: 0000000000000065 FS: 00007fc5284c8c40(0000) GS:ffff8f59bbd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fc5287f47c0 CR3: 000000011275e002 CR4: 0000000000370ee0 Call Trace: replay_one_buffer+0x409/0x470 ? btree_read_extent_buffer_pages+0xd0/0x110 walk_up_log_tree+0x157/0x1e0 walk_log_tree+0xa6/0x1d0 btrfs_recover_log_trees+0x1da/0x360 ? replay_one_extent+0x7b0/0x7b0 open_ctree+0x1486/0x1720 btrfs_mount_root.cold+0x12/0xea ? __kmalloc_track_caller+0x12f/0x240 legacy_get_tree+0x24/0x40 vfs_get_tree+0x22/0xb0 vfs_kern_mount.part.0+0x71/0xb0 btrfs_mount+0x10d/0x380 ? vfs_parse_fs_string+0x4d/0x90 legacy_get_tree+0x24/0x40 vfs_get_tree+0x22/0xb0 path_mount+0x433/0xa10 __x64_sys_mount+0xe3/0x120 do_syscall_64+0x3d/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae We can get -EIO or any number of legitimate errors from btrfs_search_slot(), panicing here is not the appropriate response. The error path for this code handles errors properly, simply return the error. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-17 15:49:24 +02:00
Filipe Manana	6416954ca7	btrfs: release path before starting transaction when cloning inline extent When cloning an inline extent there are a few cases, such as when we have an implicit hole at file offset 0, where we start a transaction while holding a read lock on a leaf. Starting the transaction results in a call to sb_start_intwrite(), which results in doing a read lock on a percpu semaphore. Lockdep doesn't like this and complains about it: [46.580704] ====================================================== [46.580752] WARNING: possible circular locking dependency detected [46.580799] 5.13.0-rc1 #28 Not tainted [46.580832] ------------------------------------------------------ [46.580877] cloner/3835 is trying to acquire lock: [46.580918] c00000001301d638 (sb_internal#2){.+.+}-{0:0}, at: clone_copy_inline_extent+0xe4/0x5a0 [46.581167] [46.581167] but task is already holding lock: [46.581217] c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0 [46.581293] [46.581293] which lock already depends on the new lock. [46.581293] [46.581351] [46.581351] the existing dependency chain (in reverse order) is: [46.581410] [46.581410] -> #1 (btrfs-tree-00){++++}-{3:3}: [46.581464] down_read_nested+0x68/0x200 [46.581536] __btrfs_tree_read_lock+0x70/0x1d0 [46.581577] btrfs_read_lock_root_node+0x88/0x200 [46.581623] btrfs_search_slot+0x298/0xb70 [46.581665] btrfs_set_inode_index+0xfc/0x260 [46.581708] btrfs_new_inode+0x26c/0x950 [46.581749] btrfs_create+0xf4/0x2b0 [46.581782] lookup_open.isra.57+0x55c/0x6a0 [46.581855] path_openat+0x418/0xd20 [46.581888] do_filp_open+0x9c/0x130 [46.581920] do_sys_openat2+0x2ec/0x430 [46.581961] do_sys_open+0x90/0xc0 [46.581993] system_call_exception+0x3d4/0x410 [46.582037] system_call_common+0xec/0x278 [46.582078] [46.582078] -> #0 (sb_internal#2){.+.+}-{0:0}: [46.582135] __lock_acquire+0x1e90/0x2c50 [46.582176] lock_acquire+0x2b4/0x5b0 [46.582263] start_transaction+0x3cc/0x950 [46.582308] clone_copy_inline_extent+0xe4/0x5a0 [46.582353] btrfs_clone+0x5fc/0x880 [46.582388] btrfs_clone_files+0xd8/0x1c0 [46.582434] btrfs_remap_file_range+0x3d8/0x590 [46.582481] do_clone_file_range+0x10c/0x270 [46.582558] vfs_clone_file_range+0x1b0/0x310 [46.582605] ioctl_file_clone+0x90/0x130 [46.582651] do_vfs_ioctl+0x874/0x1ac0 [46.582697] sys_ioctl+0x6c/0x120 [46.582733] system_call_exception+0x3d4/0x410 [46.582777] system_call_common+0xec/0x278 [46.582822] [46.582822] other info that might help us debug this: [46.582822] [46.582888] Possible unsafe locking scenario: [46.582888] [46.582942] CPU0 CPU1 [46.582984] ---- ---- [46.583028] lock(btrfs-tree-00); [46.583062] lock(sb_internal#2); [46.583119] lock(btrfs-tree-00); [46.583174] lock(sb_internal#2); [46.583212] [46.583212] * DEADLOCK * [46.583212] [46.583266] 6 locks held by cloner/3835: [46.583299] #0: c00000001301d448 (sb_writers#12){.+.+}-{0:0}, at: ioctl_file_clone+0x90/0x130 [46.583382] #1: c00000000f6d3768 (&sb->s_type->i_mutex_key#15){+.+.}-{3:3}, at: lock_two_nondirectories+0x58/0xc0 [46.583477] #2: c00000000f6d72a8 (&sb->s_type->i_mutex_key#15/4){+.+.}-{3:3}, at: lock_two_nondirectories+0x9c/0xc0 [46.583574] #3: c00000000f6d7138 (&ei->i_mmap_lock){+.+.}-{3:3}, at: btrfs_remap_file_range+0xd0/0x590 [46.583657] #4: c00000000f6d35f8 (&ei->i_mmap_lock/1){+.+.}-{3:3}, at: btrfs_remap_file_range+0xe0/0x590 [46.583743] #5: c000000007fa2550 (btrfs-tree-00){++++}-{3:3}, at: __btrfs_tree_read_lock+0x70/0x1d0 [46.583828] [46.583828] stack backtrace: [46.583872] CPU: 1 PID: 3835 Comm: cloner Not tainted 5.13.0-rc1 #28 [46.583931] Call Trace: [46.583955] [c0000000167c7200] [c000000000c1ee78] dump_stack+0xec/0x144 (unreliable) [46.584052] [c0000000167c7240] [c000000000274058] print_circular_bug.isra.32+0x3a8/0x400 [46.584123] [c0000000167c72e0] [c0000000002741f4] check_noncircular+0x144/0x190 [46.584191] [c0000000167c73b0] [c000000000278fc0] __lock_acquire+0x1e90/0x2c50 [46.584259] [c0000000167c74f0] [c00000000027aa94] lock_acquire+0x2b4/0x5b0 [46.584317] [c0000000167c75e0] [c000000000a0d6cc] start_transaction+0x3cc/0x950 [46.584388] [c0000000167c7690] [c000000000af47a4] clone_copy_inline_extent+0xe4/0x5a0 [46.584457] [c0000000167c77c0] [c000000000af525c] btrfs_clone+0x5fc/0x880 [46.584514] [c0000000167c7990] [c000000000af5698] btrfs_clone_files+0xd8/0x1c0 [46.584583] [c0000000167c7a00] [c000000000af5b58] btrfs_remap_file_range+0x3d8/0x590 [46.584652] [c0000000167c7ae0] [c0000000005d81dc] do_clone_file_range+0x10c/0x270 [46.584722] [c0000000167c7b40] [c0000000005d84f0] vfs_clone_file_range+0x1b0/0x310 [46.584793] [c0000000167c7bb0] [c00000000058bf80] ioctl_file_clone+0x90/0x130 [46.584861] [c0000000167c7c10] [c00000000058c894] do_vfs_ioctl+0x874/0x1ac0 [46.584922] [c0000000167c7d10] [c00000000058db4c] sys_ioctl+0x6c/0x120 [46.584978] [c0000000167c7d60] [c0000000000364a4] system_call_exception+0x3d4/0x410 [46.585046] [c0000000167c7e10] [c00000000000d45c] system_call_common+0xec/0x278 [46.585114] --- interrupt: c00 at 0x7ffff7e22990 [46.585160] NIP: 00007ffff7e22990 LR: 00000001000010ec CTR: 0000000000000000 [46.585224] REGS: c0000000167c7e80 TRAP: 0c00 Not tainted (5.13.0-rc1) [46.585280] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000244 XER: 00000000 [46.585374] IRQMASK: 0 [46.585374] GPR00: 0000000000000036 00007fffffffdec0 00007ffff7f17100 0000000000000004 [46.585374] GPR04: 000000008020940d 00007fffffffdf40 0000000000000000 0000000000000000 [46.585374] GPR08: 0000000000000004 0000000000000000 0000000000000000 0000000000000000 [46.585374] GPR12: 0000000000000000 00007ffff7ffa940 0000000000000000 0000000000000000 [46.585374] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [46.585374] GPR20: 0000000000000000 000000009123683e 00007fffffffdf40 0000000000000000 [46.585374] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000004 [46.585374] GPR28: 0000000100030260 0000000100030280 0000000000000003 000000000000005f [46.585919] NIP [00007ffff7e22990] 0x7ffff7e22990 [46.585964] LR [00000001000010ec] 0x1000010ec [46.586010] --- interrupt: c00 This should be a false positive, as both locks are acquired in read mode. Nevertheless, we don't need to hold a leaf locked when we start the transaction, so just release the leaf (path) before starting it. Reported-by: Ritesh Harjani <riteshh@linux.ibm.com> Link: https://lore.kernel.org/linux-btrfs/20210513214404.xks77p566fglzgum@riteshh-domain/ Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-17 15:49:19 +02:00
Pavel Begunkov	7a27472770	io_uring: don't modify req->poll for rw __io_queue_proc() is used by both poll and apoll, so we should not access req->poll directly but selecting right struct io_poll_iocb depending on use case. Reported-and-tested-by: syzbot+a84b8783366ecb1c65d0@syzkaller.appspotmail.com Fixes: `ea6a693d86` ("io_uring: disable multishot poll for double poll add cases") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/4a6a1de31142d8e0250fe2dfd4c8923d82a5bbfc.1621251795.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-17 07:28:48 -06:00
Pavel Skripkin	a149127be5	reiserfs: add check for invalid 1st journal block syzbot reported divide error in reiserfs. The problem was in incorrect journal 1st block. Syzbot's reproducer manualy generated wrong superblock with incorrect 1st block. In journal_init() wasn't any checks about this particular case. For example, if 1st journal block is before superblock 1st block, it can cause zeroing important superblock members in do_journal_end(). Link: https://lore.kernel.org/r/20210517121545.29645-1-paskripkin@gmail.com Reported-by: syzbot+0ba9909df31c6a36974d@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>	2021-05-17 15:07:54 +02:00
Andy Shevchenko	776797f1bd	nilfs2: Switch to use %ptTs Use %ptTs instead of open coded variant to print contents of time64_t type in human readable form. Use sysfs_emit() at the same time in the changed functions. Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: linux-nilfs@vger.kernel.org Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20210511153958.34527-3-andriy.shevchenko@linux.intel.com	2021-05-17 12:01:46 +02:00
Greg Kroah-Hartman	0e9e37d042	Merge 5.13-rc2 into driver-core-next We need the driver core fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-17 09:44:25 +02:00
wenhuizhang	4236a26a6b	cifs: remove deadstore in cifs_close_all_deferred_files() Deadstore detected by Lukas Bulwahn's CodeChecker Tool (ELISA group). line 741 struct cifsInodeInfo *cinode; line 747 cinode = CIFS_I(d_inode(cfile->dentry)); could be deleted. cinode on filesystem should not be deleted when files are closed, they are representations of some data fields on a physical disk, thus no further action is required. The virtual inode on vfs will be handled by vfs automatically, and the denotation is inode, which is different from the cinode. Signed-off-by: wenhuizhang <wenhui@gwmail.gwu.edu> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-05-16 23:05:46 -05:00
Darrick J. Wong	9d5e8492ee	xfs: adjust rt allocation minlen when extszhint > rtextsize xfs_bmap_rtalloc doesn't handle realtime extent files with extent size hints larger than the rt volume's extent size properly, because xfs_bmap_extsize_align can adjust the offset/length parameters to try to fit the extent size hint. Under these conditions, minlen has to be large enough so that any allocation returned by xfs_rtallocate_extent will be large enough to cover at least one of the blocks that the caller asked for. If the allocation is too short, bmapi_write will return no mapping for the requested range, which causes ENOSPC errors in other parts of the filesystem. Therefore, adjust minlen upwards to fix this. This can be found by running generic/263 (g/127 or g/522) with a realtime extent size hint that's larger than the rt volume extent size. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Allison Henderson <allison.henderson@oracle.com>	2021-05-16 18:45:03 -07:00
Linus Torvalds	a4147415bd	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "13 patches. Subsystems affected by this patch series: resource, squashfs, hfsplus, modprobe, and mm (hugetlb, slub, userfaultfd, ksm, pagealloc, kasan, pagemap, and ioremap)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm/ioremap: fix iomap_max_page_shift docs: admin-guide: update description for kernel.modprobe sysctl hfsplus: prevent corruption in shrinking truncate mm/filemap: fix readahead return types kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled mm: fix struct page layout on 32-bit systems ksm: revert "use GET_KSM_PAGE_NOLOCK to get ksm page in remove_rmap_item_from_tree()" userfaultfd: release page in error path to avoid BUG_ON squashfs: fix divide error in calculate_skip() kernel/resource: fix return code check in __request_free_mem_region mm, slub: move slub_debug static key enabling outside slab_mutex mm/hugetlb: fix cow where page writtable in child mm/hugetlb: fix F_SEAL_FUTURE_WRITE	2021-05-15 09:42:27 -07:00
Linus Torvalds	5601591035	io_uring-5.13-2021-05-14 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmCew+oQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpnAAD/0RGU6BTpYX0AjSuHtHPsGxAWlLroe7Yvew BBXX58uL9LqSYDe+FfCherA7GbyLdXrN9yvbeKVEZH7wmFV0u6dGX/RiK8lpWvfd pMTSf14QkASkoQ5bMQURdSv73OruzKQN7CisN3btwD2sDcqZqz7RsFuWf5Fuxs4r UjyQdJpt+sNs1UTvHBjqQcCrAipEVWePH93/jhayx8iBykab4+aNKFtysjqYJdD0 LL5NG5LihP5G2WECD5Q7vDmb9+km3cN5TJLhHSDsmQg4Ln6U9zd4X3bnvEXNtlWk edNNhKVmS8rtwK2qiZCoVlR4HrSjCCjUg/0h6hyOL8AYNV9vPup/0EuWfRKxLE+3 l3TRTO02/SM8Tjdu27lYtxFYnIkIgRv+w2/ZmURzwnPpIvjwbdfth5DN+10bFnUV IPKcEvMXhbgdyQ5OtA1oPk3udWesrk836s2W6kqBLSEeqFrb0UbI8A40VXxoAfVQ Ig5LmuuDAlZzt4fCu3GYhVZS1jj2CXuBsGrsbSVZaJGbMu9MPbmMUoz6XBS3lsY6 gnhYv2paMuOo/hD6q4XeCH4j1jveLXgzenW3fzEP4E0wxfvMkybyWCwfW14a15Q+ Sr8VEEUTc74RfW5pP0ZTvrYGnR+oJwB1RacdbU5WpOrB01A5bWkmb0fRNHfj8vjH h49oIdqZKw== =5+hs -----END PGP SIGNATURE----- Merge tag 'io_uring-5.13-2021-05-14' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Just a few minor fixes/changes: - Fix issue with double free race for linked timeout completions - Fix reference issue with timeouts - Remove last few places that make SQPOLL special, since it's just an io thread now. - Bump maximum allowed registered buffers, as we don't allocate as much anymore" * tag 'io_uring-5.13-2021-05-14' of git://git.kernel.dk/linux-block: io_uring: increase max number of reg buffers io_uring: further remove sqpoll limits on opcodes io_uring: fix ltout double free on completion race io_uring: fix link timeout refs	2021-05-15 08:43:44 -07:00
Linus Torvalds	41f035c062	Changes since last update: - update documentation to fix the broken illustration due to ReST conversion by accident at that time and complete the big pcluster introduction; - fix 1 lcluster-sized pclusters for the big pcluster feature. -----BEGIN PGP SIGNATURE----- iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYJ8DGBEceGlhbmdAa2Vy bmVsLm9yZwAKCRA5NzHcH7XmBAC0AQDaap8fSTWMLroLLBCcr1MwTqoS6wf44tx8 iq2FFcU/hQD+PqrnCFJW7wjWjMC84weOudRvh2/lu/GKH2a5LgJ5Xgs= =UTkq -----END PGP SIGNATURE----- Merge tag 'erofs-for-5.13-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "This mainly fixes 1 lcluster-sized pclusters for the big pcluster feature, which can be forcely generated by mkfs as a specific on-disk case for per-(sub)file compression strategies but missed to handle in runtime properly. Also, documentation updates are included to fix the broken illustration due to the ReST conversion by accident and complete the big pcluster introduction. Summary: - update documentation to fix the broken illustration due to ReST conversion by accident at that time and complete the big pcluster introduction - fix 1 lcluster-sized pclusters for the big pcluster feature" * tag 'erofs-for-5.13-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: fix 1 lcluster-sized pcluster for big pcluster erofs: update documentation about data compression erofs: fix broken illustration in documentation	2021-05-15 08:37:21 -07:00
Linus Torvalds	393f42f113	dax fixes for 5.13-rc2 - Fix a hang condition (missed wakeups with virtiofs when invalidating entries) -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEf41QbsdZzFdA8EfZHtKRamZ9iAIFAmCfBboACgkQHtKRamZ9 iAIiHQ/+LqD0USAXxWQFcDupTATVy0Z/hpUCBWcEKII/ljluUWLLkGUT2/Gy3TXE 0HZmJBWyJyqNRyWtzNZ8hu4FpxSawtYVkqTv0/ODAjrpva9m8p4eVYFp0UpTHn3d KL/DD+VeLWs1yoPIXgqd2dSwV2YsAJSEYYXcF0CYeHOWH4BVGrOglQBL7kJyra6n IQsnXGJQMXkOoDMB/5xTI7LgYD0R09OevsHE6Eupxm9SI8ud2qUQlBLde8Eh+7qb pMhkeNNjG2w461C8215rhGPzCweMMasiBwUz1EHXDpXebZSsDfURwBWMCFbe/H7p x3u0s3hlJydTZmUnaMeWje+wR1Ku8YXiBeelMobpXi4RzNyebhZ0Fap3fMDbrR8/ 5mro6H9blEYGZ1kISHSdvZUfh6uzWiL8hs+uBb/ANICZouValjyVrHuTauwncyQP PHaKZYo/kh6Hj3j1LYDHbMs69Cbr+E0x/JFnYAxIkZSggYJeXN9+3K9hhUXcQNIf Lh4p1F/t7DmIXzljFu6qwJl9JmCC+yx4PcSgOqa6vPvm2H6KEH+rMCLHtu+WgaXq 1Gj9EI1sshTXgot8Y1xlPCCTLNqxhV0O30L+EsasmjNCjWwVRi2zz+FjkgFAeDvo 7LZUNVepC9YMffknBNGkfNibfVBn5/DxbGR/9SWygHy8ahECoLc= =cWwB -----END PGP SIGNATURE----- Merge tag 'dax-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax fixes from Dan Williams: "A fix for a hang condition due to missed wakeups in the filesystem-dax core when exercised by virtiofs. This bug has been there from the beginning, but the condition has not triggered on other filesystems since they hold a lock over invalidation events" * tag 'dax-fixes-5.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: Wake up all waiters after invalidating dax entry dax: Add a wakeup mode parameter to put_unlocked_entry() dax: Add an enum for specifying dax wakup mode	2021-05-15 08:28:08 -07:00
Jouni Roivas	c3187cf322	hfsplus: prevent corruption in shrinking truncate I believe there are some issues introduced by commit `31651c6071` ("hfsplus: avoid deadlock on file truncation") HFS+ has extent records which always contains 8 extents. In case the first extent record in catalog file gets full, new ones are allocated from extents overflow file. In case shrinking truncate happens to middle of an extent record which locates in extents overflow file, the logic in hfsplus_file_truncate() was changed so that call to hfs_brec_remove() is not guarded any more. Right action would be just freeing the extents that exceed the new size inside extent record by calling hfsplus_free_extents(), and then check if the whole extent record should be removed. However since the guard (blk_cnt > start) is now after the call to hfs_brec_remove(), this has unfortunate effect that the last matching extent record is removed unconditionally. To reproduce this issue, create a file which has at least 10 extents, and then perform shrinking truncate into middle of the last extent record, so that the number of remaining extents is not under or divisible by 8. This causes the last extent record (8 extents) to be removed totally instead of truncating into middle of it. Thus this causes corruption, and lost data. Fix for this is simply checking if the new truncated end is below the start of this extent record, making it safe to remove the full extent record. However call to hfs_brec_remove() can't be moved to it's previous place since we're dropping ->tree_lock and it can cause a race condition and the cached info being invalidated possibly corrupting the node data. Another issue is related to this one. When entering into the block (blk_cnt > start) we are not holding the ->tree_lock. We break out from the loop not holding the lock, but hfs_find_exit() does unlock it. Not sure if it's possible for someone else to take the lock under our feet, but it can cause hard to debug errors and premature unlocking. Even if there's no real risk of it, the locking should still always be kept in balance. Thus taking the lock now just before the check. Link: https://lkml.kernel.org/r/20210429165139.3082828-1-jouni.roivas@tuxera.com Fixes: `31651c6071` ("hfsplus: avoid deadlock on file truncation") Signed-off-by: Jouni Roivas <jouni.roivas@tuxera.com> Reviewed-by: Anton Altaparmakov <anton@tuxera.com> Cc: Anatoly Trosinenko <anatoly.trosinenko@gmail.com> Cc: Viacheslav Dubeyko <slava@dubeyko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-14 19:41:32 -07:00
Matthew Wilcox (Oracle)	076171a677	mm/filemap: fix readahead return types A readahead request will not allocate more memory than can be represented by a size_t, even on systems that have HIGHMEM available. Change the length functions from returning an loff_t to a size_t. Link: https://lkml.kernel.org/r/20210510201201.1558972-1-willy@infradead.org Fixes: `32c0a6bcaa` ("btrfs: add and use readahead_batch_length") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-14 19:41:32 -07:00
Phillip Lougher	d6e621de1f	squashfs: fix divide error in calculate_skip() Sysbot has reported a "divide error" which has been identified as being caused by a corrupted file_size value within the file inode. This value has been corrupted to a much larger value than expected. Calculate_skip() is passed i_size_read(inode) >> msblk->block_log. Due to the file_size value corruption this overflows the int argument/variable in that function, leading to the divide error. This patch changes the function to use u64. This will accommodate any unexpectedly large values due to corruption. The value returned from calculate_skip() is clamped to be never more than SQUASHFS_CACHED_BLKS - 1, or 7. So file_size corruption does not lead to an unexpectedly large return result here. Link: https://lkml.kernel.org/r/20210507152618.9447-1-phillip@squashfs.org.uk Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk> Reported-by: <syzbot+e8f781243ce16ac2f962@syzkaller.appspotmail.com> Reported-by: <syzbot+7b98870d4fec9447b951@syzkaller.appspotmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-14 19:41:32 -07:00
Peter Xu	22247efd82	mm/hugetlb: fix F_SEAL_FUTURE_WRITE Patch series "mm/hugetlb: Fix issues on file sealing and fork", v2. Hugh reported issue with F_SEAL_FUTURE_WRITE not applied correctly to hugetlbfs, which I can easily verify using the memfd_test program, which seems that the program is hardly run with hugetlbfs pages (as by default shmem). Meanwhile I found another probably even more severe issue on that hugetlb fork won't wr-protect child cow pages, so child can potentially write to parent private pages. Patch 2 addresses that. After this series applied, "memfd_test hugetlbfs" should start to pass. This patch (of 2): F_SEAL_FUTURE_WRITE is missing for hugetlb starting from the first day. There is a test program for that and it fails constantly. $ ./memfd_test hugetlbfs memfd-hugetlb: CREATE memfd-hugetlb: BASIC memfd-hugetlb: SEAL-WRITE memfd-hugetlb: SEAL-FUTURE-WRITE mmap() didn't fail as expected Aborted (core dumped) I think it's probably because no one is really running the hugetlbfs test. Fix it by checking FUTURE_WRITE also in hugetlbfs_file_mmap() as what we do in shmem_mmap(). Generalize a helper for that. Link: https://lkml.kernel.org/r/20210503234356.9097-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20210503234356.9097-2-peterx@redhat.com Fixes: `ab3948f58f` ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") Signed-off-by: Peter Xu <peterx@redhat.com> Reported-by: Hugh Dickins <hughd@google.com> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-14 19:41:32 -07:00
Chao Yu	91f0fb6903	f2fs: compress: clean up parameter of __f2fs_cluster_blocks() Previously, in order to reuse __f2fs_cluster_blocks(), f2fs_is_compressed_cluster() assigned a compress_ctx type variable, which is used to pass few parameters (cc.inode, cc.cluster_size, cc.cluster_idx), it's wasteful to allocate such large space in stack. Let's clean up parameters of __f2fs_cluster_blocks() to avoid that. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-14 11:22:09 -07:00
Chao Yu	fbec3b963a	f2fs: compress: remove unneeded f2fs_put_dnode() If we don't initialize dn.inode_page for f2fs_get_block(), f2fs_get_block() will call f2fs_put_dnode() itself, so let's remove unneeded f2fs_put_dnode() in f2fs_vm_page_mkwrite(). Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-14 11:22:08 -07:00
Chao Yu	89e53ff165	f2fs: atgc: fix to set default age threshold Default age threshold value is missed to set, fix it. Fixes: `093749e296` ("f2fs: support age threshold based garbage collection") Reported-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-14 11:22:08 -07:00
Shin'ichiro Kawasaki	d927ccfccb	f2fs: Prevent swap file in LFS mode The kernel writes to swap files on f2fs directly without the assistance of the filesystem. This direct write by kernel can be non-sequential even when the f2fs is in LFS mode. Such non-sequential write conflicts with the LFS semantics. Especially when f2fs is set up on zoned block devices, the non-sequential write causes unaligned write command errors. To avoid the non-sequential writes to swap files, prevent swap file activation when the filesystem is in LFS mode. Fixes: `4969c06a0d` ("f2fs: support swap file w/ DIO") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.10+ Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-14 11:22:08 -07:00
Chao Yu	cad83c968c	f2fs: fix to avoid racing on fsync_entry_slab by multi filesystem instances As syzbot reported, there is an use-after-free issue during f2fs recovery: Use-after-free write at 0xffff88823bc16040 (in kfence-#10): kmem_cache_destroy+0x1f/0x120 mm/slab_common.c:486 f2fs_recover_fsync_data+0x75b0/0x8380 fs/f2fs/recovery.c:869 f2fs_fill_super+0x9393/0xa420 fs/f2fs/super.c:3945 mount_bdev+0x26c/0x3a0 fs/super.c:1367 legacy_get_tree+0xea/0x180 fs/fs_context.c:592 vfs_get_tree+0x86/0x270 fs/super.c:1497 do_new_mount fs/namespace.c:2905 [inline] path_mount+0x196f/0x2be0 fs/namespace.c:3235 do_mount fs/namespace.c:3248 [inline] __do_sys_mount fs/namespace.c:3456 [inline] __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433 do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae The root cause is multi f2fs filesystem instances can race on accessing global fsync_entry_slab pointer, result in use-after-free issue of slab cache, fixes to init/destroy this slab cache only once during module init/destroy procedure to avoid this issue. Reported-by: syzbot+9d90dad32dd9727ed084@syzkaller.appspotmail.com Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-14 11:22:08 -07:00
Chao Yu	b763f3bedc	f2fs: restructure f2fs page.private layout Restruct f2fs page private layout for below reasons: There are some cases that f2fs wants to set a flag in a page to indicate a specified status of page: a) page is in transaction list for atomic write b) page contains dummy data for aligned write c) page is migrating for GC d) page contains inline data for inline inode flush e) page belongs to merkle tree, and is verified for fsverity f) page is dirty and has filesystem/inode reference count for writeback g) page is temporary and has decompress io context reference for compression There are existed places in page structure we can use to store f2fs private status/data: - page.flags: PG_checked, PG_private - page.private However it was a mess when we using them, which may cause potential confliction: page.private PG_private PG_checked page._refcount (+1 at most) a) -1 set +1 b) -2 set c), d), e) set f) 0 set +1 g) pointer set The other problem is page.flags has no free slot, if we can avoid set zero to page.private and set PG_private flag, then we use non-zero value to indicate PG_private status, so that we may have chance to reclaim PG_private slot for other usage. [1] The other concern is f2fs has bad scalability in aspect of indicating more page status. So in this patch, let's restructure f2fs' page.private as below to solve above issues: Layout A: lowest bit should be 1 \| bit0 = 1 \| bit1 \| bit2 \| ... \| bit MAX \| private data .... \| bit 0 PAGE_PRIVATE_NOT_POINTER bit 1 PAGE_PRIVATE_ATOMIC_WRITE bit 2 PAGE_PRIVATE_DUMMY_WRITE bit 3 PAGE_PRIVATE_ONGOING_MIGRATION bit 4 PAGE_PRIVATE_INLINE_INODE bit 5 PAGE_PRIVATE_REF_RESOURCE bit 6- f2fs private data Layout B: lowest bit should be 0 page.private is a wrapped pointer. After the change: page.private PG_private PG_checked page._refcount (+1 at most) a) 11 set +1 b) 101 set +1 c) 1001 set +1 d) 10001 set +1 e) set f) 100001 set +1 g) pointer set +1 [1] https://lore.kernel.org/linux-f2fs-devel/20210422154705.GO3596236@casper.infradead.org/T/#u Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-14 11:22:08 -07:00
Chao Yu	ee68d27181	f2fs: add cp_error check in f2fs_write_compressed_pages This patch adds cp_error check in f2fs_write_compressed_pages() like we did in f2fs_write_single_data_page() Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-14 11:22:07 -07:00
Chao Yu	5db479f049	f2fs: compress: rename __cluster_may_compress This patch renames __cluster_may_compress() to cluster_has_invalid_data() for better readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-14 11:22:07 -07:00
Linus Torvalds	ac524ece21	f2fs-5.13-rc1-fix This series of patches fix some critical bugs such as memory leak in compression flows, kernel panic when handling errors, and swapon failure due to newly added condition check. -----BEGIN PGP SIGNATURE----- iQIyBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmCeTNcACgkQQBSofoJI UNKc0Q/3X9Tngns5DpnzQw9kbWochucG/7Vf1HjdvV2gE7IxruDPofGtBdhPdSYV uR+nP9dxhXLQQNfWS2KICzdj0yceKCKq8xpnnNdq9SGjVJmUjCD39ByGJV3GMOGM fY6dizcywltH7iBQboMZ0Eh3ivPh6ugl5klDo20WzYsH6F/UF7CPSuSJ3K5ezGuY T6R3NkqG8v1cS6+5u+teDpmdCCHOCBEeizBFQ6XskNDBavbw7KEA0liwKOv6eghB PdoeqYemg1VfOHAKqP6F3o+eSlsT5Ljs0Zmc5x8h8qRS76JK/hb9REFngrcERaIw GPDnMPCHniCHMd61z90oqGtMf4PFqLUVnBTdxhxLK5G4+u874dlZLciKWGDIGTLv eNU2W+8c9s+KdJAZFJbYN5zVoyJUR5SW7RcYTuvZWt8wX38Ch+FZEGqOC8FxWyWU i1WHXZiGeifIlIeqUOPJP5sbslL2hfK5OqMYJotAeIW/E2RyJWnc+Yo2UwvmvXVU xPOKFOn9nAAQz2GGgSpvGaWAVfMNqhoLw7/gzwdkabP0EASIzHuW2PCJhp59c1NO Fb9eUi7yhgd94vDbYftXRBgAhrUmCd0u+/gySyou4vujWtfWcO+AvOEreV7VRFfL Su0bGkBJ03ThFEFAZnY14RenydGSwpk5Fd9wkRy4Qk7mBv9sRg== =NeKH -----END PGP SIGNATURE----- Merge tag 'f2fs-5.13-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "This fixes some critical bugs such as memory leak in compression flows, kernel panic when handling errors, and swapon failure due to newly added condition check" * tag 'f2fs-5.13-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: return EINVAL for hole cases in swap file f2fs: avoid swapon failure by giving a warning first f2fs: compress: fix to assign cc.cluster_idx correctly f2fs: compress: fix race condition of overwrite vs truncate f2fs: compress: fix to free compress page correctly f2fs: support iflag change given the mask f2fs: avoid null pointer access when handling IPU error	2021-05-14 10:49:20 -07:00
Pavel Begunkov	489809e2e2	io_uring: increase max number of reg buffers Since recent changes instead of storing a large array of struct io_mapped_ubuf, we store pointers to them, that is 4 times slimmer and we should not to so worry about restricting max number of registererd buffer slots, increase the limit 4 times. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d3dee1da37f46da416aa96a16bf9e5094e10584d.1620990371.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-14 06:06:34 -06:00
Pavel Begunkov	2d74d0421e	io_uring: further remove sqpoll limits on opcodes There are three types of requests that left disabled for sqpoll, namely epoll ctx, statx, and resources update. Since SQPOLL task is now closely mimics a userspace thread, remove the restrictions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/909b52d70c45636d8d7897582474ea5aab5eed34.1620990306.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-14 06:06:23 -06:00
Pavel Begunkov	447c19f3b5	io_uring: fix ltout double free on completion race Always remove linked timeout on io_link_timeout_fn() from the master request link list, otherwise we may get use-after-free when first io_link_timeout_fn() puts linked timeout in the fail path, and then will be found and put on master's free. Cc: stable@vger.kernel.org # 5.10+ Fixes: `90cd7e4249` ("io_uring: track link timeout's master explicitly") Reported-and-tested-by: syzbot+5a864149dd970b546223@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/69c46bf6ce37fec4fdcd98f0882e18eb07ce693a.1620990121.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-14 06:06:15 -06:00
Wolfram Sang	d616f56d34	debugfs: only accept read attributes for blobs Blobs can only be read. So, keep only 'read' file attributes because the others will not work and only confuse users. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20210504131350.46586-1-wsa+renesas@sang-engineering.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-14 13:36:18 +02:00
Filipe Manana	54a40fc3a1	btrfs: fix removed dentries still existing after log is synced When we move one inode from one directory to another and both the inode and its previous parent directory were logged before, we are not supposed to have the dentry for the old parent if we have a power failure after the log is synced. Only the new dentry is supposed to exist. Generally this works correctly, however there is a scenario where this is not currently working, because the old parent of the file/directory that was moved is not authoritative for a range that includes the dir index and dir item keys of the old dentry. This case is better explained with the following example and reproducer: # The test requires a very specific layout of keys and items in the # fs/subvolume btree to trigger the bug. So we want to make sure that # on whatever platform we are, we have the same leaf/node size. # # Currently in btrfs the node/leaf size can not be smaller than the page # size (but it can be greater than the page size). So use the largest # supported node/leaf size (64K). $ mkfs.btrfs -f -n 65536 /dev/sdc $ mount /dev/sdc /mnt # "testdir" is inode 257. $ mkdir /mnt/testdir $ chmod 755 /mnt/testdir # Create several empty files to have the directory "testdir" with its # items spread over several leaves (7 in this case). $ for ((i = 1; i <= 1200; i++)); do echo -n > /mnt/testdir/file$i done # Create our test directory "dira", inode number 1458, which gets all # its items in leaf 7. # # The BTRFS_DIR_ITEM_KEY item for inode 257 ("testdir") that points to # the entry named "dira" is in leaf 2, while the BTRFS_DIR_INDEX_KEY # item that points to that entry is in leaf 3. # # For this particular filesystem node size (64K), file count and file # names, we endup with the directory entry items from inode 257 in # leaves 2 and 3, as previously mentioned - what matters for triggering # the bug exercised by this test case is that those items are not placed # in leaf 1, they must be placed in a leaf different from the one # containing the inode item for inode 257. # # The corresponding BTRFS_DIR_ITEM_KEY and BTRFS_DIR_INDEX_KEY items for # the parent inode (257) are the following: # # item 460 key (257 DIR_ITEM 3724298081) itemoff 48344 itemsize 34 # location key (1458 INODE_ITEM 0) type DIR # transid 6 data_len 0 name_len 4 # name: dira # # and: # # item 771 key (257 DIR_INDEX 1202) itemoff 36673 itemsize 34 # location key (1458 INODE_ITEM 0) type DIR # transid 6 data_len 0 name_len 4 # name: dira $ mkdir /mnt/testdir/dira # Make sure everything done so far is durably persisted. $ sync # Now do a change to inode 257 ("testdir") that does not result in # COWing leaves 2 and 3 - the leaves that contain the directory items # pointing to inode 1458 (directory "dira"). # # Changing permissions, the owner/group, updating or adding a xattr, # etc, will not change (COW) leaves 2 and 3. So for the sake of # simplicity change the permissions of inode 257, which results in # updating its inode item and therefore change (COW) only leaf 1. $ chmod 700 /mnt/testdir # Now fsync directory inode 257. # # Since only the first leaf was changed/COWed, we log the inode item of # inode 257 and only the dentries found in the first leaf, all have a # key type of BTRFS_DIR_ITEM_KEY, and no keys of type # BTRFS_DIR_INDEX_KEY, because they sort after the former type and none # exist in the first leaf. # # We also log 3 items that represent ranges for dir items and dir # indexes for which the log is authoritative: # # 1) a key of type BTRFS_DIR_LOG_ITEM_KEY, which indicates the log is # authoritative for all BTRFS_DIR_ITEM_KEY keys that have an offset # in the range [0, 2285968570] (the offset here is the crc32c of the # dentry's name). The value 2285968570 corresponds to the offset of # the first key of leaf 2 (which is of type BTRFS_DIR_ITEM_KEY); # # 2) a key of type BTRFS_DIR_LOG_ITEM_KEY, which indicates the log is # authoritative for all BTRFS_DIR_ITEM_KEY keys that have an offset # in the range [4293818216, (u64)-1] (the offset here is the crc32c # of the dentry's name). The value 4293818216 corresponds to the # offset of the highest key of type BTRFS_DIR_ITEM_KEY plus 1 # (4293818215 + 1), which is located in leaf 2; # # 3) a key of type BTRFS_DIR_LOG_INDEX_KEY, with an offset of 1203, # which indicates the log is authoritative for all keys of type # BTRFS_DIR_INDEX_KEY that have an offset in the range # [1203, (u64)-1]. The value 1203 corresponds to the offset of the # last key of type BTRFS_DIR_INDEX_KEY plus 1 (1202 + 1), which is # located in leaf 3; # # Also, because "testdir" is a directory and inode 1458 ("dira") is a # child directory, we log inode 1458 too. $ xfs_io -c "fsync" /mnt/testdir # Now move "dira", inode 1458, to be a child of the root directory # (inode 256). # # Because this inode was previously logged, when "testdir" was fsynced, # the log is updated so that the old inode reference, referring to inode # 257 as the parent, is deleted and the new inode reference, referring # to inode 256 as the parent, is added to the log. $ mv /mnt/testdir/dira /mnt # Now change some file and fsync it. This guarantees the log changes # made by the previous move/rename operation are persisted. We do not # need to do any special modification to the file, just any change to # any file and sync the log. $ xfs_io -c "pwrite -S 0xab 0 64K" -c "fsync" /mnt/testdir/file1 # Simulate a power failure and then mount again the filesystem to # replay the log tree. We want to verify that we are able to mount the # filesystem, meaning log replay was successful, and that directory # inode 1458 ("dira") only has inode 256 (the filesystem's root) as # its parent (and no longer a child of inode 257). # # It used to happen that during log replay we would end up having # inode 1458 (directory "dira") with 2 hard links, being a child of # inode 257 ("testdir") and inode 256 (the filesystem's root). This # resulted in the tree checker detecting the issue and causing the # mount operation to fail (with -EIO). # # This happened because in the log we have the new name/parent for # inode 1458, which results in adding the new dentry with inode 256 # as the parent, but the previous dentry, under inode 257 was never # removed - this is because the ranges for dir items and dir indexes # of inode 257 for which the log is authoritative do not include the # old dir item and dir index for the dentry of inode 257 referring to # inode 1458: # # - for dir items, the log is authoritative for the ranges # [0, 2285968570] and [4293818216, (u64)-1]. The dir item at inode 257 # pointing to inode 1458 has a key of (257 DIR_ITEM 3724298081), as # previously mentioned, so the dir item is not deleted when the log # replay procedure processes the authoritative ranges, as 3724298081 # is outside both ranges; # # - for dir indexes, the log is authoritative for the range # [1203, (u64)-1], and the dir index item of inode 257 pointing to # inode 1458 has a key of (257 DIR_INDEX 1202), as previously # mentioned, so the dir index item is not deleted when the log # replay procedure processes the authoritative range. <power failure> $ mount /dev/sdc /mnt mount: /mnt: can't read superblock on /dev/sdc. $ dmesg (...) [87849.840509] BTRFS info (device sdc): start tree-log replay [87849.875719] BTRFS critical (device sdc): corrupt leaf: root=5 block=30539776 slot=554 ino=1458, invalid nlink: has 2 expect no more than 1 for dir [87849.878084] BTRFS info (device sdc): leaf 30539776 gen 7 total ptrs 557 free space 2092 owner 5 [87849.879516] BTRFS info (device sdc): refs 1 lock_owner 0 current 2099108 [87849.880613] item 0 key (1181 1 0) itemoff 65275 itemsize 160 [87849.881544] inode generation 6 size 0 mode 100644 [87849.882692] item 1 key (1181 12 257) itemoff 65258 itemsize 17 (...) [87850.562549] item 556 key (1458 12 257) itemoff 16017 itemsize 14 [87850.563349] BTRFS error (device dm-0): block=30539776 write time tree block corruption detected [87850.564386] ------------[ cut here ]------------ [87850.564920] WARNING: CPU: 3 PID: 2099108 at fs/btrfs/disk-io.c:465 csum_one_extent_buffer+0xed/0x100 [btrfs] [87850.566129] Modules linked in: btrfs dm_zero dm_snapshot (...) [87850.573789] CPU: 3 PID: 2099108 Comm: mount Not tainted 5.12.0-rc8-btrfs-next-86 #1 (...) [87850.587481] Call Trace: [87850.587768] btree_csum_one_bio+0x244/0x2b0 [btrfs] [87850.588354] ? btrfs_bio_fits_in_stripe+0xd8/0x110 [btrfs] [87850.589003] btrfs_submit_metadata_bio+0xb7/0x100 [btrfs] [87850.589654] submit_one_bio+0x61/0x70 [btrfs] [87850.590248] submit_extent_page+0x91/0x2f0 [btrfs] [87850.590842] write_one_eb+0x175/0x440 [btrfs] [87850.591370] ? find_extent_buffer_nolock+0x1c0/0x1c0 [btrfs] [87850.592036] btree_write_cache_pages+0x1e6/0x610 [btrfs] [87850.592665] ? free_debug_processing+0x1d5/0x240 [87850.593209] do_writepages+0x43/0xf0 [87850.593798] ? __filemap_fdatawrite_range+0xa4/0x100 [87850.594391] __filemap_fdatawrite_range+0xc5/0x100 [87850.595196] btrfs_write_marked_extents+0x68/0x160 [btrfs] [87850.596202] btrfs_write_and_wait_transaction.isra.0+0x4d/0xd0 [btrfs] [87850.597377] btrfs_commit_transaction+0x794/0xca0 [btrfs] [87850.598455] ? _raw_spin_unlock_irqrestore+0x32/0x60 [87850.599305] ? kmem_cache_free+0x15a/0x3d0 [87850.600029] btrfs_recover_log_trees+0x346/0x380 [btrfs] [87850.601021] ? replay_one_extent+0x7d0/0x7d0 [btrfs] [87850.601988] open_ctree+0x13c9/0x1698 [btrfs] [87850.602846] btrfs_mount_root.cold+0x13/0xed [btrfs] [87850.603771] ? kmem_cache_alloc_trace+0x7c9/0x930 [87850.604576] ? vfs_parse_fs_string+0x5d/0xb0 [87850.605293] ? kfree+0x276/0x3f0 [87850.605857] legacy_get_tree+0x30/0x50 [87850.606540] vfs_get_tree+0x28/0xc0 [87850.607163] fc_mount+0xe/0x40 [87850.607695] vfs_kern_mount.part.0+0x71/0x90 [87850.608440] btrfs_mount+0x13b/0x3e0 [btrfs] (...) [87850.629477] ---[ end trace 68802022b99a1ea0 ]--- [87850.630849] BTRFS: error (device sdc) in btrfs_commit_transaction:2381: errno=-5 IO failure (Error while writing out transaction) [87850.632422] BTRFS warning (device sdc): Skipping commit of aborted transaction. [87850.633416] BTRFS: error (device sdc) in cleanup_transaction:1978: errno=-5 IO failure [87850.634553] BTRFS: error (device sdc) in btrfs_replay_log:2431: errno=-5 IO failure (Failed to recover log tree) [87850.637529] BTRFS error (device sdc): open_ctree failed In this example the inode we moved was a directory, so it was easy to detect the problem because directories can only have one hard link and the tree checker immediately detects that. If the moved inode was a file, then the log replay would succeed and we would end up having both the new hard link (/mnt/foo) and the old hard link (/mnt/testdir/foo) present, but only the new one should be present. Fix this by forcing re-logging of the old parent directory when logging the new name during a rename operation. This ensures we end up with a log that is authoritative for a range covering the keys for the old dentry, therefore causing the old dentry do be deleted when replaying the log. A test case for fstests will follow up soon. Fixes: `64d6b281ba` ("btrfs: remove unnecessary check_parent_dirs_for_sync()") CC: stable@vger.kernel.org # 5.12+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-14 01:23:04 +02:00
Boris Burkov	15c7745c9a	btrfs: return whole extents in fiemap `xfs_io -c 'fiemap <off> <len>' <file>` can give surprising results on btrfs that differ from xfs. btrfs prints out extents trimmed to fit the user input. If the user's fiemap request has an offset, then rather than returning each whole extent which intersects that range, we also trim the start extent to not have start < off. Documentation in filesystems/fiemap.txt and the xfs_io man page suggests that returning the whole extent is expected. Some cases which all yield the same fiemap in xfs, but not btrfs: dd if=/dev/zero of=$f bs=4k count=1 sudo xfs_io -c 'fiemap 0 1024' $f 0: [0..7]: 26624..26631 sudo xfs_io -c 'fiemap 2048 1024' $f 0: [4..7]: 26628..26631 sudo xfs_io -c 'fiemap 2048 4096' $f 0: [4..7]: 26628..26631 sudo xfs_io -c 'fiemap 3584 512' $f 0: [7..7]: 26631..26631 sudo xfs_io -c 'fiemap 4091 5' $f 0: [7..6]: 26631..26630 I believe this is a consequence of the logic for merging contiguous extents represented by separate extent items. That logic needs to track the last offset as it loops through the extent items, which happens to pick up the start offset on the first iteration, and trim off the beginning of the full extent. To fix it, start `off` at 0 rather than `start` so that we keep the iteration/merging intact without cutting off the start of the extent. after the fix, all the above commands give: 0: [0..7]: 26624..26631 The merging logic is exercised by fstest generic/483, and I have written a new fstest for checking we don't have backwards or zero-length fiemaps for cases like those above. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-14 01:23:00 +02:00
Josef Bacik	71795ee590	btrfs: avoid RCU stalls while running delayed iputs Generally a delayed iput is added when we might do the final iput, so usually we'll end up sleeping while processing the delayed iputs naturally. However there's no guarantee of this, especially for small files. In production we noticed 5 instances of RCU stalls while testing a kernel release overnight across 1000 machines, so this is relatively common: host count: 5 rcu: INFO: rcu_sched self-detected stall on CPU rcu: ....: (20998 ticks this GP) idle=59e/1/0x4000000000000002 softirq=12333372/12333372 fqs=3208 (t=21031 jiffies g=27810193 q=41075) NMI backtrace for cpu 1 CPU: 1 PID: 1713 Comm: btrfs-cleaner Kdump: loaded Not tainted 5.6.13-0_fbk12_rc1_5520_gec92bffc1ec9 #1 Call Trace: <IRQ> dump_stack+0x50/0x70 nmi_cpu_backtrace.cold.6+0x30/0x65 ? lapic_can_unplug_cpu.cold.30+0x40/0x40 nmi_trigger_cpumask_backtrace+0xba/0xca rcu_dump_cpu_stacks+0x99/0xc7 rcu_sched_clock_irq.cold.90+0x1b2/0x3a3 ? trigger_load_balance+0x5c/0x200 ? tick_sched_do_timer+0x60/0x60 ? tick_sched_do_timer+0x60/0x60 update_process_times+0x24/0x50 tick_sched_timer+0x37/0x70 __hrtimer_run_queues+0xfe/0x270 hrtimer_interrupt+0xf4/0x210 smp_apic_timer_interrupt+0x5e/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> RIP: 0010:queued_spin_lock_slowpath+0x17d/0x1b0 RSP: 0018:ffffc9000da5fe48 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RAX: 0000000000000000 RBX: ffff889fa81d0cd8 RCX: 0000000000000029 RDX: ffff889fff86c0c0 RSI: 0000000000080000 RDI: ffff88bfc2da7200 RBP: ffff888f2dcdd768 R08: 0000000001040000 R09: 0000000000000000 R10: 0000000000000001 R11: ffffffff82a55560 R12: ffff88bfc2da7200 R13: 0000000000000000 R14: ffff88bff6c2a360 R15: ffffffff814bd870 ? kzalloc.constprop.57+0x30/0x30 list_lru_add+0x5a/0x100 inode_lru_list_add+0x20/0x40 iput+0x1c1/0x1f0 run_delayed_iput_locked+0x46/0x90 btrfs_run_delayed_iputs+0x3f/0x60 cleaner_kthread+0xf2/0x120 kthread+0x10b/0x130 Fix this by adding a cond_resched_lock() to the loop processing delayed iputs so we can avoid these sort of stalls. CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Rik van Riel <riel@surriel.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-14 01:22:53 +02:00
Johannes Thumshirn	d6f67afbdf	btrfs: return 0 for dev_extent_hole_check_zoned hole_start in case of error Commit `7000babdda` ("btrfs: assign proper values to a bool variable in dev_extent_hole_check_zoned") assigned false to the hole_start parameter of dev_extent_hole_check_zoned(). The hole_start parameter is not boolean and returns the start location of the found hole. Fixes: `7000babdda` ("btrfs: assign proper values to a bool variable in dev_extent_hole_check_zoned") Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-14 01:22:48 +02:00
Phillip Potter	c6052f09c1	fs: ecryptfs: remove BUG_ON from crypt_scatterlist crypt_stat memory itself is allocated when inode is created, in ecryptfs_alloc_inode, which returns NULL on failure and is handled by callers, which would prevent us getting to this point. It then calls ecryptfs_init_crypt_stat which allocates crypt_stat->tfm checking for and likewise handling allocation failure. Finally, crypt_stat->flags has ECRYPTFS_STRUCT_INITIALIZED merged into it in ecryptfs_init_crypt_stat as well. Simply put, the conditions that the BUG_ON checks for will never be triggered, as to even get to this function, the relevant conditions will have already been fulfilled (or the inode allocation would fail in the first place and thus no call to this function or those above it). Cc: Tyler Hicks <code@tyhicks.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Link: https://lore.kernel.org/r/20210503115736.2104747-50-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-13 18:32:26 +02:00
Greg Kroah-Hartman	e1436df2f2	Revert "ecryptfs: replace BUG_ON with error handling code" This reverts commit `2c2a7552dd`. Because of recent interactions with developers from @umn.edu, all commits from them have been recently re-reviewed to ensure if they were correct or not. Upon review, this commit was found to be incorrect for the reasons below, so it must be reverted. It will be fixed up "correctly" in a later kernel change. The original commit log for this change was incorrect, no "error handling code" was added, things will blow up just as badly as before if any of these cases ever were true. As this BUG_ON() never fired, and most of these checks are "obviously" never going to be true, let's just revert to the original code for now until this gets unwound to be done correctly in the future. Cc: Aditya Pakki <pakki001@umn.edu> Fixes: `2c2a7552dd` ("ecryptfs: replace BUG_ON with error handling code") Cc: stable <stable@vger.kernel.org> Acked-by: Tyler Hicks <code@tyhicks.com> Link: https://lore.kernel.org/r/20210503115736.2104747-49-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-05-13 18:32:24 +02:00
Gao Xiang	0852b6ca94	erofs: fix 1 lcluster-sized pcluster for big pcluster If the 1st NONHEAD lcluster of a pcluster isn't CBLKCNT lcluster type rather than a HEAD or PLAIN type instead, which means its pclustersize _must_ be 1 lcluster (since its uncompressed size < 2 lclusters), as illustrated below: HEAD HEAD / PLAIN lcluster type ____________ ____________ \|_:__________\|_________:__\| file data (uncompressed) . . .____________. \|____________\| pcluster data (compressed) Such on-disk case was explained before [1] but missed to be handled properly in the runtime implementation. It can be observed if manually generating 1 lcluster-sized pcluster with 2 lclusters (thus CBLKCNT doesn't exist.) Let's fix it now. [1] https://lore.kernel.org/r/20210407043927.10623-1-xiang@kernel.org Link: https://lore.kernel.org/r/20210510064715.29123-1-xiang@kernel.org Fixes: `cec6e93bea` ("erofs: support parsing big pcluster compress indexes") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Gao Xiang <xiang@kernel.org>	2021-05-13 15:58:46 +08:00
Alexey Dobriyan	9745516841	sched: Make nr_iowait() return 32-bit value Creating 2**32 tasks to wait in D-state is impossible and wasteful. Return "unsigned int" and save on REX prefixes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210422200228.1423391-2-adobriyan@gmail.com	2021-05-12 21:34:15 +02:00
Alexey Dobriyan	01aee8fd7f	sched: Make nr_running() return 32-bit value Creating 2**32 tasks is impossible due to futex pid limits and wasteful anyway. Nobody has done it. Bring nr_running() into 32-bit world to save on REX prefixes. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210422200228.1423391-1-adobriyan@gmail.com	2021-05-12 21:34:14 +02:00
Jaegeuk Kim	f395183f95	f2fs: return EINVAL for hole cases in swap file This tries to fix xfstests/generic/495. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-12 07:38:00 -07:00
Christian Brauner	2ca4dcc490	fs/mount_setattr: tighten permission checks We currently don't have any filesystems that support idmapped mounts which are mountable inside a user namespace. That was a deliberate decision for now as a userns root can just mount the filesystem themselves. So enforce this restriction explicitly until there's a real use-case for this. This way we can notice it and will have a chance to adapt and audit our translation helpers and fstests appropriately if we need to support such filesystems. Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org CC: linux-fsdevel@vger.kernel.org Suggested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>	2021-05-12 14:13:16 +02:00
Jaegeuk Kim	ca298241bc	f2fs: avoid swapon failure by giving a warning first The final solution can be migrating blocks to form a section-aligned file internally. Meanwhile, let's ask users to do that when preparing the swap file initially like: 1) create() 2) ioctl(F2FS_IOC_SET_PIN_FILE) 3) fallocate() Reported-by: kernel test robot <oliver.sang@intel.com> Fixes: `36e4d95891` ("f2fs: check if swapfile is section-alligned") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-11 20:51:53 -07:00
Chao Yu	8bfbfb0ddd	f2fs: compress: fix to assign cc.cluster_idx correctly In f2fs_destroy_compress_ctx(), after f2fs_destroy_compress_ctx(), cc.cluster_idx will be cleared w/ NULL_CLUSTER, f2fs_cluster_blocks() may check wrong cluster metadata, fix it. Fixes: `4c8ff7095b` ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-11 14:48:12 -07:00
Chao Yu	a949dc5f2c	f2fs: compress: fix race condition of overwrite vs truncate pos_fsstress testcase complains a panic as belew: ------------[ cut here ]------------ kernel BUG at fs/f2fs/compress.c:1082! invalid opcode: 0000 [#1] SMP PTI CPU: 4 PID: 2753477 Comm: kworker/u16:2 Tainted: G OE 5.12.0-rc1-custom #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Workqueue: writeback wb_workfn (flush-252:16) RIP: 0010:prepare_compress_overwrite+0x4c0/0x760 [f2fs] Call Trace: f2fs_prepare_compress_overwrite+0x5f/0x80 [f2fs] f2fs_write_cache_pages+0x468/0x8a0 [f2fs] f2fs_write_data_pages+0x2a4/0x2f0 [f2fs] do_writepages+0x38/0xc0 __writeback_single_inode+0x44/0x2a0 writeback_sb_inodes+0x223/0x4d0 __writeback_inodes_wb+0x56/0xf0 wb_writeback+0x1dd/0x290 wb_workfn+0x309/0x500 process_one_work+0x220/0x3c0 worker_thread+0x53/0x420 kthread+0x12f/0x150 ret_from_fork+0x22/0x30 The root cause is truncate() may race with overwrite as below, so that one reference count left in page can not guarantee the page attaching in mapping tree all the time, after truncation, later find_lock_page() may return NULL pointer. - prepare_compress_overwrite - f2fs_pagecache_get_page - unlock_page - f2fs_setattr - truncate_setsize - truncate_inode_page - delete_from_page_cache - find_lock_page Fix this by avoiding referencing updated page. Fixes: `4c8ff7095b` ("f2fs: support data compression") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-11 14:48:12 -07:00
Chao Yu	a12cc5b423	f2fs: compress: fix to free compress page correctly In error path of f2fs_write_compressed_pages(), it needs to call f2fs_compress_free_page() to release temporary page. Fixes: `5e6bbde959` ("f2fs: introduce mempool for {,de}compress intermediate page allocation") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-11 14:48:12 -07:00
Jaegeuk Kim	a753103909	f2fs: support iflag change given the mask In f2fs_fileattr_set(), if (!fa->flags_valid) mask &= FS_COMMON_FL; In this case, we can set supported flags by mask only instead of BUG_ON. /* Flags shared betwen flags/xflags */ (FS_SYNC_FL \| FS_IMMUTABLE_FL \| FS_APPEND_FL \| \ FS_NODUMP_FL \| FS_NOATIME_FL \| FS_DAX_FL \| \ FS_PROJINHERIT_FL) Fixes: `9b1bb01c8a` ("f2fs: convert to fileattr") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-11 14:48:11 -07:00
Jaegeuk Kim	349c4d6c75	f2fs: avoid null pointer access when handling IPU error Unable to handle kernel NULL pointer dereference at virtual address 000000000000001a pc : f2fs_inplace_write_data+0x144/0x208 lr : f2fs_inplace_write_data+0x134/0x208 Call trace: f2fs_inplace_write_data+0x144/0x208 f2fs_do_write_data_page+0x270/0x770 f2fs_write_single_data_page+0x47c/0x830 __f2fs_write_data_pages+0x444/0x98c f2fs_write_data_pages.llvm.16514453770497736882+0x2c/0x38 do_writepages+0x58/0x118 __writeback_single_inode+0x44/0x300 writeback_sb_inodes+0x4b8/0x9c8 wb_writeback+0x148/0x42c wb_do_writeback+0xc8/0x390 wb_workfn+0xb0/0x2f4 process_one_work+0x1fc/0x444 worker_thread+0x268/0x4b4 kthread+0x13c/0x158 ret_from_fork+0x10/0x18 Fixes: `9557727876` ("f2fs: drop inplace IO if fs status is abnormal") Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2021-05-11 14:48:07 -07:00
Linus Torvalds	88b06399c9	for-5.13-rc1-part2-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmCaiuQACgkQxWXV+ddt WDv3Ww//bDUlNXqAYEoLKePohy1bupiqG8lKYX4s4bGEq0x0cyh4qVER/Q/lU2l2 AMf8t6Pwr/iBOPwfckreLDuFrhacvWq0K4eMkgpf++3P0Mzbj2sIBX0+XnrWluRL yFCZudJej+cpM55Ve4l6M8zrk1nbzYJLFPRRdOIFe4HonWkhI/zY6RD7kFybQevW mAxqMgIpUQAjoj5F/EhwXQ9dk6PXSZj+gaOoNrmQmN7mZMqNgSLHBEoJUHrotm1K rDlEwIRUTtNPV+rcPxcXD1GFiUxU0cZhg0jts252z89Mvaqb2g/YKaHPAR/IVIt5 enf4llZzoEeiMnHuSj9zCg4HxOvCCFV8zZYXlO7/9IqdgLJjQkElZoqTz45obWdE aoJrHAWWlulS2jPocJfJ/Zti2xBYGLjQASH0kYS+vjVxjKyqz3fuM1Tsasaf9Mcp +M2m6yMBjJ0nJMTL2CgBksCd0dHwfiBZ/YYClrMSjYlzYSU6ofA2b2hej0OjqZ4X FmpEmCBK4lySdJI+JlJKikeneOOxKSpT0xGqU+OMmbpwFH3k1N3oseu0hrG8Xreo RU1xNbekGTwRbCcCA9l5HQ/RYptT7rt/KqkC70UFEvdIijCNcptOGaTAoYvLS14O T+yu0Cizt7O0Fdg5E+MAS/qaI2yacXxBfIkMDbPxHGUg7+vUteM= =Phtq -----END PGP SIGNATURE----- Merge tag 'for-5.13-rc1-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "Handle transaction start error in btrfs_fileattr_set() This is fix for code introduced by the new fileattr merge" * tag 'for-5.13-rc1-part2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: handle transaction start error in btrfs_fileattr_set	2021-05-11 09:43:16 -07:00
Ritesh Harjani	9b8a233bc2	btrfs: handle transaction start error in btrfs_fileattr_set Add error handling in btrfs_fileattr_set in case of an error while starting a transaction. This fixes btrfs/232 which otherwise used to fail with below signature on Power. btrfs/232 [ 1119.474650] run fstests btrfs/232 at 2021-04-21 02:21:22 <...> [ 1366.638585] BUG: Unable to handle kernel data access on read at 0xffffffffffffff86 [ 1366.638768] Faulting instruction address: 0xc0000000009a5c88 cpu 0x0: Vector: 380 (Data SLB Access) at [c000000014f177b0] pc: c0000000009a5c88: btrfs_update_root_times+0x58/0xc0 lr: c0000000009a5c84: btrfs_update_root_times+0x54/0xc0 <...> pid = 24881, comm = fsstress btrfs_update_inode+0xa0/0x140 btrfs_fileattr_set+0x5d0/0x6f0 vfs_fileattr_set+0x2a8/0x390 do_vfs_ioctl+0x1290/0x1ac0 sys_ioctl+0x6c/0x120 system_call_exception+0x3d4/0x410 system_call_common+0xec/0x278 Fixes: `97fc297754` ("btrfs: convert to fileattr") Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-11 15:35:57 +02:00
Linus Torvalds	142b507f91	for-5.13-rc1-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmCZnCIACgkQxWXV+ddt WDuEvhAAmC+Mkrz25GbQnSIp2FKYCCQK34D0rdghml0Bc0cJcDh3yhgIB6ZTHZ7e Z+UZu84ISK31OHKDzXtX0MINN2wuU4u4kd6PHtYj0wSVl3cX6E/K5j6YcThfI1Ru vCW5O87V9SCV5NnykIFt3sbYvsPKtF9lhgPQprj4np+wxaSyNlEF2c+zLTI3J7NV +8OlM4oi8GocZd1aAwGpVM3qUPyQSHEb9oUEp6aV1ERuAs6LIyeGks3Cag6gjPnq dYz3jV9HyZB5GtX0dmv4LeRFIog1uFi+SIEFl5RpqhB3sXN3n6XHMka4x20FXiWy PfX9+Nf4bQGx6F9rGsgHNHQP5dVhHAkZcq3E0n0yshIfNe8wDHBRlmk0wbfj4K7I VYv85SxEYpigG8KzF5gjiar4EqsaJVQcJioMxVE7z9vrW6xlOWD1lf/ViUZnB3wd IQEyGz2qOe9eqJD+dnyN7QkN9WKGSUr2p1Q/DngCIwFzKWf1qIlETNXrIL+AZ97r v4G5mMq9dCxs3s8c5SGbdF9qqK8gEuaV3iWQAoKOciuy6fbc553Q90I1v3OhW+by j2yVoo3nJbBJBuLBNWPDUlwxQF/EHPQ6nh3fvxNRgwksXgRmqywdJb5dQ8hcKgSH RsvinJhtKo5rTgtgGgmNvmLAjKIieW1lIVG4ha0O/m49HeaohDE= =GNNs -----END PGP SIGNATURE----- Merge tag 'for-5.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "First batch of various fixes, here's a list of notable ones: - fix unmountable seed device after fstrim - fix silent data loss in zoned mode due to ordered extent splitting - fix race leading to unpersisted data and metadata on fsync - fix deadlock when cloning inline extents and using qgroups" * tag 'for-5.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: initialize return variable in cleanup_free_space_cache_v1 btrfs: zoned: sanity check zone type btrfs: fix unmountable seed device after fstrim btrfs: fix deadlock when cloning inline extents and using qgroups btrfs: fix race leading to unpersisted data and metadata on fsync btrfs: do not consider send context as valid when trying to flush qgroups btrfs: zoned: fix silent data loss after failure splitting ordered extent	2021-05-10 14:10:42 -07:00
Christophe JAILLET	8c721cb0f7	quota: Use 'hlist_for_each_entry' to simplify code Use 'hlist_for_each_entry' instead of hand writing it. This saves a few lines of code. Link: https://lore.kernel.org/r/f82d3e33964dcbd2aac19866735e0a8381c8a735.1619599407.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jan Kara <jack@suse.cz>	2021-05-10 16:27:49 +02:00
Linus Torvalds	0a55a1fbed	3 small SMB3 chmultichannel related changesets (also for stable) -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmCXTY4ACgkQiiy9cAdy T1GTwAv8DHmrUOSMPY79kH+vYrg67S8bnCEE2b0ChP0CyU/Gp05CRUTH+C9bstb+ hou6Wx7tvAa++PpgdwBfNcj+Be76PcPLxCQoXglOzckpuAdvyCLgzcOhrPoOYAyR gbLsfkIXMPGFLnGltZMQl9TCXEIpF6xjizZUyHVSq0UBhIyfBOdsDeoCNbVOuB1Y t6KtBRTro+P2DIj9zITKYCnKFU3OXae/gBwv+wQU066mNq/IxlMqX84Av728govJ sEd8CSLwnwX4/NsvADjiy4+M6iLvWtXN9xDr6S5Hyo+1Y/SkU/C1qo8Uc8H74gdT D9LSoGTBx36X97eSisrE8vOFt3SIwEy9fU/yU87qrDuwwCjBYNvYCxK+VaVUA0vH 1CjScSifrG7MZAAw7h4o6Ug6q4Otobabj+ODq4exkjW+GYb0wQ/DzjqVuoIvqP7F TLoRisVWpg/MY7FcEU1ZpYdHAFyKKTcdOLlmX6Y40sKue0mwfpcv0rI+isNUqCYX R3CJ5tPp =9Egy -----END PGP SIGNATURE----- Merge tag '5.13-rc-smb3-part3' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Three small SMB3 chmultichannel related changesets (also for stable) from the SMB3 test event this week. The other fixes are still in review/testing" * tag '5.13-rc-smb3-part3' of git://git.samba.org/sfrench/cifs-2.6: smb3: if max_channels set to more than one channel request multichannel smb3: do not attempt multichannel to server which does not support it smb3: when mounting with multichannel include it in requested capabilities	2021-05-09 13:19:29 -07:00
Pavel Begunkov	a298232ee6	io_uring: fix link timeout refs WARNING: CPU: 0 PID: 10242 at lib/refcount.c:28 refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28 RIP: 0010:refcount_warn_saturate+0x15b/0x1a0 lib/refcount.c:28 Call Trace: __refcount_sub_and_test include/linux/refcount.h:283 [inline] __refcount_dec_and_test include/linux/refcount.h:315 [inline] refcount_dec_and_test include/linux/refcount.h:333 [inline] io_put_req fs/io_uring.c:2140 [inline] io_queue_linked_timeout fs/io_uring.c:6300 [inline] __io_queue_sqe+0xbef/0xec0 fs/io_uring.c:6354 io_submit_sqe fs/io_uring.c:6534 [inline] io_submit_sqes+0x2bbd/0x7c50 fs/io_uring.c:6660 __do_sys_io_uring_enter fs/io_uring.c:9240 [inline] __se_sys_io_uring_enter+0x256/0x1d60 fs/io_uring.c:9182 io_link_timeout_fn() should put only one reference of the linked timeout request, however in case of racing with the master request's completion first io_req_complete() puts one and then io_put_req_deferred() is called. Cc: stable@vger.kernel.org # 5.12+ Fixes: `9ae1f8dd37` ("io_uring: fix inconsistent lock state") Reported-by: syzbot+a2910119328ce8e7996f@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ff51018ff29de5ffa76f09273ef48cb24c720368.1620417627.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-08 22:11:49 -06:00
Linus Torvalds	0f979d815c	Kbuild updates for v5.13 (2nd) - Convert sh and sparc to use generic shell scripts to generate the syscall headers - refactor .gitignore files - Update kernel/config_data.gz only when the content of the .config is really changed, which avoids the unneeded re-link of vmlinux - move "remove stale files" workarounds to scripts/remove-stale-files - suppress unused-but-set-variable warnings by default for Clang as well - fix locale setting LANG=C to LC_ALL=C - improve 'make distclean' - always keep intermediate objects from scripts/link-vmlinux.sh - move IF_ENABLED out of <linux/kconfig.h> to make it self-contained - misc cleanups -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmCWrucVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGRLkQAJ8t7PfMJLSh/VcgDXp3Z7fZ/V2M RUGbOeRYErR1gylejuip/R19mS5MiBNecU60VrugZyDOMf98+mx61mI/ykpPeX92 sE3VU5MPXEwmv758QUr4gH014TZshMtHHo+tXA+NVUbqFp7RTnkZMDjOXGthYDHG NhDou4LZ2P0CUKm8vb58SJPqB7ZdYOT9eEQEdHevm18Gx0KProCxRziup7loldy7 ET770okQ23if90ufCSVmnM6Ee6opoKYvXS5lv8V/a4xV/VbicbUclpzIZsHF7L2i mIfr6dy480ncOaQlfWnX9ACgIeeqiFPOeZbAu7HAtwXzP5vCahgQ9FKVC7KPt+BP Lf3LgdBrfSP5A7f7FrtkkPmP7pl1j6/Bq3+PhCur9XimtRIsvTOx7m7nuvsY4yHC /wmBXFZgqE5DGyzpHXz1az8JHWw2AesP9L2f536BhfvRtdXaoOxPtZ/rmO1lfcMV fWMa9f1em8lXwCiD1dR8UkBrIxItty+qqPffu2S/DlEepbiZrCg1gD827Fy7Mm3n 5rvrzYMOY2YK0yW1jtm+w3NlPlmG91BDUTP8tEcDxrTOIXezwqJf7fw8qIgGIy7W 3WzuBfgSvpT977ByMsB0YYugo2Xie+R1jpOWt7tv6KHM4varNBu0WpVhQhrKQr5o agJiuvzsf3b+64oP =935P -----END PGP SIGNATURE----- Merge tag 'kbuild-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: - Convert sh and sparc to use generic shell scripts to generate the syscall headers - refactor .gitignore files - Update kernel/config_data.gz only when the content of the .config is really changed, which avoids the unneeded re-link of vmlinux - move "remove stale files" workarounds to scripts/remove-stale-files - suppress unused-but-set-variable warnings by default for Clang as well - fix locale setting LANG=C to LC_ALL=C - improve 'make distclean' - always keep intermediate objects from scripts/link-vmlinux.sh - move IF_ENABLED out of <linux/kconfig.h> to make it self-contained - misc cleanups * tag 'kbuild-v5.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (25 commits) linux/kconfig.h: replace IF_ENABLED() with PTR_IF() in <linux/kernel.h> kbuild: Don't remove link-vmlinux temporary files on exit/signal kbuild: remove the unneeded comments for external module builds kbuild: make distclean remove tag files in sub-directories kbuild: make distclean work against $(objtree) instead of $(srctree) kbuild: refactor modname-multi by using suffix-search kbuild: refactor fdtoverlay rule kbuild: parameterize the .o part of suffix-search arch: use cross_compiling to check whether it is a cross build or not kbuild: remove ARCH=sh64 support from top Makefile .gitignore: prefix local generated files with a slash kbuild: replace LANG=C with LC_ALL=C Makefile: Move -Wno-unused-but-set-variable out of GCC only block kbuild: add a script to remove stale generated files kbuild: update config_data.gz only when the content of .config is changed .gitignore: ignore only top-level modules.builtin .gitignore: move tags and TAGS close to other tag files kernel/.gitgnore: remove stale timeconst.h and hz.bc usr/include: refactor .gitignore genksyms: fix stale comment ...	2021-05-08 10:00:11 -07:00
Steve French	c1f8a398b6	smb3: if max_channels set to more than one channel request multichannel Mounting with "multichannel" is obviously implied if user requested more than one channel on mount (ie mount parm max_channels>1). Currently both have to be specified. Fix that so that if max_channels is greater than 1 on mount, enable multichannel rather than silently falling back to non-multichannel. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-By: Tom Talpey <tom@talpey.com> Cc: <stable@vger.kernel.org> # v5.11+ Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>	2021-05-08 10:51:06 -05:00
Steve French	9c2dc11df5	smb3: do not attempt multichannel to server which does not support it We were ignoring CAP_MULTI_CHANNEL in the server response - if the server doesn't support multichannel we should not be attempting it. See MS-SMB2 section 3.2.5.2 Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-By: Tom Talpey <tom@talpey.com> Cc: <stable@vger.kernel.org> # v5.8+ Signed-off-by: Steve French <stfrench@microsoft.com>	2021-05-08 10:50:53 -05:00
Steve French	679971e721	smb3: when mounting with multichannel include it in requested capabilities In the SMB3/SMB3.1.1 negotiate protocol request, we are supposed to advertise CAP_MULTICHANNEL capability when establishing multiple channels has been requested by the user doing the mount. See MS-SMB2 sections 2.2.3 and 3.2.5.2 Without setting it there is some risk that multichannel could fail if the server interpreted the field strictly. Reviewed-By: Tom Talpey <tom@talpey.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Cc: <stable@vger.kernel.org> # v5.8+ Signed-off-by: Steve French <stfrench@microsoft.com>	2021-05-08 10:44:11 -05:00
Vivek Goyal	237388320d	dax: Wake up all waiters after invalidating dax entry I am seeing missed wakeups which ultimately lead to a deadlock when I am using virtiofs with DAX enabled and running "make -j". I had to mount virtiofs as rootfs and also reduce to dax window size to 256M to reproduce the problem consistently. So here is the problem. put_unlocked_entry() wakes up waiters only if entry is not null as well as !dax_is_conflict(entry). But if I call multiple instances of invalidate_inode_pages2() in parallel, then I can run into a situation where there are waiters on this index but nobody will wake these waiters. invalidate_inode_pages2() invalidate_inode_pages2_range() invalidate_exceptional_entry2() dax_invalidate_mapping_entry_sync() __dax_invalidate_entry() { xas_lock_irq(&xas); entry = get_unlocked_entry(&xas, 0); ... ... dax_disassociate_entry(entry, mapping, trunc); xas_store(&xas, NULL); ... ... put_unlocked_entry(&xas, entry); xas_unlock_irq(&xas); } Say a fault in in progress and it has locked entry at offset say "0x1c". Now say three instances of invalidate_inode_pages2() are in progress (A, B, C) and they all try to invalidate entry at offset "0x1c". Given dax entry is locked, all tree instances A, B, C will wait in wait queue. When dax fault finishes, say A is woken up. It will store NULL entry at index "0x1c" and wake up B. When B comes along it will find "entry=0" at page offset 0x1c and it will call put_unlocked_entry(&xas, 0). And this means put_unlocked_entry() will not wake up next waiter, given the current code. And that means C continues to wait and is not woken up. This patch fixes the issue by waking up all waiters when a dax entry has been invalidated. This seems to fix the deadlock I am facing and I can make forward progress. Reported-by: Sergio Lopez <slp@redhat.com> Fixes: `ac401cc782` ("dax: New fault locking") Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20210428190314.1865312-4-vgoyal@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-05-07 15:55:44 -07:00
Vivek Goyal	4c3d043d27	dax: Add a wakeup mode parameter to put_unlocked_entry() As of now put_unlocked_entry() always wakes up next waiter. In next patches we want to wake up all waiters at one callsite. Hence, add a parameter to the function. This patch does not introduce any change of behavior. Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20210428190314.1865312-3-vgoyal@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-05-07 15:55:44 -07:00
Vivek Goyal	698ab77aeb	dax: Add an enum for specifying dax wakup mode Dan mentioned that he is not very fond of passing around a boolean true/false to specify if only next waiter should be woken up or all waiters should be woken up. He instead prefers that we introduce an enum and make it very explicity at the callsite itself. Easier to read code. This patch should not introduce any change of behavior. Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Jan Kara <jack@suse.cz> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Link: https://lore.kernel.org/r/20210428190314.1865312-2-vgoyal@redhat.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2021-05-07 15:55:44 -07:00
Linus Torvalds	bd313968fd	block-5.13-2021-05-07 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmCVVnQQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgps0ND/0SL4zWQJ5fh+NVCyQJFLm0E+ejqWg6Ykmk EE1Dzhgr9lgxZU19UCXKtN0lF9icWPfoVDxvqsB2luJLc89GciOmla3PaknCgY6N QZ/GJh/2Kwb9ybVblzKvUNnGSZOZ8gplpAAXu4zlbFXl7xoGBb12kql78fjw84rS S4IG+nKvTdC6ENVTPwFMj0UREL5nccVJycvsuZgzYsSQ//5i5zViDz7mfdCujAo4 g3rt8rctBqYoF684BG4OVkDp7ivJUFvMW93PVqvx8vw2sAOB11v+sAKvX5cZIsdM Z01a3C5nY8IQcpXhoI7n6Kgg4VY0ubeiOrlIBssNQWJszquAHPN7s5uiiSFaIKwg mCyo69Ofmk4wYm2UO0hM8y7x94QvUNKmlcVxb4ls5OEaAKS/v7chnjoovp8s8Me/ 2w1BMBB4qPcF99+K2GF9KyT/gKrXDRXkr9ERTtLLPpCf2uIXtFcU+X+Y64cOivhf ImN1kbN8fQm1ItiEntn5tVd9u9cDnfqTJhzutBolLP33jjarK3TblJ4cUZqN/xAC uH5k1IXZGHbrE9LuXUJQwFs752m21LElSkfG7OxzlktfJcKxJriM9o/dw0mgEmLv 0i1meb55VMbtYT/dNWZEa2FRVtelFIngfoiLSgH0IHXU7sKgTEpgyLmSu4PrySez kRVUsF1Lfw== =Sv+q -----END PGP SIGNATURE----- Merge tag 'block-5.13-2021-05-07' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: - dasd spelling fixes (Bhaskar) - Limit bio max size on multi-page bvecs to the hardware limit, to avoid overly large bio's (and hence latencies). Originally queued for the merge window, but needed a fix and was dropped from the initial pull (Changheun) - NVMe pull request (Christoph): - reset the bdev to ns head when failover (Daniel Wagner) - remove unsupported command noise (Keith Busch) - misc passthrough improvements (Kanchan Joshi) - fix controller ioctl through ns_head (Minwoo Im) - fix controller timeouts during reset (Tao Chiu) - rnbd fixes/cleanups (Gioh, Md, Dima) - Fix iov_iter re-expansion (yangerkun) * tag 'block-5.13-2021-05-07' of git://git.kernel.dk/linux-block: block: reexpand iov_iter after read/write nvmet: remove unsupported command noise nvme-multipath: reset bdev to ns head when failover nvme-pci: fix controller reset hang when racing with nvme_timeout nvme: move the fabrics queue ready check routines to core nvme: avoid memset for passthrough requests nvme: add nvme_get_ns helper nvme: fix controller ioctl through ns_head bio: limit bio max size RDMA/rtrs: fix uninitialized symbol 'cnt' s390: dasd: Mundane spelling fixes block/rnbd: Remove all likely and unlikely block/rnbd-clt: Check the return value of the function rtrs_clt_query block/rnbd: Fix style issues block/rnbd-clt: Change queue_depth type in rnbd_clt_session to size_t	2021-05-07 11:35:12 -07:00
Linus Torvalds	28b4afeb59	io_uring-5.13-2021-05-07 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmCVVmMQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpv7yEAC/WV1alcH9XdEqLrc2aDwlaScMmSlrMQhY ihtDCR9BsX11E3QcUB7D+VYjBo68uKR+ksa1/GN2Xp+vvqmdjQvZindgto/5b6u1 ko0Dradl2zulCAc7QIdjb2tbmL+Q+JOX5wxv14/+2XabEcce3OegWIvIgX+56NFW ZHg80SQzXUhEtQcAUVCoPeBN+H+xzadgz38VlOI08gOG7/M6tS965GH3tZqTjh2K P7dLjUn0WcxZ3euAYAsQzNN2O2ObJfpCsQtsG2eSf8DGpanPe4gQjAud1BstDtN0 CJ0+b6DHgzQYOAgPFjm7l0jjs+VnIYIMnoBBxm5EkIoktsj0hHdqTnEugoz4wTnS T8WgojaU6jYNx+Jj6vciCLk0lb5c3O3nxmw3w84/rtTwtaEChCAbWdAkl4cleNaw 3/Z2bksCVrQWDVskmu4FP7+kGYpjpV+ZiA2+6OGwILTCN+W7vi079NByQAzdLaRb K/4lEGM7VYEXtq/I7C6VzjtY7gq46TJmpFW+OdQnPIguavp+7vlUl2pLV3oTeGBc E6c+xltgIN+sbbDc/57EJEvhHQod4A6HYOGwBMyjHrhr/sdQ4xvUaJPNmG9HfqRK SM3TOlwpHRWFTgbO+6qoJQSMvACQyE/SDqiPi08q75zFVTNCcYM7uYV3fJMsQ9sj vA+5HAaRKQ== =YwTw -----END PGP SIGNATURE----- Merge tag 'io_uring-5.13-2021-05-07' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Mostly fixes for merge window merged code. In detail: - Error case memory leak fixes (Colin, Zqiang) - Add the tools/io_uring/ to the list of maintained files (Lukas) - Set of fixes for the modified buffer registration API (Pavel) - Sanitize io thread setup on x86 (Stefan) - Ensure we truncate transfer count for registered buffers (Thadeu)" * tag 'io_uring-5.13-2021-05-07' of git://git.kernel.dk/linux-block: x86/process: setup io_threads more like normal user space threads MAINTAINERS: add io_uring tool to IO_URING io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffers io_uring: Fix memory leak in io_sqe_buffers_register() io_uring: Fix premature return from loop and memory leak io_uring: fix unchecked error in switch_start() io_uring: allow empty slots for reg buffers io_uring: add more build check for uapi io_uring: dont overlap internal and user req flags io_uring: fix drain with rsrc CQEs	2021-05-07 11:29:23 -07:00
Linus Torvalds	a647034fe2	NFS client updates for Linux 5.13 Highlights include: Stable fixes: - Add validation of the UDP retrans parameter to prevent shift out-of-bounds - Don't discard pNFS layout segments that are marked for return Bugfixes: - Fix a NULL dereference crash in xprt_complete_bc_request() when the NFSv4.1 server misbehaves. - Fix the handling of NFS READDIR cookie verifiers - Sundry fixes to ensure attribute revalidation works correctly when the server does not return post-op attributes. - nfs4_bitmask_adjust() must not change the server global bitmasks - Fix major timeout handling in the RPC code. - NFSv4.2 fallocate() fixes. - Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling - Copy offload attribute revalidation fixes - Fix an incorrect filehandle size check in the pNFS flexfiles driver - Fix several RDMA transport setup/teardown races - Fix several RDMA queue wrapping issues - Fix a misplaced memory read barrier in sunrpc's call_decode() Features: - Micro optimisation of the TCP transmission queue using TCP_CORK - statx() performance improvements by further splitting up the tracking of invalid cached file metadata. - Support the NFSv4.2 "change_attr_type" attribute and use it to optimise handling of change attribute updates. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmCVLooACgkQZwvnipYK APJB5BAAtIJyhx40ooMBzcucDmXd1qovlKsb8ZlvnSI6c7wvHhFPNk9z4zwThnjL FpVYzJzK6XzAQY/PtgbrPwnSUmW925ngPWYR/hiYe+OGPBnYV+tXP8izCyEkNgMg 45goDOxojGWl7AGTuAJiKcDSdH9PyIrbvt28iwcNSGjslasGSbAoL/836l4OIGr1 Ymxs/NDML11dPco8GIKLGtHd8leFGleDx089VeNsgud8MdaFErp16O5Iz8DdzRKd W1l2zDMb05j8eDZIfy3w3FyrLkDXA+KgLSADiC8TcpxoadPaQJMeCvoIq8oqVndn bZBoxduXdLgf54Aec0WnNKFAOyc7pGvZoSNmFouT7EGV73g+g1LQ+ZbEE1bb8fCQ XHqCVaBt2+47NiTUgdxjXlZRfcn8fYKx0tVxfG3mQVMXUAWfsjmMyQMNgijDRJI2 8Wz3lZMRGMILbR9j4QpP1biVy/2zGNWG/TB5ZZyZMSY4uT+aOpzlqdknb4UsRaSp f7MfmB7xEWpS4DJr9RIBrJ/hIdnMu1mNInxDPFo5Kl5HNp4TaPm2dPir2ZD2wMZI daURTX7giUhpE15ZebQDBqWD+mTR0bVDqLLeo131JRmMfMEHugNrr49xe+NkBu/R QWnFzgkGdQsOeiKRRwEUuhsi74JspqfwzdZzHqcRM5WuXVvBLcA= =h01b -----END PGP SIGNATURE----- Merge tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Stable fixes: - Add validation of the UDP retrans parameter to prevent shift out-of-bounds - Don't discard pNFS layout segments that are marked for return Bugfixes: - Fix a NULL dereference crash in xprt_complete_bc_request() when the NFSv4.1 server misbehaves. - Fix the handling of NFS READDIR cookie verifiers - Sundry fixes to ensure attribute revalidation works correctly when the server does not return post-op attributes. - nfs4_bitmask_adjust() must not change the server global bitmasks - Fix major timeout handling in the RPC code. - NFSv4.2 fallocate() fixes. - Fix the NFSv4.2 SEEK_HOLE/SEEK_DATA end-of-file handling - Copy offload attribute revalidation fixes - Fix an incorrect filehandle size check in the pNFS flexfiles driver - Fix several RDMA transport setup/teardown races - Fix several RDMA queue wrapping issues - Fix a misplaced memory read barrier in sunrpc's call_decode() Features: - Micro optimisation of the TCP transmission queue using TCP_CORK - statx() performance improvements by further splitting up the tracking of invalid cached file metadata. - Support the NFSv4.2 'change_attr_type' attribute and use it to optimise handling of change attribute updates" * tag 'nfs-for-5.13-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (85 commits) xprtrdma: Fix a NULL dereference in frwr_unmap_sync() sunrpc: Fix misplaced barrier in call_decode NFSv4.2: Remove ifdef CONFIG_NFSD from NFSv4.2 client SSC code. xprtrdma: Move fr_mr field to struct rpcrdma_mr xprtrdma: Move the Work Request union to struct rpcrdma_mr xprtrdma: Move fr_linv_done field to struct rpcrdma_mr xprtrdma: Move cqe to struct rpcrdma_mr xprtrdma: Move fr_cid to struct rpcrdma_mr xprtrdma: Remove the RPC/RDMA QP event handler xprtrdma: Don't display r_xprt memory addresses in tracepoints xprtrdma: Add an rpcrdma_mr_completion_class xprtrdma: Add tracepoints showing FastReg WRs and remote invalidation xprtrdma: Avoid Send Queue wrapping xprtrdma: Do not wake RPC consumer on a failed LocalInv xprtrdma: Do not recycle MR after FastReg/LocalInv flushes xprtrdma: Clarify use of barrier in frwr_wc_localinv_done() xprtrdma: Rename frwr_release_mr() xprtrdma: rpcrdma_mr_pop() already does list_del_init() xprtrdma: Delete rpcrdma_recv_buffer_put() xprtrdma: Fix cwnd update ordering ...	2021-05-07 11:23:41 -07:00
Linus Torvalds	e22e983279	9p for 5.13-rc1 an error handling fix and const optimization -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE/IPbcYBuWt0zoYhOq06b7GqY5nAFAmCVJa4ACgkQq06b7GqY 5nDUwxAAkBS34dEfhWgENugMY73Rj1sZ7VAN6HqpT9+7KrQHPXSfCnVac+q1/JzA KuWg1zJaUeYt8VyDMUDPR4l4u6X7NM7ED7qwYlVolP4zfRtYgvo7ZbGHTqryV6wl A0/TIznPAYphOWYqiLPbtuUfmDYQedsGR8CC55jR3FIA0JBMj28b7m5aSSigDUU0 SX5Erkq6PPRT5yAStPQBhwcpckceo+cVpfdO+9llZ35BfZVxdhMKudU54XIOiwX6 1AJk+naHeLN4cCZJeWiiMHMKfBdylAJV2/dG0Po2SRo2nsytTC1eCRSvMtqZOf3m T0cEM6LFJgiqvG0c3w2At1ZkhDTZmyKapMgexRMsuxWhy3InAvfpXIurGuLkdI5T J99cd+LDp5bvH8u3tk3QSzYWp3ZACaW16X4rlV+iBk9huA0EiSHNVdTGU8tvEkPj s9gQPgIzgxdkDZmzTvsJDiPASORblJiLdmuvz4vojRey7Bqr/1ilZ2cnX/OjDA6p d4K73VWG3YSecm63mpfv8KCxAFUaQT09/oPBNDIJ/SYZTOUaunoBse3EOvUkX5Od Ar2ehkPOt9Q8Gu3N2F194Rd30vbADiYMezPkTdY0NdJXI6sUUa2SJj0nEsbMqNIg wauDYYzyuRp+k0J6+Oj3tVH074a8/q6trPzuA6C5MuyPx/kgJ00= =B3Ef -----END PGP SIGNATURE----- Merge tag '9p-for-5.13-rc1' of git://github.com/martinetd/linux Pull 9p updates from Dominique Martinet: "An error handling fix and constification" * tag '9p-for-5.13-rc1' of git://github.com/martinetd/linux: fs: 9p: fix v9fs_file_open writeback fid error check 9p: Constify static struct v9fs_attr_group	2021-05-07 11:18:52 -07:00
Linus Torvalds	a48b0872e6	Merge branch 'akpm' (patches from Andrew) Merge yet more updates from Andrew Morton: "This is everything else from -mm for this merge window. 90 patches. Subsystems affected by this patch series: mm (cleanups and slub), alpha, procfs, sysctl, misc, core-kernel, bitmap, lib, compat, checkpatch, epoll, isofs, nilfs2, hpfs, exit, fork, kexec, gcov, panic, delayacct, gdb, resource, selftests, async, initramfs, ipc, drivers/char, and spelling" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (90 commits) mm: fix typos in comments mm: fix typos in comments treewide: remove editor modelines and cruft ipc/sem.c: spelling fix fs: fat: fix spelling typo of values kernel/sys.c: fix typo kernel/up.c: fix typo kernel/user_namespace.c: fix typos kernel/umh.c: fix some spelling mistakes include/linux/pgtable.h: few spelling fixes mm/slab.c: fix spelling mistake "disired" -> "desired" scripts/spelling.txt: add "overflw" scripts/spelling.txt: Add "diabled" typo scripts/spelling.txt: add "overlfow" arm: print alloc free paths for address in registers mm/vmalloc: remove vwrite() mm: remove xlate_dev_kmem_ptr() drivers/char: remove /dev/kmem for good mm: fix some typos and code style problems ipc/sem.c: mundane typo fixes ...	2021-05-07 00:34:51 -07:00
Masahiro Yamada	fa60ce2cb4	treewide: remove editor modelines and cruft The section "19) Editor modelines and other cruft" in Documentation/process/coding-style.rst clearly says, "Do not include any of these in source files." I recently receive a patch to explicitly add a new one. Let's do treewide cleanups, otherwise some people follow the existing code and attempt to upstream their favoriate editor setups. It is even nicer if scripts/checkpatch.pl can check it. If we like to impose coding style in an editor-independent manner, I think editorconfig (patch [1]) is a saner solution. [1] https://lore.kernel.org/lkml/20200703073143.423557-1-danny@kdrag0n.dev/ Link: https://lkml.kernel.org/r/20210324054457.1477489-1-masahiroy@kernel.org Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> [auxdisplay] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-07 00:26:34 -07:00
dingsenjie	a109ae2a02	fs: fat: fix spelling typo of values vaules -> values Link: https://lkml.kernel.org/r/20210302034817.30384-1-dingsenjie@163.com Signed-off-by: dingsenjie <dingsenjie@yulong.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-07 00:26:34 -07:00
Linus Torvalds	05da1f643f	More new code for 5.13-rc1: - Remove the now unused "io_private" field from struct iomap_ioend, for a modest savings in memory allocation. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmCRdGYACgkQ+H93GTRK tOuhBxAAptj7EjQivnQno69kWqbhjOOKZOH40BuwMsufq8XhZkn36rnkt7y3P3B8 WUeWkrBwhFW83UIg+L4Sb7BSCzVXqjqbnCffi8g9MTyuysuHk+3DlAHX0125x1DI z/16F+RVBajD6Ee8D3OIIhJQmFiNw7ERhHHbDuwpc+n4Wown1UwzROTp1S8DIvdJ LFGi0JzbE1++vngARkRidLjp2digS8fioyw+dIeTzLG+fSgnb00ZdybE/g/b5ZqQ PJH/23GFBlo5AuDhxDuhNOzqqC9ensG+n9hUNdaKzxAiYD5T7WSh7y69f/zmZJE/ xLNgXE76QNtkjGUzeCil9lQ9muQxUNBDNnpHJim4ILI8YwaNuvVbrNpURGskcCrT gT1LsAv+8gbcm+SgYE4gAMIEMZlA+uh8qmz+8pDHSMuHHUr2+EUEkWUTY7ioycOW dZgZO1ZKYlXk8vRcvGDwbR1dhmv+jR8hWBHfCLpfLOUE6KRTthA6c4JhwnFpddhM cSJPKqZ+uGASuDGK3WuJVIuGlYUPRS3Gyj2X2Eg43T3zTe2wz/sAAkLLC2TkSeGj QLZEhq/pp2/PWM2LWujdEAiX8zFBJoJjrlR42egNqk27JQ80fVe9fHZruuCYo5SZ ftBDXUJRTahhvW6xFrQcdRyoMG8zlvM8dOjQM38GzkuIFCKp8u8= =vlas -----END PGP SIGNATURE----- Merge tag 'iomap-5.13-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull more iomap updates from Darrick Wong: "Remove the now unused 'io_private' field from struct iomap_ioend, for a modest savings in memory allocation" * tag 'iomap-5.13-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: iomap: remove unused private field from ioend	2021-05-06 23:54:12 -07:00
Linus Torvalds	af120709b1	More new code for 5.13: - Rename the log timestamp struct. - Remove broken transaction counter debugging that wasn't working correctly on very old filesystems. - Various fixes to make pre-lazysbcount filesystems work properly again. - Fix a free space accounting problem where we neglected to consider free space btree blocks that track metadata reservation space when deciding whether or not to allow caller to reserve space for a metadata update. - Fix incorrect pagecache clearing behavior during FUNSHARE ops. - Don't allow log writes if the data device is readonly. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAmCRa+UACgkQ+H93GTRK tOvBLw/+PWgbb/sudVRk51f0bN0NgOBHM/pcW918Xo7TrASjxlRFeJit3TBvKiEi JqRdeUe8OPk6bhrCk1o1qo1zqK4BxDgsS6hn9/ruZAvG/Rh9oDyFQ9YTwvwRGCEs y8aALdlbrCT+4nQ/ORjWlZjTBuuj4N6sT2U21vtqmVjisFkVPhe5FH/Ntd1IXXOs FKVU3pC9SsAiEGWIEH+ZmB6ED1PIqFAqOEPDkP3t2UdN7iV3w1LaLBkYJcCHVZHT h2OX2bkmnDEuX2HKyMgJBOBrQtq/ZLunP+rfh8EjoBb7zBzToI6pAhH9dbmTarsM nV/lydkpSWdy3DIiANEGUpmIOShL5QRf2qwjEnew23scN52xDazZicPNPvEgU/YD EVvtOXbvVCzIs9ft3zMm6zhg3u/u07G7k3e08WO5x6SVe7ys5Z0Do7uESePC+3H+ n9IdN4+EP6RgNPKTRr1NlIuqTYc7wf63vj27QkBr0e7Q2vtoiquBOzrzWgINL90I AvLKrMsniMFBSKLayEhLSWXsm/1VxE2QiYRtfe4igMl4Nfu8dHXwezi4Awv70ibI tLf0Fjm2CK+CMP4SFa7hUzwQ29ZRqVE43ghlHqnZQtOVG1avZJ3mipIxXeO+O9pJ mOgJfZjud5TfsO2dUar1qr+efzCuZ4a/qfVjPlrh0LHJM2sRK5Y= =yoyk -----END PGP SIGNATURE----- Merge tag 'xfs-5.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull more xfs updates from Darrick Wong: "Except for the timestamp struct renaming patches, everything else in here are bug fixes: - Rename the log timestamp struct. - Remove broken transaction counter debugging that wasn't working correctly on very old filesystems. - Various fixes to make pre-lazysbcount filesystems work properly again. - Fix a free space accounting problem where we neglected to consider free space btree blocks that track metadata reservation space when deciding whether or not to allow caller to reserve space for a metadata update. - Fix incorrect pagecache clearing behavior during FUNSHARE ops. - Don't allow log writes if the data device is readonly" * tag 'xfs-5.13-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: don't allow log writes if the data device is readonly xfs: fix xfs_reflink_unshare usage of filemap_write_and_wait_range xfs: set aside allocation btree blocks from block reservation xfs: introduce in-core global counter of allocbt blocks xfs: unconditionally read all AGFs on mounts with perag reservation xfs: count free space btree blocks when scrubbing pre-lazysbcount fses xfs: update superblock counters correctly for !lazysbcount xfs: don't check agf_btreeblks on pre-lazysbcount filesystems xfs: remove obsolete AGF counter debugging xfs: rename struct xfs_legacy_ictimestamp xfs: rename xfs_ictimestamp_t	2021-05-06 23:46:46 -07:00
Gustavo A. R. Silva	c1e4726f46	hpfs: replace one-element array with flexible-array member There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. Also, this helps with the ongoing efforts to enable -Warray-bounds by fixing the following warning: CC [M] fs/hpfs/dir.o fs/hpfs/dir.c: In function `hpfs_readdir': fs/hpfs/dir.c:163:41: warning: array subscript 1 is above array bounds of `u8[1]' {aka `unsigned char[1]'} [-Warray-bounds] 163 \| \|\| de ->name[0] != 1 \|\| de->name[1] != 1)) \| ~~~~~~~~^~~ [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.10/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/79 Link: https://github.com/KSPP/linux/issues/109 Link: https://lkml.kernel.org/r/20210326173510.GA81212@embeddedor Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-06 19:24:13 -07:00
Lu Jialin	312f79c486	nilfs2: fix typos in comments numer -> number in fs/nilfs2/cpfile.c Decription -> Description in fs/nilfs2/ioctl.c isntance -> instance in fs/nilfs2/the_nilfs.c Link: https://lkml.kernel.org/r/1617942951-14631-1-git-send-email-konishi.ryusuke@gmail.com Link: https://lore.kernel.org/r/20210409022519.176988-1-lujialin4@huawei.com Signed-off-by: Lu Jialin <lujialin4@huawei.com> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-06 19:24:13 -07:00
Liu xuzhi	300563e6e0	fs/nilfs2: fix misspellings using codespell tool Two typos are found out by codespell tool \ in 2217th and 2254th lines of segment.c: $ codespell ./fs/nilfs2/ ./segment.c:2217 :retured ==> returned ./segment.c:2254: retured ==> returned Fix two typos found by codespell. Link: https://lkml.kernel.org/r/1617864087-8198-1-git-send-email-konishi.ryusuke@gmail.com Signed-off-by: Liu xuzhi <liu.xuzhi@zte.com.cn> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-06 19:24:13 -07:00
Gustavo A. R. Silva	b4ca4c0178	isofs: fix fall-through warnings for Clang In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a break statement instead of just letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115 Link: https://lkml.kernel.org/r/5b7caa73958588065fabc59032c340179b409ef5.1605896059.git.gustavoars@kernel.org Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-06 19:24:13 -07:00
Davidlohr Bueso	7fab29e356	fs/epoll: restore waking from ep_done_scan() Commit `339ddb53d3` ("fs/epoll: remove unnecessary wakeups of nested epoll") changed the userspace visible behavior of exclusive waiters blocked on a common epoll descriptor upon a single event becoming ready. Previously, all tasks doing epoll_wait would awake, and now only one is awoken, potentially causing missed wakeups on applications that rely on this behavior, such as Apache Qpid. While the aforementioned commit aims at having only a wakeup single path in ep_poll_callback (with the exceptions of epoll_ctl cases), we need to restore the wakeup in what was the old ep_scan_ready_list() such that the next thread can be awoken, in a cascading style, after the waker's corresponding ep_send_events(). Link: https://lkml.kernel.org/r/20210405231025.33829-3-dave@stgolabs.net Fixes: `339ddb53d3` ("fs/epoll: remove unnecessary wakeups of nested epoll") Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jason Baron <jbaron@akamai.com> Cc: Roman Penyaev <rpenyaev@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-06 19:24:13 -07:00
zhouchuangao	5b31a7dfa3	proc/sysctl: fix function name error in comments The function name should be modified to register_sysctl_paths instead of register_sysctl_table_path. Link: https://lkml.kernel.org/r/1615807194-79646-1-git-send-email-zhouchuangao@vivo.com Signed-off-by: zhouchuangao <zhouchuangao@vivo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-06 19:24:11 -07:00
Alexey Dobriyan	1dcdd7ef96	proc: delete redundant subset=pid check Two checks in lookup and readdir code should be enough to not have third check in open code. Can't open what can't be looked up? Link: https://lkml.kernel.org/r/YFYYwIBIkytqnkxP@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-06 19:24:11 -07:00
Alexey Dobriyan	d4455faccd	proc: mandate ->proc_lseek in "struct proc_ops" Now that proc_ops are separate from file_operations and other operations it easy to check all instances to have ->proc_lseek hook and remove check in main code. Note: nonseekable_open() files naturally don't require ->proc_lseek. Garbage collect pde_lseek() function. [adobriyan@gmail.com: smoke test lseek()] Link: https://lkml.kernel.org/r/YG4OIhChOrVTPgdN@localhost.localdomain Link: https://lkml.kernel.org/r/YFYX0Bzwxlc7aBa/@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-06 19:24:11 -07:00
Alexey Dobriyan	b793cd9ab3	proc: save LOC in __xlate_proc_name() Can't look at this verbosity anymore. Link: https://lkml.kernel.org/r/YFYXAp/fgq405qcy@localhost.localdomain Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-06 19:24:11 -07:00
Colin Ian King	f4bf74d829	fs/proc/generic.c: fix incorrect pde_is_permanent check Currently the pde_is_permanent() check is being run on root multiple times rather than on the next proc directory entry. This looks like a copy-paste error. Fix this by replacing root with next. Addresses-Coverity: ("Copy-paste error") Link: https://lkml.kernel.org/r/20210318122633.14222-1-colin.king@canonical.com Fixes: `d919b33daf` ("proc: faster open/read/close with "permanent" files") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Greg Kroah-Hartman <gregkh@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-06 19:24:11 -07:00
Linus Torvalds	7ac86b3dca	Notable items here are a series to take advantage of David Howells' netfs helper library from Jeff, three new filesystem client metrics from Xiubo, ceph.dir.rsnaps vxattr from Yanhu and two auth-related fixes from myself, marked for stable. Interspersed is a smattering of assorted fixes and cleanups across the filesystem. -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmCT8IITHGlkcnlvbW92 QGdtYWlsLmNvbQAKCRBKf944AhHzizgqCACYbyY4Yr/2C8fZsn+P9rd97zRTbcC6 eufTZwnlECLnc89BxJQRk9a2UpDJfC8RMM3/9tmiulc8G4M+ggVbdFQTCzsZox3c vLAunGeVyfKIY+16Bv2RNuoO3KeeZm5aB3jXJ5QcUPcXmd4XnHKI1FU2ebC56UJb pxxfHpE6fb59r6Ek1e5uUFyta4KDMrvwXozghuAPEgT1GpKeA9zMIGI0CkQbBHlW PWHpcahTiT6GWa/d9ud0CnfssiBxVydWyKTz9xppYC6LNdsZUf9tBmYYGRklcjoA yAwPSuqxNmg+7uWubEawc0+a/3fXORgp2SF7Rbp1XYE+HpfnMF1J+nIn =IO5c -----END PGP SIGNATURE----- Merge tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-client Pull ceph updates from Ilya Dryomov: "Notable items here are - a series to take advantage of David Howells' netfs helper library from Jeff - three new filesystem client metrics from Xiubo - ceph.dir.rsnaps vxattr from Yanhu - two auth-related fixes from myself, marked for stable. Interspersed is a smattering of assorted fixes and cleanups across the filesystem" * tag 'ceph-for-5.13-rc1' of git://github.com/ceph/ceph-client: (24 commits) libceph: allow addrvecs with a single NONE/blank address libceph: don't set global_id until we get an auth ticket libceph: bump CephXAuthenticate encoding version ceph: don't allow access to MDS-private inodes ceph: fix up some bare fetches of i_size ceph: convert some PAGE_SIZE invocations to thp_size() ceph: support getting ceph.dir.rsnaps vxattr ceph: drop pinned_page parameter from ceph_get_caps ceph: fix inode leak on getattr error in __fh_to_dentry ceph: only check pool permissions for regular files ceph: send opened files/pinned caps/opened inodes metrics to MDS daemon ceph: avoid counting the same request twice or more ceph: rename the metric helpers ceph: fix kerneldoc copypasta over ceph_start_io_direct ceph: use attach/detach_page_private for tracking snap context ceph: don't use d_add in ceph_handle_snapdir ceph: don't clobber i_snap_caps on non-I_NEW inode ceph: fix fall-through warnings for Clang ceph: convert ceph_readpages to ceph_readahead ceph: convert ceph_write_begin to netfs_write_begin ...	2021-05-06 10:27:02 -07:00
Linus Torvalds	682a8e2b41	Code cleanups and a bug fix - W=1 compiler warning cleanups - Mutex initialization simplification - Protect against NULL pointer exception during mount -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEKvuQkp28KvPJn/fVfL7QslYSS/0FAmCTTQoRHGNvZGVAdHlo aWNrcy5jb20ACgkQfL7QslYSS/3M4RAAlpWKtZlUd83gr2wS7jKX+WR6Bmc9yu2Q hH9eJRca1O3LQHzRfsh6Y4PQmQD10kPOZUD4Qfy5FgYebNu/yapjx+2O7VpXeiiu b+Ien7l5f2qyBUjjIypW7m0T52YHDV5vc2nWkGoxYl3g6ecla3THdCmzpMD7iAUa 7tNdDd+zF+I2Lm0nVW2r/wLSUHoWs1d5u5GKOWkWgyEa80A4F6nHR6g+1GuUYhpE 8jfo5NnGGYjM3M+uH2MUBGan7K99YH+bTTsCSeEOSx7MK1X7zeH9pKaHIpPIYI6n m7iymS0qAfkdlK8GEMLVTgjVPY3Jh6f7iLug2+js4y+j2sXsBZvJzQHRS6uKu4YP dW/Da/0qntnnIyE0gD/SnwRGjTfv3LBbnw2vJSv2d9z9N50gIHJB43ZFP4XYfK0n WWHce7W7O+lo+4ZPTDd4G6Nz96Fi9Bl6bMA71Pw6P/J9KWnl6LOyp95YHifDlZ+f asYkxOvHDJagA6mKos0lnc/GT8IPAe65p6Jsq0IE32hplS4Cq9ajrUZ+3+VHsADr ZvG84IWe2vK2fWgvcaSpxLx3BhJTh/qkN/ISOykzVMWbet114XeXCC4hOQOVqU/A fsJgq6TeMMlpBeUnHobRN0AihGpatgiaMq7GtlZyEmk4Rkotf65vw/gmnR301pDq tTVVPw1XTdQ= =aHZx -----END PGP SIGNATURE----- Merge tag 'ecryptfs-5.13-rc1-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs updates from Tyler Hicks: "Code cleanups and a bug fix - W=1 compiler warning cleanups - Mutex initialization simplification - Protect against NULL pointer exception during mount" * tag 'ecryptfs-5.13-rc1-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: ecryptfs: fix kernel panic with null dev_name ecryptfs: remove unused helpers ecryptfs: Fix typo in message eCryptfs: Use DEFINE_MUTEX() for mutex lock ecryptfs: keystore: Fix some kernel-doc issues and demote non-conformant headers ecryptfs: inode: Help out nearly-there header and demote non-conformant ones ecryptfs: mmap: Help out one function header and demote other abuses ecryptfs: crypto: Supply some missing param descriptions and demote abuses ecryptfs: miscdev: File headers are not good kernel-doc candidates ecryptfs: main: Demote a bunch of non-conformant kernel-doc headers ecryptfs: messaging: Add missing param descriptions and demote abuses ecryptfs: super: Fix formatting, naming and kernel-doc abuses ecryptfs: file: Demote kernel-doc abuses ecryptfs: kthread: Demote file header and provide description for 'cred' ecryptfs: dentry: File headers are not good candidates for kernel-doc ecryptfs: debug: Demote a couple of kernel-doc abuses ecryptfs: read_write: File headers do not make good candidates for kernel-doc ecryptfs: use DEFINE_MUTEX() for mutex lock eCryptfs: add a semicolon	2021-05-06 10:06:39 -07:00
yangerkun	cf7b39a0cb	block: reexpand iov_iter after read/write We get a bug: BUG: KASAN: slab-out-of-bounds in iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 Read of size 8 at addr ffff0000d3fb11f8 by task CPU: 0 PID: 12582 Comm: syz-executor.2 Not tainted 5.10.0-00843-g352c8610ccd2 #2 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2d0 arch/arm64/kernel/stacktrace.c:132 show_stack+0x28/0x34 arch/arm64/kernel/stacktrace.c:196 __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x110/0x164 lib/dump_stack.c:118 print_address_description+0x78/0x5c8 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report+0x148/0x1e4 mm/kasan/report.c:562 check_memory_region_inline mm/kasan/generic.c:183 [inline] __asan_load8+0xb4/0xbc mm/kasan/generic.c:252 iov_iter_revert+0x11c/0x404 lib/iov_iter.c:1139 io_read fs/io_uring.c:3421 [inline] io_issue_sqe+0x2344/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Allocated by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track mm/kasan/common.c:56 [inline] __kasan_kmalloc+0xdc/0x120 mm/kasan/common.c:461 kasan_kmalloc+0xc/0x14 mm/kasan/common.c:475 __kmalloc+0x23c/0x334 mm/slub.c:3970 kmalloc include/linux/slab.h:557 [inline] __io_alloc_async_data+0x68/0x9c fs/io_uring.c:3210 io_setup_async_rw fs/io_uring.c:3229 [inline] io_read fs/io_uring.c:3436 [inline] io_issue_sqe+0x2954/0x2d64 fs/io_uring.c:5943 __io_queue_sqe+0x19c/0x520 fs/io_uring.c:6260 io_queue_sqe+0x2a4/0x590 fs/io_uring.c:6326 io_submit_sqe fs/io_uring.c:6395 [inline] io_submit_sqes+0x4c0/0xa04 fs/io_uring.c:6624 __do_sys_io_uring_enter fs/io_uring.c:9013 [inline] __se_sys_io_uring_enter fs/io_uring.c:8960 [inline] __arm64_sys_io_uring_enter+0x190/0x708 fs/io_uring.c:8960 __invoke_syscall arch/arm64/kernel/syscall.c:36 [inline] invoke_syscall arch/arm64/kernel/syscall.c:48 [inline] el0_svc_common arch/arm64/kernel/syscall.c:158 [inline] do_el0_svc+0x120/0x290 arch/arm64/kernel/syscall.c:227 el0_svc+0x1c/0x28 arch/arm64/kernel/entry-common.c:367 el0_sync_handler+0x98/0x170 arch/arm64/kernel/entry-common.c:383 el0_sync+0x140/0x180 arch/arm64/kernel/entry.S:670 Freed by task 12570: stack_trace_save+0x80/0xb8 kernel/stacktrace.c:121 kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track+0x38/0x6c mm/kasan/common.c:56 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:355 __kasan_slab_free+0x124/0x150 mm/kasan/common.c:422 kasan_slab_free+0x10/0x1c mm/kasan/common.c:431 slab_free_hook mm/slub.c:1544 [inline] slab_free_freelist_hook mm/slub.c:1577 [inline] slab_free mm/slub.c:3142 [inline] kfree+0x104/0x38c mm/slub.c:4124 io_dismantle_req fs/io_uring.c:1855 [inline] __io_free_req+0x70/0x254 fs/io_uring.c:1867 io_put_req_find_next fs/io_uring.c:2173 [inline] __io_queue_sqe+0x1fc/0x520 fs/io_uring.c:6279 __io_req_task_submit+0x154/0x21c fs/io_uring.c:2051 io_req_task_submit+0x2c/0x44 fs/io_uring.c:2063 task_work_run+0xdc/0x128 kernel/task_work.c:151 get_signal+0x6f8/0x980 kernel/signal.c:2562 do_signal+0x108/0x3a4 arch/arm64/kernel/signal.c:658 do_notify_resume+0xbc/0x25c arch/arm64/kernel/signal.c:722 work_pending+0xc/0x180 blkdev_read_iter can truncate iov_iter's count since the count + pos may exceed the size of the blkdev. This will confuse io_read that we have consume the iovec. And once we do the iov_iter_revert in io_read, we will trigger the slab-out-of-bounds. Fix it by reexpand the count with size has been truncated. blkdev_write_iter can trigger the problem too. Signed-off-by: yangerkun <yangerkun@huawei.com> Acked-by: Pavel Begunkov <asml.silencec@gmail.com> Link: https://lore.kernel.org/r/20210401071807.3328235-1-yangerkun@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-06 09:24:03 -06:00
Thadeu Lima de Souza Cascardo	d1f8280887	io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffers Read and write operations are capped to MAX_RW_COUNT. Some read ops rely on that limit, and that is not guaranteed by the IORING_OP_PROVIDE_BUFFERS. Truncate those lengths when doing io_add_buffers, so buffer addresses still use the uncapped length. Also, take the chance and change struct io_buffer len member to __u32, so it matches struct io_provide_buffer len member. This fixes CVE-2021-3491, also reported as ZDI-CAN-13546. Fixes: `ddf0322db7` ("io_uring: add IORING_OP_PROVIDE_BUFFERS") Reported-by: Billy Jheng Bing-Jhong (@st424204) Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-05 15:17:35 -06:00
Linus Torvalds	8404c9fbc8	Merge branch 'akpm' (patches from Andrew) Merge more updates from Andrew Morton: "The remainder of the main mm/ queue. 143 patches. Subsystems affected by this patch series (all mm): pagecache, hugetlb, userfaultfd, vmscan, compaction, migration, cma, ksm, vmstat, mmap, kconfig, util, memory-hotplug, zswap, zsmalloc, highmem, cleanups, and kfence" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (143 commits) kfence: use power-efficient work queue to run delayed work kfence: maximize allocation wait timeout duration kfence: await for allocation using wait_event kfence: zero guard page after out-of-bounds access mm/process_vm_access.c: remove duplicate include mm/mempool: minor coding style tweaks mm/highmem.c: fix coding style issue btrfs: use memzero_page() instead of open coded kmap pattern iov_iter: lift memzero_page() to highmem.h mm/zsmalloc: use BUG_ON instead of if condition followed by BUG. mm/zswap.c: switch from strlcpy to strscpy arm64/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE x86/Kconfig: introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE mm,memory_hotplug: add kernel boot option to enable memmap_on_memory acpi,memhotplug: enable MHP_MEMMAP_ON_MEMORY when supported mm,memory_hotplug: allocate memmap from the added memory range mm,memory_hotplug: factor out adjusting present pages into adjust_present_page_count() mm,memory_hotplug: relax fully spanned sections check drivers/base/memory: introduce memory_block_{online,offline} mm/memory_hotplug: remove broken locking of zone PCP structures during hot remove ...	2021-05-05 13:50:15 -07:00
Linus Torvalds	a79cdfba68	Additional fixes and clean-ups for NFSD since tags/nfsd-5.13, including a fix to grant read delegations for files open for writing. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmCJz0UACgkQM2qzM29m f5einQ//ZqErt5sYcvQw5Onkt+lDHp13XgjIVGo1DrAegrdoTMT+jpUfYSbDLEuC B+G2+rUGHpNZ017mzoAmzoeA+pKsdRX+YAy/i8K+7r/cr6T9v78yoX9rx1rbEQEq QFJm0fGrFLydzaxRpVq5by7yCKD2DaCQL6DefcXQitfKlfRJ8i/D/vXVBb4FJcmg 4qRJ7RCcck5gqfInFJ+ZKRjC/9Oj9bNUJz2Ph9mWH1qDDKachgnfWYqrnFQdjYTr /Tb+6gyqnRplHU7LmPYSREZqrS3CuvPX0MSXKcFhITj0teaF3b7MArIsSrpw/GGi kKrc/K+46COA/Ej0stdGev+Fe3GRlPKUk7UgdD3uWvQrDZ5WdcvN1N7xyCHk90qO pOmU3iQuFIBJLaHfwzDaPUJZKMsEO+hsd+liwJjBg6WD4DDLYSQT7jglwYwCxeV4 ywJi9C3DKaM8kpSBbnMUreHdIIz1d8hNifM4PKgtKGpaXaVlO+rxbkQfZjVAF7Sk uRXIegRi+YSJY7RJIhT+NcmmJbyQOEXu9UyUJmqpIzbzmiLF/K2qUk5jPxFLgBpq CHmdEIfcoGhA1UqAlynplk5+I5QvhzjxENZJ2Bz8Xwn/uDebKlNhrQeXQP1mQ8dK 3kJ3RUN/yQxgYCXIQWg/ug51hSZ5Y6c7RzaJeW359V5DbPKBQOU= =HB+N -----END PGP SIGNATURE----- Merge tag 'nfsd-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull more nfsd updates from Chuck Lever: "Additional fixes and clean-ups for NFSD since tags/nfsd-5.13, including a fix to grant read delegations for files open for writing" * tag 'nfsd-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Fix null pointer dereference in svc_rqst_free() SUNRPC: fix ternary sign expansion bug in tracing nfsd: Fix fall-through warnings for Clang nfsd: grant read delegations to clients holding writes nfsd: reshuffle some code nfsd: track filehandle aliasing in nfs4_files nfsd: hash nfs4_files by inode number nfsd: ensure new clients break delegations nfsd: removed unused argument in nfsd_startup_generic() nfsd: remove unused function svcrdma: Pass a useful error code to the send_err tracepoint svcrdma: Rename goto labels in svc_rdma_sendto() svcrdma: Don't leak send_ctxt on Send errors	2021-05-05 13:44:19 -07:00
Linus Torvalds	7c9e41e0ef	10 CIFS/SMB3 changesets including some important multichannel fixes, as well as support for handle leases (deferred close) and shutdown support -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmCSKIIACgkQiiy9cAdy T1F+Hgv+L3NkOFwMvBgGjHP9b+Lkv/YWGKeJkLwQW1xqoHIUHn0/+C5+9ScJGBZc WVuzp4pqEIgv4my4UQiyqVwzcmz4BqY2KTDJzBYtqANt6pVp1w6YtC2GplgJE3J2 qoQh1RwZaqSXfjcoPSRnv5EiSF6DbHlBUhPMd53qOE9pwaf/38i9/M3d9G7EIB8h rRNmpGtFuzBHdtGQ2b+4+8ftCIpEBDu/OXcA6QXMUMcvKGaruU39NOxBuW6a/VO5 9P47Nsof3dlN758uesoQT2VMEc0pcpwAs9BwOkinfXWGUyNqJmbPNvddIOlaP/dv vG58n/+JqvWUKgEnrNk5h+wD7wmXpgxpQ523sD5k6bID1hc+vh4lXf+O+iltbYtc 1ce9ITglSVxA7z4qwFWhtawBy1j1YyvltTAGvhnzdtKZLRk6e5AYIFOUn9O+AMJw Eofk4lD0kNTdXyMkveGluRMBXrOzKMdmfw5FW/9hObYgebEGpTQkGyIMpaStraZM 8hDNAGTk =BFOj -----END PGP SIGNATURE----- Merge tag '5.13-rc-smb3-part2' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs updates from Steve French: "Ten CIFS/SMB3 changes - including two marked for stable - including some important multichannel fixes, as well as support for handle leases (deferred close) and shutdown support: - some important multichannel fixes - support for handle leases (deferred close) - shutdown support (which is also helpful since it enables multiple xfstests) - enable negotiating stronger encryption by default (GCM256) - improve wireshark debugging by allowing more options for root to dump decryption keys SambaXP and the SMB3 Plugfest test event are going on now so I am expecting more patches over the next few days due to extra testing (including more multichannel fixes)" * tag '5.13-rc-smb3-part2' of git://git.samba.org/sfrench/cifs-2.6: fs/cifs: Fix resource leak Cifs: Fix kernel oops caused by deferred close for files. cifs: fix regression when mounting shares with prefix paths cifs: use echo_interval even when connection not ready. cifs: detect dead connections only when echoes are enabled. smb3.1.1: allow dumping keys for multiuser mounts smb3.1.1: allow dumping GCM256 keys to improve debugging of encrypted shares cifs: add shutdown support cifs: Deferred close for files smb3.1.1: enable negotiating stronger encryption by default	2021-05-05 13:37:07 -07:00
Ira Weiny	d048b9c2a7	btrfs: use memzero_page() instead of open coded kmap pattern There are many places where kmap/memset/kunmap patterns occur. Use the newly lifted memzero_page() to eliminate direct uses of kmap and leverage the new core functions use of kmap_local_page(). The development of this patch was aided by the following coccinelle script: // <smpl> // SPDX-License-Identifier: GPL-2.0-only // Find kmap/memset/kunmap pattern and replace with memset*page calls // // NOTE: Offsets and other expressions may be more complex than what the script // will automatically generate. Therefore a catchall rule is provided to find // the pattern which then must be evaluated by hand. // // Confidence: Low // Copyright: (C) 2021 Intel Corporation // URL: http://coccinelle.lip6.fr/ // Comments: // Options: // // Then the memset pattern // @ memset_rule1 @ expression page, V, L, Off; identifier ptr; type VP; @@ ( -VP ptr = kmap(page); \| -ptr = kmap(page); \| -VP ptr = kmap_atomic(page); \| -ptr = kmap_atomic(page); ) <+... ( -memset(ptr, 0, L); +memzero_page(page, 0, L); \| -memset(ptr + Off, 0, L); +memzero_page(page, Off, L); \| -memset(ptr, V, L); +memset_page(page, V, 0, L); \| -memset(ptr + Off, V, L); +memset_page(page, V, Off, L); ) ...+> ( -kunmap(page); \| -kunmap_atomic(ptr); ) // Remove any pointers left unused @ depends on memset_rule1 @ identifier memset_rule1.ptr; type VP, VP1; @@ -VP ptr; ... when != ptr; ? VP1 ptr; // // Catch all // @ memset_rule2 @ expression page; identifier ptr; expression GenTo, GenSize, GenValue; type VP; @@ ( -VP ptr = kmap(page); \| -ptr = kmap(page); \| -VP ptr = kmap_atomic(page); \| -ptr = kmap_atomic(page); ) <+... ( // // Some call sites have complex expressions within the memset/memcpy // The follow are catch alls which need to be evaluated by hand. // -memset(GenTo, 0, GenSize); +memzero_pageExtra(page, GenTo, GenSize); \| -memset(GenTo, GenValue, GenSize); +memset_pageExtra(page, GenValue, GenTo, GenSize); ) ...+> ( -kunmap(page); \| -kunmap_atomic(ptr); ) // Remove any pointers left unused @ depends on memset_rule2 @ identifier memset_rule2.ptr; type VP, VP1; @@ -VP ptr; ... when != ptr; ? VP1 ptr; // </smpl> Link: https://lkml.kernel.org/r/20210309212137.2610186-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: David Sterba <dsterba@suse.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-05-05 11:27:27 -07:00

... 8 9 10 11 12 ...

71563 Коммитов