WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Michal Hocko	c15aeead24	mm, oom: do not trigger out_of_memory from the #PF commit `60e2793d44` upstream. Any allocation failure during the #PF path will return with VM_FAULT_OOM which in turn results in pagefault_out_of_memory. This can happen for 2 different reasons. a) Memcg is out of memory and we rely on mem_cgroup_oom_synchronize to perform the memcg OOM handling or b) normal allocation fails. The latter is quite problematic because allocation paths already trigger out_of_memory and the page allocator tries really hard to not fail allocations. Anyway, if the OOM killer has been already invoked there is no reason to invoke it again from the #PF path. Especially when the OOM condition might be gone by that time and we have no way to find out other than allocate. Moreover if the allocation failed and the OOM killer hasn't been invoked then we are unlikely to do the right thing from the #PF context because we have already lost the allocation context and restictions and therefore might oom kill a task from a different NUMA domain. This all suggests that there is no legitimate reason to trigger out_of_memory from pagefault_out_of_memory so drop it. Just to be sure that no #PF path returns with VM_FAULT_OOM without allocation print a warning that this is happening before we restart the #PF. [VvS: #PF allocation can hit into limit of cgroup v1 kmem controller. This is a local problem related to memcg, however, it causes unnecessary global OOM kills that are repeated over and over again and escalate into a real disaster. This has been broken since kmem accounting has been introduced for cgroup v1 (3.8). There was no kmem specific reclaim for the separate limit so the only way to handle kmem hard limit was to return with ENOMEM. In upstream the problem will be fixed by removing the outdated kmem limit, however stable and LTS kernels cannot do it and are still affected. This patch fixes the problem and should be backported into stable/LTS.] Link: https://lkml.kernel.org/r/f5fd8dd8-0ad4-c524-5f65-920b01972a42@virtuozzo.com Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:16 +01:00
Vasily Averin	487a4c60c5	mm, oom: pagefault_out_of_memory: don't force global OOM for dying tasks commit `0b28179a61` upstream. Patch series "memcg: prohibit unconditional exceeding the limit of dying tasks", v3. Memory cgroup charging allows killed or exiting tasks to exceed the hard limit. It can be misused and allowed to trigger global OOM from inside a memcg-limited container. On the other hand if memcg fails allocation, called from inside #PF handler it triggers global OOM from inside pagefault_out_of_memory(). To prevent these problems this patchset: (a) removes execution of out_of_memory() from pagefault_out_of_memory(), becasue nobody can explain why it is necessary. (b) allow memcg to fail allocation of dying/killed tasks. This patch (of 3): Any allocation failure during the #PF path will return with VM_FAULT_OOM which in turn results in pagefault_out_of_memory which in turn executes out_out_memory() and can kill a random task. An allocation might fail when the current task is the oom victim and there are no memory reserves left. The OOM killer is already handled at the page allocator level for the global OOM and at the charging level for the memcg one. Both have much more information about the scope of allocation/charge request. This means that either the OOM killer has been invoked properly and didn't lead to the allocation success or it has been skipped because it couldn't have been invoked. In both cases triggering it from here is pointless and even harmful. It makes much more sense to let the killed task die rather than to wake up an eternally hungry oom-killer and send him to choose a fatter victim for breakfast. Link: https://lkml.kernel.org/r/0828a149-786e-7c06-b70a-52d086818ea3@virtuozzo.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Roman Gushchin <guro@fb.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:16 +01:00
Vasily Averin	f1e83db27a	memcg: prohibit unconditional exceeding the limit of dying tasks commit `a4ebf1b6ca` upstream. Memory cgroup charging allows killed or exiting tasks to exceed the hard limit. It is assumed that the amount of the memory charged by those tasks is bound and most of the memory will get released while the task is exiting. This is resembling a heuristic for the global OOM situation when tasks get access to memory reserves. There is no global memory shortage at the memcg level so the memcg heuristic is more relieved. The above assumption is overly optimistic though. E.g. vmalloc can scale to really large requests and the heuristic would allow that. We used to have an early break in the vmalloc allocator for killed tasks but this has been reverted by commit `b8c8a338f7` ("Revert "vmalloc: back off when the current task is killed""). There are likely other similar code paths which do not check for fatal signals in an allocation&charge loop. Also there are some kernel objects charged to a memcg which are not bound to a process life time. It has been observed that it is not really hard to trigger these bypasses and cause global OOM situation. One potential way to address these runaways would be to limit the amount of excess (similar to the global OOM with limited oom reserves). This is certainly possible but it is not really clear how much of an excess is desirable and still protects from global OOMs as that would have to consider the overall memcg configuration. This patch is addressing the problem by removing the heuristic altogether. Bypass is only allowed for requests which either cannot fail or where the failure is not desirable while excess should be still limited (e.g. atomic requests). Implementation wise a killed or dying task fails to charge if it has passed the OOM killer stage. That should give all forms of reclaim chance to restore the limit before the failure (ENOMEM) and tell the caller to back off. In addition, this patch renames should_force_charge() helper to task_is_dying() because now its use is not associated witch forced charging. This patch depends on pagefault_out_of_memory() to not trigger out_of_memory(), because then a memcg failure can unwind to VM_FAULT_OOM and cause a global OOM killer. Link: https://lkml.kernel.org/r/8f5cebbb-06da-4902-91f0-6566fc4b4203@virtuozzo.com Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Suggested-by: Michal Hocko <mhocko@suse.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Roman Gushchin <guro@fb.com> Cc: Uladzislau Rezki <urezki@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Shakeel Butt <shakeelb@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:16 +01:00
Matthew Wilcox (Oracle)	6560e8cd86	mm/filemap.c: remove bogus VM_BUG_ON commit `d417b49fff` upstream. It is not safe to check page->index without holding the page lock. It can be changed if the page is moved between the swap cache and the page cache for a shmem file, for example. There is a VM_BUG_ON below which checks page->index is correct after taking the page lock. Link: https://lkml.kernel.org/r/20210818144932.940640-1-willy@infradead.org Fixes: `5c211ba29d` ("mm: add and use find_lock_entries") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reported-by: <syzbot+c87be4f669d920c76330@syzkaller.appspotmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:16 +01:00
Dominique Martinet	6847c42862	9p/net: fix missing error check in p9_check_errors commit `27eb4c3144` upstream. Link: https://lkml.kernel.org/r/99338965-d36c-886e-cd0e-1d8fff2b4746@gmail.com Reported-by: syzbot+06472778c97ed94af66d@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Dominique Martinet <asmadeus@codewreck.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:16 +01:00
Daniel Borkmann	1f03911876	net, neigh: Enable state migration between NUD_PERMANENT and NTF_USE [ Upstream commit `3dc20f4762` ] Currently, it is not possible to migrate a neighbor entry between NUD_PERMANENT state and NTF_USE flag with a dynamic NUD state from a user space control plane. Similarly, it is not possible to add/remove NTF_EXT_LEARNED flag from an existing neighbor entry in combination with NTF_USE flag. This is due to the latter directly calling into neigh_event_send() without any meta data updates as happening in __neigh_update(). Thus, to enable this use case, extend the latter with a NEIGH_UPDATE_F_USE flag where we break the NUD_PERMANENT state in particular so that a latter neigh_event_send() is able to re-resolve a neighbor entry. Before fix, NUD_PERMANENT -> NUD_* & NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] As can be seen, despite the admin-triggered replace, the entry remains in the NUD_PERMANENT state. After fix, NUD_PERMANENT -> NUD_* & NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE [...] # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn STALE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a PERMANENT [...] After the fix, the admin-triggered replace switches to a dynamic state from the NTF_USE flag which triggered a new neighbor resolution. Likewise, we can transition back from there, if needed, into NUD_PERMANENT. Similar before/after behavior can be observed for below transitions: Before fix, NTF_USE -> NTF_USE \| NTF_EXT_LEARNED -> NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 use # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [...] After fix, NTF_USE -> NTF_USE \| NTF_EXT_LEARNED -> NTF_USE: # ./ip/ip n replace 192.168.178.30 dev enp5s0 use # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use extern_learn # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a extern_learn REACHABLE [...] # ./ip/ip n replace 192.168.178.30 dev enp5s0 use # ./ip/ip n 192.168.178.30 dev enp5s0 lladdr f4:8c:50:5e:71:9a REACHABLE [..] Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Roopa Prabhu <roopa@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:16 +01:00
Anatolij Gustschin	6a85f01a89	dmaengine: bestcomm: fix system boot lockups commit `adec566b05` upstream. memset() and memcpy() on an MMIO region like here results in a lockup at startup on mpc5200 platform (since this first happens during probing of the ATA and Ethernet drivers). Use memset_io() and memcpy_toio() instead. Fixes: `2f9ea1bde0` ("bestcomm: core bestcomm support for Freescale MPC5200") Cc: stable@vger.kernel.org # v5.14+ Signed-off-by: Anatolij Gustschin <agust@denx.de> Link: https://lore.kernel.org/r/20211014094012.21286-1-agust@denx.de Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:16 +01:00
Kishon Vijay Abraham I	c4cd9e5acc	dmaengine: ti: k3-udma: Set r/tchan or rflow to NULL if request fail commit `eb91224e47` upstream. udma_get_() checks if rchan/tchan/rflow is already allocated by checking if it has a NON NULL value. For the error cases, rchan/tchan/rflow will have error value and udma_get_() considers this as already allocated (PASS) since the error values are NON NULL. This results in NULL pointer dereference error while de-referencing rchan/tchan/rflow. Reset the value of rchan/tchan/rflow to NULL if a channel request fails. CC: stable@vger.kernel.org Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Link: https://lore.kernel.org/r/20211031032411.27235-3-kishon@ti.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:16 +01:00
Kishon Vijay Abraham I	68ae6ae143	dmaengine: ti: k3-udma: Set bchan to NULL if a channel request fail commit `5c6c6d60e4` upstream. bcdma_get_() checks if bchan is already allocated by checking if it has a NON NULL value. For the error cases, bchan will have error value and bcdma_get_() considers this as already allocated (PASS) since the error values are NON NULL. This results in NULL pointer dereference error while de-referencing bchan. Reset the value of bchan to NULL if a channel request fails. CC: stable@vger.kernel.org Acked-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Link: https://lore.kernel.org/r/20211031032411.27235-2-kishon@ti.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:16 +01:00
Namjae Jeon	1dd578e985	ksmbd: don't need 8byte alignment for request length in ksmbd_check_message commit `b53ad8107e` upstream. When validating request length in ksmbd_check_message, 8byte alignment is not needed for compound request. It can cause wrong validation of request length. Fixes: `e2f34481b2` ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org # v5.15 Acked-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:15 +01:00
Marios Makassikis	aacb2ddb67	ksmbd: Fix buffer length check in fsctl_validate_negotiate_info() commit `78f1688a64` upstream. The validate_negotiate_info_req struct definition includes an extra field to access the data coming after the header. This causes the check in fsctl_validate_negotiate_info() to count the first element of the array twice. This in turn makes some valid requests fail, depending on whether they include padding or not. Fixes: `f7db8fd03a` ("ksmbd: add validation in smb2_ioctl") Cc: stable@vger.kernel.org # v5.15 Acked-by: Namjae Jeon <linkinjeon@kernel.org> Acked-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Marios Makassikis <mmakassikis@freebox.fr> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:15 +01:00
Shin'ichiro Kawasaki	5e84e9d61d	block: Hold invalidate_lock in BLKRESETZONE ioctl commit `86399ea071` upstream. When BLKRESETZONE ioctl and data read race, the data read leaves stale page cache. The commit `e511350590` ("block: Discard page cache of zone reset target range") added page cache truncation to avoid stale page cache after the ioctl. However, the stale page cache still can be read during the reset zone operation for the ioctl. To avoid the stale page cache completely, hold invalidate_lock of the block device file mapping. Fixes: `e511350590` ("block: Discard page cache of zone reset target range") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.15 Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211111085238.942492-1-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:15 +01:00
Shin'ichiro Kawasaki	373c2bfecb	block: Hold invalidate_lock in BLKZEROOUT ioctl commit `35e4c6c1a2` upstream. When BLKZEROOUT ioctl and data read race, the data read leaves stale page cache. To avoid the stale page cache, hold invalidate_lock of the block device file mapping. The stale page cache is observed when blktests test case block/009 is modified to call "blkdiscard -z" command and repeated hundreds of times. This patch can be applied back to the stable kernel version v5.15.y. Rework is required for older stable kernels. Fixes: `22dd6d3566` ("block: invalidate the page cache when issuing BLKZEROOUT") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.15 Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211109104723.835533-3-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:15 +01:00
Shin'ichiro Kawasaki	f9bed86a35	block: Hold invalidate_lock in BLKDISCARD ioctl commit `7607c44c15` upstream. When BLKDISCARD ioctl and data read race, the data read leaves stale page cache. To avoid the stale page cache, hold invalidate_lock of the block device file mapping. The stale page cache is observed when blktests test case block/009 is repeated hundreds of times. This patch can be applied back to the stable kernel version v5.15.y with slight patch edit. Rework is required for older stable kernels. Fixes: `351499a172` ("block: Invalidate cache on discard v2") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: stable@vger.kernel.org # v5.15 Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20211109104723.835533-2-shinichiro.kawasaki@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:15 +01:00
Matthew Brost	876d6242d2	drm/i915/guc: Fix blocked context accounting commit `fc30a6764a` upstream. Prior to this patch the blocked context counter was cleared on init_sched_state (used during registering a context & resets) which is incorrect. This state needs to be persistent or the counter can read the incorrect value resulting in scheduling never getting enabled again. Fixes: `62eaf0ae21` ("drm/i915/guc: Support request cancellation") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: <stable@vger.kernel.org> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210909164744.31249-2-matthew.brost@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:15 +01:00
Gao Xiang	f609789699	erofs: fix unsafe pagevec reuse of hooked pclusters commit `86432a6dca` upstream. There are pclusters in runtime marked with Z_EROFS_PCLUSTER_TAIL before actual I/O submission. Thus, the decompression chain can be extended if the following pcluster chain hooks such tail pcluster. As the related comment mentioned, if some page is made of a hooked pcluster and another followed pcluster, it can be reused for in-place I/O (since I/O should be submitted anyway): _______________________________________________________________ \| tail (partial) page \| head (partial) page \| \|_____PRIMARY_HOOKED___\|____________PRIMARY_FOLLOWED____________\| However, it's by no means safe to reuse as pagevec since if such PRIMARY_HOOKED pclusters finally move into bypass chain without I/O submission. It's somewhat hard to reproduce with LZ4 and I just found it (general protection fault) by ro_fsstressing a LZMA image for long time. I'm going to actively clean up related code together with multi-page folio adaption in the next few months. Let's address it directly for easier backporting for now. Call trace for reference: z_erofs_decompress_pcluster+0x10a/0x8a0 [erofs] z_erofs_decompress_queue.isra.36+0x3c/0x60 [erofs] z_erofs_runqueue+0x5f3/0x840 [erofs] z_erofs_readahead+0x1e8/0x320 [erofs] read_pages+0x91/0x270 page_cache_ra_unbounded+0x18b/0x240 filemap_get_pages+0x10a/0x5f0 filemap_read+0xa9/0x330 new_sync_read+0x11b/0x1a0 vfs_read+0xf1/0x190 Link: https://lore.kernel.org/r/20211103182006.4040-1-xiang@kernel.org Fixes: `3883a79abd` ("staging: erofs: introduce VLE decompression support") Cc: <stable@vger.kernel.org> # 4.19+ Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:15 +01:00
Xiubo Li	11a102de53	ceph: fix mdsmap decode when there are MDS's beyond max_mds commit `0e24421ac4` upstream. If the max_mds is decreased in a cephfs cluster, there is a window of time before the MDSs are removed. If a map goes out during this period, the mdsmap may show the decreased max_mds but still shows those MDSes as in or in the export target list. Ensure that we don't fail the map decode in that case. Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/52436 Fixes: `d517b3983d` ("ceph: reconnect to the export targets on new mdsmaps") Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:15 +01:00
Dongliang Mu	5e1b901dd4	f2fs: fix UAF in f2fs_available_free_memory commit `5429c9dbc9` upstream. if2fs_fill_super -> f2fs_build_segment_manager -> create_discard_cmd_control -> f2fs_start_discard_thread It invokes kthread_run to create a thread and run issue_discard_thread. However, if f2fs_build_node_manager fails, the control flow goes to free_nm and calls f2fs_destroy_node_manager. This function will free sbi->nm_info. However, if issue_discard_thread accesses sbi->nm_info after the deallocation, but before the f2fs_stop_discard_thread, it will cause UAF(Use-after-free). -> f2fs_destroy_segment_manager -> destroy_discard_cmd_control -> f2fs_stop_discard_thread Fix this by stopping discard thread before f2fs_destroy_node_manager. Note that, the commit `d6d2b491a8` introduces the call of f2fs_available_free_memory into issue_discard_thread. Cc: stable@vger.kernel.org Fixes: `d6d2b491a8` ("f2fs: allow to change discard policy based on cached discard cmds") Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:15 +01:00
Daeho Jeong	6fd542665e	f2fs: include non-compressed blocks in compr_written_block commit `09631cf323` upstream. Need to include non-compressed blocks in compr_written_block to estimate average compression ratio more accurately. Fixes: `5ac443e26a` ("f2fs: add sysfs nodes to get runtime compression stat") Cc: stable@vger.kernel.org Signed-off-by: Daeho Jeong <daehojeong@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:15 +01:00
Jaegeuk Kim	035302003c	f2fs: should use GFP_NOFS for directory inodes commit `92d602bc71` upstream. We use inline_dentry which requires to allocate dentry page when adding a link. If we allow to reclaim memory from filesystem, we do down_read(&sbi->cp_rwsem) twice by f2fs_lock_op(). I think this should be okay, but how about stopping the lockdep complaint [1]? f2fs_create() - f2fs_lock_op() - f2fs_do_add_link() - __f2fs_find_entry - f2fs_get_read_data_page() -> kswapd - shrink_node - f2fs_evict_inode - f2fs_lock_op() [1] fs_reclaim ){+.+.}-{0:0} : kswapd0: lock_acquire+0x114/0x394 kswapd0: __fs_reclaim_acquire+0x40/0x50 kswapd0: prepare_alloc_pages+0x94/0x1ec kswapd0: __alloc_pages_nodemask+0x78/0x1b0 kswapd0: pagecache_get_page+0x2e0/0x57c kswapd0: f2fs_get_read_data_page+0xc0/0x394 kswapd0: f2fs_find_data_page+0xa4/0x23c kswapd0: find_in_level+0x1a8/0x36c kswapd0: __f2fs_find_entry+0x70/0x100 kswapd0: f2fs_do_add_link+0x84/0x1ec kswapd0: f2fs_mkdir+0xe4/0x1e4 kswapd0: vfs_mkdir+0x110/0x1c0 kswapd0: do_mkdirat+0xa4/0x160 kswapd0: __arm64_sys_mkdirat+0x24/0x34 kswapd0: el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8 kswapd0: do_el0_svc+0x28/0xa0 kswapd0: el0_svc+0x24/0x38 kswapd0: el0_sync_handler+0x88/0xec kswapd0: el0_sync+0x1c0/0x200 kswapd0: -> #1 ( &sbi->cp_rwsem ){++++}-{3:3} : kswapd0: lock_acquire+0x114/0x394 kswapd0: down_read+0x7c/0x98 kswapd0: f2fs_do_truncate_blocks+0x78/0x3dc kswapd0: f2fs_truncate+0xc8/0x128 kswapd0: f2fs_evict_inode+0x2b8/0x8b8 kswapd0: evict+0xd4/0x2f8 kswapd0: iput+0x1c0/0x258 kswapd0: do_unlinkat+0x170/0x2a0 kswapd0: __arm64_sys_unlinkat+0x4c/0x68 kswapd0: el0_svc_common.llvm.17258447499513131576+0xc4/0x1e8 kswapd0: do_el0_svc+0x28/0xa0 kswapd0: el0_svc+0x24/0x38 kswapd0: el0_sync_handler+0x88/0xec kswapd0: el0_sync+0x1c0/0x200 Cc: stable@vger.kernel.org Fixes: `bdbc90fa55` ("f2fs: don't put dentry page in pagecache into highmem") Reviewed-by: Chao Yu <chao@kernel.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Light Hsieh <light.hsieh@mediatek.com> Tested-by: Light Hsieh <light.hsieh@mediatek.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:15 +01:00
Guo Ren	910ea7dd1e	irqchip/sifive-plic: Fixup EOI failed when masked commit `69ea463021` upstream. When using "devm_request_threaded_irq(,,,,IRQF_ONESHOT,,)" in a driver, only the first interrupt is handled, and following interrupts are never delivered (initially reported in [1]). That's because the RISC-V PLIC cannot EOI masked interrupts, as explained in the description of Interrupt Completion in the PLIC spec [2]: <quote> The PLIC signals it has completed executing an interrupt handler by writing the interrupt ID it received from the claim to the claim/complete register. The PLIC does not check whether the completion ID is the same as the last claim ID for that target. If the completion ID does not match an interrupt source that is currently enabled for the target, the completion is silently ignored. </quote> Re-enable the interrupt before completion if it has been masked during the handling, and remask it afterwards. [1] http://lists.infradead.org/pipermail/linux-riscv/2021-July/007441.html [2] `8bc15a35d0/riscv-plic.adoc` Fixes: `bb0fed1c60` ("irqchip/sifive-plic: Switch to fasteoi flow") Reported-by: Vincent Pelletier <plr.vincent@gmail.com> Tested-by: Nikita Shubin <nikita.shubin@maquefel.me> Signed-off-by: Guo Ren <guoren@linux.alibaba.com> Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Atish Patra <atish.patra@wdc.com> Reviewed-by: Anup Patel <anup@brainfault.org> [maz: amended commit message] Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20211105094748.3894453-1-guoren@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:14 +01:00
Michael Pratt	d5d21724af	posix-cpu-timers: Clear task::posix_cputimers_work in copy_process() commit `ca7752caea` upstream. copy_process currently copies task_struct.posix_cputimers_work as-is. If a timer interrupt arrives while handling clone and before dup_task_struct completes then the child task will have: 1. posix_cputimers_work.scheduled = true 2. posix_cputimers_work.work queued. copy_process clears task_struct.task_works, so (2) will have no effect and posix_cpu_timers_work will never run (not to mention it doesn't make sense for two tasks to share a common linked list). Since posix_cpu_timers_work never runs, posix_cputimers_work.scheduled is never cleared. Since scheduled is set, future timer interrupts will skip scheduling work, with the ultimate result that the task will never receive timer expirations. Together, the complete flow is: 1. Task 1 calls clone(), enters kernel. 2. Timer interrupt fires, schedules task work on Task 1. 2a. task_struct.posix_cputimers_work.scheduled = true 2b. task_struct.posix_cputimers_work.work added to task_struct.task_works. 3. dup_task_struct() copies Task 1 to Task 2. 4. copy_process() clears task_struct.task_works for Task 2. 5. Future timer interrupts on Task 2 see task_struct.posix_cputimers_work.scheduled = true and skip scheduling work. Fix this by explicitly clearing contents of task_struct.posix_cputimers_work in copy_process(). This was never meant to be shared or inherited across tasks in the first place. Fixes: `1fb497dd00` ("posix-cpu-timers: Provide mechanisms to defer timer handling to task_work") Reported-by: Rhys Hiltner <rhys@justin.tv> Signed-off-by: Michael Pratt <mpratt@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20211101210615.716522-1-mpratt@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:14 +01:00
Paolo Bonzini	fb58e9a26c	KVM: x86: move guest_pv_has out of user_access section commit `3e067fd850` upstream. When UBSAN is enabled, the code emitted for the call to guest_pv_has includes a call to __ubsan_handle_load_invalid_value. objtool complains that this call happens with UACCESS enabled; to avoid the warning, pull the calls to user_access_begin into both arms of the "if" statement, after the check for guest_pv_has. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:14 +01:00
Thomas Gleixner	d5e79d872b	PCI/MSI: Destroy sysfs before freeing entries commit `3735459037` upstream. free_msi_irqs() frees the MSI entries before destroying the sysfs entries which are exposing them. Nothing prevents a concurrent free while a sysfs file is read and accesses the possibly freed entry. Move the sysfs release ahead of freeing the entries. Fixes: `1c51b50c29` ("PCI/MSI: Export MSI mode using attributes, not kobjects") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87sfw5305m.ffs@tglx Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:14 +01:00
Thomas Gleixner	ab40a2e5e2	PCI/MSI: Move non-mask check back into low level accessors commit `9c8e9c9681` upstream. The recent rework of PCI/MSI[X] masking moved the non-mask checks from the low level accessors into the higher level mask/unmask functions. This missed the fact that these accessors can be invoked from other places as well. The missing checks break XEN-PV which sets pci_msi_ignore_mask and also violates the virtual MSIX and the msi_attrib.maskbit protections. Instead of sprinkling checks all over the place, lift them back into the low level accessor functions. To avoid checking three different conditions combine them into one property of msi_desc::msi_attrib. [ josef: Fixed the missed conversion in the core code ] Fixes: `fcacdfbef5` ("PCI/MSI: Provide a new set of mask and unmask functions") Reported-by: Josef Johansson <josef@oderland.se> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Josef Johansson <josef@oderland.se> Cc: Bjorn Helgaas <helgaas@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:14 +01:00
Dave Jones	affa136164	x86/mce: Add errata workaround for Skylake SKX37 commit `e629fc1407` upstream. Errata SKX37 is word-for-word identical to the other errata listed in this workaround. I happened to notice this after investigating a CMCI storm on a Skylake host. While I can't confirm this was the root cause, spurious corrected errors does sound like a likely suspect. Fixes: `2976908e41` ("x86/mce: Do not log spurious corrected mce errors") Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20211029205759.GA7385@codemonkey.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:14 +01:00
Maciej W. Rozycki	3cd12b61c3	MIPS: Fix assembly error from MIPSr2 code used within MIPS_ISA_ARCH_LEVEL commit `a923a2676e` upstream. Fix assembly errors like: {standard input}: Assembler messages: {standard input}:287: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32' {standard input}:680: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32' {standard input}:1274: Error: opcode not supported on this processor: mips3 (mips3) `dins $12,$9,32,32' {standard input}:2175: Error: opcode not supported on this processor: mips3 (mips3) `dins $10,$7,32,32' make[1]: *** [scripts/Makefile.build:277: mm/highmem.o] Error 1 with code produced from `__cmpxchg64' for MIPS64r2 CPU configurations using CONFIG_32BIT and CONFIG_PHYS_ADDR_T_64BIT. This is due to MIPS_ISA_ARCH_LEVEL downgrading the assembly architecture to `r4000' i.e. MIPS III for MIPS64r2 configurations, while there is a block of code containing a DINS MIPS64r2 instruction conditionalized on MIPS_ISA_REV >= 2 within the scope of the downgrade. The assembly architecture override code pattern has been put there for LL/SC instructions, so that code compiles for configurations that select a processor to build for that does not support these instructions while still providing run-time support for processors that do, dynamically switched by non-constant `cpu_has_llsc'. It went in with linux-mips.org commit `aac8aa7717` ("Enable a suitable ISA for the assembler around ll/sc so that code builds even for processors that don't support the instructions. Plus minor formatting fixes.") back in 2005. Fix the problem by wrapping these instructions along with the adjacent SYNC instructions only, following the practice established with commit `cfd54de3b0` ("MIPS: Avoid move psuedo-instruction whilst using MIPS_ISA_LEVEL") and commit `378ed6f0e3` ("MIPS: Avoid using .set mips0 to restore ISA"). Strictly speaking the SYNC instructions do not have to be wrapped as they are only used as a Loongson3 erratum workaround, so they will be enabled in the assembler by default, but do this so as to keep code consistent with other places. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Fixes: `c7e2d71dda` ("MIPS: Fix set_pte() for Netlogic XLR using cmpxchg64()") Cc: stable@vger.kernel.org # v5.1+ Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:14 +01:00
Masahiro Yamada	9b1b68eaad	MIPS: fix -pkg builds for loongson2ef platform commit `0706f74f71` upstream. Since commit `805b2e1d42` ("kbuild: include Makefile.compiler only when compiler is needed"), package builds for the loongson2f platform fail. $ make ARCH=mips CROSS_COMPILE=mips64-linux- lemote2f_defconfig bindeb-pkg [ snip ] sh ./scripts/package/builddeb arch/mips/loongson2ef//Platform:36: only binutils >= 2.20.2 have needed option -mfix-loongson2f-nop. Stop. cp: cannot stat '': No such file or directory make[5]: * [scripts/Makefile.package:87: intdeb-pkg] Error 1 make[4]: * [Makefile:1558: intdeb-pkg] Error 2 make[3]: * [debian/rules:13: binary-arch] Error 2 dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2 make[2]: * [scripts/Makefile.package:83: bindeb-pkg] Error 2 make[1]: * [Makefile:1558: bindeb-pkg] Error 2 make: * [Makefile:350: __build_one_by_one] Error 2 The reason is because "make image_name" fails. $ make ARCH=mips CROSS_COMPILE=mips64-linux- image_name arch/mips/loongson2ef//Platform:36: * only binutils >= 2.20.2 have needed option -mfix-loongson2f-nop. Stop. In general, adding $(error ...) in the parse stage is troublesome, and it is pointless to check toolchains even if we are not building anything. Do not include Kbuild.platform in such cases. Fixes: `805b2e1d42` ("kbuild: include Makefile.compiler only when compiler is needed") Reported-by: Jason Self <jason@bluehome.net> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:14 +01:00
Masahiro Yamada	7089587935	MIPS: fix duplicated slashes for Platform file path commit `cca2aac8ac` upstream. platform-y accumulates platform names with a slash appended. The current $(patsubst ...) ends up with doubling slashes. GNU Make still include Platform files, but in case of an error, a clumsy file path is displayed: arch/mips/loongson2ef//Platform:36: *** only binutils >= 2.20.2 have needed option -mfix-loongson2f-nop. Stop. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Jason Self <jason@bluehome.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:14 +01:00
John David Anglin	8301503a72	parisc: Flush kernel data mapping in set_pte_at() when installing pte for user page commit `38860b2c8b` upstream. For years, there have been random segmentation faults in userspace on SMP PA-RISC machines. It occurred to me that this might be a problem in set_pte_at(). MIPS and some other architectures do cache flushes when installing PTEs with the present bit set. Here I have adapted the code in update_mmu_cache() to flush the kernel mapping when the kernel flush is deferred, or when the kernel mapping may alias with the user mapping. This simplifies calls to update_mmu_cache(). I also changed the barrier in set_pte() from a compiler barrier to a full memory barrier. I know this change is not sufficient to fix the problem. It might not be needed. I have had a few days of operation with 5.14.16 to 5.15.1 and haven't seen any random segmentation faults on rp3440 or c8000 so far. Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@kernel.org # 5.12+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:14 +01:00
Helge Deller	1dc7ce007a	parisc: Fix backtrace to always include init funtion names commit `279917e27e` upstream. I noticed that sometimes at kernel startup the backtraces did not included the function names of init functions. Their address were not resolved to function names and instead only the address was printed. Debugging shows that the culprit is is_ksym_addr() which is called by the backtrace functions to check if an address belongs to a function in the kernel. The problem occurs only for CONFIG_KALLSYMS_ALL=y. When looking at is_ksym_addr() one can see that for CONFIG_KALLSYMS_ALL=y the function only tries to resolve the address via is_kernel() function, which checks like this: if (addr >= _stext && addr <= _end) return 1; On parisc the init functions are located before _stext, so this check fails. Other platforms seem to have all functions (including init functions) behind _stext. The following patch moves the _stext symbol at the beginning of the kernel and thus includes the init section. This fixes the check and does not seem to have any negative side effects on where the kernel mapping happens in the map_pages() function in arch/parisc/mm/init.c. Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@kernel.org # 5.4+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:13 +01:00
Arnd Bergmann	ca8f29dc8b	ARM: 9156/1: drop cc-option fallbacks for architecture selection commit `418ace9992` upstream. Naresh and Antonio ran into a build failure with latest Debian armhf compilers, with lots of output like tmp/ccY3nOAs.s:2215: Error: selected processor does not support `cpsid i' in ARM mode As it turns out, $(cc-option) fails early here when the FPU is not selected before CPU architecture is selected, as the compiler option check runs before enabling -msoft-float, which causes a problem when testing a target architecture level without an FPU: cc1: error: '-mfloat-abi=hard': selected architecture lacks an FPU Passing e.g. -march=armv6k+fp in place of -march=armv6k would avoid this issue, but the fallback logic is already broken because all supported compilers (gcc-5 and higher) are much more recent than these options, and building with -march=armv5t as a fallback no longer works. The best way forward that I see is to just remove all the checks, which also has the nice side-effect of slightly improving the startup time for 'make'. The -mtune=marvell-f option was apparently never supported by any mainline compiler, and the custom Codesourcery gcc build that did support is now too old to build kernels, so just use -mtune=xscale unconditionally for those. This should be safe to apply on all stable kernels, and will be required in order to keep building them with gcc-11 and higher. Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=996419 Reported-by: Antonio Terceiro <antonio.terceiro@linaro.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Reported-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Tested-by: Sebastian Reichel <sebastian.reichel@collabora.com> Tested-by: Klaus Kudielka <klaus.kudielka@gmail.com> Cc: Matthias Klose <doko@debian.org> Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:13 +01:00
Michał Mirosław	12f0dc47f2	ARM: 9155/1: fix early early_iounmap() commit `0d08e7bf0d` upstream. Currently __set_fixmap() bails out with a warning when called in early boot from early_iounmap(). Fix it, and while at it, make the comment a bit easier to understand. Cc: <stable@vger.kernel.org> Fixes: `b089c31c51` ("ARM: 8667/3: Fix memory attribute inconsistencies when using fixmap") Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:13 +01:00
Steve French	93114d5b3a	smb3: do not error on fsync when readonly commit `71e6864eac` upstream. Linux allows doing a flush/fsync on a file open for read-only, but the protocol does not allow that. If the file passed in on the flush is read-only try to find a writeable handle for the same inode, if that is not possible skip sending the fsync call to the server to avoid breaking the apps. Reported-by: Julian Sikorski <belegdol@gmail.com> Tested-by: Julian Sikorski <belegdol@gmail.com> Suggested-by: Jeremy Allison <jra@samba.org> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2021-11-18 19:17:13 +01:00
Linus Torvalds	c3f2809643	thermal: int340x: fix build on 32-bit targets [ Upstream commit `d9c8e52ff9` ] Commit `aeb58c860d` ("thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses") started using 'readq()' to read 64-bit status responses from the int340x hardware. That's all fine and good, but on 32-bit targets a 64-bit 'readq()' is ambiguous, since it's no longer an atomic access. Some hardware might require 64-bit accesses, and other hardware might want low word first or high word first. It's quite likely that the driver isn't relevant in a 32-bit environment any more, and there's a patch floating around to just make it depend on X86_64, but let's make it buildable on x86-32 anyway. The driver previously just read the low 32 bits, so the hardware certainly is ok with 32-bit reads, and in a little-endian environment the low word first model is the natural one. So just add the include for the 'io-64-nonatomic-lo-hi.h' version. Fixes: `aeb58c860d` ("thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses") Reported-by: Jakub Kicinski <kuba@kernel.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:13 +01:00
Willem de Bruijn	2cc4450b53	selftests/net: udpgso_bench_rx: fix port argument [ Upstream commit `d336509cb9` ] The below commit added optional support for passing a bind address. It configures the sockaddr bind arguments before parsing options and reconfigures on options -b and -4. This broke support for passing port (-p) on its own. Configure sockaddr after parsing all arguments. Fixes: `3327a9c463` ("selftests: add functionals test for UDP GRO") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:13 +01:00
Rahul Lakkireddy	3ce70c3455	cxgb4: fix eeprom len when diagnostics not implemented [ Upstream commit `4ca110bf8d` ] Ensure diagnostics monitoring support is implemented for the SFF 8472 compliant port module and set the correct length for ethtool port module eeprom read. Fixes: `f56ec6766d` ("cxgb4: Add support for ethtool i2c dump") Signed-off-by: Manoj Malviya <manojmalviya@chelsio.com> Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:13 +01:00
Dust Li	ed8b7355e3	net/smc: fix sk_refcnt underflow on linkdown and fallback [ Upstream commit `e5d5aadcf3` ] We got the following WARNING when running ab/nginx test with RDMA link flapping (up-down-up). The reason is when smc_sock fallback and at linkdown happens simultaneously, we may got the following situation: __smc_lgr_terminate() --> smc_conn_kill() --> smc_close_active_abort() smc_sock->sk_state = SMC_CLOSED sock_put(smc_sock) smc_sock was set to SMC_CLOSED and sock_put() been called when terminate the link group. But later application call close() on the socket, then we got: __smc_release(): if (smc_sock->fallback) smc_sock->sk_state = SMC_CLOSED sock_put(smc_sock) Again we set the smc_sock to CLOSED through it's already in CLOSED state, and double put the refcnt, so the following warning happens: refcount_t: underflow; use-after-free. WARNING: CPU: 5 PID: 860 at lib/refcount.c:28 refcount_warn_saturate+0x8d/0xf0 Modules linked in: CPU: 5 PID: 860 Comm: nginx Not tainted 5.10.46+ #403 Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 8c24b4c 04/01/2014 RIP: 0010:refcount_warn_saturate+0x8d/0xf0 Code: 05 5c 1e b5 01 01 e8 52 25 bc ff 0f 0b c3 80 3d 4f 1e b5 01 00 75 ad 48 RSP: 0018:ffffc90000527e50 EFLAGS: 00010286 RAX: 0000000000000026 RBX: ffff8881300df2c0 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffff88813bd58040 RDI: ffff88813bd58048 RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000001 R10: ffff8881300df2c0 R11: ffffc90000527c78 R12: ffff8881300df340 R13: ffff8881300df930 R14: ffff88810b3dad80 R15: ffff8881300df4f8 FS: 00007f739de8fb80(0000) GS:ffff88813bd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000a01b008 CR3: 0000000111b64003 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: smc_release+0x353/0x3f0 __sock_release+0x3d/0xb0 sock_close+0x11/0x20 __fput+0x93/0x230 task_work_run+0x65/0xa0 exit_to_user_mode_prepare+0xf9/0x100 syscall_exit_to_user_mode+0x27/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This patch adds check in __smc_release() to make sure we won't do an extra sock_put() and set the socket to CLOSED when its already in CLOSED state. Fixes: `51f1de79ad` (net/smc: replace sock_put worker by socket refcounting) Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Acked-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:13 +01:00
Eiichi Tsukata	244048220c	vsock: prevent unnecessary refcnt inc for nonblocking connect [ Upstream commit `c7cd82b905` ] Currently vosck_connect() increments sock refcount for nonblocking socket each time it's called, which can lead to memory leak if it's called multiple times because connect timeout function decrements sock refcount only once. Fixes it by making vsock_connect() return -EALREADY immediately when sock state is already SS_CONNECTING. Fixes: `d021c34405` ("VSOCK: Introduce VM Sockets") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:13 +01:00
Marek Behún	c8234e6086	net: marvell: mvpp2: Fix wrong SerDes reconfiguration order [ Upstream commit `bb7bbb6e36` ] Commit `bfe301ebbc` ("net: mvpp2: convert to use mac_prepare()/mac_finish()") introduced a bug wherein it leaves the MAC RESET register asserted after mac_finish(), due to wrong order of function calls. Before it was: .mac_config() mvpp22_mode_reconfigure() assert reset mvpp2_xlg_config() deassert reset Now it is: .mac_prepare() .mac_config() mvpp2_xlg_config() deassert reset .mac_finish() mvpp2_xlg_config() assert reset Obviously this is wrong. This bug is triggered when phylink tries to change the PHY interface mode from a GMAC mode (sgmii, 1000base-x, 2500base-x) to XLG mode (10gbase-r, xaui). The XLG mode does not work since reset is left asserted. Only after ifconfig down && ifconfig up is called will the XLG mode work. Move the call to mvpp22_mode_reconfigure() to .mac_prepare() implementation. Since some of the subsequent functions need to know whether the interface is being changed, we unfortunately also need to pass around the new interface mode before setting port->phy_interface. Fixes: `bfe301ebbc` ("net: mvpp2: convert to use mac_prepare()/mac_finish()") Signed-off-by: Marek Behún <kabel@kernel.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:13 +01:00
Christophe JAILLET	6fb190ff57	net: ethernet: ti: cpsw_ale: Fix access to un-initialized memory [ Upstream commit `7a166854b4` ] It is spurious to allocate a bitmap without initializing it. So, better safe than sorry, initialize it to 0 at least to have some known values. While at it, switch to the devm_bitmap_ API which is less verbose. Fixes: `4b41d34367` ("net: ethernet: ti: cpsw: allow untagged traffic on host port") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:12 +01:00
Vladimir Oltean	0c8ee89e35	net: stmmac: allow a tc-taprio base-time of zero [ Upstream commit `f64ab8e4f3` ] Commit `fe28c53ed7` ("net: stmmac: fix taprio configuration when base_time is in the past") allowed some base time values in the past, but apparently not all, the base-time value of 0 (Jan 1st 1970) is still explicitly denied by the driver. Remove the bogus check. Fixes: `b60189e039` ("net: stmmac: Integrate EST with TAPRIO scheduler API") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:12 +01:00
Guangbin Huang	cb85093e0c	net: hns3: allow configure ETS bandwidth of all TCs [ Upstream commit `688db0c7a4` ] Currently, driver only allow configuring ETS bandwidth of TCs according to the max TC number queried from firmware. However, the hardware actually supports 8 TCs and users may need to configure ETS bandwidth of all TCs, so remove the restriction. Fixes: `330baff542` ("net: hns3: add ETS TC weight setting in SSU module") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:12 +01:00
Yufeng Mo	3d3f131b40	net: hns3: fix kernel crash when unload VF while it is being reset [ Upstream commit `e140c7983e` ] When fully configure VLANs for a VF, then unload the VF while triggering a reset to PF, will cause a kernel crash because the irq is already uninit. [ 293.177579] ------------[ cut here ]------------ [ 293.183502] kernel BUG at drivers/pci/msi.c:352! [ 293.189547] Internal error: Oops - BUG: 0 [#1] SMP ...... [ 293.390124] Workqueue: hclgevf hclgevf_service_task [hclgevf] [ 293.402627] pstate: 80c00009 (Nzcv daif +PAN +UAO) [ 293.414324] pc : free_msi_irqs+0x19c/0x1b8 [ 293.425429] lr : free_msi_irqs+0x18c/0x1b8 [ 293.436545] sp : ffff00002716fbb0 [ 293.446950] x29: ffff00002716fbb0 x28: 0000000000000000 [ 293.459519] x27: 0000000000000000 x26: ffff45b91ea16b00 [ 293.472183] x25: 0000000000000000 x24: ffffa587b08f4700 [ 293.484717] x23: ffffc591ac30e000 x22: ffffa587b08f8428 [ 293.497190] x21: ffffc591ac30e300 x20: 0000000000000000 [ 293.509594] x19: ffffa58a062a8300 x18: 0000000000000000 [ 293.521949] x17: 0000000000000000 x16: ffff45b91dcc3f48 [ 293.534013] x15: 0000000000000000 x14: 0000000000000000 [ 293.545883] x13: 0000000000000040 x12: 0000000000000228 [ 293.557508] x11: 0000000000000020 x10: 0000000000000040 [ 293.568889] x9 : ffff45b91ea1e190 x8 : ffffc591802d0000 [ 293.580123] x7 : ffffc591802d0148 x6 : 0000000000000120 [ 293.591190] x5 : ffffc591802d0000 x4 : 0000000000000000 [ 293.602015] x3 : 0000000000000000 x2 : 0000000000000000 [ 293.612624] x1 : 00000000000004a4 x0 : ffffa58a1e0c6b80 [ 293.623028] Call trace: [ 293.630340] free_msi_irqs+0x19c/0x1b8 [ 293.638849] pci_disable_msix+0x118/0x140 [ 293.647452] pci_free_irq_vectors+0x20/0x38 [ 293.656081] hclgevf_uninit_msi+0x44/0x58 [hclgevf] [ 293.665309] hclgevf_reset_rebuild+0x1ac/0x2e0 [hclgevf] [ 293.674866] hclgevf_reset+0x358/0x400 [hclgevf] [ 293.683545] hclgevf_reset_service_task+0xd0/0x1b0 [hclgevf] [ 293.693325] hclgevf_service_task+0x4c/0x2e8 [hclgevf] [ 293.702307] process_one_work+0x1b0/0x448 [ 293.710034] worker_thread+0x54/0x468 [ 293.717331] kthread+0x134/0x138 [ 293.724114] ret_from_fork+0x10/0x18 [ 293.731324] Code: f940b000 b4ffff00 a903e7b8 f90017b6 (d4210000) This patch fixes the problem by waiting for the VF reset done while unloading the VF. Fixes: `e2cb1dec97` ("net: hns3: Add HNS3 VF HCL(Hardware Compatibility Layer) Support") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:12 +01:00
Jie Wang	5c05136690	net: hns3: fix pfc packet number incorrect after querying pfc parameters [ Upstream commit `0b653a81a2` ] Currently, driver will send command to firmware to query pfc packet number when user uses dcb tool to get pfc parameters. However, the periodic service task will also periodically query and record MAC statistics, including pfc packet number. As the hardware registers of statistics is cleared after reading, it will cause pfc packet number of MAC statistics are not correct after using dcb tool to get pfc parameters. To fix this problem, when user uses dcb tool to get pfc parameters, driver updates MAC statistics firstly and then get pfc packet number from MAC statistics. Fixes: `64fd2300fc` ("net: hns3: add support for querying pfc puase packets statistic") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:12 +01:00
Jie Wang	bcbee7cf14	net: hns3: fix ROCE base interrupt vector initialization bug [ Upstream commit `beb27ca451` ] Currently, NIC init ROCE interrupt vector with MSIX interrupt. But ROCE use pci_irq_vector() to get interrupt vector, which adds the relative interrupt vector again and gets wrong interrupt vector. So fixes it by assign relative interrupt vector to ROCE instead of MSIX interrupt vector and delete the unused struct member base_msi_vector declaration of hclgevf_dev. Fixes: `46a3df9f97` ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:12 +01:00
Eric Dumazet	3e13ce88a3	net/sched: sch_taprio: fix undefined behavior in ktime_mono_to_any [ Upstream commit `6dc25401cb` ] 1) if q->tk_offset == TK_OFFS_MAX, then get_tcp_tstamp() calls ktime_mono_to_any() with out-of-bound value. 2) if q->tk_offset is changed in taprio_parse_clockid(), taprio_get_time() might also call ktime_mono_to_any() with out-of-bound value as sysbot found: UBSAN: array-index-out-of-bounds in kernel/time/timekeeping.c:908:27 index 3 is out of range for type 'ktime_t *[3]' CPU: 1 PID: 25668 Comm: kworker/u4:0 Not tainted 5.15.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 ubsan_epilogue+0xb/0x5a lib/ubsan.c:151 __ubsan_handle_out_of_bounds.cold+0x62/0x6c lib/ubsan.c:291 ktime_mono_to_any+0x1d4/0x1e0 kernel/time/timekeeping.c:908 get_tcp_tstamp net/sched/sch_taprio.c:322 [inline] get_packet_txtime net/sched/sch_taprio.c:353 [inline] taprio_enqueue_one+0x5b0/0x1460 net/sched/sch_taprio.c:420 taprio_enqueue+0x3b1/0x730 net/sched/sch_taprio.c:485 dev_qdisc_enqueue+0x40/0x300 net/core/dev.c:3785 __dev_xmit_skb net/core/dev.c:3869 [inline] __dev_queue_xmit+0x1f6e/0x3630 net/core/dev.c:4194 batadv_send_skb_packet+0x4a9/0x5f0 net/batman-adv/send.c:108 batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:393 [inline] batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:421 [inline] batadv_iv_send_outstanding_bat_ogm_packet+0x6d7/0x8e0 net/batman-adv/bat_iv_ogm.c:1701 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445 kthread+0x405/0x4f0 kernel/kthread.c:327 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 Fixes: `7ede7b0348` ("taprio: make clock reference conversions easier") Fixes: `5400206610` ("taprio: Adjust timestamps for TCP packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Vedang Patel <vedang.patel@intel.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/20211108180815.1822479-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:12 +01:00
Marek Behún	422fb87996	net: dsa: mv88e6xxx: Don't support >1G speeds on 6191X on ports other than 10 [ Upstream commit `dc2fc9f03c` ] Model 88E6191X only supports >1G speeds on port 10. Port 0 and 9 are only 1G. Fixes: `de776d0d31` ("net: dsa: mv88e6xxx: add support for mv88e6393x family") Signed-off-by: Marek Behún <kabel@kernel.org> Cc: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20211104171747.10509-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:12 +01:00
Evan Quan	f17e9e8100	drm/amdgpu: fix uvd crash on Polaris12 during driver unloading [ Upstream commit `4fc30ea780` ] There was a change(below) target for such issue: `d82e2c249c` ("drm/amdgpu: Fix crash on device remove/driver unload") But the fix for VI ASICs was missing there. This is a supplement for that. Fixes: `d82e2c249c` ("drm/amdgpu: Fix crash on device remove/driver unload") Signed-off-by: Evan Quan <evan.quan@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:12 +01:00
Muchun Song	f00e054b29	seq_file: fix passing wrong private data [ Upstream commit `10a6de19ca` ] DEFINE_PROC_SHOW_ATTRIBUTE() is supposed to be used to define a series of functions and variables to register proc file easily. And the users can use proc_create_data() to pass their own private data and get it via seq->private in the callback. Unfortunately, the proc file system use PDE_DATA() to get private data instead of inode->i_private. So fix it. Fortunately, there only one user of it which does not pass any private data, so this bug does not break any in-tree codes. Link: https://lkml.kernel.org/r/20211029032638.84884-1-songmuchun@bytedance.com Fixes: `97a32539b9` ("proc: convert everything to "struct proc_ops"") Signed-off-by: Muchun Song <songmuchun@bytedance.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Florent Revest <revest@chromium.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>	2021-11-18 19:17:12 +01:00

1 2 3 4 5 ...

1045825 Коммитов Все ветки Поиск

1045825 Коммитов

Все ветки