WSL2-Linux-Kernel

The source for the Linux kernel used in Windows Subsystem for Linux 2 (WSL2)

Перейти к файлу

Sagi Grimberg ecca390e80 nvme: fix deadlock in disconnect during scan_work and/or ana_work A deadlock happens in the following scenario with multipath: 1) scan_work(nvme0) detects a new nsid while nvme0 is an optimized path to it, path nvme1 happens to be inaccessible. 2) Before scan_work is complete nvme0 disconnect is initiated nvme_delete_ctrl_sync() sets nvme0 state to NVME_CTRL_DELETING 3) scan_work(1) attempts to submit IO, but nvme_path_is_optimized() observes nvme0 is not LIVE. Since nvme1 is a possible path IO is requeued and scan_work hangs. -- Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 -- 4) Delete also hangs in flush_work(ctrl->scan_work) from nvme_remove_namespaces(). Similiarly a deadlock with ana_work may happen: if ana_work has started and calls nvme_mpath_set_live and device_add_disk, it will trigger I/O. When we trigger disconnect I/O will block because our accessible (optimized) path is disconnecting, but the alternate path is inaccessible, so I/O blocks. Then disconnect tries to flush the ana_work and hangs. [ 605.550896] Workqueue: nvme-wq nvme_ana_work [nvme_core] [ 605.552087] Call Trace: [ 605.552683] __schedule+0x2b9/0x6c0 [ 605.553507] schedule+0x42/0xb0 [ 605.554201] io_schedule+0x16/0x40 [ 605.555012] do_read_cache_page+0x438/0x830 [ 605.556925] read_cache_page+0x12/0x20 [ 605.557757] read_dev_sector+0x27/0xc0 [ 605.558587] amiga_partition+0x4d/0x4c5 [ 605.561278] check_partition+0x154/0x244 [ 605.562138] rescan_partitions+0xae/0x280 [ 605.563076] __blkdev_get+0x40f/0x560 [ 605.563830] blkdev_get+0x3d/0x140 [ 605.564500] __device_add_disk+0x388/0x480 [ 605.565316] device_add_disk+0x13/0x20 [ 605.566070] nvme_mpath_set_live+0x5e/0x130 [nvme_core] [ 605.567114] nvme_update_ns_ana_state+0x2c/0x30 [nvme_core] [ 605.568197] nvme_update_ana_state+0xca/0xe0 [nvme_core] [ 605.569360] nvme_parse_ana_log+0xa1/0x180 [nvme_core] [ 605.571385] nvme_read_ana_log+0x76/0x100 [nvme_core] [ 605.572376] nvme_ana_work+0x15/0x20 [nvme_core] [ 605.573330] process_one_work+0x1db/0x380 [ 605.574144] worker_thread+0x4d/0x400 [ 605.574896] kthread+0x104/0x140 [ 605.577205] ret_from_fork+0x35/0x40 [ 605.577955] INFO: task nvme:14044 blocked for more than 120 seconds. [ 605.579239] Tainted: G OE 5.3.5-050305-generic #201910071830 [ 605.580712] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 605.582320] nvme D 0 14044 14043 0x00000000 [ 605.583424] Call Trace: [ 605.583935] __schedule+0x2b9/0x6c0 [ 605.584625] schedule+0x42/0xb0 [ 605.585290] schedule_timeout+0x203/0x2f0 [ 605.588493] wait_for_completion+0xb1/0x120 [ 605.590066] __flush_work+0x123/0x1d0 [ 605.591758] __cancel_work_timer+0x10e/0x190 [ 605.593542] cancel_work_sync+0x10/0x20 [ 605.594347] nvme_mpath_stop+0x2f/0x40 [nvme_core] [ 605.595328] nvme_stop_ctrl+0x12/0x50 [nvme_core] [ 605.596262] nvme_do_delete_ctrl+0x3f/0x90 [nvme_core] [ 605.597333] nvme_sysfs_delete+0x5c/0x70 [nvme_core] [ 605.598320] dev_attr_store+0x17/0x30 Fix this by introducing a new state: NVME_CTRL_DELETE_NOIO, which will indicate the phase of controller deletion where I/O cannot be allowed to access the namespace. NVME_CTRL_DELETING still allows mpath I/O to be issued to the bottom device, and only after we flush the ana_work and scan_work (after nvme_stop_ctrl and nvme_prep_remove_namespaces) we change the state to NVME_CTRL_DELETING_NOIO. Also we prevent ana_work from re-firing by aborting early if we are not LIVE, so we should be safe here. In addition, change the transport drivers to follow the updated state machine. Fixes: `0d0b660f21` ("nvme: add ANA support") Reported-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>		2020-07-29 07:45:19 +02:00
Documentation	block: add max_active_zones to blk-sysfs	2020-07-15 14:26:11 -06:00
LICENSES	LICENSES: Rename other to deprecated	2019-05-03 06:34:32 -06:00
arch	Linux 5.8-rc4	2020-07-08 08:02:13 -06:00
block	block: add max_active_zones to blk-sysfs	2020-07-15 14:26:11 -06:00
certs	.gitignore: add SPDX License Identifier	2020-03-25 11:50:48 +01:00
crypto	crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock()	2020-06-18 17:09:54 +10:00
drivers	nvme: fix deadlock in disconnect during scan_work and/or ana_work	2020-07-29 07:45:19 +02:00
fs	Linux 5.8-rc4	2020-07-08 08:02:13 -06:00
include	bcache: add bucket_size_hi into struct cache_sb_disk for large bucket	2020-07-25 07:38:21 -06:00
init	kbuild: fix CONFIG_CC_CAN_LINK(_STATIC) for cross-compilation with Clang	2020-07-02 00:57:45 +09:00
ipc	mmap locking API: use coccinelle to convert mmap_sem rwsem call sites	2020-06-09 09:39:14 -07:00
kernel	Linux 5.8-rc4	2020-07-08 08:02:13 -06:00
lib	Linux 5.8-rc4	2020-07-08 08:02:13 -06:00
mm	Linux 5.8-rc4	2020-07-08 08:02:13 -06:00
net	Fixes for a umask bug on exported filesystems lacking ACL support, a	2020-07-02 20:35:33 -07:00
samples	samples/vfs: avoid warning in statx override	2020-07-03 16:15:25 -07:00
scripts	Kbuild fixes for v5.8 (2nd)	2020-07-05 12:14:24 -07:00
security	Linux 5.8-rc4	2020-07-08 08:02:13 -06:00
sound	sound fixes for 5.8-rc3	2020-06-25 09:15:24 -07:00
tools	A series of fixes for x86:	2020-07-05 12:23:49 -07:00
usr	bpfilter: match bit size of bpfilter_umh to that of the kernel	2020-05-17 18:52:01 +09:00
virt	MIPS:	2020-06-12 11:05:52 -07:00
.clang-format	block: add bio_for_each_bvec_all()	2020-05-25 11:25:24 +02:00
.cocciconfig	…
.get_maintainer.ignore	Opt out of scripts/get_maintainer.pl	2019-05-16 10:53:40 -07:00
.gitattributes	.gitattributes: use 'dts' diff driver for dts files	2019-12-04 19:44:11 -08:00
.gitignore	.gitignore: Do not track `defconfig` from `make savedefconfig`	2020-07-05 16:15:46 +09:00
.mailmap	A fair amount of stuff this time around, dominated by yet another massive	2020-06-01 15:45:27 -07:00
COPYING	COPYING: state that all contributions really are covered by this file	2020-02-10 13:32:20 -08:00
CREDITS	mailmap: change email for Ricardo Ribalda	2020-05-25 18:59:59 -06:00
Kbuild	kbuild: rename hostprogs-y/always to hostprogs/always-y	2020-02-04 01:53:07 +09:00
Kconfig	kbuild: ensure full rebuild when the compiler is updated	2020-05-12 13:28:33 +09:00
MAINTAINERS	Devicetree fixes for v5.8, take 2:	2020-07-02 22:46:05 -07:00
Makefile	Linux 5.8-rc4	2020-07-05 16:20:22 -07:00
README	Drop all 00-INDEX files from Documentation/	2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.