WSL2-Linux-Kernel/drivers/md
Logan Gunthorpe 9e86dffd0b md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d
[ Upstream commit 5e2cf333b7 ]

A complicated deadlock exists when using the journal and an elevated
group_thrtead_cnt. It was found with loop devices, but its not clear
whether it can be seen with real disks. The deadlock can occur simply
by writing data with an fio script.

When the deadlock occurs, multiple threads will hang in different ways:

 1) The group threads will hang in the blk-wbt code with bios waiting to
    be submitted to the block layer:

        io_schedule+0x70/0xb0
        rq_qos_wait+0x153/0x210
        wbt_wait+0x115/0x1b0
        io_schedule+0x70/0xb0
        rq_qos_wait+0x153/0x210
        wbt_wait+0x115/0x1b0
        __rq_qos_throttle+0x38/0x60
        blk_mq_submit_bio+0x589/0xcd0
        wbt_wait+0x115/0x1b0
        __rq_qos_throttle+0x38/0x60
        blk_mq_submit_bio+0x589/0xcd0
        __submit_bio+0xe6/0x100
        submit_bio_noacct_nocheck+0x42e/0x470
        submit_bio_noacct+0x4c2/0xbb0
        ops_run_io+0x46b/0x1a30
        handle_stripe+0xcd3/0x36b0
        handle_active_stripes.constprop.0+0x6f6/0xa60
        raid5_do_work+0x177/0x330

    Or:
        io_schedule+0x70/0xb0
        rq_qos_wait+0x153/0x210
        wbt_wait+0x115/0x1b0
        __rq_qos_throttle+0x38/0x60
        blk_mq_submit_bio+0x589/0xcd0
        __submit_bio+0xe6/0x100
        submit_bio_noacct_nocheck+0x42e/0x470
        submit_bio_noacct+0x4c2/0xbb0
        flush_deferred_bios+0x136/0x170
        raid5_do_work+0x262/0x330

 2) The r5l_reclaim thread will hang in the same way, submitting a
    bio to the block layer:

        io_schedule+0x70/0xb0
        rq_qos_wait+0x153/0x210
        wbt_wait+0x115/0x1b0
        __rq_qos_throttle+0x38/0x60
        blk_mq_submit_bio+0x589/0xcd0
        __submit_bio+0xe6/0x100
        submit_bio_noacct_nocheck+0x42e/0x470
        submit_bio_noacct+0x4c2/0xbb0
        submit_bio+0x3f/0xf0
        md_super_write+0x12f/0x1b0
        md_update_sb.part.0+0x7c6/0xff0
        md_update_sb+0x30/0x60
        r5l_do_reclaim+0x4f9/0x5e0
        r5l_reclaim_thread+0x69/0x30b

    However, before hanging, the MD_SB_CHANGE_PENDING flag will be
    set for sb_flags in r5l_write_super_and_discard_space(). This
    flag will never be cleared because the submit_bio() call never
    returns.

 3) Due to the MD_SB_CHANGE_PENDING flag being set, handle_stripe()
    will do no processing on any pending stripes and re-set
    STRIPE_HANDLE. This will cause the raid5d thread to enter an
    infinite loop, constantly trying to handle the same stripes
    stuck in the queue.

    The raid5d thread has a blk_plug that holds a number of bios
    that are also stuck waiting seeing the thread is in a loop
    that never schedules. These bios have been accounted for by
    blk-wbt thus preventing the other threads above from
    continuing when they try to submit bios. --Deadlock.

To fix this, add the same wait_event() that is used in raid5_do_work()
to raid5d() such that if MD_SB_CHANGE_PENDING is set, the thread will
schedule and wait until the flag is cleared. The schedule action will
flush the plug which will allow the r5l_reclaim thread to continue,
thus preventing the deadlock.

However, md_check_recovery() calls can also clear MD_SB_CHANGE_PENDING
from the same thread and can thus deadlock if the thread is put to
sleep. So avoid waiting if md_check_recovery() is being called in the
loop.

It's not clear when the deadlock was introduced, but the similar
wait_event() call in raid5_do_work() was added in 2017 by this
commit:

    16d997b78b ("md/raid5: simplfy delaying of writes while metadata
                   is updated.")

Link: https://lore.kernel.org/r/7f3b87b6-b52a-f737-51d7-a4eec5c44112@deltatee.com
Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26 12:35:49 +02:00
..
bcache bcache: fix set_at_max_writeback_rate() for multiple attached devices 2022-10-26 12:35:48 +02:00
persistent-data dm space map common: add bounds check to sm_ll_lookup_bitmap() 2022-01-27 11:04:53 +01:00
Kconfig dm: make EBS depend on !HIGHMEM 2021-08-16 10:50:32 -06:00
Makefile dm ima: measure data on table load 2021-08-10 13:32:40 -04:00
dm-bio-prison-v1.c dm bio prison: replace spin_lock_irqsave with spin_lock_irq 2019-11-05 14:53:03 -05:00
dm-bio-prison-v1.h
dm-bio-prison-v2.c dm bio prison v2: use true/false for bool variable 2020-01-07 12:07:08 -05:00
dm-bio-prison-v2.h
dm-bio-record.h block: store a block_device pointer in struct bio 2021-01-24 18:17:20 -07:00
dm-bufio.c dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size 2021-03-04 14:53:54 -05:00
dm-builtin.c
dm-cache-background-tracker.c
dm-cache-background-tracker.h
dm-cache-block-types.h
dm-cache-metadata.c dm: use bdev_read_only to check if a device is read-only 2021-01-24 18:15:57 -07:00
dm-cache-metadata.h
dm-cache-policy-internal.h
dm-cache-policy-smq.c
dm-cache-policy.c
dm-cache-policy.h
dm-cache-target.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-clone-metadata.c dm clone metadata: remove unused function 2021-04-19 13:20:31 -04:00
dm-clone-metadata.h dm clone metadata: Fix return type of dm_clone_nr_of_hydrated_regions() 2020-03-27 14:42:51 -04:00
dm-clone-target.c dm clone: make array 'descs' static 2021-10-12 13:54:10 -04:00
dm-core.h dm: interlock pending dm_io and dm_wait_for_bios_completion 2022-04-08 14:22:57 +02:00
dm-crypt.c dm crypt: make printing of the key constant-time 2022-06-06 08:43:40 +02:00
dm-delay.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-dust.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-ebs-target.c - Add DM infrastructure for IMA-based remote attestion. These changes 2021-08-31 14:55:09 -07:00
dm-era-target.c dm era: commit metadata in postsuspend after worker stops 2022-06-29 09:03:20 +02:00
dm-exception-store.c
dm-exception-store.h
dm-flakey.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-ima.c integrity-v5.15 2021-09-02 12:51:41 -07:00
dm-ima.h dm ima: add version info to dm related events in ima log 2021-08-20 15:59:47 -04:00
dm-init.c dm init: Set file local variable static 2020-08-04 15:51:28 -04:00
dm-integrity.c dm integrity: fix error code in dm_integrity_ctr() 2022-06-06 08:43:40 +02:00
dm-io-tracker.h dm writecache: make writeback pause configurable 2021-06-28 16:30:13 -04:00
dm-io.c block: Add bio_max_segs 2021-02-26 15:49:51 -07:00
dm-ioctl.c dm ioctl: prevent potential spectre v1 gadget 2022-04-13 20:59:05 +02:00
dm-kcopyd.c dm writecache: have ssd writeback wait if the kcopyd workqueue is busy 2021-06-15 15:42:03 -04:00
dm-linear.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-log-userspace-base.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-log-userspace-transfer.c
dm-log-userspace-transfer.h
dm-log-writes.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-log.c dm mirror log: clear log bits up to BITS_PER_LONG boundary 2022-06-29 09:03:20 +02:00
dm-mpath.c dm ima: update dm target attributes for ima measurements 2021-08-20 16:07:36 -04:00
dm-mpath.h
dm-path-selector.c
dm-path-selector.h dm mpath: pass IO start time to path selector 2020-05-15 10:29:36 -04:00
dm-ps-historical-service-time.c dm mpath: only use ktime_get_ns() in historical selector 2022-04-20 09:34:13 +02:00
dm-ps-io-affinity.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-ps-queue-length.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-ps-round-robin.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-ps-service-time.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-raid.c dm raid: fix address sanitizer warning in raid_resume 2022-08-17 14:24:26 +02:00
dm-raid1.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-region-hash.c
dm-rq.c dm: requeue IO if mapping table not yet available 2022-04-13 20:59:06 +02:00
dm-rq.h
dm-snap-persistent.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-snap-transient.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-snap.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-stats.c dm stats: add cond_resched when looping over entries 2022-06-06 08:43:40 +02:00
dm-stats.h dm: fix double accounting of flush with data 2022-04-08 14:22:57 +02:00
dm-stripe.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-switch.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-sysfs.c
dm-table.c libnvdimm for v5.15 2021-09-09 11:39:57 -07:00
dm-target.c
dm-thin-metadata.c dm thin: fix use-after-free crash in dm_sm_register_threshold_callback 2022-08-17 14:24:23 +02:00
dm-thin-metadata.h dm thin metadata: Add support for a pre-commit callback 2019-12-05 17:05:24 -05:00
dm-thin.c dm thin: fix use-after-free crash in dm_sm_register_threshold_callback 2022-08-17 14:24:23 +02:00
dm-uevent.c
dm-uevent.h
dm-unstripe.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-verity-fec.c dm verity fec: fix misaligned RS roots IO 2021-04-14 14:28:29 -04:00
dm-verity-fec.h dm verity fec: fix misaligned RS roots IO 2021-04-14 14:28:29 -04:00
dm-verity-target.c dm verity: set DM_TARGET_IMMUTABLE feature flag 2022-06-06 08:43:40 +02:00
dm-verity-verify-sig.c dm verity: fix require_signatures module_param permissions 2021-05-25 16:14:05 -04:00
dm-verity-verify-sig.h dm verity: Fix compilation warning 2020-08-04 15:48:13 -04:00
dm-verity.h dm verity: add "panic_on_corruption" error handling mode 2020-07-13 11:47:33 -04:00
dm-writecache.c dm writecache: set a default MAX_WRITEBACK_JOBS 2022-08-17 14:24:23 +02:00
dm-zero.c dm: add support for REQ_NOWAIT to various targets 2020-12-04 18:04:35 -05:00
dm-zone.c dm zone: fix dm_revalidate_zones() memory allocation 2021-06-25 15:25:23 -04:00
dm-zoned-metadata.c dm zoned: check zone capacity 2021-06-04 12:07:28 -04:00
dm-zoned-reclaim.c dm kcopyd: avoid useless atomic operations 2021-06-04 12:07:24 -04:00
dm-zoned-target.c dm: update target status functions to support IMA measurement 2021-08-10 13:34:23 -04:00
dm-zoned.h dm zoned: select reclaim zone based on device index 2020-06-05 14:59:53 -04:00
dm.c dm: return early from dm_pr_call() if DM device is suspended 2022-08-17 14:23:15 +02:00
dm.h dm: introduce zone append emulation 2021-06-04 12:07:37 -04:00
md-autodetect.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
md-bitmap.c md/bitmap: don't set sb values if can't pass sanity check 2022-06-09 10:22:33 +02:00
md-bitmap.h
md-cluster.c for-5.11/drivers-2020-12-14 2020-12-16 13:09:32 -08:00
md-cluster.h
md-faulty.c md: mark some personalities as deprecated 2021-06-14 22:32:07 -07:00
md-linear.c md: mark some personalities as deprecated 2021-06-14 22:32:07 -07:00
md-linear.h md/raid1: Replace zero-length array with flexible-array 2020-05-13 12:02:23 -07:00
md-multipath.c md: mark some personalities as deprecated 2021-06-14 22:32:07 -07:00
md-multipath.h
md.c md: Flush workqueue md_rdev_misc_wq in md_alloc() 2022-09-15 11:30:01 +02:00
md.h md: Move alloc/free acct bioset in to personality 2022-01-27 11:05:08 +01:00
raid0.c md: Replace snprintf with scnprintf 2022-10-26 12:35:12 +02:00
raid0.h md/raid0: avoid RAID0 data corruption due to layout confusion. 2019-09-13 13:10:05 -07:00
raid1-10.c
raid1.c md/raid1: fix missing bitmap update w/o WriteMostly devices 2022-01-11 15:35:15 +01:00
raid1.h md/raid1: enable io accounting 2021-06-14 22:32:07 -07:00
raid5-cache.c block: rename BIO_MAX_PAGES to BIO_MAX_VECS 2021-03-11 07:47:48 -07:00
raid5-log.h
raid5-ppl.c block: rename BIO_MAX_PAGES to BIO_MAX_VECS 2021-03-11 07:47:48 -07:00
raid5.c md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d 2022-10-26 12:35:49 +02:00
raid5.h md/raid5: let multiple devices of stripe_head share page 2020-09-24 16:44:44 -07:00
raid10.c md-raid10: fix KASAN warning 2022-08-17 14:22:57 +02:00
raid10.h md/raid10: enable io accounting 2021-06-14 22:32:07 -07:00