WSL2-Linux-Kernel/fs
Naohiro Aota 522b39bd71 btrfs: fix adding block group to a reclaim list and the unused list during reclaim
commit 48f091fd50b2eb33ae5eaea9ed3c4f81603acf38 upstream.

There is a potential parallel list adding for retrying in
btrfs_reclaim_bgs_work and adding to the unused list. Since the block
group is removed from the reclaim list and it is on a relocation work,
it can be added into the unused list in parallel. When that happens,
adding it to the reclaim list will corrupt the list head and trigger
list corruption like below.

Fix it by taking fs_info->unused_bgs_lock.

  [177.504][T2585409] BTRFS error (device nullb1): error relocating ch= unk 2415919104
  [177.514][T2585409] list_del corruption. next->prev should be ff1100= 0344b119c0, but was ff11000377e87c70. (next=3Dff110002390cd9c0)
  [177.529][T2585409] ------------[ cut here ]------------
  [177.537][T2585409] kernel BUG at lib/list_debug.c:65!
  [177.545][T2585409] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
  [177.555][T2585409] CPU: 9 PID: 2585409 Comm: kworker/u128:2 Tainted: G        W          6.10.0-rc5-kts #1
  [177.568][T2585409] Hardware name: Supermicro SYS-520P-WTR/X12SPW-TF, BIOS 1.2 02/14/2022
  [177.579][T2585409] Workqueue: events_unbound btrfs_reclaim_bgs_work[btrfs]
  [177.589][T2585409] RIP: 0010:__list_del_entry_valid_or_report.cold+0x70/0x72
  [177.624][T2585409] RSP: 0018:ff11000377e87a70 EFLAGS: 00010286
  [177.633][T2585409] RAX: 000000000000006d RBX: ff11000344b119c0 RCX:0000000000000000
  [177.644][T2585409] RDX: 000000000000006d RSI: 0000000000000008 RDI:ffe21c006efd0f40
  [177.655][T2585409] RBP: ff110002e0509f78 R08: 0000000000000001 R09:ffe21c006efd0f08
  [177.665][T2585409] R10: ff11000377e87847 R11: 0000000000000000 R12:ff110002390cd9c0
  [177.676][T2585409] R13: ff11000344b119c0 R14: ff110002e0508000 R15:dffffc0000000000
  [177.687][T2585409] FS:  0000000000000000(0000) GS:ff11000fec880000(0000) knlGS:0000000000000000
  [177.700][T2585409] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [177.709][T2585409] CR2: 00007f06bc7b1978 CR3: 0000001021e86005 CR4:0000000000771ef0
  [177.720][T2585409] DR0: 0000000000000000 DR1: 0000000000000000 DR2:0000000000000000
  [177.731][T2585409] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:0000000000000400
  [177.742][T2585409] PKRU: 55555554
  [177.748][T2585409] Call Trace:
  [177.753][T2585409]  <TASK>
  [177.759][T2585409]  ? __die_body.cold+0x19/0x27
  [177.766][T2585409]  ? die+0x2e/0x50
  [177.772][T2585409]  ? do_trap+0x1ea/0x2d0
  [177.779][T2585409]  ? __list_del_entry_valid_or_report.cold+0x70/0x72
  [177.788][T2585409]  ? do_error_trap+0xa3/0x160
  [177.795][T2585409]  ? __list_del_entry_valid_or_report.cold+0x70/0x72
  [177.805][T2585409]  ? handle_invalid_op+0x2c/0x40
  [177.812][T2585409]  ? __list_del_entry_valid_or_report.cold+0x70/0x72
  [177.820][T2585409]  ? exc_invalid_op+0x2d/0x40
  [177.827][T2585409]  ? asm_exc_invalid_op+0x1a/0x20
  [177.834][T2585409]  ? __list_del_entry_valid_or_report.cold+0x70/0x72
  [177.843][T2585409]  btrfs_delete_unused_bgs+0x3d9/0x14c0 [btrfs]

There is a similar retry_list code in btrfs_delete_unused_bgs(), but it is
safe, AFAICS. Since the block group was in the unused list, the used bytes
should be 0 when it was added to the unused list. Then, it checks
block_group->{used,reserved,pinned} are still 0 under the
block_group->lock. So, they should be still eligible for the unused list,
not the reclaim list.

The reason it is safe there it's because because we're holding
space_info->groups_sem in write mode.

That means no other task can allocate from the block group, so while we
are at deleted_unused_bgs() it's not possible for other tasks to
allocate and deallocate extents from the block group, so it can't be
added to the unused list or the reclaim list by anyone else.

The bug can be reproduced by btrfs/166 after a few rounds. In practice
this can be hit when relocation cannot find more chunk space and ends
with ENOSPC.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Suggested-by: Johannes Thumshirn <Johannes.Thumshirn@wdc.com>
Fixes: 4eb4e85c4f81 ("btrfs: retry block group reclaim without infinite loop")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-18 13:07:32 +02:00
..
9p fs/9p: drop inodes immediately on non-.L too 2024-05-17 11:50:55 +02:00
adfs
affs affs: initialize fsdata in affs_truncate() 2023-02-01 08:27:06 +01:00
afs afs: Don't cross .backup mountpoint from backup volume 2024-06-16 13:39:53 +02:00
autofs autofs: fix memory leak of waitqueues in autofs_catatonic_mode 2023-09-23 11:09:54 +02:00
befs
bfs
btrfs btrfs: fix adding block group to a reclaim list and the unused list during reclaim 2024-07-18 13:07:32 +02:00
cachefiles cachefiles: fix memory leak in cachefiles_add_cache() 2024-03-06 14:38:50 +00:00
ceph ceph: prevent use-after-free in encode_cap_msg() 2024-02-23 08:55:09 +01:00
cifs cifs: fix typo in module parameter enable_gcm_256 2024-07-05 09:14:40 +02:00
coda coda: Avoid partial allocation of sig_inputArgs 2023-03-10 09:39:50 +01:00
configfs configfs: fix possible memory leak in configfs_create_dir() 2022-12-31 13:14:15 +01:00
cramfs
crypto fscrypt: fix keyring memory leak on mount failure 2022-11-10 18:15:37 +01:00
debugfs debugfs: fix automount d_fsdata usage 2024-01-25 14:52:27 -08:00
devpts fsnotify: fix fsnotify hooks in pseudo filesystems 2022-02-01 17:27:01 +01:00
dlm dlm: fix plock lookup when using multiple lockspaces 2023-09-19 12:22:52 +02:00
ecryptfs ecryptfs: Fix buffer size for tag 66 packet 2024-06-16 13:39:16 +02:00
efivarfs efivarfs: force RO when remounting if SetVariable is not supported 2024-01-25 14:52:33 -08:00
efs
erofs erofs: apply proper VMA alignment for memory mapped files on THP 2024-03-15 10:48:15 -04:00
exfat exfat: support dynamic allocate bh for exfat_entry_set_cache 2024-03-01 13:21:56 +01:00
exportfs exportfs: use pr_debug for unreachable debug statements 2024-04-10 16:19:21 +02:00
ext2 ext2: fix datatype of block number in ext2_xattr_set2() 2023-09-23 11:09:57 +02:00
ext4 ext4: fix mb_cache_entry's e_refcnt leak in ext4_xattr_block_cache_find() 2024-06-16 13:40:00 +02:00
f2fs f2fs: remove clear SB_INLINECRYPT flag in default_options 2024-07-05 09:14:28 +02:00
fat fat: fix uninitialized field in nostale filehandles 2024-04-10 16:18:35 +02:00
freevxfs
fscache fscache: Remove an unused static variable 2021-10-04 22:13:12 +01:00
fuse fuse: don't unhash root 2024-04-10 16:18:38 +02:00
gfs2 gfs2: Fix "ignore unlock failures after withdraw" 2024-06-16 13:39:20 +02:00
hfs hfs: fix missing hfs_bnode_get() in __hfs_bnode_create 2023-03-10 09:39:57 +01:00
hfsplus fs: hfsplus: remove WARN_ON() from hfsplus_cat_{read,write}_inode() 2023-05-24 17:36:43 +01:00
hostfs hostfs: support splice_write 2021-08-26 22:28:02 +02:00
hpfs
hugetlbfs fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_super 2024-03-06 14:38:50 +00:00
iomap iomap: update ki_pos a little later in iomap_dio_complete 2023-12-08 08:48:05 +01:00
isofs isofs: handle CDs with bad root inode but good Joliet root directory 2024-04-13 13:01:44 +02:00
jbd2 jbd2: fix soft lockup in journal_finish_inode_data_buffers() 2024-01-25 14:52:29 -08:00
jffs2 jffs2: Fix potential illegal address access in jffs2_free_inode 2024-07-18 13:07:29 +02:00
jfs jfs: xattr: fix buffer overflow for invalid xattr 2024-07-05 09:14:14 +02:00
kernfs fs/kernfs/dir: obey S_ISGID 2024-02-23 08:54:51 +01:00
ksmbd ksmbd: ignore trailing slashes in share paths 2024-07-05 09:14:37 +02:00
lockd Revert "lockd: introduce safe async lock op" 2024-04-27 17:05:23 +02:00
minix minix: fix bug when opening a file with O_DIRECT 2022-04-13 20:59:10 +02:00
netfs netfs: fix parameter of cleanup() 2021-12-29 12:28:59 +01:00
nfs nfs: Leave pages in the pagecache if readpage failed 2024-07-05 09:14:50 +02:00
nfs_common nfs: Fix kerneldoc warning shown up by W=1 2021-10-04 22:02:17 +01:00
nfsd knfsd: LOOKUP can return an illegal error value 2024-07-05 09:14:21 +02:00
nilfs2 nilfs2: add missing check for inode numbers on directory entries 2024-07-18 13:07:32 +02:00
nls fs/nls: make load_nls() take a const parameter 2023-09-19 12:22:27 +02:00
notify fanotify: Remove obsoleted fanotify_event_has_path() 2024-04-10 16:19:19 +02:00
ntfs ntfs: check overflow when iterating ATTR_RECORDs 2022-11-26 09:24:52 +01:00
ntfs3 fs/ntfs3: Use variable length array instead of fixed size 2024-06-16 13:39:43 +02:00
ocfs2 ocfs2: fix DIO failure due to insufficient transaction credits 2024-07-05 09:14:45 +02:00
omfs
openpromfs openpromfs: finish conversion to the new mount API 2024-06-16 13:39:16 +02:00
orangefs orangefs: fix out-of-bounds fsid access 2024-07-18 13:07:29 +02:00
overlayfs ima: detect changes to the backing overlay file 2023-11-28 16:56:29 +00:00
proc fs/proc: fix softlockup in __read_vmcore 2024-07-05 09:14:21 +02:00
pstore pstore/zone: Add a null pointer check to the psz_kmsg_read 2024-04-13 13:01:43 +02:00
qnx4 qnx4: work around gcc false positive warning bug 2021-09-21 08:36:48 -07:00
qnx6
quota quota: Fix rcu annotations of inode dquot pointers 2024-03-26 18:21:27 -04:00
ramfs shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfs 2023-07-23 13:47:33 +02:00
reiserfs reiserfs: Check the return value from __getblk() 2023-09-19 12:22:30 +02:00
romfs
smbfs_common cifs: Fix crash on unload of cifs_arc4.ko 2021-12-14 10:57:12 +01:00
squashfs revert "squashfs: harden sanity check in squashfs_read_xattr_id_table" 2023-02-22 12:57:07 +01:00
sysfs fs: sysfs: Fix reference leak in sysfs_break_active_protection() 2024-04-27 17:05:28 +02:00
sysv sysv: don't call sb_bread() with pointers_lock held 2024-04-13 13:01:44 +02:00
tracefs tracefs: Add missing lockdown check to tracefs_create_dir() 2023-09-23 11:10:02 +02:00
ubifs ubifs: Set page uptodate in the correct place 2024-04-10 16:18:35 +02:00
udf udf: udftime: prevent overflow in udf_disk_stamp_to_time() 2024-07-05 09:14:28 +02:00
ufs
unicode
vboxsf vboxsf: Avoid an spurious warning if load_nls_xxx() fails 2024-04-10 16:19:38 +02:00
verity fsverity: skip PKCS#7 parser when keyring is empty 2023-09-19 12:22:52 +02:00
xfs xfs: read only mounts with fsopen mount API are busted 2024-02-23 08:54:32 +01:00
zonefs zonefs: Improve error handling 2024-03-01 13:21:43 +01:00
Kconfig NFSD: Remove CONFIG_NFSD_V3 2024-04-10 16:19:01 +02:00
Kconfig.binfmt
Makefile io_uring: move to separate directory 2022-12-14 11:37:31 +01:00
aio.c fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion 2024-04-10 16:18:46 +02:00
anon_inodes.c
attr.c attr: block mode changes of symlinks 2023-09-23 11:10:01 +02:00
bad_inode.c
binfmt_aout.c binfmt: a.out: Fix bogus semicolon 2021-09-05 10:15:05 -07:00
binfmt_elf.c fs/binfmt_elf: Fix memory leak in load_elf_binary() 2022-11-03 23:59:12 +09:00
binfmt_elf_fdpic.c fs: binfmt_elf_efpic: fix personality for ELF-FDPIC 2023-10-06 13:18:24 +02:00
binfmt_flat.c binfmt_flat: do not stop relocating GOT entries prematurely on riscv 2022-06-09 10:22:26 +02:00
binfmt_misc.c binfmt_misc: fix shift-out-of-bounds in check_special_flags 2022-12-31 13:14:39 +01:00
binfmt_script.c
buffer.c mm: fs: initialize fsdata passed to write_begin/write_end interface 2022-11-26 09:24:51 +01:00
char_dev.c chardev: fix error handling in cdev_device_add() 2022-12-31 13:14:30 +01:00
compat_binfmt_elf.c
coredump.c coredump: Use the vma snapshot in fill_files_note 2022-04-08 14:24:18 +02:00
d_path.c d_path: make 'prepend()' fill up the buffer exactly on overflow 2021-09-02 10:07:29 -07:00
dax.c fsdax: Fix infinite loop in dax_iomap_rw() 2022-09-28 11:11:56 +02:00
dcache.c fast_dput(): handle underflows gracefully 2024-02-23 08:54:46 +01:00
direct-io.c
drop_caches.c fs: drop_caches: fix skipping over shadow cache inodes 2021-09-03 09:58:10 -07:00
eventfd.c eventfd: prevent underflow for eventfd semaphores 2023-09-19 12:22:30 +02:00
eventpoll.c epoll: be better about file lifetimes 2024-06-16 13:39:15 +02:00
exec.c exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack() 2024-04-10 16:19:31 +02:00
fcntl.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
fhandle.c do_sys_name_to_handle(): use kzalloc() to fix kernel-infoleak 2024-03-26 18:21:14 -04:00
file.c file: reinstate f_pos locking optimization for regular files 2023-08-11 15:13:58 +02:00
file_table.c locks: fix TOCTOU race when granting write lease 2022-10-26 12:34:58 +02:00
filesystems.c
fs-writeback.c writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs 2023-11-20 11:08:13 +01:00
fs_context.c fs: avoid empty option when generating legacy mount string 2023-07-23 13:47:34 +02:00
fs_parser.c namei: Standardize callers of filename_lookup() 2021-09-07 16:07:47 -04:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c fs: add ctime accessors infrastructure 2023-12-08 08:48:04 +01:00
internal.h nfs: use vfs setgid helper 2023-08-30 16:18:19 +02:00
ioctl.c lsm: new security_file_ioctl_compat() hook 2024-02-23 08:54:25 +01:00
kernel_read_file.c vfs: check fd has read access in kernel_read_file_from_fd() 2021-10-18 20:22:03 -10:00
libfs.c libfs: add DEFINE_SIMPLE_ATTRIBUTE_SIGNED for signed value 2022-12-31 13:14:03 +01:00
locks.c filelock: add a new locks_inode_context accessor function 2024-04-10 16:19:23 +02:00
mbcache.c mbcache: Avoid nesting of cache->c_list_lock under bit locks 2023-01-12 11:59:20 +01:00
mount.h
mpage.c
namei.c rename(): fix the locking of subdirectories 2024-02-23 08:54:26 +01:00
namespace.c fs: indicate request originates from old mount API 2024-01-25 14:52:35 -08:00
no-block.c
nsfs.c
open.c ftruncate: pass a signed offset 2024-07-05 09:14:50 +02:00
pipe.c fs/pipe: Fix lockdep false-positive in watchqueue pipe_write() 2024-04-10 16:19:42 +02:00
pnode.c pnode: terminate at peers of source 2023-01-12 11:58:47 +01:00
pnode.h
posix_acl.c fs: fix acl translation 2022-07-02 16:41:17 +02:00
proc_namespace.c fs: add is_idmapped_mnt() helper 2022-07-02 16:41:14 +02:00
read_write.c vfs: fix copy_file_range() averts filesystem freeze protection 2022-12-19 12:36:39 +01:00
readdir.c
remap_range.c fs/remap: constrain dedupe of EOF blocks 2022-07-21 21:24:14 +02:00
select.c fs/select: rework stack allocation hack for clang 2024-03-26 18:21:15 -04:00
seq_file.c rxrpc: Fix locking issue 2022-07-12 16:35:08 +02:00
signalfd.c signalfd: use wake_up_pollfree() 2021-12-14 10:57:15 +01:00
splice.c Revert "fs: check FMODE_LSEEK to control internal pipe splicing" 2022-10-26 12:34:17 +02:00
stack.c
stat.c stat: fix inconsistency between struct stat and struct compat_stat 2022-04-27 14:38:57 +02:00
statfs.c statfs: enforce statfs[64] structure initialization 2023-05-24 17:36:54 +01:00
super.c fs: Protect reconfiguration of sb read-write from racing writes 2023-08-11 15:13:58 +02:00
sync.c vfs: make sync_filesystem return errors from ->sync_fs 2022-04-27 14:38:50 +02:00
timerfd.c
userfaultfd.c userfaultfd: open userfaultfds with O_RDONLY 2022-10-26 12:34:36 +02:00
utimes.c
xattr.c fs: don't audit the capability check in simple_xattr_list() 2022-12-31 13:14:01 +01:00