WSL2-Linux-Kernel/fs
Lin Ma df4b177b48 io_uring/poll: fix poll_refs race with cancelation
[ upstream commit 12ad3d2d6c ]

There is an interesting race condition of poll_refs which could result
in a NULL pointer dereference. The crash trace is like:

KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 30781 Comm: syz-executor.2 Not tainted 6.0.0-g493ffd6605b2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:io_poll_remove_entry io_uring/poll.c:154 [inline]
RIP: 0010:io_poll_remove_entries+0x171/0x5b4 io_uring/poll.c:190
Code: ...
RSP: 0018:ffff88810dfefba0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000040000
RDX: ffffc900030c4000 RSI: 000000000003ffff RDI: 0000000000040000
RBP: 0000000000000008 R08: ffffffff9764d3dd R09: fffffbfff3836781
R10: fffffbfff3836781 R11: 0000000000000000 R12: 1ffff11003422d60
R13: ffff88801a116b04 R14: ffff88801a116ac0 R15: dffffc0000000000
FS:  00007f9c07497700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffb5c00ea98 CR3: 0000000105680005 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 io_apoll_task_func+0x3f/0xa0 io_uring/poll.c:299
 handle_tw_list io_uring/io_uring.c:1037 [inline]
 tctx_task_work+0x37e/0x4f0 io_uring/io_uring.c:1090
 task_work_run+0x13a/0x1b0 kernel/task_work.c:177
 get_signal+0x2402/0x25a0 kernel/signal.c:2635
 arch_do_signal_or_restart+0x3b/0x660 arch/x86/kernel/signal.c:869
 exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
 exit_to_user_mode_prepare+0xc2/0x160 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
 syscall_exit_to_user_mode+0x58/0x160 kernel/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

The root cause for this is a tiny overlooking in
io_poll_check_events() when cocurrently run with poll cancel routine
io_poll_cancel_req().

The interleaving to trigger use-after-free:

CPU0                                       |  CPU1
                                           |
io_apoll_task_func()                       |  io_poll_cancel_req()
 io_poll_check_events()                    |
  // do while first loop                   |
  v = atomic_read(...)                     |
  // v = poll_refs = 1                     |
  ...                                      |  io_poll_mark_cancelled()
                                           |   atomic_or()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |
  atomic_sub_return(...)                   |
  // poll_refs = IO_POLL_CANCEL_FLAG       |
  // loop continue                         |
                                           |
                                           |  io_poll_execute()
                                           |   io_poll_get_ownership()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |   // gets the ownership
  v = atomic_read(...)                     |
  // poll_refs not change                  |
                                           |
  if (v & IO_POLL_CANCEL_FLAG)             |
   return -ECANCELED;                      |
  // io_poll_check_events return           |
  // will go into                          |
  // io_req_complete_failed() free req     |
                                           |
                                           |  io_apoll_task_func()
                                           |  // also go into
io_req_complete_failed()

And the interleaving to trigger the kernel WARNING:

CPU0                                       |  CPU1
                                           |
io_apoll_task_func()                       |  io_poll_cancel_req()
 io_poll_check_events()                    |
  // do while first loop                   |
  v = atomic_read(...)                     |
  // v = poll_refs = 1                     |
  ...                                      |  io_poll_mark_cancelled()
                                           |   atomic_or()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |
  atomic_sub_return(...)                   |
  // poll_refs = IO_POLL_CANCEL_FLAG       |
  // loop continue                         |
                                           |
  v = atomic_read(...)                     |
  // v = IO_POLL_CANCEL_FLAG               |
                                           |  io_poll_execute()
                                           |   io_poll_get_ownership()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |   // gets the ownership
                                           |
  WARN_ON_ONCE(!(v & IO_POLL_REF_MASK)))   |
  // v & IO_POLL_REF_MASK = 0 WARN         |
                                           |
                                           |  io_apoll_task_func()
                                           |  // also go into
io_req_complete_failed()

By looking up the source code and communicating with Pavel, the
implementation of this atomic poll refs should continue the loop of
io_poll_check_events() just to avoid somewhere else to grab the
ownership. Therefore, this patch simply adds another AND operation to
make sure the loop will stop if it finds the poll_refs is exactly equal
to IO_POLL_CANCEL_FLAG. Since io_poll_cancel_req() grabs ownership and
will finally make its way to io_req_complete_failed(), the req will
be reclaimed as expected.

Fixes: aa43477b04 ("io_uring: poll rework")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: tweak description and code style]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-12-08 11:28:43 +01:00
..
9p 9p: fix a bunch of checkpatch warnings 2022-08-17 14:24:07 +02:00
adfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
affs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
afs afs: Fix fileserver probe RTT handling 2022-12-08 11:28:41 +01:00
autofs autofs: fix wait name hash calculation in autofs_wait() 2021-10-20 21:09:02 -04:00
befs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
bfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
btrfs btrfs: qgroup: fix sleep from invalid context bug in btrfs_qgroup_inherit() 2022-12-08 11:28:38 +01:00
cachefiles fs: add is_idmapped_mnt() helper 2022-07-02 16:41:14 +02:00
ceph ceph: fix NULL pointer dereference for req->r_session 2022-12-02 17:41:01 +01:00
cifs cifs: fix missed refcounting of ipc tcon 2022-12-02 17:41:12 +01:00
coda
configfs configfs: fix a race in configfs_{,un}register_subsystem() 2022-03-02 11:48:02 +01:00
cramfs
crypto fscrypt: fix keyring memory leak on mount failure 2022-11-10 18:15:37 +01:00
debugfs debugfs: add debugfs_lookup_and_remove() 2022-09-15 11:30:02 +02:00
devpts fsnotify: fix fsnotify hooks in pseudo filesystems 2022-02-01 17:27:01 +01:00
dlm fs: dlm: fix invalid derefence of sb_lvbptr 2022-10-29 10:12:57 +02:00
ecryptfs fs: add is_idmapped_mnt() helper 2022-07-02 16:41:14 +02:00
efivarfs
efs
erofs erofs: fix order >= MAX_ORDER warning due to crafted negative i_size 2022-12-08 11:28:37 +01:00
exfat exfat: use updated exfat_chain directly during renaming 2022-07-29 17:25:30 +02:00
exportfs exportfs: support idmapped mounts 2022-06-09 10:23:32 +02:00
ext2 ext2: Use kvmalloc() for group descriptor array 2022-10-26 12:35:51 +02:00
ext4 ext4: fix use-after-free in ext4_ext_shift_extents 2022-12-02 17:41:08 +01:00
f2fs ext4,f2fs: fix readahead of verity data 2022-11-10 18:15:42 +01:00
fat fat: add ratelimit to fat*_ent_bread() 2022-06-09 10:22:42 +02:00
freevxfs
fscache fscache: Remove an unused static variable 2021-10-04 22:13:12 +01:00
fuse fuse: lock inode unconditionally in fuse_fallocate() 2022-12-02 17:41:11 +01:00
gfs2 gfs2: Switch from strlcpy to strscpy 2022-11-26 09:24:51 +01:00
hfs hfs: add lock nesting notation to hfs_find_init 2021-07-15 10:13:49 -07:00
hfsplus hfsplus: report create_date to kstat.btime 2021-07-01 11:06:06 -07:00
hostfs hostfs: support splice_write 2021-08-26 22:28:02 +02:00
hpfs hpfs: use iomap_fiemap to implement ->fiemap 2021-07-27 11:00:36 +02:00
hugetlbfs hugetlbfs: don't delete error page from pagecache 2022-11-26 09:24:33 +01:00
iomap iomap: iomap_write_failed fix 2022-06-09 10:22:55 +02:00
isofs isofs: Fix out of bound access for corrupted isofs image 2021-11-12 15:05:50 +01:00
jbd2 jbd2: add miss release buffer head in fc_do_one_pass() 2022-10-26 12:34:28 +02:00
jffs2 jffs2: fix memory leak in jffs2_do_fill_super 2022-06-14 18:36:10 +02:00
jfs fs: jfs: fix possible NULL pointer dereference in dbFree() 2022-06-09 10:22:41 +02:00
kernfs kernfs: fix use-after-free in __kernfs_remove 2022-11-03 23:59:13 +09:00
ksmbd ksmbd: fix incorrect handling of iterate_dir 2022-10-29 10:12:57 +02:00
lockd lockd: detect and reject lock arguments that overflow 2022-08-17 14:22:47 +02:00
minix minix: fix bug when opening a file with O_DIRECT 2022-04-13 20:59:10 +02:00
netfs netfs: fix parameter of cleanup() 2021-12-29 12:28:59 +01:00
nfs NFSv4: Retry LOCK on OLD_STATEID during delegation return 2022-11-26 09:24:31 +01:00
nfs_common nfs: Fix kerneldoc warning shown up by W=1 2021-10-04 22:02:17 +01:00
nfsd NFSD: fix use-after-free on source server when doing inter-server copy 2022-10-26 12:35:32 +02:00
nilfs2 nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry() 2022-12-08 11:28:42 +01:00
nls
notify fsnotify: fix wrong lockdep annotations 2022-06-09 10:22:50 +02:00
ntfs ntfs: check overflow when iterating ATTR_RECORDs 2022-11-26 09:24:52 +01:00
ntfs3 ntfs3: rework xattr handlers and switch to POSIX ACL VFS helpers 2022-10-26 12:34:36 +02:00
ocfs2 ocfs2: fix BUG when iput after ocfs2_mknod fails 2022-10-29 10:12:53 +02:00
omfs mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
openpromfs
orangefs orangefs: Fix the size of a memory allocation in orangefs_bufmap_alloc() 2022-01-20 09:13:13 +01:00
overlayfs ovl: warn if trusted xattr creation fails 2022-08-25 11:40:43 +02:00
proc mm: /proc/pid/smaps_rollup: fix no vma's null-deref 2022-10-29 10:12:58 +02:00
pstore pstore: Don't use semaphores in always-atomic-context code 2022-04-08 14:23:01 +02:00
qnx4 qnx4: work around gcc false positive warning bug 2021-09-21 08:36:48 -07:00
qnx6
quota quota: Check next/prev free block number after reading from quota file 2022-10-26 12:34:21 +02:00
ramfs fs: move ramfs_aops to libfs 2021-06-29 10:53:48 -07:00
reiserfs Kbuild updates for v5.15 2021-09-03 15:33:47 -07:00
romfs
smbfs_common cifs: Fix crash on unload of cifs_arc4.ko 2021-12-14 10:57:12 +01:00
squashfs squashfs: use bvec_virt 2021-08-16 10:50:32 -06:00
sysfs sysfs: Allow deferred execution of iomem_get_mapping() 2021-08-06 13:05:28 +02:00
sysv mm: require ->set_page_dirty to be explicitly wired up 2021-06-29 10:53:48 -07:00
tracefs tracefs: Only clobber mode/uid/gid on remount if asked 2022-09-20 12:39:43 +02:00
ubifs ubifs: rename_whiteout: correct old_dir size computing 2022-04-08 14:24:08 +02:00
udf udf: Fix a slab-out-of-bounds write bug in udf_find_entry() 2022-11-16 09:58:27 +01:00
ufs isystem: ship and use stdarg.h 2021-08-19 09:02:55 +09:00
unicode .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
vboxsf vboxfs: fix broken legacy mount signature checking 2021-09-27 11:26:21 -07:00
verity fs-verity: fix signed integer overflow with i_size near S64_MAX 2021-09-22 10:56:34 -07:00
xfs fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE 2022-10-26 12:34:27 +02:00
zonefs zonefs: fix zone report size in __zonefs_io_error() 2022-12-02 17:41:10 +01:00
Kconfig 4 cifs/smb3 fixes, one for DFS reconnect, and one to begin creating common headers for server and client and the other two to rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb which is more accurate 2021-09-12 10:10:21 -07:00
Kconfig.binfmt binfmt: remove support for em86 (alpha only) 2021-07-25 22:33:03 -07:00
Makefile 4 cifs/smb3 fixes, one for DFS reconnect, and one to begin creating common headers for server and client and the other two to rename the cifs_common directory to smbfs_common to be more consistent ie change use of the name cifs to smb which is more accurate 2021-09-12 10:10:21 -07:00
aio.c aio: Fix incorrect usage of eventfd_signal_allowed() 2021-12-14 10:57:22 +01:00
anon_inodes.c
attr.c vfs: Check the truncate maximum size in inode_newsize_ok() 2022-08-17 14:22:50 +02:00
bad_inode.c vfs: add rcu argument to ->get_acl() callback 2021-08-18 22:08:24 +02:00
binfmt_aout.c binfmt: a.out: Fix bogus semicolon 2021-09-05 10:15:05 -07:00
binfmt_elf.c fs/binfmt_elf: Fix memory leak in load_elf_binary() 2022-11-03 23:59:12 +09:00
binfmt_elf_fdpic.c coredump: Snapshot the vmas in do_coredump 2022-04-08 14:24:17 +02:00
binfmt_flat.c binfmt_flat: do not stop relocating GOT entries prematurely on riscv 2022-06-09 10:22:26 +02:00
binfmt_misc.c
binfmt_script.c
buffer.c mm: fs: initialize fsdata passed to write_begin/write_end interface 2022-11-26 09:24:51 +01:00
char_dev.c
compat_binfmt_elf.c
coredump.c coredump: Use the vma snapshot in fill_files_note 2022-04-08 14:24:18 +02:00
d_path.c d_path: make 'prepend()' fill up the buffer exactly on overflow 2021-09-02 10:07:29 -07:00
dax.c fsdax: Fix infinite loop in dax_iomap_rw() 2022-09-28 11:11:56 +02:00
dcache.c
direct-io.c
drop_caches.c fs: drop_caches: fix skipping over shadow cache inodes 2021-09-03 09:58:10 -07:00
eventfd.c eventfd: guard wake_up in eventfd fs calls as well 2022-10-26 12:35:49 +02:00
eventpoll.c epoll: autoremove wakers even more aggressively 2022-08-17 14:22:59 +02:00
exec.c exec: Copy oldsighand->action under spin-lock 2022-11-03 23:59:12 +09:00
fcntl.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
fhandle.c
file.c fs: fix fd table size alignment properly 2022-04-08 14:23:54 +02:00
file_table.c locks: fix TOCTOU race when granting write lease 2022-10-26 12:34:58 +02:00
filesystems.c fs: simplify get_filesystem_list / get_all_fs_names 2021-08-23 01:25:40 -04:00
fs-writeback.c fs: do not update freeing inode i_io_list 2022-12-02 17:41:07 +01:00
fs_context.c vfs: fs_context: fix up param length parsing in legacy_parse_param 2022-01-20 09:13:14 +01:00
fs_parser.c namei: Standardize callers of filename_lookup() 2021-09-07 16:07:47 -04:00
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c fs: fix UAF/GPF bug in nilfs_mdt_destroy 2022-10-12 09:53:26 +02:00
internal.h locks: fix TOCTOU race when granting write lease 2022-10-26 12:34:58 +02:00
io-wq.c io-wq: Fix memory leak in worker creation 2022-10-26 12:35:56 +02:00
io-wq.h io-wq: provide a way to limit max number of workers 2021-08-29 07:55:55 -06:00
io_uring.c io_uring/poll: fix poll_refs race with cancelation 2022-12-08 11:28:43 +01:00
ioctl.c fs: fix an infinite loop in iomap_fiemap 2022-05-25 09:57:26 +02:00
kernel_read_file.c vfs: check fd has read access in kernel_read_file_from_fd() 2021-10-18 20:22:03 -10:00
libfs.c fs: remove noop_set_page_dirty() 2021-06-29 10:53:48 -07:00
locks.c Revert "memcg: enable accounting for file lock caches" 2021-09-07 11:21:48 -07:00
mbcache.c mbcache: add functions to delete entry if unused 2022-08-17 14:22:57 +02:00
mount.h
mpage.c
namei.c mm: fs: initialize fsdata passed to write_begin/write_end interface 2022-11-26 09:24:51 +01:00
namespace.c fs: require CAP_SYS_ADMIN in target namespace for idmapped mounts 2022-08-31 17:16:37 +02:00
no-block.c
nsfs.c
open.c locks: fix TOCTOU race when granting write lease 2022-10-26 12:34:58 +02:00
pipe.c pipe: Fix missing lock in pipe_resize_ring() 2022-06-06 08:43:37 +02:00
pnode.c
pnode.h
posix_acl.c fs: fix acl translation 2022-07-02 16:41:17 +02:00
proc_namespace.c fs: add is_idmapped_mnt() helper 2022-07-02 16:41:14 +02:00
read_write.c fs: sendfile handles O_NONBLOCK of out_fd 2022-08-03 12:03:41 +02:00
readdir.c
remap_range.c fs/remap: constrain dedupe of EOF blocks 2022-07-21 21:24:14 +02:00
select.c select: Fix indefinitely sleeping task in poll_schedule_timeout() 2022-01-29 10:58:25 +01:00
seq_file.c rxrpc: Fix locking issue 2022-07-12 16:35:08 +02:00
signalfd.c signalfd: use wake_up_pollfree() 2021-12-14 10:57:15 +01:00
splice.c Revert "fs: check FMODE_LSEEK to control internal pipe splicing" 2022-10-26 12:34:17 +02:00
stack.c
stat.c stat: fix inconsistency between struct stat and struct compat_stat 2022-04-27 14:38:57 +02:00
statfs.c
super.c fscrypt: fix keyring memory leak on mount failure 2022-11-10 18:15:37 +01:00
sync.c vfs: make sync_filesystem return errors from ->sync_fs 2022-04-27 14:38:50 +02:00
timerfd.c timerfd: Provide timerfd_resume() 2021-08-10 17:57:22 +02:00
userfaultfd.c userfaultfd: open userfaultfds with O_RDONLY 2022-10-26 12:34:36 +02:00
utimes.c
xattr.c fs: split off setxattr_copy and do_setxattr function from setxattr 2022-10-05 10:39:44 +02:00