WSL2-Linux-Kernel/fs
Aleksa Sarai ab87f9a56c namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
Allow LOOKUP_BENEATH and LOOKUP_IN_ROOT to safely permit ".." resolution
(in the case of LOOKUP_BENEATH the resolution will still fail if ".."
resolution would resolve a path outside of the root -- while
LOOKUP_IN_ROOT will chroot(2)-style scope it). Magic-link jumps are
still disallowed entirely[*].

As Jann explains[1,2], the need for this patch (and the original no-".."
restriction) is explained by observing there is a fairly easy-to-exploit
race condition with chroot(2) (and thus by extension LOOKUP_IN_ROOT and
LOOKUP_BENEATH if ".." is allowed) where a rename(2) of a path can be
used to "skip over" nd->root and thus escape to the filesystem above
nd->root.

  thread1 [attacker]:
    for (;;)
      renameat2(AT_FDCWD, "/a/b/c", AT_FDCWD, "/a/d", RENAME_EXCHANGE);
  thread2 [victim]:
    for (;;)
      openat2(dirb, "b/c/../../etc/shadow",
              { .flags = O_PATH, .resolve = RESOLVE_IN_ROOT } );

With fairly significant regularity, thread2 will resolve to
"/etc/shadow" rather than "/a/b/etc/shadow". There is also a similar
(though somewhat more privileged) attack using MS_MOVE.

With this patch, such cases will be detected *during* ".." resolution
and will return -EAGAIN for userspace to decide to either retry or abort
the lookup. It should be noted that ".." is the weak point of chroot(2)
-- walking *into* a subdirectory tautologically cannot result in you
walking *outside* nd->root (except through a bind-mount or magic-link).
There is also no other way for a directory's parent to change (which is
the primary worry with ".." resolution here) other than a rename or
MS_MOVE.

The primary reason for deferring to userspace with -EAGAIN is that an
in-kernel retry loop (or doing a path_is_under() check after re-taking
the relevant seqlocks) can become unreasonably expensive on machines
with lots of VFS activity (nfsd can cause lots of rename_lock updates).
Thus it should be up to userspace how many times they wish to retry the
lookup -- the selftests for this attack indicate that there is a ~35%
chance of the lookup succeeding on the first try even with an attacker
thrashing rename_lock.

A variant of the above attack is included in the selftests for
openat2(2) later in this patch series. I've run this test on several
machines for several days and no instances of a breakout were detected.
While this is not concrete proof that this is safe, when combined with
the above argument it should lend some trustworthiness to this
construction.

[*] It may be acceptable in the future to do a path_is_under() check for
    magic-links after they are resolved. However this seems unlikely to
    be a feature that people *really* need -- it can be added later if
    it turns out a lot of people want it.

[1]: https://lore.kernel.org/lkml/CAG48ez1jzNvxB+bfOBnERFGp=oMM0vHWuLD6EULmne3R6xa53w@mail.gmail.com/
[2]: https://lore.kernel.org/lkml/CAG48ez30WJhbsro2HOc_DR7V91M+hNFzBP5ogRMZaxbAORvqzg@mail.gmail.com/

Cc: Christian Brauner <christian.brauner@ubuntu.com>
Suggested-by: Jann Horn <jannh@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-12-08 19:09:44 -05:00
..
9p
adfs
affs affs: fix a memory leak in affs_remount 2019-11-18 14:26:43 +01:00
afs AFS development 2019-11-30 10:57:22 -08:00
autofs Merge branch 'next.autofs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-05 17:11:48 -08:00
befs
bfs
btrfs compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
cachefiles
ceph The two highlights are a set of improvements to how rbd read-only 2019-12-05 13:06:51 -08:00
cifs 9 cifs/smb3 fixes: two timestamp fixes, one oops fix (during oplock break) for stable, two fixes found in multichannel testing, two fixes for file create when using modeforsid mount parm 2019-12-08 12:12:18 -08:00
coda
configfs configfs: calculate the depth of parent item 2019-11-06 18:36:01 +01:00
cramfs cramfs: fix usage on non-MTD device 2019-11-23 21:44:49 -05:00
crypto fscrypt: add support for IV_INO_LBLK_64 policies 2019-11-06 12:34:36 -08:00
debugfs Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-06 09:06:58 -08:00
devpts
dlm
ecryptfs compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
efivarfs
efs
erofs erofs: remove unnecessary output in erofs_show_options() 2019-11-24 11:02:41 +08:00
exportfs race in exportfs_decode_fh() 2019-11-11 09:21:59 -05:00
ext2 \n 2019-11-30 11:16:07 -08:00
ext4 compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
f2fs compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
fat compat_ioctl: move drivers to compat_ptr_ioctl 2019-10-23 17:23:43 +02:00
freevxfs
fscache
fuse pipe: Fix iteration end check in fuse_dev_splice_write() 2019-12-06 13:57:04 -08:00
gfs2 GFS2 changes for this merge window: 2019-12-05 13:20:11 -08:00
hfs
hfsplus
hostfs
hpfs fs: compat_ioctl: move FITRIM emulation into file systems 2019-10-23 17:23:46 +02:00
hugetlbfs hugetlb: remove unused hstate in hugetlb_fault_mutex_hash() 2019-12-01 12:59:08 -08:00
iomap iomap: stop using ioend after it's been freed in iomap_finish_ioend() 2019-12-05 07:41:16 -08:00
isofs
jbd2 This merge window saw the the following new featuers added to ext4: 2019-11-30 10:53:02 -08:00
jffs2 Revert "jffs2: Fix possible null-pointer dereferences in jffs2_add_frag_to_fragtree()" 2019-11-29 11:29:58 +01:00
jfs
kernfs Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-06 09:06:58 -08:00
lockd NFSv4.1: Don't rebind to the same source port when reconnecting to the server 2019-11-03 21:28:45 -05:00
minix
nfs NFS4: Trace lock reclaims 2019-11-18 11:04:32 +01:00
nfs_common
nfsd This is a relatively quiet cycle for nfsd, mainly various bugfixes. 2019-12-07 16:56:00 -08:00
nilfs2 fs: compat_ioctl: move FITRIM emulation into file systems 2019-10-23 17:23:46 +02:00
nls
notify compat_ioctl: remove most of fs/compat_ioctl.c 2019-12-01 13:46:15 -08:00
ntfs
ocfs2 Merge branch 'akpm' (patches from Andrew) 2019-12-01 20:36:41 -08:00
omfs
openpromfs
orangefs orangefs: posix open permission checking... 2019-12-04 08:52:55 -05:00
overlayfs new helper: lookup_positive_unlocked() 2019-11-15 13:49:04 -05:00
proc namei: allow nd_jump_link() to produce errors 2019-12-08 19:09:38 -05:00
pstore pstore: Make pstore_choose_compression() static 2019-10-29 09:43:03 -07:00
qnx4
qnx6
quota Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-06 09:06:58 -08:00
ramfs
reiserfs reiserfs: replace open-coded atomic_dec_and_mutex_lock() 2019-11-05 12:25:22 +01:00
romfs
squashfs
sysfs
sysv
tracefs tracing: Do not create tracefs files if tracefs lockdown is in effect 2019-10-12 20:49:07 -04:00
ubifs ubifs: ubifs_tnc_start_commit: Fix OOB in layout_in_gaps 2019-11-17 22:22:54 +01:00
udf
ufs
unicode
verity
xfs Fixes for 5.5-rc1: 2019-12-07 17:05:33 -08:00
Kconfig io-wq: small threadpool implementation for io_uring 2019-10-29 12:43:00 -06:00
Kconfig.binfmt
Makefile io-wq: small threadpool implementation for io_uring 2019-10-29 12:43:00 -06:00
aio.c y2038: syscall implementation cleanups 2019-12-01 14:00:59 -08:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf.c fs/binfmt_elf.c: extract elf_read() function 2019-12-04 19:44:13 -08:00
binfmt_elf_fdpic.c y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c block: don't send uevent for empty disk when not invalidating 2019-12-02 18:49:30 -07:00
buffer.c fs/buffer.c: include internal.h for missing declarations 2019-12-01 06:29:17 -08:00
char_dev.c
compat.c
compat_binfmt_elf.c y2038: elfcore: Use __kernel_old_timeval for process times 2019-11-15 14:38:29 +01:00
compat_ioctl.c New code for 5.5: 2019-12-02 14:46:22 -08:00
coredump.c
d_path.c
dax.c New code for 5.5: 2019-11-30 10:44:49 -08:00
dcache.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-08 11:08:28 -08:00
dcookies.c
direct-io.c fs/direct-io.c: keep dio_warn_stale_pagecache() when CONFIG_BLOCK=n 2019-12-01 06:29:18 -08:00
drop_caches.c
eventfd.c
eventpoll.c fs/epoll: remove unnecessary wakeups of nested epoll 2019-12-04 19:44:13 -08:00
exec.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2019-12-03 12:20:25 -08:00
fcntl.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-08 11:08:28 -08:00
fhandle.c
file.c Revert "vfs: properly and reliably lock f_pos in fdget_pos()" 2019-11-26 11:34:06 -08:00
file_table.c
filesystems.c
fs-writeback.c cgroup,writeback: don't switch wbs immediately on dead wbs if the memcg is dead 2019-11-08 13:37:24 -07:00
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
inode.c
internal.h make __d_alloc() static 2019-10-25 14:08:24 -04:00
io-wq.c io_uring: use current task creds instead of allocating a new one 2019-12-02 08:50:00 -07:00
io-wq.h io-wq: clear node->next on list deletion 2019-12-04 17:26:57 -07:00
io_uring.c io_uring: fix a typo in a comment 2019-12-05 07:59:37 -07:00
ioctl.c New code for 5.5: 2019-12-02 14:46:22 -08:00
libfs.c fs/libfs.c: fix kernel-doc warning 2019-10-14 15:04:01 -07:00
locks.c
mbcache.c
mount.h
mpage.c
namei.c namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution 2019-12-08 19:09:44 -05:00
namespace.c Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-12-08 11:08:28 -08:00
no-block.c
nsfs.c nsfs: clean-up ns_get_path() signature to return int 2019-12-08 19:09:37 -05:00
open.c Revert "vfs: properly and reliably lock f_pos in fdget_pos()" 2019-11-26 11:34:06 -08:00
pipe.c pipe: don't use 'pipe_wait() for basic pipe IO 2019-12-07 13:53:09 -08:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c
readdir.c filldir[64]: remove WARN_ON_ONCE() for bad directory entries 2019-10-18 18:41:16 -04:00
select.c y2038: syscalls: change remaining timeval to __kernel_old_timeval 2019-11-15 14:38:29 +01:00
seq_file.c
signalfd.c
splice.c pipe: remove 'waiting_writers' merging logic 2019-12-07 13:21:01 -08:00
stack.c
stat.c
statfs.c vfs: Fix EOVERFLOW testing in put_compat_statfs64 2019-10-03 14:21:35 -07:00
super.c Merge branch 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2019-10-10 08:16:44 -07:00
sync.c
timerfd.c y2038: timerfd: Use timespec64 internally 2019-11-15 14:38:30 +01:00
userfaultfd.c Merge branch 'akpm' (patches from Andrew) 2019-12-01 20:36:41 -08:00
utimes.c y2038: syscalls: change remaining timeval to __kernel_old_timeval 2019-11-15 14:38:29 +01:00
xattr.c