WSL2-Linux-Kernel/kernel
Kumar Kartikeya Dwivedi 51b82141ff bpf: Fix UAF due to race between btf_try_get_module and load_module
[ Upstream commit 18688de203 ]

While working on code to populate kfunc BTF ID sets for module BTF from
its initcall, I noticed that by the time the initcall is invoked, the
module BTF can already be seen by userspace (and the BPF verifier). The
existing btf_try_get_module calls try_module_get which only fails if
mod->state == MODULE_STATE_GOING, i.e. it can increment module reference
when module initcall is happening in parallel.

Currently, BTF parsing happens from MODULE_STATE_COMING notifier
callback. At this point, the module initcalls have not been invoked.
The notifier callback parses and prepares the module BTF, allocates an
ID, which publishes it to userspace, and then adds it to the btf_modules
list allowing the kernel to invoke btf_try_get_module for the BTF.

However, at this point, the module has not been fully initialized (i.e.
its initcalls have not finished). The code in module.c can still fail
and free the module, without caring for other users. However, nothing
stops btf_try_get_module from succeeding between the state transition
from MODULE_STATE_COMING to MODULE_STATE_LIVE.

This leads to a use-after-free issue when BPF program loads
successfully in the state transition, load_module's do_init_module call
fails and frees the module, and BPF program fd on close calls module_put
for the freed module. Future patch has test case to verify we don't
regress in this area in future.

There are multiple points after prepare_coming_module (in load_module)
where failure can occur and module loading can return error. We
illustrate and test for the race using the last point where it can
practically occur (in module __init function).

An illustration of the race:

CPU 0                           CPU 1
			  load_module
			    notifier_call(MODULE_STATE_COMING)
			      btf_parse_module
			      btf_alloc_id	// Published to userspace
			      list_add(&btf_mod->list, btf_modules)
			    mod->init(...)
...				^
bpf_check		        |
check_pseudo_btf_id             |
  btf_try_get_module            |
    returns true                |  ...
...                             |  module __init in progress
return prog_fd                  |  ...
...                             V
			    if (ret < 0)
			      free_module(mod)
			    ...
close(prog_fd)
 ...
 bpf_prog_free_deferred
  module_put(used_btf.mod) // use-after-free

We fix this issue by setting a flag BTF_MODULE_F_LIVE, from the notifier
callback when MODULE_STATE_LIVE state is reached for the module, so that
we return NULL from btf_try_get_module for modules that are not fully
formed. Since try_module_get already checks that module is not in
MODULE_STATE_GOING state, and that is the only transition a live module
can make before being removed from btf_modules list, this is enough to
close the race and prevent the bug.

A later selftest patch crafts the race condition artifically to verify
that it has been fixed, and that verifier fails to load program (with
ENXIO).

Lastly, a couple of comments:

 1. Even if this race didn't exist, it seems more appropriate to only
    access resources (ksyms and kfuncs) of a fully formed module which
    has been initialized completely.

 2. This patch was born out of need for synchronization against module
    initcall for the next patch, so it is needed for correctness even
    without the aforementioned race condition. The BTF resources
    initialized by module initcall are set up once and then only looked
    up, so just waiting until the initcall has finished ensures correct
    behavior.

Fixes: 541c3bad8d ("bpf: Support BPF ksym variables in kernel modules")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220114163953.1455836-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:24 +02:00
..
bpf bpf: Fix UAF due to race between btf_try_get_module and load_module 2022-04-08 14:23:24 +02:00
cgroup cgroup-v1: Correct privileges check in release_agent writes 2022-03-02 11:47:47 +01:00
configs drivers/char: remove /dev/kmem for good 2021-05-07 00:26:34 -07:00
debug kdb: Adopt scheduler's task classification 2021-11-18 19:17:06 +01:00
dma Revert "swiotlb: rework "fix info leak with DMA_FROM_DEVICE"" 2022-04-08 14:22:45 +02:00
entry signal: Replace force_fatal_sig with force_exit_sig when in doubt 2021-11-25 09:49:07 +01:00
events perf/core: Fix address filter parser for multiple filters 2022-04-08 14:23:10 +02:00
gcov Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR 2021-06-22 11:07:18 -07:00
irq PCI/MSI: Move non-mask check back into low level accessors 2021-11-18 19:17:14 +01:00
kcsan LKMM updates: 2021-09-02 13:00:15 -07:00
livepatch livepatch: Replace deprecated CPU-hotplug functions. 2021-08-19 12:00:24 +02:00
locking locking/lockdep: Avoid potential access of invalid memory in lock_class 2022-04-08 14:22:48 +02:00
power PM: suspend: fix return value of __setup handler 2022-04-08 14:23:07 +02:00
printk printk: fix return value of printk.devkmsg __setup handler 2022-04-08 14:23:19 +02:00
rcu rcu: Don't deboost before reporting expedited quiescent state 2022-03-28 09:58:45 +02:00
sched sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race 2022-04-08 14:23:11 +02:00
time clocksource: Avoid accidental unstable marking of clocksources 2022-01-27 11:04:08 +01:00
trace tracing: Have trace event string test handle zero length strings 2022-04-08 14:22:57 +02:00
.gitignore .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/rwlock: Provide RT variant 2021-08-17 17:50:51 +02:00
Kconfig.preempt sched/core: Disable CONFIG_SCHED_CORE by default 2021-06-28 22:43:05 +02:00
Makefile kbuild: update config_data.gz only when the content of .config is changed 2021-05-02 00:43:35 +09:00
acct.c kernel/acct.c: use dedicated helper to access rlimit values 2021-09-08 11:50:26 -07:00
async.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-23 12:03:07 +01:00
audit.c audit: improve audit queue handling when "audit=1" on cmdline 2022-02-08 18:34:03 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:23:06 +02:00
audit_fsnotify.c audit_alloc_mark(): don't open-code ERR_CAST() 2021-02-23 10:25:27 -05:00
audit_tree.c audit: move put_tree() to avoid trim_trees refcount underflow and UAF 2021-08-24 18:52:36 -04:00
audit_watch.c
auditfilter.c lsm: separate security_task_getsecid() into subjective and objective variants 2021-03-22 15:23:32 -04:00
auditsc.c audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:23:06 +02:00
backtracetest.c
bounds.c
capability.c capability: handle idmapped mounts 2021-01-24 14:27:16 +01:00
cfi.c cfi: Use rcu_read_{un}lock_sched_notrace 2021-08-11 13:11:12 -07:00
compat.c arch: remove compat_alloc_user_space 2021-09-08 15:32:35 -07:00
configs.c
context_tracking.c
cpu.c sched/scs: Reset task stack state in bringup_cpu() 2021-12-01 09:04:54 +01:00
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-08-16 18:55:32 +02:00
crash_core.c kernel/crash_core: suppress unknown crashkernel parameter warning 2021-12-29 12:28:49 +01:00
crash_dump.c
cred.c ucounts: Base set_cred_ucounts changes on the real user 2022-02-23 12:03:20 +01:00
delayacct.c delayacct: Add sysctl to enable at runtime 2021-05-12 11:43:25 +02:00
dma.c
exec_domain.c
exit.c io_uring: remove files pointer in cancellation functions 2021-08-23 13:10:37 -06:00
extable.c
fail_function.c fault-injection: handle EI_ETYPE_TRUE 2020-12-15 22:46:19 -08:00
fork.c sched: Fix yet more sched_fork() races 2022-03-08 19:12:49 +01:00
freezer.c sched: Add get_current_state() 2021-06-18 11:43:08 +02:00
futex.c futex: Remove unused variable 'vpid' in futex_proxy_trylock_atomic() 2021-09-03 23:00:22 +02:00
gen_kheaders.sh kbuild: clean up ${quiet} checks in shell scripts 2021-05-27 04:01:50 +09:00
groups.c groups: simplify struct group_info allocation 2021-02-26 09:41:03 -08:00
hung_task.c Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
iomem.c
irq_work.c irq_work: Make irq_work_queue() NMI-safe again 2021-06-10 10:00:08 +02:00
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-05 10:46:20 +02:00
kallsyms.c module: add printk formats to add module build ID to stacktraces 2021-07-08 11:48:22 -07:00
kcmp.c Merge branch 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:36:48 -08:00
kcov.c
kexec.c kexec: avoid compat_alloc_user_space 2021-09-08 15:32:34 -07:00
kexec_core.c Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
kexec_elf.c
kexec_file.c kernel: kexec_file: fix error return code of kexec_calculate_store_digests() 2021-05-07 00:26:32 -07:00
kexec_internal.h kexec: move machine_kexec_post_load() to public interface 2021-02-22 12:33:26 +00:00
kheaders.c
kmod.c modules: add CONFIG_MODPROBE_PATH 2021-05-07 00:26:33 -07:00
kprobes.c kprobes: Limit max data_size of the kretprobe instances 2021-12-08 09:04:41 +01:00
ksysfs.c
kthread.c Merge branch 'akpm' (patches from Andrew) 2021-06-29 17:29:11 -07:00
latencytop.c
module-internal.h
module.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-23 12:03:07 +01:00
module_signature.c module: harden ELF info handling 2021-01-19 10:24:45 +01:00
module_signing.c module: harden ELF info handling 2021-01-19 10:24:45 +01:00
notifier.c notifier: Remove atomic_notifier_call_chain_robust() 2021-08-16 18:55:32 +02:00
nsproxy.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
padata.c padata: Remove repeated verbose license text 2021-08-27 16:30:18 +08:00
panic.c Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
params.c params: lift param_set_uint_minmax to common code 2021-08-16 14:42:22 +02:00
pid.c kernel/pid.c: implement additional checks upon pidfd_create() parameters 2021-08-10 12:53:07 +02:00
pid_namespace.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
profile.c profiling: fix shift-out-of-bounds bugs 2021-09-08 11:50:26 -07:00
ptrace.c ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE 2022-04-08 14:22:50 +02:00
range.c
reboot.c reboot: Add hardware protection power-off 2021-06-21 13:08:36 +01:00
regset.c
relay.c relay: allow the use of const callback structs 2020-12-15 22:46:18 -08:00
resource.c kernel/resource: fix return code check in __request_free_mem_region 2021-05-14 19:41:32 -07:00
resource_kunit.c
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-04-08 14:23:10 +02:00
scftorture.c scftorture: Avoid NULL pointer exception on early exit 2021-07-27 11:39:30 -07:00
scs.c scs: Release kasan vmalloc poison in scs_free process 2021-11-18 19:16:29 +01:00
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2022-02-16 12:56:38 +01:00
signal.c signal: In get_signal test for signal_group_exit every time through the loop 2022-03-08 19:12:34 +01:00
smp.c smp: Fix all kernel-doc warnings 2021-08-11 14:47:16 +02:00
smpboot.c smpboot: Replace deprecated CPU-hotplug functions. 2021-08-10 14:57:42 +02:00
smpboot.h
softirq.c genirq: Change force_irqthreads to a static key 2021-08-10 22:50:07 +02:00
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-23 12:03:07 +01:00
stacktrace.c
static_call.c static_call: Fix static_call_text_reserved() vs __init 2021-07-05 10:46:33 +02:00
stop_machine.c stop_machine: Add caller debug info to queue_stop_cpus_work 2021-03-23 16:01:58 +01:00
sys.c ucounts: Move RLIMIT_NPROC handling after set_user 2022-02-23 12:03:20 +01:00
sys_ni.c compat: remove some compat entry points 2021-09-08 15:32:35 -07:00
sysctl-test.c kernel/sysctl-test: Remove some casts which are no-longer required 2021-06-23 16:41:24 -06:00
sysctl.c x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting 2022-03-11 12:22:31 +01:00
task_work.c kasan: record task_work_add() call stack 2021-04-30 11:20:42 -07:00
taskstats.c
test_kprobes.c
torture.c torture: Replace deprecated CPU-hotplug functions. 2021-08-10 10:48:07 -07:00
tracepoint.c tracepoint: Fix kerneldoc comments 2021-08-16 11:39:51 -04:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-27 11:05:35 +01:00
ucount.c ucounts: Handle wrapping in is_ucounts_overlimit 2022-02-23 12:03:20 +01:00
uid16.c
uid16.h
umh.c kernel/umh.c: fix some spelling mistakes 2021-05-07 00:26:34 -07:00
up.c A set of locking related fixes and updates: 2021-05-09 13:07:03 -07:00
user-return-notifier.c
user.c fs/epoll: use a per-cpu counter for user's watches count 2021-09-08 11:50:27 -07:00
user_namespace.c ucounts: Fix systemd LimitNPROC with private users regression 2022-03-08 19:12:42 +01:00
usermode_driver.c Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
utsname.c
utsname_sysctl.c
watch_queue.c watch_queue: Actually free the watch 2022-04-08 14:23:10 +02:00
watchdog.c kernel: watchdog: modify the explanation related to watchdog thread 2021-06-29 10:53:46 -07:00
watchdog_hld.c
workqueue.c workqueue: Fix unbind_workers() VS wq_worker_running() race 2022-01-16 09:12:41 +01:00
workqueue_internal.h workqueue: Assign a color to barrier work items 2021-08-17 07:49:10 -10:00