WSL2-Linux-Kernel/kernel
Daniel Borkmann 0f98f40eb1 bpf: Fix overrunning reservations in ringbuf
commit cfa1a2329a691ffd991fcf7248a57d752e712881 upstream.

The BPF ring buffer internally is implemented as a power-of-2 sized circular
buffer, with two logical and ever-increasing counters: consumer_pos is the
consumer counter to show which logical position the consumer consumed the
data, and producer_pos which is the producer counter denoting the amount of
data reserved by all producers.

Each time a record is reserved, the producer that "owns" the record will
successfully advance producer counter. In user space each time a record is
read, the consumer of the data advanced the consumer counter once it finished
processing. Both counters are stored in separate pages so that from user
space, the producer counter is read-only and the consumer counter is read-write.

One aspect that simplifies and thus speeds up the implementation of both
producers and consumers is how the data area is mapped twice contiguously
back-to-back in the virtual memory, allowing to not take any special measures
for samples that have to wrap around at the end of the circular buffer data
area, because the next page after the last data page would be first data page
again, and thus the sample will still appear completely contiguous in virtual
memory.

Each record has a struct bpf_ringbuf_hdr { u32 len; u32 pg_off; } header for
book-keeping the length and offset, and is inaccessible to the BPF program.
Helpers like bpf_ringbuf_reserve() return `(void *)hdr + BPF_RINGBUF_HDR_SZ`
for the BPF program to use. Bing-Jhong and Muhammad reported that it is however
possible to make a second allocated memory chunk overlapping with the first
chunk and as a result, the BPF program is now able to edit first chunk's
header.

For example, consider the creation of a BPF_MAP_TYPE_RINGBUF map with size
of 0x4000. Next, the consumer_pos is modified to 0x3000 /before/ a call to
bpf_ringbuf_reserve() is made. This will allocate a chunk A, which is in
[0x0,0x3008], and the BPF program is able to edit [0x8,0x3008]. Now, lets
allocate a chunk B with size 0x3000. This will succeed because consumer_pos
was edited ahead of time to pass the `new_prod_pos - cons_pos > rb->mask`
check. Chunk B will be in range [0x3008,0x6010], and the BPF program is able
to edit [0x3010,0x6010]. Due to the ring buffer memory layout mentioned
earlier, the ranges [0x0,0x4000] and [0x4000,0x8000] point to the same data
pages. This means that chunk B at [0x4000,0x4008] is chunk A's header.
bpf_ringbuf_submit() / bpf_ringbuf_discard() use the header's pg_off to then
locate the bpf_ringbuf itself via bpf_ringbuf_restore_from_rec(). Once chunk
B modified chunk A's header, then bpf_ringbuf_commit() refers to the wrong
page and could cause a crash.

Fix it by calculating the oldest pending_pos and check whether the range
from the oldest outstanding record to the newest would span beyond the ring
buffer size. If that is the case, then reject the request. We've tested with
the ring buffer benchmark in BPF selftests (./benchs/run_bench_ringbufs.sh)
before/after the fix and while it seems a bit slower on some benchmarks, it
is still not significantly enough to matter.

Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Reported-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Reported-by: Muhammad Ramdhan <ramdhan@starlabs.sg>
Co-developed-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Co-developed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Bing-Jhong Billy Jheng <billy@starlabs.sg>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240621140828.18238-1-daniel@iogearbox.net
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-27 10:46:06 +02:00
..
bpf bpf: Fix overrunning reservations in ringbuf 2024-07-27 10:46:06 +02:00
cgroup sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level 2024-06-16 13:39:34 +02:00
configs drivers/char: remove /dev/kmem for good 2021-05-07 00:26:34 -07:00
debug kdb: Use format-specifiers rather than memset() for padding in kdb_read() 2024-06-16 13:39:59 +02:00
dma dma-mapping: benchmark: avoid needless copy_to_user if benchmark fails 2024-07-18 13:07:35 +02:00
entry entry: Respect changes to system call number by trace_sys_enter() 2024-04-10 16:18:46 +02:00
events perf/core: Fix missing wakeup when waiting for context reference 2024-07-05 09:14:38 +02:00
futex futex: Don't include process MM in futex key on no-MMU 2023-11-20 11:08:13 +01:00
gcov gcov: add support for GCC 14 2024-07-05 09:14:34 +02:00
irq genirq/cpuhotplug, x86/vector: Prevent vector leak during CPU offline 2024-06-16 13:39:52 +02:00
kcsan kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures 2023-07-23 13:47:12 +02:00
livepatch livepatch: Fix missing newline character in klp_resolve_symbols() 2023-11-20 11:08:25 +01:00
locking locking/mutex: Introduce devm_mutex_init() 2024-07-18 13:07:26 +02:00
power PM: suspend: Set mem_sleep_current during kernel command line setup 2024-04-10 16:18:36 +02:00
printk printk: Update @console_may_schedule in console_trylock_spinning() 2024-04-10 16:18:47 +02:00
rcu rcutorture: Fix invalid context warning when enable srcu barrier testing 2024-07-05 09:14:25 +02:00
sched sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write() 2024-06-16 13:39:34 +02:00
time tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device() 2024-07-05 09:14:22 +02:00
trace tracing: Add MODULE_DESCRIPTION() to preemptirq_delay_test 2024-07-05 09:14:36 +02:00
.gitignore .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/rwlock: Provide RT variant 2021-08-17 17:50:51 +02:00
Kconfig.preempt sched/core: Disable CONFIG_SCHED_CORE by default 2021-06-28 22:43:05 +02:00
Makefile futex: Move to kernel/futex/ 2022-12-31 13:14:04 +01:00
acct.c acct: fix potential integer overflow in encode_comp_t() 2022-12-31 13:14:40 +01:00
async.c async: Introduce async_schedule_dev_nocall() 2024-02-23 08:54:25 +01:00
audit.c audit: Send netlink ACK before setting connection in auditd_set 2024-02-23 08:54:37 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:23:06 +02:00
audit_fsnotify.c fsnotify: make allow_dups a property of the group 2024-04-10 16:19:02 +02:00
audit_tree.c fsnotify: pass flags argument to fsnotify_alloc_group() 2024-04-10 16:19:02 +02:00
audit_watch.c fsnotify: pass flags argument to fsnotify_alloc_group() 2024-04-10 16:19:02 +02:00
auditfilter.c ima: Avoid blocking in RCU read-side critical section 2024-07-18 13:07:34 +02:00
auditsc.c audit: fix possible soft lockup in __audit_inode_child() 2023-09-19 12:22:39 +02:00
backtracetest.c
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-05-02 16:24:50 +02:00
capability.c capability: handle idmapped mounts 2021-01-24 14:27:16 +01:00
cfi.c cfi: Fix __cfi_slowpath_diag RCU usage with cpuidle 2022-06-22 14:22:04 +02:00
compat.c sched_getaffinity: don't assume 'cpumask_size()' is fully initialized 2023-04-05 11:24:53 +02:00
configs.c
context_tracking.c
cpu.c cpu/hotplug: Fix dynstate assignment in __cpuhp_setup_state_cpuslocked() 2024-07-05 09:14:48 +02:00
cpu_pm.c PM: cpu: Make notifier chain use a raw_spinlock_t 2021-08-16 18:55:32 +02:00
crash_core.c kernel/crash_core: suppress unknown crashkernel parameter warning 2021-12-29 12:28:49 +01:00
crash_dump.c
cred.c cred: switch to using atomic_long_t 2023-12-20 15:17:37 +01:00
delayacct.c delayacct: Add sysctl to enable at runtime 2021-05-12 11:43:25 +02:00
dma.c
exec_domain.c
exit.c mm: optimize the redundant loop of mm_update_owner_next() 2024-07-18 13:07:32 +02:00
extable.c
fail_function.c kernel/fail_function: fix memory leak with using debugfs_lookup() 2023-03-11 13:57:38 +01:00
fork.c kernel/fork: beware of __put_task_struct() calling context 2023-09-23 11:09:55 +02:00
freezer.c sched: Add get_current_state() 2021-06-18 11:43:08 +02:00
gen_kheaders.sh kheaders: explicitly define file modes for archived headers 2024-07-05 09:14:37 +02:00
groups.c groups: simplify struct group_info allocation 2021-02-26 09:41:03 -08:00
hung_task.c Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
iomem.c
irq_work.c irq_work: Make irq_work_queue() NMI-safe again 2021-06-10 10:00:08 +02:00
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-05 10:46:20 +02:00
kallsyms.c kallsyms: Make kallsyms_on_each_symbol generally available 2023-12-13 18:36:45 +01:00
kcmp.c
kcov.c kcov: don't lose track of remote references during softirqs 2024-07-05 09:14:34 +02:00
kexec.c kernel: kexec: copy user-array safely 2023-11-28 16:56:16 +00:00
kexec_core.c kexec: fix a memory leak in crash_shrink_memory() 2023-07-23 13:46:52 +02:00
kexec_elf.c
kexec_file.c kexec: support purgatories with .text.hot sections 2023-06-21 15:59:14 +02:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2023-04-20 12:13:57 +02:00
kheaders.c kheaders: Use array declaration instead of char 2023-05-11 23:00:17 +09:00
kmod.c modules: add CONFIG_MODPROBE_PATH 2021-05-07 00:26:33 -07:00
kprobes.c x86/ibt,ftrace: Search for __fentry__ location 2024-07-05 09:14:12 +02:00
ksysfs.c kexec: turn all kexec_mutex acquisitions into trylocks 2023-04-20 12:13:57 +02:00
kthread.c exit: Implement kthread_exit 2024-04-10 16:18:55 +02:00
latencytop.c
module-internal.h
module.c NFSD: Remove svc_serv_ops::svo_module 2024-04-10 16:19:01 +02:00
module_signature.c module: harden ELF info handling 2021-01-19 10:24:45 +01:00
module_signing.c module: harden ELF info handling 2021-01-19 10:24:45 +01:00
notifier.c notifier: Remove atomic_notifier_call_chain_robust() 2021-08-16 18:55:32 +02:00
nsproxy.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
padata.c padata: Disable BH when taking works lock on MT path 2024-07-05 09:14:24 +02:00
panic.c panic: Flush kernel log buffer at the end 2024-04-13 13:01:43 +02:00
params.c params: lift param_set_uint_minmax to common code 2021-08-16 14:42:22 +02:00
pid.c kernel/pid.c: implement additional checks upon pidfd_create() parameters 2021-08-10 12:53:07 +02:00
pid_namespace.c zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING 2024-07-05 09:14:24 +02:00
profile.c profiling: fix shift too large makes kernel panic 2022-08-17 14:24:04 +02:00
ptrace.c ptrace: Reimplement PTRACE_KILL by always sending SIGKILL 2022-06-09 10:22:29 +02:00
range.c
reboot.c kernel/reboot: emergency_restart: Set correct system_state 2023-11-28 16:56:31 +00:00
regset.c
relay.c relayfs: fix out-of-bounds access in relay_file_read 2023-05-11 23:00:18 +09:00
resource.c dax/kmem: Fix leak of memory-hotplug resources 2023-03-10 09:40:08 +01:00
resource_kunit.c
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-04-08 14:23:10 +02:00
scftorture.c scftorture: Forgive memory-allocation failure if KASAN 2023-09-23 11:09:55 +02:00
scs.c scs: Release kasan vmalloc poison in scs_free process 2021-11-18 19:16:29 +01:00
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2022-02-16 12:56:38 +01:00
signal.c signal handling: don't use BUG_ON() for debugging 2022-07-21 21:24:42 +02:00
smp.c locking/csd_lock: Change csdlock_debug from early_param to __setup 2022-08-17 14:24:24 +02:00
smpboot.c smpboot: Replace deprecated CPU-hotplug functions. 2021-08-10 14:57:42 +02:00
smpboot.h
softirq.c softirq: Fix suspicious RCU usage in __do_softirq() 2024-06-16 13:39:15 +02:00
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-23 12:03:07 +01:00
stacktrace.c stacktrace: move filter_irq_stacks() to kernel/stacktrace.c 2022-04-13 20:59:28 +02:00
static_call.c static_call: Don't make __static_call_return0 static 2022-04-13 20:59:28 +02:00
static_call_inline.c static_call: Don't make __static_call_return0 static 2022-04-13 20:59:28 +02:00
stop_machine.c stop_machine: Add caller debug info to queue_stop_cpus_work 2021-03-23 16:01:58 +01:00
sys.c getrusage: use sig->stats_lock rather than lock_task_sighand() 2024-03-15 10:48:22 -04:00
sys_ni.c syscalls: fix compat_sys_io_pgetevents_time64 usage 2024-07-05 09:14:50 +02:00
sysctl-test.c kernel/sysctl-test: Remove some casts which are no-longer required 2021-06-23 16:41:24 -06:00
sysctl.c sched/rt: Disallow writing invalid values to sched_rt_period_us 2024-03-01 13:21:43 +01:00
task_work.c kasan: record task_work_add() call stack 2021-04-30 11:20:42 -07:00
taskstats.c
test_kprobes.c
torture.c torture: Fix hang during kthread shutdown phase 2023-08-30 16:18:19 +02:00
tracepoint.c tracepoint: Fix kerneldoc comments 2021-08-16 11:39:51 -04:00
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-27 11:05:35 +01:00
ucount.c ucounts: Handle wrapping in is_ucounts_overlimit 2022-02-23 12:03:20 +01:00
uid16.c
uid16.h
umh.c kernel/umh.c: fix some spelling mistakes 2021-05-07 00:26:34 -07:00
up.c A set of locking related fixes and updates: 2021-05-09 13:07:03 -07:00
user-return-notifier.c
user.c fs/epoll: use a per-cpu counter for user's watches count 2021-09-08 11:50:27 -07:00
user_namespace.c ucounts: Fix systemd LimitNPROC with private users regression 2022-03-08 19:12:42 +01:00
usermode_driver.c Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
utsname.c
utsname_sysctl.c
watch_queue.c kernel: watch_queue: copy user-array safely 2023-11-28 16:56:16 +00:00
watchdog.c watchdog: move softlockup_panic back to early_param 2023-11-28 16:56:28 +00:00
watchdog_hld.c watchdog/perf: more properly prevent false positives with turbo modes 2023-07-23 13:46:52 +02:00
workqueue.c Revert "workqueue: remove unused cancel_work()" 2023-12-08 08:48:03 +01:00
workqueue_internal.h workqueue: Assign a color to barrier work items 2021-08-17 07:49:10 -10:00