WSL2-Linux-Kernel/kernel
Daniel Borkmann 3b1efb196e bpf, maps: flush own entries on perf map release
The behavior of perf event arrays are quite different from all
others as they are tightly coupled to perf event fds, f.e. shown
recently by commit e03e7ee34f ("perf/bpf: Convert perf_event_array
to use struct file") to make refcounting on perf event more robust.
A remaining issue that the current code still has is that since
additions to the perf event array take a reference on the struct
file via perf_event_get() and are only released via fput() (that
cleans up the perf event eventually via perf_event_release_kernel())
when the element is either manually removed from the map from user
space or automatically when the last reference on the perf event
map is dropped. However, this leads us to dangling struct file's
when the map gets pinned after the application owning the perf
event descriptor exits, and since the struct file reference will
in such case only be manually dropped or via pinned file removal,
it leads to the perf event living longer than necessary, consuming
needlessly resources for that time.

Relations between perf event fds and bpf perf event map fds can be
rather complex. F.e. maps can act as demuxers among different perf
event fds that can possibly be owned by different threads and based
on the index selection from the program, events get dispatched to
one of the per-cpu fd endpoints. One perf event fd (or, rather a
per-cpu set of them) can also live in multiple perf event maps at
the same time, listening for events. Also, another requirement is
that perf event fds can get closed from application side after they
have been attached to the perf event map, so that on exit perf event
map will take care of dropping their references eventually. Likewise,
when such maps are pinned, the intended behavior is that a user
application does bpf_obj_get(), puts its fds in there and on exit
when fd is released, they are dropped from the map again, so the map
acts rather as connector endpoint. This also makes perf event maps
inherently different from program arrays as described in more detail
in commit c9da161c65 ("bpf: fix clearing on persistent program
array maps").

To tackle this, map entries are marked by the map struct file that
added the element to the map. And when the last reference to that map
struct file is released from user space, then the tracked entries
are purged from the map. This is okay, because new map struct files
instances resp. frontends to the anon inode are provided via
bpf_map_new_fd() that is called when we invoke bpf_obj_get_user()
for retrieving a pinned map, but also when an initial instance is
created via map_create(). The rest is resolved by the vfs layer
automatically for us by keeping reference count on the map's struct
file. Any concurrent updates on the map slot are fine as well, it
just means that perf_event_fd_array_release() needs to delete less
of its own entires.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-06-15 23:42:57 -07:00
..
bpf bpf, maps: flush own entries on perf map release 2016-06-15 23:42:57 -07:00
configs
debug
events Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-05-25 17:05:40 -07:00
gcov
irq irqchip updates for 4.7-rc1: 2016-06-03 15:05:51 +02:00
livepatch
locking add down_write_killable_nested() 2016-05-26 00:04:58 -04:00
power
printk printk/nmi: flush NMI messages on the system panic 2016-05-20 17:58:30 -07:00
rcu debugobjects: insulate non-fixup logic related to static obj from fixup callbacks 2016-05-19 19:12:14 -07:00
sched Merge branches 'pm-cpufreq-fixes' and 'pm-cpuidle' 2016-06-09 23:49:16 +02:00
time timer: Export destroy_hrtimer_on_stack() 2016-05-31 11:44:08 -07:00
trace bpf, maps: flush own entries on perf map release 2016-06-15 23:42:57 -07:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile ELF/MIPS build fix 2016-05-23 17:04:14 -07:00
acct.c
async.c
audit.c
audit.h
audit_fsnotify.c
audit_tree.c
audit_watch.c
auditfilter.c
auditsc.c
backtracetest.c
bounds.c
capability.c
cgroup.c
cgroup_freezer.c
cgroup_pids.c
compat.c
configs.c
context_tracking.c
cpu.c
cpu_pm.c
cpuset.c cpuset: use static key better and convert to new API 2016-05-19 19:12:14 -07:00
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c wait: allow sys_waitid() to accept __WNOTHREAD/__WCLONE/__WALL 2016-05-23 17:04:14 -07:00
extable.c
fork.c mm: oom_reaper: remove some bloat 2016-05-26 15:35:44 -07:00
freezer.c
futex.c x86: remove more uaccess_32.h complexity 2016-05-22 17:21:27 -07:00
futex_compat.c
groups.c
hung_task.c
irq_work.c
jump_label.c
kallsyms.c
kcmp.c
kcov.c
kexec.c s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres() 2016-05-23 17:04:14 -07:00
kexec_core.c s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres() 2016-05-23 17:04:14 -07:00
kexec_file.c kexec: introduce a protection mechanism for the crashkernel reserved memory 2016-05-23 17:04:14 -07:00
kexec_internal.h
kmod.c
kprobes.c
ksysfs.c
kthread.c
latencytop.c
membarrier.c
memremap.c
module-internal.h
module.c
module_signing.c
notifier.c
nsproxy.c
padata.c kernel/padata.c: hide unused functions 2016-05-19 19:12:14 -07:00
panic.c printk/nmi: flush NMI messages on the system panic 2016-05-20 17:58:30 -07:00
params.c
pid.c remove lots of IS_ERR_VALUE abuses 2016-05-27 15:26:11 -07:00
pid_namespace.c
profile.c
ptrace.c
range.c
reboot.c
relay.c kernel/relay.c: fix potential memory leak 2016-06-09 14:23:11 -07:00
resource.c
seccomp.c
signal.c kernel/signal.c: convert printk(KERN_<LEVEL> ...) to pr_<level>(...) 2016-05-23 17:04:14 -07:00
smp.c
smpboot.c
smpboot.h
softirq.c
stacktrace.c
stop_machine.c
sys.c prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable 2016-05-23 17:04:14 -07:00
sys_ni.c
sysctl.c Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2016-05-25 17:05:40 -07:00
sysctl_binary.c kernel/sysctl_binary.c: use generic UUID library 2016-05-20 17:58:30 -07:00
task_work.c
taskstats.c
test_kprobes.c
torture.c
tracepoint.c
tsacct.c
uid16.c
up.c
user-return-notifier.c
user.c
user_namespace.c
utsname.c
utsname_sysctl.c
watchdog.c
workqueue.c debugobjects: insulate non-fixup logic related to static obj from fixup callbacks 2016-05-19 19:12:14 -07:00
workqueue_internal.h