Syscall raw tracepoints have struct pt_regs pointer as tracepoint's first
argument. After that, reading any of pt_regs fields requires bpf_probe_read(),
even for tp_btf programs. Due to that, PT_REGS_PARMx macros are not usable as
is. This patch adds CO-RE variants of those macros that use BPF_CORE_READ() to
read necessary fields. This provides relocatable architecture-agnostic pt_regs
field accesses.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200313172336.1879637-4-andriin@fb.com
When finding target type candidates, ignore forward declarations, functions,
and other named types of incompatible kind. Not doing this can cause false
errors. See [0] for one such case (due to struct pt_regs forward
declaration).
[0] https://github.com/iovisor/bcc/pull/2806#issuecomment-598543645
Fixes: ddc7c30426 ("libbpf: implement BPF CO-RE offset relocation algorithm")
Reported-by: Wenbo Zhang <ethercflow@gmail.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200313172336.1879637-3-andriin@fb.com
printf() doesn't seem to honor using overwritten stdout/stderr (as part of
stdio hijacking), so ensure all "standard" invocations of printf() do
fprintf(stdout, ...) instead.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200313172336.1879637-2-andriin@fb.com
Andrii Nakryiko reports that sockmap_listen test suite is frequently
failing due to accept() calls erroring out with EAGAIN:
./test_progs:connect_accept_thread:733: accept: Resource temporarily unavailable
connect_accept_thread:FAIL:733
This is because we are using a non-blocking listening TCP socket to
accept() connections without polling on the socket.
While at first switching to blocking mode seems like the right thing to do,
this could lead to test process blocking indefinitely in face of a network
issue, like loopback interface being down, as Andrii pointed out.
Hence, stick to non-blocking mode for TCP listening sockets but with
polling for incoming connection for a limited time before giving up.
Apply this approach to all socket I/O calls in the test suite that we
expect to block indefinitely, that is accept() for TCP and recv() for UDP.
Fixes: 44d28be2b8 ("selftests/bpf: Tests for sockmap/sockhash holding listening sockets")
Reported-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200313161049.677700-1-jakub@cloudflare.com
Commit fe4eb069ed ("bpftool: Use linux/types.h from source tree for
profiler build") added a build dependency on tools/testing/selftests/bpf
to tools/bpf/bpftool. This is suboptimal with respect to a possible
stand-alone build of bpftool.
Fix this by moving tools/testing/selftests/bpf/include/uapi/linux/types.h
to tools/include/uapi/linux/types.h.
This requires an adjustment in the include search path order for the
tests in tools/testing/selftests/bpf so that tools/include/linux/types.h
is selected when building host binaries and
tools/include/uapi/linux/types.h is selected when building bpf binaries.
Verified by compiling bpftool and the bpf selftests on x86_64 with this
change.
Fixes: fe4eb069ed ("bpftool: Use linux/types.h from source tree for profiler build")
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200313113105.6918-1-tklauser@distanz.ch
Sparse reports a warning at __bpf_prog_enter() and __bpf_prog_exit()
warning: context imbalance in __bpf_prog_enter() - wrong count at exit
warning: context imbalance in __bpf_prog_exit() - unexpected unlock
The root cause is the missing annotation at __bpf_prog_enter()
and __bpf_prog_exit()
Add the missing __acquires(RCU) annotation
Add the missing __releases(RCU) annotation
Signed-off-by: Jules Irenge <jbi.octave@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200311010908.42366-2-jbi.octave@gmail.com
When compiling bpftool the following warning is found: "declaration of
'struct bpf_pidns_info' will not be visible outside of this function."
This patch adds struct bpf_pidns_info to type_fwds array to fix this.
Fixes: b4490c5c4e ("bpf: Added new helper bpf_get_ns_current_pid_tgid")
Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200313154650.13366-1-cneirabustos@gmail.com
nanosleep syscall expects pointer to struct timespec, not nanoseconds
directly. Current implementation fulfills its purpose of invoking nanosleep
syscall, but doesn't really provide sleeping capabilities, which can cause
flakiness for tests relying on usleep() to wait for something.
Fixes: ec12a57b822c ("selftests/bpf: Guarantee that useep() calls nanosleep() syscall")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200313061837.3685572-1-andriin@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiri Olsa says:
====================
this patchset adds trampoline and dispatcher objects
to be visible in /proc/kallsyms.
$ sudo cat /proc/kallsyms | tail -20
...
ffffffffa050f000 t bpf_prog_5a2b06eab81b8f51 [bpf]
ffffffffa0511000 t bpf_prog_6deef7357e7b4530 [bpf]
ffffffffa0542000 t bpf_trampoline_13832 [bpf]
ffffffffa0548000 t bpf_prog_96f1b5bf4e4cc6dc_mutex_lock [bpf]
ffffffffa0572000 t bpf_prog_d1c63e29ad82c4ab_bpf_prog1 [bpf]
ffffffffa0585000 t bpf_prog_e314084d332a5338__dissect [bpf]
ffffffffa0587000 t bpf_prog_59785a79eac7e5d2_mutex_unlock [bpf]
ffffffffa0589000 t bpf_prog_d0db6e0cac050163_mutex_lock [bpf]
ffffffffa058d000 t bpf_prog_d8f047721e4d8321_bpf_prog2 [bpf]
ffffffffa05df000 t bpf_trampoline_25637 [bpf]
ffffffffa05e3000 t bpf_prog_d8f047721e4d8321_bpf_prog2 [bpf]
ffffffffa05e5000 t bpf_prog_3b185187f1855c4c [bpf]
ffffffffa05e7000 t bpf_prog_d8f047721e4d8321_bpf_prog2 [bpf]
ffffffffa05eb000 t bpf_prog_93cebb259dd5c4b2_do_sys_open [bpf]
ffffffffa0677000 t bpf_dispatcher_xdp [bpf]
v5 changes:
- keeping just 1 bpf_tree for all the objects and adding flag
to recognize bpf_objects when searching for exception tables [Alexei]
- no need for is_bpf_image_address call in kernel_text_address [Alexei]
- removed the bpf_image tree, because it's no longer needed
v4 changes:
- add trampoline and dispatcher to kallsyms once the it's allocated [Alexei]
- omit the symbols sorting for kallsyms [Alexei]
- small title change in one patch [Song]
- some function renames:
bpf_get_prog_name to bpf_prog_ksym_set_name
bpf_get_prog_addr_region to bpf_prog_ksym_set_addr
- added acks to changelogs
- I checked and there'll be conflict on perftool side with
upcoming changes from Adrian Hunter (text poke events),
so I think it's better if Arnaldo takes the perf changes
via perf tree and we will solve all conflicts there
v3 changes:
- use container_of directly in bpf_get_ksym_start [Daniel]
- add more changelog explanations for ksym addresses [Daniel]
v2 changes:
- omit extra condition in __bpf_ksym_add for sorting code (Andrii)
- rename bpf_kallsyms_tree_ops to bpf_ksym_tree (Andrii)
- expose only executable code in kallsyms (Andrii)
- use full trampoline key as its kallsyms id (Andrii)
- explained the BPF_TRAMP_REPLACE case (Andrii)
- small format changes in bpf_trampoline_link_prog/bpf_trampoline_unlink_prog (Andrii)
- propagate error value in bpf_dispatcher_update and update kallsym if it's successful (Andrii)
- get rid of __always_inline for bpf_ksym_tree callbacks (Andrii)
- added KSYMBOL notification for bpf_image add/removal
- added perf tools changes to properly display trampoline/dispatcher
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that we have all the objects (bpf_prog, bpf_trampoline,
bpf_dispatcher) linked in bpf_tree, there's no need to have
separate bpf_image tree for images.
Reverting the bpf_image tree together with struct bpf_image,
because it's no longer needed.
Also removing bpf_image_alloc function and adding the original
bpf_jit_alloc_exec_page interface instead.
The kernel_text_address function can now rely only on is_bpf_text_address,
because it checks the bpf_tree that contains all the objects.
Keeping bpf_image_ksym_add and bpf_image_ksym_del because they are
useful wrappers with perf's ksymbol interface calls.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200312195610.346362-13-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding dispatchers to kallsyms. It's displayed as
bpf_dispatcher_<NAME>
where NAME is the name of dispatcher.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200312195610.346362-12-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding trampolines to kallsyms. It's displayed as
bpf_trampoline_<ID> [bpf]
where ID is the BTF id of the trampoline function.
Adding bpf_image_ksym_add/del functions that setup
the start/end values and call KSYMBOL perf events
handlers.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200312195610.346362-11-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Separating /proc/kallsyms add/del code and adding bpf_ksym_add/del
functions for that.
Moving bpf_prog_ksym_node_add/del functions to __bpf_ksym_add/del
and changing their argument to 'struct bpf_ksym' object. This way
we can call them for other bpf objects types like trampoline and
dispatcher.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200312195610.346362-10-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding 'prog' bool flag to 'struct bpf_ksym' to mark that
this object belongs to bpf_prog object.
This change allows having bpf_prog objects together with
other types (trampolines and dispatchers) in the single
bpf_tree. It's used when searching for bpf_prog exception
tables by the bpf_prog_ksym_find function, where we need
to get the bpf_prog pointer.
>From now we can safely add bpf_ksym support for trampoline
or dispatcher objects, because we can differentiate them
from bpf_prog objects.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200312195610.346362-9-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Instead of requiring users to do three steps for cleaning up bpf_link, its
anon_inode file, and unused fd, abstract that away into bpf_link_cleanup()
helper. bpf_link_defunct() is removed, as it shouldn't be needed as an
individual operation anymore.
v1->v2:
- keep bpf_link_cleanup() static for now (Daniel).
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200313002128.2028680-1-andriin@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding bpf_ksym_find function that is used bpf bpf address
lookup functions:
__bpf_address_lookup
is_bpf_text_address
while keeping bpf_prog_kallsyms_find to be used only for lookup
of bpf_prog objects (will happen in following changes).
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200312195610.346362-8-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Switch to non-blocking accept and wait for server thread to exit before
proceeding. I noticed that sometimes tcp_rtt server thread failure would
"spill over" into other tests (that would run after tcp_rtt), probably just
because server thread exits much later and tcp_rtt doesn't wait for it.
v1->v2:
- add usleep() while waiting on initial non-blocking accept() (Stanislav);
Fixes: 8a03222f50 ("selftests/bpf: test_progs: fix client/server race in tcp_rtt")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20200311222749.458015-1-andriin@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Moving ksym_tnode list node to 'struct bpf_ksym' object,
so the symbol itself can be chained and used in other
objects like bpf_trampoline and bpf_dispatcher.
We need bpf_ksym object to be linked both in bpf_kallsyms
via lnode for /proc/kallsyms and in bpf_tree via tnode for
bpf address lookup functions like __bpf_address_lookup or
bpf_prog_kallsyms_find.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200312195610.346362-7-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Some implementations of C runtime library won't call nanosleep() syscall from
usleep(). But a bunch of kprobe/tracepoint selftests rely on nanosleep being
called to trigger them. To make this more reliable, "override" usleep
implementation and call nanosleep explicitly.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: Julia Kartseva <hex@fb.com>
Link: https://lore.kernel.org/bpf/20200311185345.3874602-1-andriin@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding lnode list node to 'struct bpf_ksym' object,
so the struct bpf_ksym itself can be chained and used
in other objects like bpf_trampoline and bpf_dispatcher.
Changing iterator to bpf_ksym in bpf_get_kallsym function.
The ksym->start is holding the prog->bpf_func value,
so it's ok to use it as value in bpf_get_kallsym.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200312195610.346362-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In commit 4a3d6c6a6e ("libbpf: Reduce log level for custom section
names"), log level for messages for libbpf_attach_type_by_name() and
libbpf_prog_type_by_name() was downgraded from "info" to "debug". The
latter function, in particular, is used by bpftool when attempting to
load programs, and this change caused bpftool to exit with no hint or
error message when it fails to detect the type of the program to load
(unless "-d" option was provided).
To help users understand why bpftool fails to load the program, let's do
a second run of the function with log level in "debug" mode in case of
failure.
Before:
# bpftool prog load sample_ret0.o /sys/fs/bpf/sample_ret0
# echo $?
255
Or really verbose with -d flag:
# bpftool -d prog load sample_ret0.o /sys/fs/bpf/sample_ret0
libbpf: loading sample_ret0.o
libbpf: section(1) .strtab, size 134, link 0, flags 0, type=3
libbpf: skip section(1) .strtab
libbpf: section(2) .text, size 16, link 0, flags 6, type=1
libbpf: found program .text
libbpf: section(3) .debug_abbrev, size 55, link 0, flags 0, type=1
libbpf: skip section(3) .debug_abbrev
libbpf: section(4) .debug_info, size 75, link 0, flags 0, type=1
libbpf: skip section(4) .debug_info
libbpf: section(5) .rel.debug_info, size 32, link 14, flags 0, type=9
libbpf: skip relo .rel.debug_info(5) for section(4)
libbpf: section(6) .debug_str, size 150, link 0, flags 30, type=1
libbpf: skip section(6) .debug_str
libbpf: section(7) .BTF, size 155, link 0, flags 0, type=1
libbpf: section(8) .BTF.ext, size 80, link 0, flags 0, type=1
libbpf: section(9) .rel.BTF.ext, size 32, link 14, flags 0, type=9
libbpf: skip relo .rel.BTF.ext(9) for section(8)
libbpf: section(10) .debug_frame, size 40, link 0, flags 0, type=1
libbpf: skip section(10) .debug_frame
libbpf: section(11) .rel.debug_frame, size 16, link 14, flags 0, type=9
libbpf: skip relo .rel.debug_frame(11) for section(10)
libbpf: section(12) .debug_line, size 74, link 0, flags 0, type=1
libbpf: skip section(12) .debug_line
libbpf: section(13) .rel.debug_line, size 16, link 14, flags 0, type=9
libbpf: skip relo .rel.debug_line(13) for section(12)
libbpf: section(14) .symtab, size 96, link 1, flags 0, type=2
libbpf: looking for externs among 4 symbols...
libbpf: collected 0 externs total
libbpf: failed to guess program type from ELF section '.text'
libbpf: supported section(type) names are: socket sk_reuseport kprobe/ [...]
After:
# bpftool prog load sample_ret0.o /sys/fs/bpf/sample_ret0
libbpf: failed to guess program type from ELF section '.text'
libbpf: supported section(type) names are: socket sk_reuseport kprobe/ [...]
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200311021205.9755-1-quentin@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding name to 'struct bpf_ksym' object to carry the name
of the symbol for bpf_prog, bpf_trampoline, bpf_dispatcher
objects.
The current benefit is that name is now generated only when
the symbol is added to the list, so we don't need to generate
it every time it's accessed.
The future benefit is that we will have all the bpf objects
symbols represented by struct bpf_ksym.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200312195610.346362-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding 'struct bpf_ksym' object that will carry the
kallsym information for bpf symbol. Adding the start
and end address to begin with. It will be used by
bpf_prog, bpf_trampoline, bpf_dispatcher objects.
The symbol_start/symbol_end values were originally used
to sort bpf_prog objects. For the address displayed in
/proc/kallsyms we are using prog->bpf_func value.
I'm using the bpf_func value for program symbol start
instead of the symbol_start, because it makes no difference
for sorting bpf_prog objects and we can use it directly as
an address to display it in /proc/kallsyms.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200312195610.346362-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Adding bpf_trampoline_ name prefix for DECLARE_BPF_DISPATCHER,
so all the dispatchers have the common name prefix.
And also a small '_' cleanup for bpf_dispatcher_nopfunc function
name.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200312195610.346362-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The kbuild test robot reported compile issue on x86 in one of
the following patches that adds <linux/kallsyms.h> include into
<linux/bpf.h>, which is picked up by init_32.c object.
The problem is that <linux/kallsyms.h> defines global function
is_kernel_text which colides with the static function of the
same name defined in init_32.c:
$ make ARCH=i386
...
>> arch/x86/mm/init_32.c:241:19: error: redefinition of 'is_kernel_text'
static inline int is_kernel_text(unsigned long addr)
^~~~~~~~~~~~~~
In file included from include/linux/bpf.h:21:0,
from include/linux/bpf-cgroup.h:5,
from include/linux/cgroup-defs.h:22,
from include/linux/cgroup.h:28,
from include/linux/hugetlb.h:9,
from arch/x86/mm/init_32.c:18:
include/linux/kallsyms.h:31:19: note: previous definition of 'is_kernel_text' was here
static inline int is_kernel_text(unsigned long addr)
Renaming the init_32.c is_kernel_text function to __is_kernel_text.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200312195610.346362-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Introduce new helper that reuses existing xdp perf_event output
implementation, but can be called from raw_tracepoint programs
that receive 'struct xdp_buff *' as a tracepoint argument.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/158348514556.2239.11050972434793741444.stgit@xdp-tutorial
Carlos Neira says:
====================
Currently bpf_get_current_pid_tgid(), is used to do pid filtering in bcc's
scripts but this helper returns the pid as seen by the root namespace which is
fine when a bcc script is not executed inside a container.
When the process of interest is inside a container, pid filtering will not work
if bpf_get_current_pid_tgid() is used.
This helper addresses this limitation returning the pid as it's seen by the current
namespace where the script is executing.
In the future different pid_ns files may belong to different devices, according to the
discussion between Eric Biederman and Yonghong in 2017 Linux plumbers conference.
To address that situation the helper requires inum and dev_t from /proc/self/ns/pid.
This helper has the same use cases as bpf_get_current_pid_tgid() as it can be
used to do pid filtering even inside a container.
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
New bpf helper bpf_get_ns_current_pid_tgid,
This helper will return pid and tgid from current task
which namespace matches dev_t and inode number provided,
this will allows us to instrument a process inside a container.
Signed-off-by: Carlos Neira <cneirabustos@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200304204157.58695-3-cneirabustos@gmail.com
Minor fixes for bash completion: addition of program name completion for
two subcommands, and correction for program test-runs and map pinning.
The completion for the following commands is fixed or improved:
# bpftool prog run [TAB]
# bpftool prog pin [TAB]
# bpftool map pin [TAB]
# bpftool net attach xdp name [TAB]
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200312184608.12050-3-quentin@isovalent.com
Documentation and interactive help for bpftool have always explained
that the regular handles for programs (id|name|tag|pinned) and maps
(id|name|pinned) can be passed to the utility when attempting to pin
objects (bpftool prog pin PROG / bpftool map pin MAP).
THIS IS A LIE!! The tool actually accepts only ids, as the parsing is
done in do_pin_any() in common.c instead of reusing the parsing
functions that have long been generic for program and map handles.
Instead of fixing the doc, fix the code. It is trivial to reuse the
generic parsing, and to simplify do_pin_any() in the process.
Do not accept to pin multiple objects at the same time with
prog_parse_fds() or map_parse_fds() (this would require a more complex
syntax for passing multiple sysfs paths and validating that they
correspond to the number of e.g. programs we find for a given name or
tag).
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200312184608.12050-2-quentin@isovalent.com
Needs for application BTF being present differs between user-space libbpf needs and kernel
needs. Currently, BTF is mandatory only in kernel only when BPF application is
using STRUCT_OPS. While libbpf itself relies more heavily on presense of BTF:
- for BTF-defined maps;
- for Kconfig externs;
- for STRUCT_OPS as well.
Thus, checks for presence and validness of bpf_object's BPF needs to be
performed separately, which is patch does.
Fixes: 5327644614 ("libbpf: Relax check whether BTF is mandatory")
Reported-by: Michal Rostecki <mrostecki@opensuse.org>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200312185033.736911-1-andriin@fb.com
bpftool-prog-profile requires clang to generate BTF for global variables.
When compared with older clang, skip this command. This is achieved by
adding a new feature, clang-bpf-global-var, to tools/build/feature.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200312182332.3953408-2-songliubraving@fb.com
When compiling bpftool on a system where the /usr/include/asm symlink
doesn't exist (e.g. on an Ubuntu system without gcc-multilib installed),
the build fails with:
CLANG skeleton/profiler.bpf.o
In file included from skeleton/profiler.bpf.c:4:
In file included from /usr/include/linux/bpf.h:11:
/usr/include/linux/types.h:5:10: fatal error: 'asm/types.h' file not found
#include <asm/types.h>
^~~~~~~~~~~~~
1 error generated.
make: *** [Makefile:123: skeleton/profiler.bpf.o] Error 1
This indicates that the build is using linux/types.h from system headers
instead of source tree headers.
To fix this, adjust the clang search path to include the necessary
headers from tools/testing/selftests/bpf/include/uapi and
tools/include/uapi. Also use __bitwise__ instead of __bitwise in
skeleton/profiler.h to avoid clashing with the definition in
tools/testing/selftests/bpf/include/uapi/linux/types.h.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200312130330.32239-1-tklauser@distanz.ch
Libbpf compiles and runs subset of selftests on each PR in its Github mirror
repository. To allow still building up-to-date selftests against outdated
kernel images, add back BPF_F_CURRENT_CPU definitions back.
N.B. BCC's runqslower version ([0]) doesn't need BPF_F_CURRENT_CPU due to use of
locally checked in vmlinux.h, generated against kernel with 1aae4bdd78 ("bpf:
Switch BPF UAPI #define constants used from BPF program side to enums")
applied.
[0] https://github.com/iovisor/bcc/pull/2809
Fixes: 367d82f17e (" tools/runqslower: Drop copy/pasted BPF_F_CURRENT_CPU definiton")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200311043010.530620-1-andriin@fb.com
fmod_ret progs are emitted as:
start = __bpf_prog_enter();
call fmod_ret
*(u64 *)(rbp - 8) = rax
__bpf_prog_exit(, start);
test eax, eax
jne do_fexit
That 'test eax, eax' is working by accident. The compiler is free to use rax
inside __bpf_prog_exit() or inside functions that __bpf_prog_exit() is calling.
Which caused "test_progs -t modify_return" to sporadically fail depending on
compiler version and kconfig. Fix it by using 'cmp [rbp - 8], 0' instead of
'test eax, eax'.
Fixes: ae24082331 ("bpf: Introduce BPF_MODIFY_RETURN")
Reported-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200311003906.3643037-1-ast@kernel.org
Add bpf_link_new_file() API for cases when we need to ensure anon_inode is
successfully created before we proceed with expensive BPF program attachment
procedure, which will require equally (if not more so) expensive and
potentially failing compensation detachment procedure just because anon_inode
creation failed. This API allows to simplify code by ensuring first that
anon_inode is created and after BPF program is attached proceed with
fd_install() that can't fail.
After anon_inode file is created, link can't be just kfree()'d anymore,
because its destruction will be performed by deferred file_operations->release
call. For this, bpf_link API required specifying two separate operations:
release() and dealloc(), former performing detachment only, while the latter
frees memory used by bpf_link itself. dealloc() needs to be specified, because
struct bpf_link is frequently embedded into link type-specific container
struct (e.g., struct bpf_raw_tp_link), so bpf_link itself doesn't know how to
properly free the memory. In case when anon_inode file was successfully
created, but subsequent BPF attachment failed, bpf_link needs to be marked as
"defunct", so that file's release() callback will perform only memory
deallocation, but no detachment.
Convert raw tracepoint and tracing attachment to new API and eliminate
detachment from error handling path.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200309231051.1270337-1-andriin@fb.com
_bpftool_get_map_names => _bpftool_get_prog_names for prog-attach|detach.
Fixes: 99f9863a0c ("bpftool: Match maps by name")
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200309173218.2739965-5-songliubraving@fb.com
With fentry/fexit programs, it is possible to profile BPF program with
hardware counters. Introduce bpftool "prog profile", which measures key
metrics of a BPF program.
bpftool prog profile command creates per-cpu perf events. Then it attaches
fentry/fexit programs to the target BPF program. The fentry program saves
perf event value to a map. The fexit program reads the perf event again,
and calculates the difference, which is the instructions/cycles used by
the target program.
Example input and output:
./bpftool prog profile id 337 duration 3 cycles instructions llc_misses
4228 run_cnt
3403698 cycles (84.08%)
3525294 instructions # 1.04 insn per cycle (84.05%)
13 llc_misses # 3.69 LLC misses per million isns (83.50%)
This command measures cycles and instructions for BPF program with id
337 for 3 seconds. The program has triggered 4228 times. The rest of the
output is similar to perf-stat. In this example, the counters were only
counting ~84% of the time because of time multiplexing of perf counters.
Note that, this approach measures cycles and instructions in very small
increments. So the fentry/fexit programs introduce noticeable errors to
the measurement results.
The fentry/fexit programs are generated with BPF skeletons. Therefore, we
build bpftool twice. The first time _bpftool is built without skeletons.
Then, _bpftool is used to generate the skeletons. The second time, bpftool
is built with skeletons.
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200309173218.2739965-2-songliubraving@fb.com
Add Jakub and myself as maintainers for sockmap related code.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200309111243.6982-13-lmb@cloudflare.com
Remove the guard that disables UDP tests now that sockmap
has support for them.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200309111243.6982-12-lmb@cloudflare.com
Expand the TCP sockmap test suite to also check UDP sockets.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200309111243.6982-11-lmb@cloudflare.com
Most tests for TCP sockmap can be adapted to UDP sockmap if the
listen call is skipped. Rename listen_loopback, etc. to socket_loopback
and skip listen() for SOCK_DGRAM.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200309111243.6982-10-lmb@cloudflare.com
Add basic psock hooks for UDP sockets. This allows adding and
removing sockets, as well as automatic removal on unhash and close.
Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200309111243.6982-8-lmb@cloudflare.com