Граф коммитов

21990 Коммитов

Автор SHA1 Сообщение Дата
Song Liu fa28dcb82a bpf: Introduce helper bpf_get_task_stack()
Introduce helper bpf_get_task_stack(), which dumps stack trace of given
task. This is different to bpf_get_stack(), which gets stack track of
current task. One potential use case of bpf_get_task_stack() is to call
it from bpf_iter__task and dump all /proc/<pid>/stack to a seq_file.

bpf_get_task_stack() uses stack_trace_save_tsk() instead of
get_perf_callchain() for kernel stack. The benefit of this choice is that
stack_trace_save_tsk() doesn't require changes in arch/. The downside of
using stack_trace_save_tsk() is that stack_trace_save_tsk() dumps the
stack trace to unsigned long array. For 32-bit systems, we need to
translate it to u64 array.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200630062846.664389-3-songliubraving@fb.com
2020-07-01 08:23:19 -07:00
Andrii Nakryiko 8c18311067 selftests/bpf: Add byte swapping selftest
Add simple selftest validating byte swap built-ins and compile-time macros.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200630152125.3631920-3-andriin@fb.com
2020-07-01 09:06:12 +02:00
Andrii Nakryiko 30ad688094 libbpf: Make bpf_endian co-exist with vmlinux.h
Make bpf_endian.h compatible with vmlinux.h. It is a frequent request from
users wanting to use bpf_endian.h in their BPF applications using CO-RE and
vmlinux.h.

To achieve that, re-implement byte swap macros and drop all the header
includes. This way it can be used both with linux header includes, as well as
with a vmlinux.h.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200630152125.3631920-2-andriin@fb.com
2020-07-01 09:06:12 +02:00
Andrii Nakryiko ca4db6389d selftests/bpf: Allow substituting custom vmlinux.h for selftests build
Similarly to bpftool Makefile, allow to specify custom location of vmlinux.h
to be used during the build. This allows simpler testing setups with
checked-in pre-generated vmlinux.h.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200630004759.521530-2-andriin@fb.com
2020-06-30 15:50:11 -07:00
Andrii Nakryiko ec23eb7056 tools/bpftool: Allow substituting custom vmlinux.h for the build
In some build contexts (e.g., Travis CI build for outdated kernel), vmlinux.h,
generated from available kernel, doesn't contain all the types necessary for
BPF program compilation. For such set up, the most maintainable way to deal
with this problem is to keep pre-generated (almost up-to-date) vmlinux.h
checked in and use it for compilation purposes. bpftool after that can deal
with kernel missing some of the features in runtime with no problems.

To that effect, allow to specify path to custom vmlinux.h to bpftool's
Makefile with VMLINUX_H variable.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200630004759.521530-1-andriin@fb.com
2020-06-30 15:50:11 -07:00
Andrii Nakryiko 5712174c5c selftests/bpf: Test auto-load disabling logic for BPF programs
Validate that BPF object with broken (in multiple ways) BPF program can still
be successfully loaded, if that broken BPF program is disabled.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200625232629.3444003-3-andriin@fb.com
2020-06-28 10:06:53 -07:00
Andrii Nakryiko d929758101 libbpf: Support disabling auto-loading BPF programs
Currently, bpf_object__load() (and by induction skeleton's load), will always
attempt to prepare, relocate, and load into kernel every single BPF program
found inside the BPF object file. This is often convenient and the right thing
to do and what users expect.

But there are plenty of cases (especially with BPF development constantly
picking up the pace), where BPF application is intended to work with old
kernels, with potentially reduced set of features. But on kernels supporting
extra features, it would like to take a full advantage of them, by employing
extra BPF program. This could be a choice of using fentry/fexit over
kprobe/kretprobe, if kernel is recent enough and is built with BTF. Or BPF
program might be providing optimized bpf_iter-based solution that user-space
might want to use, whenever available. And so on.

With libbpf and BPF CO-RE in particular, it's advantageous to not have to
maintain two separate BPF object files to achieve this. So to enable such use
cases, this patch adds ability to request not auto-loading chosen BPF
programs. In such case, libbpf won't attempt to perform relocations (which
might fail due to old kernel), won't try to resolve BTF types for
BTF-aware (tp_btf/fentry/fexit/etc) program types, because BTF might not be
present, and so on. Skeleton will also automatically skip auto-attachment step
for such not loaded BPF programs.

Overall, this feature allows to simplify development and deployment of
real-world BPF applications with complicated compatibility requirements.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200625232629.3444003-2-andriin@fb.com
2020-06-28 10:06:53 -07:00
Tobias Klauser 16d37ee3d2 tools, bpftool: Define attach_type_name array only once
Define attach_type_name in common.c instead of main.h so it is only
defined once. This leads to a slight decrease in the binary size of
bpftool.

Before:

   text	   data	    bss	    dec	    hex	filename
 399024	  11168	1573160	1983352	 1e4378	bpftool

After:

   text	   data	    bss	    dec	    hex	filename
 398256	  10880	1573160	1982296	 1e3f58	bpftool

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200624143154.13145-1-tklauser@distanz.ch
2020-06-25 16:06:01 +02:00
Tobias Klauser 9023497d87 tools, bpftool: Define prog_type_name array only once
Define prog_type_name in prog.c instead of main.h so it is only defined
once. This leads to a slight decrease in the binary size of bpftool.

Before:

   text	   data	    bss	    dec	    hex	filename
 401032	  11936	1573160	1986128	 1e4e50	bpftool

After:

   text	   data	    bss	    dec	    hex	filename
 399024	  11168	1573160	1983352	 1e4378	bpftool

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200624143124.12914-1-tklauser@distanz.ch
2020-06-25 16:06:01 +02:00
Yonghong Song cfcd75f9bf selftests/bpf: Add tcp/udp iterator programs to selftests
Added tcp{4,6} and udp{4,6} bpf programs into test_progs
selftest so that they at least can load successfully.
  $ ./test_progs -n 3
  ...
  #3/7 tcp4:OK
  #3/8 tcp6:OK
  #3/9 udp4:OK
  #3/10 udp6:OK
  ...
  #3 bpf_iter:OK
  Summary: 1/16 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230823.3989372-1-yhs@fb.com
2020-06-24 18:38:00 -07:00
Yonghong Song ace6d6ec9e selftests/bpf: Implement sample udp/udp6 bpf_iter programs
On my VM, I got identical results between /proc/net/udp[6] and
the udp{4,6} bpf iterator.

For udp6:
  $ cat /sys/fs/bpf/p1
    sl  local_address                         remote_address                        st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode ref pointer drops
   1405: 000080FE00000000FF7CC4D0D9EFE4FE:0222 00000000000000000000000000000000:0000 07 00000000:00000000 00:00000000 00000000   193        0 19183 2 0000000029eab111 0
  $ cat /proc/net/udp6
    sl  local_address                         remote_address                        st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode ref pointer drops
   1405: 000080FE00000000FF7CC4D0D9EFE4FE:0222 00000000000000000000000000000000:0000 07 00000000:00000000 00:00000000 00000000   193        0 19183 2 0000000029eab111 0

For udp4:
  $ cat /sys/fs/bpf/p4
    sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode ref pointer drops
   2007: 00000000:1F90 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 72540 2 000000004ede477a 0
  $ cat /proc/net/udp
    sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode ref pointer drops
   2007: 00000000:1F90 00000000:0000 07 00000000:00000000 00:00000000 00000000     0        0 72540 2 000000004ede477a 0

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230822.3989299-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Yonghong Song 2767c97765 selftests/bpf: Implement sample tcp/tcp6 bpf_iter programs
In my VM, I got identical result compared to /proc/net/{tcp,tcp6}.
For tcp6:
  $ cat /proc/net/tcp6
    sl  local_address                         remote_address                        st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode
     0: 00000000000000000000000000000000:0016 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000001 00000000     0        0 17955 1 000000003eb3102e 100 0 0 10 0

  $ cat /sys/fs/bpf/p1
    sl  local_address                         remote_address                        st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode
     0: 00000000000000000000000000000000:0016 00000000000000000000000000000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 17955 1 000000003eb3102e 100 0 0 10 0

For tcp:
  $ cat /proc/net/tcp
  sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode
   0: 00000000:0016 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 2666 1 000000007152e43f 100 0 0 10 0
  $ cat /sys/fs/bpf/p2
  sl  local_address                         remote_address                        st tx_queue rx_queue tr tm->when retrnsmt   uid  timeout inode
   1: 00000000:0016 00000000:0000 0A 00000000:00000000 00:00000000 00000000     0        0 2666 1 000000007152e43f 100 0 0 10 0

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230820.3989165-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Yonghong Song 3982bfaaef selftests/bpf: Add more common macros to bpf_tracing_net.h
These newly added macros will be used in subsequent bpf iterator
tcp{4,6} and udp{4,6} programs.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230819.3989050-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Yonghong Song 647b502e3d selftests/bpf: Refactor some net macros to bpf_tracing_net.h
Refactor bpf_iter_ipv6_route.c and bpf_iter_netlink.c
so net macros, originally from various include/linux header
files, are moved to a new header file
bpf_tracing_net.h. The goal is to improve reuse so
networking tracing programs do not need to
copy these macros every time they use them.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230817.3988962-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Yonghong Song 84544f5637 selftests/bpf: Move newer bpf_iter_* type redefining to a new header file
Commit b9f4c01f3e ("selftest/bpf: Make bpf_iter selftest
compilable against old vmlinux.h") and Commit dda18a5c0b
("selftests/bpf: Convert bpf_iter_test_kern{3, 4}.c to define
own bpf_iter_meta") redefined newly introduced types
in bpf programs so the bpf program can still compile
properly with old kernels although loading may fail.

Since this patch set introduced new types and the same
workaround is needed, so let us move the workaround
to a separate header file so they do not clutter
bpf programs.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230816.3988656-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Yonghong Song 0d4fad3e57 bpf: Add bpf_skc_to_udp6_sock() helper
The helper is used in tracing programs to cast a socket
pointer to a udp6_sock pointer.
The return value could be NULL if the casting is illegal.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/bpf/20200623230815.3988481-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Yonghong Song 478cfbdf5f bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpers
Three more helpers are added to cast a sock_common pointer to
an tcp_sock, tcp_timewait_sock or a tcp_request_sock for
tracing programs.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230811.3988277-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Yonghong Song af7ec13833 bpf: Add bpf_skc_to_tcp6_sock() helper
The helper is used in tracing programs to cast a socket
pointer to a tcp6_sock pointer.
The return value could be NULL if the casting is illegal.

A new helper return type RET_PTR_TO_BTF_ID_OR_NULL is added
so the verifier is able to deduce proper return types for the helper.

Different from the previous BTF_ID based helpers,
the bpf_skc_to_tcp6_sock() argument can be several possible
btf_ids. More specifically, all possible socket data structures
with sock_common appearing in the first in the memory layout.
This patch only added socket types related to tcp and udp.

All possible argument btf_id and return value btf_id
for helper bpf_skc_to_tcp6_sock() are pre-calculcated and
cached. In the future, it is even possible to precompute
these btf_id's at kernel build time.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200623230809.3988195-1-yhs@fb.com
2020-06-24 18:37:59 -07:00
Dmitry Yakunin f9bcf96837 bpf: Add SO_KEEPALIVE and related options to bpf_setsockopt
This patch adds support of SO_KEEPALIVE flag and TCP related options
to bpf_setsockopt() routine. This is helpful if we want to enable or tune
TCP keepalive for applications which don't do it in the userspace code.

v3:
  - update kernel-doc in uapi (Nikita Vetoshkin <nekto0n@yandex-team.ru>)

v4:
  - update kernel-doc in tools too (Alexei Starovoitov)
  - add test to selftests (Alexei Starovoitov)

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200620153052.9439-3-zeil@yandex-team.ru
2020-06-24 11:21:03 -07:00
Alexei Starovoitov fea549b030 selftests/bpf: Workaround for get_stack_rawtp test.
./test_progs-no_alu32 -t get_stack_raw_tp
fails due to:

52: (85) call bpf_get_stack#67
53: (bf) r8 = r0
54: (bf) r1 = r8
55: (67) r1 <<= 32
56: (c7) r1 s>>= 32
; if (usize < 0)
57: (c5) if r1 s< 0x0 goto pc+26
 R0=inv(id=0,smax_value=800) R1_w=inv(id=0,umax_value=800,var_off=(0x0; 0x3ff)) R6=ctx(id=0,off=0,imm=0) R7=map_value(id=0,off=0,ks=4,vs=1600,imm=0) R8_w=inv(id=0,smax_value=800) R9=inv800
; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
58: (1f) r9 -= r8
; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
59: (bf) r2 = r7
60: (0f) r2 += r1
regs=1 stack=0 before 52: (85) call bpf_get_stack#67
; ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
61: (bf) r1 = r6
62: (bf) r3 = r9
63: (b7) r4 = 0
64: (85) call bpf_get_stack#67
 R0=inv(id=0,smax_value=800) R1_w=ctx(id=0,off=0,imm=0) R2_w=map_value(id=0,off=0,ks=4,vs=1600,umax_value=800,var_off=(0x0; 0x3ff),s32_max_value=1023,u32_max_value=1023) R3_w=inv(id=0,umax_value=9223372036854776608)
R3 unbounded memory access, use 'var &= const' or 'if (var < const)'

In the C code:
  usize = bpf_get_stack(ctx, raw_data, max_len, BPF_F_USER_STACK);
  if (usize < 0)
          return 0;

  ksize = bpf_get_stack(ctx, raw_data + usize, max_len - usize, 0);
  if (ksize < 0)
          return 0;

We used to have problem with pointer arith in R2.
Now it's a problem with two integers in R3.
'if (usize < 0)' is comparing R1 and makes it [0,800], but R8 stays [-inf,800].
Both registers represent the same 'usize' variable.
Then R9 -= R8 is doing 800 - [-inf, 800]
so the result of "max_len - usize" looks unbounded to the verifier while
it's obvious in C code that "max_len - usize" should be [0, 800].

To workaround the problem convert ksize and usize variables from int to long.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-24 11:10:59 -07:00
Andrii Nakryiko 192b6638ee libbpf: Prevent loading vmlinux BTF twice
Prevent loading/parsing vmlinux BTF twice in some cases: for CO-RE relocations
and for BTF-aware hooks (tp_btf, fentry/fexit, etc).

Fixes: a6ed02cac6 ("libbpf: Load btf_vmlinux only once per object.")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200624043805.1794620-1-andriin@fb.com
2020-06-24 16:08:17 +02:00
Colin Ian King 135c783f47 libbpf: Fix spelling mistake "kallasyms" -> "kallsyms"
There is a spelling mistake in a pr_warn message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200623084207.149253-1-colin.king@canonical.com
2020-06-24 15:53:53 +02:00
Quentin Monnet 54b66c2255 tools, bpftool: Fix variable shadowing in emit_obj_refs_json()
Building bpftool yields the following complaint:

    pids.c: In function 'emit_obj_refs_json':
    pids.c:175:80: warning: declaration of 'json_wtr' shadows a global declaration [-Wshadow]
      175 | void emit_obj_refs_json(struct obj_refs_table *table, __u32 id, json_writer_t *json_wtr)
          |                                                                 ~~~~~~~~~~~~~~~^~~~~~~~
    In file included from pids.c:11:
    main.h:141:23: note: shadowed declaration is here
      141 | extern json_writer_t *json_wtr;
          |                       ^~~~~~~~

Let's rename the variable.

v2:
- Rename the variable instead of calling the global json_wtr directly.

Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200623213600.16643-1-quentin@isovalent.com
2020-06-24 15:46:28 +02:00
Alexei Starovoitov 4e608675e7 Merge up to bpf_probe_read_kernel_str() fix into bpf-next 2020-06-23 15:33:41 -07:00
Tobias Klauser 9d9d8cc21e tools, bpftool: Correctly evaluate $(BUILD_BPF_SKELS) in Makefile
Currently, if the clang-bpf-co-re feature is not available, the build
fails with e.g.

  CC       prog.o
prog.c:1462:10: fatal error: profiler.skel.h: No such file or directory
 1462 | #include "profiler.skel.h"
      |          ^~~~~~~~~~~~~~~~~

This is due to the fact that the BPFTOOL_WITHOUT_SKELETONS macro is not
defined, despite BUILD_BPF_SKELS not being set. Fix this by correctly
evaluating $(BUILD_BPF_SKELS) when deciding on whether to add
-DBPFTOOL_WITHOUT_SKELETONS to CFLAGS.

Fixes: 05aca6da3b ("tools/bpftool: Generalize BPF skeleton support and generate vmlinux.h")
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200623103710.10370-1-tklauser@distanz.ch
2020-06-24 00:06:46 +02:00
John Fastabend 2fde1747c9 selftests/bpf: Add variable-length data concat pattern less than test
Extend original variable-length tests with a case to catch a common
existing pattern of testing for < 0 for errors. Note because
verifier also tracks upper bounds and we know it can not be greater
than MAX_LEN here we can skip upper bound check.

In ALU64 enabled compilation converting from long->int return types
in probe helpers results in extra instruction pattern, <<= 32, s >>= 32.
The trade-off is the non-ALU64 case works. If you really care about
every extra insn (XDP case?) then you probably should be using original
int type.

In addition adding a sext insn to bpf might help the verifier in the
general case to avoid these types of tricks.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-3-andriin@fb.com
2020-06-24 00:04:36 +02:00
Andrii Nakryiko 5e85c6bb8e selftests/bpf: Add variable-length data concatenation pattern test
Add selftest that validates variable-length data reading and concatentation
with one big shared data array. This is a common pattern in production use for
monitoring and tracing applications, that potentially can read a lot of data,
but overall read much less. Such pattern allows to determine precisely what
amount of data needs to be sent over perfbuf/ringbuf and maximize efficiency.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-2-andriin@fb.com
2020-06-24 00:04:36 +02:00
Andrii Nakryiko bdb7b79b4c bpf: Switch most helper return values from 32-bit int to 64-bit long
Switch most of BPF helper definitions from returning int to long. These
definitions are coming from comments in BPF UAPI header and are used to
generate bpf_helper_defs.h (under libbpf) to be later included and used from
BPF programs.

In actual in-kernel implementation, all the helpers are defined as returning
u64, but due to some historical reasons, most of them are actually defined as
returning int in UAPI (usually, to return 0 on success, and negative value on
error).

This actually causes Clang to quite often generate sub-optimal code, because
compiler believes that return value is 32-bit, and in a lot of cases has to be
up-converted (usually with a pair of 32-bit bit shifts) to 64-bit values,
before they can be used further in BPF code.

Besides just "polluting" the code, these 32-bit shifts quite often cause
problems for cases in which return value matters. This is especially the case
for the family of bpf_probe_read_str() functions. There are few other similar
helpers (e.g., bpf_read_branch_records()), in which return value is used by
BPF program logic to record variable-length data and process it. For such
cases, BPF program logic carefully manages offsets within some array or map to
read variable-length data. For such uses, it's crucial for BPF verifier to
track possible range of register values to prove that all the accesses happen
within given memory bounds. Those extraneous zero-extending bit shifts,
inserted by Clang (and quite often interleaved with other code, which makes
the issues even more challenging and sometimes requires employing extra
per-variable compiler barriers), throws off verifier logic and makes it mark
registers as having unknown variable offset. We'll study this pattern a bit
later below.

Another common pattern is to check return of BPF helper for non-zero state to
detect error conditions and attempt alternative actions in such case. Even in
this simple and straightforward case, this 32-bit vs BPF's native 64-bit mode
quite often leads to sub-optimal and unnecessary extra code. We'll look at
this pattern as well.

Clang's BPF target supports two modes of code generation: ALU32, in which it
is capable of using lower 32-bit parts of registers, and no-ALU32, in which
only full 64-bit registers are being used. ALU32 mode somewhat mitigates the
above described problems, but not in all cases.

This patch switches all the cases in which BPF helpers return 0 or negative
error from returning int to returning long. It is shown below that such change
in definition leads to equivalent or better code. No-ALU32 mode benefits more,
but ALU32 mode doesn't degrade or still gets improved code generation.

Another class of cases switched from int to long are bpf_probe_read_str()-like
helpers, which encode successful case as non-negative values, while still
returning negative value for errors.

In all of such cases, correctness is preserved due to two's complement
encoding of negative values and the fact that all helpers return values with
32-bit absolute value. Two's complement ensures that for negative values
higher 32 bits are all ones and when truncated, leave valid negative 32-bit
value with the same value. Non-negative values have upper 32 bits set to zero
and similarly preserve value when high 32 bits are truncated. This means that
just casting to int/u32 is correct and efficient (and in ALU32 mode doesn't
require any extra shifts).

To minimize the chances of regressions, two code patterns were investigated,
as mentioned above. For both patterns, BPF assembly was analyzed in
ALU32/NO-ALU32 compiler modes, both with current 32-bit int return type and
new 64-bit long return type.

Case 1. Variable-length data reading and concatenation. This is quite
ubiquitous pattern in tracing/monitoring applications, reading data like
process's environment variables, file path, etc. In such case, many pieces of
string-like variable-length data are read into a single big buffer, and at the
end of the process, only a part of array containing actual data is sent to
user-space for further processing. This case is tested in test_varlen.c
selftest (in the next patch). Code flow is roughly as follows:

  void *payload = &sample->payload;
  u64 len;

  len = bpf_probe_read_kernel_str(payload, MAX_SZ1, &source_data1);
  if (len <= MAX_SZ1) {
      payload += len;
      sample->len1 = len;
  }
  len = bpf_probe_read_kernel_str(payload, MAX_SZ2, &source_data2);
  if (len <= MAX_SZ2) {
      payload += len;
      sample->len2 = len;
  }
  /* and so on */
  sample->total_len = payload - &sample->payload;
  /* send over, e.g., perf buffer */

There could be two variations with slightly different code generated: when len
is 64-bit integer and when it is 32-bit integer. Both variations were analysed.
BPF assembly instructions between two successive invocations of
bpf_probe_read_kernel_str() were used to check code regressions. Results are
below, followed by short analysis. Left side is using helpers with int return
type, the right one is after the switch to long.

ALU32 + INT                                ALU32 + LONG
===========                                ============

64-BIT (13 insns):                         64-BIT (10 insns):
------------------------------------       ------------------------------------
  17:   call 115                             17:   call 115
  18:   if w0 > 256 goto +9 <LBB0_4>         18:   if r0 > 256 goto +6 <LBB0_4>
  19:   w1 = w0                              19:   r1 = 0 ll
  20:   r1 <<= 32                            21:   *(u64 *)(r1 + 0) = r0
  21:   r1 s>>= 32                           22:   r6 = 0 ll
  22:   r2 = 0 ll                            24:   r6 += r0
  24:   *(u64 *)(r2 + 0) = r1              00000000000000c8 <LBB0_4>:
  25:   r6 = 0 ll                            25:   r1 = r6
  27:   r6 += r1                             26:   w2 = 256
00000000000000e0 <LBB0_4>:                   27:   r3 = 0 ll
  28:   r1 = r6                              29:   call 115
  29:   w2 = 256
  30:   r3 = 0 ll
  32:   call 115

32-BIT (11 insns):                         32-BIT (12 insns):
------------------------------------       ------------------------------------
  17:   call 115                             17:   call 115
  18:   if w0 > 256 goto +7 <LBB1_4>         18:   if w0 > 256 goto +8 <LBB1_4>
  19:   r1 = 0 ll                            19:   r1 = 0 ll
  21:   *(u32 *)(r1 + 0) = r0                21:   *(u32 *)(r1 + 0) = r0
  22:   w1 = w0                              22:   r0 <<= 32
  23:   r6 = 0 ll                            23:   r0 >>= 32
  25:   r6 += r1                             24:   r6 = 0 ll
00000000000000d0 <LBB1_4>:                   26:   r6 += r0
  26:   r1 = r6                            00000000000000d8 <LBB1_4>:
  27:   w2 = 256                             27:   r1 = r6
  28:   r3 = 0 ll                            28:   w2 = 256
  30:   call 115                             29:   r3 = 0 ll
                                             31:   call 115

In ALU32 mode, the variant using 64-bit length variable clearly wins and
avoids unnecessary zero-extension bit shifts. In practice, this is even more
important and good, because BPF code won't need to do extra checks to "prove"
that payload/len are within good bounds.

32-bit len is one instruction longer. Clang decided to do 64-to-32 casting
with two bit shifts, instead of equivalent `w1 = w0` assignment. The former
uses extra register. The latter might potentially lose some range information,
but not for 32-bit value. So in this case, verifier infers that r0 is [0, 256]
after check at 18:, and shifting 32 bits left/right keeps that range intact.
We should probably look into Clang's logic and see why it chooses bitshifts
over sub-register assignments for this.

NO-ALU32 + INT                             NO-ALU32 + LONG
==============                             ===============

64-BIT (14 insns):                         64-BIT (10 insns):
------------------------------------       ------------------------------------
  17:   call 115                             17:   call 115
  18:   r0 <<= 32                            18:   if r0 > 256 goto +6 <LBB0_4>
  19:   r1 = r0                              19:   r1 = 0 ll
  20:   r1 >>= 32                            21:   *(u64 *)(r1 + 0) = r0
  21:   if r1 > 256 goto +7 <LBB0_4>         22:   r6 = 0 ll
  22:   r0 s>>= 32                           24:   r6 += r0
  23:   r1 = 0 ll                          00000000000000c8 <LBB0_4>:
  25:   *(u64 *)(r1 + 0) = r0                25:   r1 = r6
  26:   r6 = 0 ll                            26:   r2 = 256
  28:   r6 += r0                             27:   r3 = 0 ll
00000000000000e8 <LBB0_4>:                   29:   call 115
  29:   r1 = r6
  30:   r2 = 256
  31:   r3 = 0 ll
  33:   call 115

32-BIT (13 insns):                         32-BIT (13 insns):
------------------------------------       ------------------------------------
  17:   call 115                             17:   call 115
  18:   r1 = r0                              18:   r1 = r0
  19:   r1 <<= 32                            19:   r1 <<= 32
  20:   r1 >>= 32                            20:   r1 >>= 32
  21:   if r1 > 256 goto +6 <LBB1_4>         21:   if r1 > 256 goto +6 <LBB1_4>
  22:   r2 = 0 ll                            22:   r2 = 0 ll
  24:   *(u32 *)(r2 + 0) = r0                24:   *(u32 *)(r2 + 0) = r0
  25:   r6 = 0 ll                            25:   r6 = 0 ll
  27:   r6 += r1                             27:   r6 += r1
00000000000000e0 <LBB1_4>:                 00000000000000e0 <LBB1_4>:
  28:   r1 = r6                              28:   r1 = r6
  29:   r2 = 256                             29:   r2 = 256
  30:   r3 = 0 ll                            30:   r3 = 0 ll
  32:   call 115                             32:   call 115

In NO-ALU32 mode, for the case of 64-bit len variable, Clang generates much
superior code, as expected, eliminating unnecessary bit shifts. For 32-bit
len, code is identical.

So overall, only ALU-32 32-bit len case is more-or-less equivalent and the
difference stems from internal Clang decision, rather than compiler lacking
enough information about types.

Case 2. Let's look at the simpler case of checking return result of BPF helper
for errors. The code is very simple:

  long bla;
  if (bpf_probe_read_kenerl(&bla, sizeof(bla), 0))
      return 1;
  else
      return 0;

ALU32 + CHECK (9 insns)                    ALU32 + CHECK (9 insns)
====================================       ====================================
  0:    r1 = r10                             0:    r1 = r10
  1:    r1 += -8                             1:    r1 += -8
  2:    w2 = 8                               2:    w2 = 8
  3:    r3 = 0                               3:    r3 = 0
  4:    call 113                             4:    call 113
  5:    w1 = w0                              5:    r1 = r0
  6:    w0 = 1                               6:    w0 = 1
  7:    if w1 != 0 goto +1 <LBB2_2>          7:    if r1 != 0 goto +1 <LBB2_2>
  8:    w0 = 0                               8:    w0 = 0
0000000000000048 <LBB2_2>:                 0000000000000048 <LBB2_2>:
  9:    exit                                 9:    exit

Almost identical code, the only difference is the use of full register
assignment (r1 = r0) vs half-registers (w1 = w0) in instruction #5. On 32-bit
architectures, new BPF assembly might be slightly less optimal, in theory. But
one can argue that's not a big issue, given that use of full registers is
still prevalent (e.g., for parameter passing).

NO-ALU32 + CHECK (11 insns)                NO-ALU32 + CHECK (9 insns)
====================================       ====================================
  0:    r1 = r10                             0:    r1 = r10
  1:    r1 += -8                             1:    r1 += -8
  2:    r2 = 8                               2:    r2 = 8
  3:    r3 = 0                               3:    r3 = 0
  4:    call 113                             4:    call 113
  5:    r1 = r0                              5:    r1 = r0
  6:    r1 <<= 32                            6:    r0 = 1
  7:    r1 >>= 32                            7:    if r1 != 0 goto +1 <LBB2_2>
  8:    r0 = 1                               8:    r0 = 0
  9:    if r1 != 0 goto +1 <LBB2_2>        0000000000000048 <LBB2_2>:
 10:    r0 = 0                               9:    exit
0000000000000058 <LBB2_2>:
 11:    exit

NO-ALU32 is a clear improvement, getting rid of unnecessary zero-extension bit
shifts.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200623032224.4020118-1-andriin@fb.com
2020-06-24 00:04:36 +02:00
Andrii Nakryiko 075c776658 tools/bpftool: Add documentation and sample output for process info
Add statements about bpftool being able to discover process info, holding
reference to BPF map, prog, link, or BTF. Show example output as well.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-10-andriin@fb.com
2020-06-22 17:01:49 -07:00
Andrii Nakryiko d53dee3fe0 tools/bpftool: Show info for processes holding BPF map/prog/link/btf FDs
Add bpf_iter-based way to find all the processes that hold open FDs against
BPF object (map, prog, link, btf). bpftool always attempts to discover this,
but will silently give up if kernel doesn't yet support bpf_iter BPF programs.
Process name and PID are emitted for each process (task group).

Sample output for each of 4 BPF objects:

$ sudo ./bpftool prog show
2694: cgroup_device  tag 8c42dee26e8cd4c2  gpl
        loaded_at 2020-06-16T15:34:32-0700  uid 0
        xlated 648B  jited 409B  memlock 4096B
        pids systemd(1)
2907: cgroup_skb  name egress  tag 9ad187367cf2b9e8  gpl
        loaded_at 2020-06-16T18:06:54-0700  uid 0
        xlated 48B  jited 59B  memlock 4096B  map_ids 2436
        btf_id 1202
        pids test_progs(2238417), test_progs(2238445)

$ sudo ./bpftool map show
2436: array  name test_cgr.bss  flags 0x400
        key 4B  value 8B  max_entries 1  memlock 8192B
        btf_id 1202
        pids test_progs(2238417), test_progs(2238445)
2445: array  name pid_iter.rodata  flags 0x480
        key 4B  value 4B  max_entries 1  memlock 8192B
        btf_id 1214  frozen
        pids bpftool(2239612)

$ sudo ./bpftool link show
61: cgroup  prog 2908
        cgroup_id 375301  attach_type egress
        pids test_progs(2238417), test_progs(2238445)
62: cgroup  prog 2908
        cgroup_id 375344  attach_type egress
        pids test_progs(2238417), test_progs(2238445)

$ sudo ./bpftool btf show
1202: size 1527B  prog_ids 2908,2907  map_ids 2436
        pids test_progs(2238417), test_progs(2238445)
1242: size 34684B
        pids bpftool(2258892)

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-9-andriin@fb.com
2020-06-22 17:01:49 -07:00
Andrii Nakryiko bd9bedf84b libbpf: Wrap source argument of BPF_CORE_READ macro in parentheses
Wrap source argument of BPF_CORE_READ family of macros into parentheses to
allow uses like this:

BPF_CORE_READ((struct cast_struct *)src, a, b, c);

Fixes: 7db3822ab9 ("libbpf: Add BPF_CORE_READ/BPF_CORE_READ_INTO helpers")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200619231703.738941-8-andriin@fb.com
2020-06-22 17:01:48 -07:00
Andrii Nakryiko 05aca6da3b tools/bpftool: Generalize BPF skeleton support and generate vmlinux.h
Adapt Makefile to support BPF skeleton generation beyond single profiler.bpf.c
case. Also add vmlinux.h generation and switch profiler.bpf.c to use it.

clang-bpf-global-var feature is extended and renamed to clang-bpf-co-re to
check for support of preserve_access_index attribute, which, together with BTF
for global variables, is the minimum requirement for modern BPF programs.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-7-andriin@fb.com
2020-06-22 17:01:48 -07:00
Andrii Nakryiko 16e9b187ab tools/bpftool: Minimize bootstrap bpftool
Build minimal "bootstrap mode" bpftool to enable skeleton (and, later,
vmlinux.h generation), instead of building almost complete, but slightly
different (w/o skeletons, etc) bpftool to bootstrap complete bpftool build.

Current approach doesn't scale well (engineering-wise) when adding more BPF
programs to bpftool and other complicated functionality, as it requires
constant adjusting of the code to work in both bootstrapped mode and normal
mode.

So it's better to build only minimal bpftool version that supports only BPF
skeleton code generation and BTF-to-C conversion. Thankfully, this is quite
easy to accomplish due to internal modularity of bpftool commands. This will
also allow to keep adding new functionality to bpftool in general, without the
need to care about bootstrap mode for those new parts of bpftool.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-6-andriin@fb.com
2020-06-22 17:01:48 -07:00
Andrii Nakryiko a479b8ce4e tools/bpftool: Move map/prog parsing logic into common
Move functions that parse map and prog by id/tag/name/etc outside of
map.c/prog.c, respectively. These functions are used outside of those files
and are generic enough to be in common. This also makes heavy-weight map.c and
prog.c more decoupled from the rest of bpftool files and facilitates more
lightweight bootstrap bpftool variant.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-5-andriin@fb.com
2020-06-22 17:01:48 -07:00
Andrii Nakryiko b7ddfab20a selftests/bpf: Add __ksym extern selftest
Validate libbpf is able to handle weak and strong kernel symbol externs in BPF
code correctly.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-4-andriin@fb.com
2020-06-22 17:01:48 -07:00
Andrii Nakryiko 1c0c7074fe libbpf: Add support for extracting kernel symbol addresses
Add support for another (in addition to existing Kconfig) special kind of
externs in BPF code, kernel symbol externs. Such externs allow BPF code to
"know" kernel symbol address and either use it for comparisons with kernel
data structures (e.g., struct file's f_op pointer, to distinguish different
kinds of file), or, with the help of bpf_probe_user_kernel(), to follow
pointers and read data from global variables. Kernel symbol addresses are
found through /proc/kallsyms, which should be present in the system.

Currently, such kernel symbol variables are typeless: they have to be defined
as `extern const void <symbol>` and the only operation you can do (in C code)
with them is to take its address. Such extern should reside in a special
section '.ksyms'. bpf_helpers.h header provides __ksym macro for this. Strong
vs weak semantics stays the same as with Kconfig externs. If symbol is not
found in /proc/kallsyms, this will be a failure for strong (non-weak) extern,
but will be defaulted to 0 for weak externs.

If the same symbol is defined multiple times in /proc/kallsyms, then it will
be error if any of the associated addresses differs. In that case, address is
ambiguous, so libbpf falls on the side of caution, rather than confusing user
with randomly chosen address.

In the future, once kernel is extended with variables BTF information, such
ksym externs will be supported in a typed version, which will allow BPF
program to read variable's contents directly, similarly to how it's done for
fentry/fexit input arguments.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-3-andriin@fb.com
2020-06-22 17:01:48 -07:00
Andrii Nakryiko 2e33efe32e libbpf: Generalize libbpf externs support
Switch existing Kconfig externs to be just one of few possible kinds of more
generic externs. This refactoring is in preparation for ksymbol extern
support, added in the follow up patch. There are no functional changes
intended.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20200619231703.738941-2-andriin@fb.com
2020-06-22 17:01:48 -07:00
Andrii Nakryiko 1bdb6c9a1c libbpf: Add a bunch of attribute getters/setters for map definitions
Add a bunch of getter for various aspects of BPF map. Some of these attribute
(e.g., key_size, value_size, type, etc) are available right now in struct
bpf_map_def, but this patch adds getter allowing to fetch them individually.
bpf_map_def approach isn't very scalable, when ABI stability requirements are
taken into account. It's much easier to extend libbpf and add support for new
features, when each aspect of BPF map has separate getter/setter.

Getters follow the common naming convention of not explicitly having "get" in
its name: bpf_map__type() returns map type, bpf_map__key_size() returns
key_size. Setters, though, explicitly have set in their name:
bpf_map__set_type(), bpf_map__set_key_size().

This patch ensures we now have a getter and a setter for the following
map attributes:
  - type;
  - max_entries;
  - map_flags;
  - numa_node;
  - key_size;
  - value_size;
  - ifindex.

bpf_map__resize() enforces unnecessary restriction of max_entries > 0. It is
unnecessary, because libbpf actually supports zero max_entries for some cases
(e.g., for PERF_EVENT_ARRAY map) and treats it specially during map creation
time. To allow setting max_entries=0, new bpf_map__set_max_entries() setter is
added. bpf_map__resize()'s behavior is preserved for backwards compatibility
reasons.

Map ifindex getter is added as well. There is a setter already, but no
corresponding getter. Fix this assymetry as well. bpf_map__set_ifindex()
itself is converted from void function into error-returning one, similar to
other setters. The only error returned right now is -EBUSY, if BPF map is
already loaded and has corresponding FD.

One lacking attribute with no ability to get/set or even specify it
declaratively is numa_node. This patch fixes this gap and both adds
programmatic getter/setter, as well as adds support for numa_node field in
BTF-defined map.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200621062112.3006313-1-andriin@fb.com
2020-06-23 00:01:32 +02:00
Andrey Ignatov b1b53d413f selftests/bpf: Test access to bpf map pointer
Add selftests to test access to map pointers from bpf program for all
map types except struct_ops (that one would need additional work).

verifier test focuses mostly on scenarios that must be rejected.

prog_tests test focuses on accessing multiple fields both scalar and a
nested struct from bpf program and verifies that those fields have
expected values.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/139a6a17f8016491e39347849b951525335c6eb4.1592600985.git.rdna@fb.com
2020-06-22 22:22:59 +02:00
Andrey Ignatov 41c48f3a98 bpf: Support access to bpf map fields
There are multiple use-cases when it's convenient to have access to bpf
map fields, both `struct bpf_map` and map type specific struct-s such as
`struct bpf_array`, `struct bpf_htab`, etc.

For example while working with sock arrays it can be necessary to
calculate the key based on map->max_entries (some_hash % max_entries).
Currently this is solved by communicating max_entries via "out-of-band"
channel, e.g. via additional map with known key to get info about target
map. That works, but is not very convenient and error-prone while
working with many maps.

In other cases necessary data is dynamic (i.e. unknown at loading time)
and it's impossible to get it at all. For example while working with a
hash table it can be convenient to know how much capacity is already
used (bpf_htab.count.counter for BPF_F_NO_PREALLOC case).

At the same time kernel knows this info and can provide it to bpf
program.

Fill this gap by adding support to access bpf map fields from bpf
program for both `struct bpf_map` and map type specific fields.

Support is implemented via btf_struct_access() so that a user can define
their own `struct bpf_map` or map type specific struct in their program
with only necessary fields and preserve_access_index attribute, cast a
map to this struct and use a field.

For example:

	struct bpf_map {
		__u32 max_entries;
	} __attribute__((preserve_access_index));

	struct bpf_array {
		struct bpf_map map;
		__u32 elem_size;
	} __attribute__((preserve_access_index));

	struct {
		__uint(type, BPF_MAP_TYPE_ARRAY);
		__uint(max_entries, 4);
		__type(key, __u32);
		__type(value, __u32);
	} m_array SEC(".maps");

	SEC("cgroup_skb/egress")
	int cg_skb(void *ctx)
	{
		struct bpf_array *array = (struct bpf_array *)&m_array;
		struct bpf_map *map = (struct bpf_map *)&m_array;

		/* .. use map->max_entries or array->map.max_entries .. */
	}

Similarly to other btf_struct_access() use-cases (e.g. struct tcp_sock
in net/ipv4/bpf_tcp_ca.c) the patch allows access to any fields of
corresponding struct. Only reading from map fields is supported.

For btf_struct_access() to work there should be a way to know btf id of
a struct that corresponds to a map type. To get btf id there should be a
way to get a stringified name of map-specific struct, such as
"bpf_array", "bpf_htab", etc for a map type. Two new fields are added to
`struct bpf_map_ops` to handle it:
* .map_btf_name keeps a btf name of a struct returned by map_alloc();
* .map_btf_id is used to cache btf id of that struct.

To make btf ids calculation cheaper they're calculated once while
preparing btf_vmlinux and cached same way as it's done for btf_id field
of `struct bpf_func_proto`

While calculating btf ids, struct names are NOT checked for collision.
Collisions will be checked as a part of the work to prepare btf ids used
in verifier in compile time that should land soon. The only known
collision for `struct bpf_htab` (kernel/bpf/hashtab.c vs
net/core/sock_map.c) was fixed earlier.

Both new fields .map_btf_name and .map_btf_id must be set for a map type
for the feature to work. If neither is set for a map type, verifier will
return ENOTSUPP on a try to access map_ptr of corresponding type. If
just one of them set, it's verifier misconfiguration.

Only `struct bpf_array` for BPF_MAP_TYPE_ARRAY and `struct bpf_htab` for
BPF_MAP_TYPE_HASH are supported by this patch. Other map types will be
supported separately.

The feature is available only for CONFIG_DEBUG_INFO_BTF=y and gated by
perfmon_capable() so that unpriv programs won't have access to bpf map
fields.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/6479686a0cd1e9067993df57b4c3eef0e276fec9.1592600985.git.rdna@fb.com
2020-06-22 22:22:58 +02:00
Andrii Nakryiko bb8dc2695a tools/bpftool: Relicense bpftool's BPF profiler prog as dual-license GPL/BSD
Relicense it to be compatible with the rest of bpftool files.

Suggested-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200619222024.519774-1-andriin@fb.com
2020-06-20 00:27:19 +02:00
Yonghong Song d56b74b9e1 tools/bpf: Add verifier tests for 32bit pointer/scalar arithmetic
Added two test_verifier subtests for 32bit pointer/scalar arithmetic
with BPF_SUB operator. They are passing verifier now.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200618234632.3321367-1-yhs@fb.com
2020-06-19 23:34:43 +02:00
Andrii Nakryiko 7bd3a33ae6 libbpf: Bump version to 0.1.0
Bump libbpf version to 0.1.0, as new development cycle starts.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200617183132.1970836-1-andriin@fb.com
2020-06-17 13:20:02 -07:00
Gustavo A. R. Silva a5290feb5a tools/testing/nvdimm: Replace zero-length array with flexible-array
There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-06-15 23:08:32 -05:00
Andrii Nakryiko c34a06c56d tools/bpftool: Add ringbuf map to a list of known map types
Add symbolic name "ringbuf" to map to BPF_MAP_TYPE_RINGBUF. Without this,
users will see "type 27" instead of "ringbuf" in `map show` output.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200615225355.366256-1-andriin@fb.com
2020-06-16 02:18:30 +02:00
Andrii Nakryiko b0659d8a95 bpf: Fix definition of bpf_ringbuf_output() helper in UAPI comments
Fix definition of bpf_ringbuf_output() in UAPI header comments, which is used
to generate libbpf's bpf_helper_defs.h header. Return value is a number (error
code), not a pointer.

Fixes: 457f44363a ("bpf: Implement BPF ring buffer and verifier support for it")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200615214926.3638836-1-andriin@fb.com
2020-06-16 02:17:01 +02:00
Linus Torvalds 96144c58ab Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix cfg80211 deadlock, from Johannes Berg.

 2) RXRPC fails to send norigications, from David Howells.

 3) MPTCP RM_ADDR parsing has an off by one pointer error, fix from
    Geliang Tang.

 4) Fix crash when using MSG_PEEK with sockmap, from Anny Hu.

 5) The ucc_geth driver needs __netdev_watchdog_up exported, from
    Valentin Longchamp.

 6) Fix hashtable memory leak in dccp, from Wang Hai.

 7) Fix how nexthops are marked as FDB nexthops, from David Ahern.

 8) Fix mptcp races between shutdown and recvmsg, from Paolo Abeni.

 9) Fix crashes in tipc_disc_rcv(), from Tuong Lien.

10) Fix link speed reporting in iavf driver, from Brett Creeley.

11) When a channel is used for XSK and then reused again later for XSK,
    we forget to clear out the relevant data structures in mlx5 which
    causes all kinds of problems. Fix from Maxim Mikityanskiy.

12) Fix memory leak in genetlink, from Cong Wang.

13) Disallow sockmap attachments to UDP sockets, it simply won't work.
    From Lorenz Bauer.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits)
  net: ethernet: ti: ale: fix allmulti for nu type ale
  net: ethernet: ti: am65-cpsw-nuss: fix ale parameters init
  net: atm: Remove the error message according to the atomic context
  bpf: Undo internal BPF_PROBE_MEM in BPF insns dump
  libbpf: Support pre-initializing .bss global variables
  tools/bpftool: Fix skeleton codegen
  bpf: Fix memlock accounting for sock_hash
  bpf: sockmap: Don't attach programs to UDP sockets
  bpf: tcp: Recv() should return 0 when the peer socket is closed
  ibmvnic: Flush existing work items before device removal
  genetlink: clean up family attributes allocations
  net: ipa: header pad field only valid for AP->modem endpoint
  net: ipa: program upper nibbles of sequencer type
  net: ipa: fix modem LAN RX endpoint id
  net: ipa: program metadata mask differently
  ionic: add pcie_print_link_status
  rxrpc: Fix race between incoming ACK parser and retransmitter
  net/mlx5: E-Switch, Fix some error pointer dereferences
  net/mlx5: Don't fail driver on failure to create debugfs
  net/mlx5e: CT: Fix ipv6 nat header rewrite actions
  ...
2020-06-13 16:27:13 -07:00
David S. Miller fa7566a0d6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-06-12

The following pull-request contains BPF updates for your *net* tree.

We've added 26 non-merge commits during the last 10 day(s) which contain
a total of 27 files changed, 348 insertions(+), 93 deletions(-).

The main changes are:

1) sock_hash accounting fix, from Andrey.

2) libbpf fix and probe_mem sanitizing, from Andrii.

3) sock_hash fixes, from Jakub.

4) devmap_val fix, from Jesper.

5) load_bytes_relative fix, from YiFei.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-13 15:28:08 -07:00
Andrii Nakryiko caf62492f4 libbpf: Support pre-initializing .bss global variables
Remove invalid assumption in libbpf that .bss map doesn't have to be updated
in kernel. With addition of skeleton and memory-mapped initialization image,
.bss doesn't have to be all zeroes when BPF map is created, because user-code
might have initialized those variables from user-space.

Fixes: eba9c5f498 ("libbpf: Refactor global data map initialization")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200612194504.557844-1-andriin@fb.com
2020-06-12 15:27:47 -07:00
Andrii Nakryiko 22eb78792e tools/bpftool: Fix skeleton codegen
Remove unnecessary check at the end of codegen() routine which makes codegen()
to always fail and exit bpftool with error code. Positive value of variable
n is not an indicator of a failure.

Fixes: 2c4779eff8 ("tools, bpftool: Exit on error in function codegen")
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Tobias Klauser <tklauser@distanz.ch>
Link: https://lore.kernel.org/bpf/20200612201603.680852-1-andriin@fb.com
2020-06-12 15:25:04 -07:00