Граф коммитов

3030 Коммитов

Автор SHA1 Сообщение Дата
Bobby Eshleman c7c605c982 selftests/bpf: add vsock to vmtest.sh
Add vsock loopback to the test kernel.

This allows sockmap for vsock to be tested.

Signed-off-by: Bobby Eshleman <bobby.eshleman@bytedance.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-03-29 08:19:38 +01:00
Eduard Zingerman c63a7d8bbb selftests/bpf: Remove verifier/xdp_direct_packet_access.c, converted to progs/verifier_xdp_direct_packet_access.c
Removing verifier/xdp_direct_packet_access.c.c as it was automatically converted to use
inline assembly in the previous commit. It is available in
progs/verifier_xdp_direct_packet_access.c.c.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230328020813.392560-3-eddyz87@gmail.com
2023-03-28 14:48:27 -07:00
Eduard Zingerman 6e9e141a7a selftests/bpf: Verifier/xdp_direct_packet_access.c converted to inline assembly
Test verifier/xdp_direct_packet_access.c automatically converted to use inline assembly.
Original test would be removed in the next patch.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230328020813.392560-2-eddyz87@gmail.com
2023-03-28 14:48:27 -07:00
Hengqi Chen 7283137a76 selftests/bpf: Don't assume page size is 4096
The verifier test creates BPF ringbuf maps using hard-coded
4096 as max_entries. Some tests will fail if the page size
of the running kernel is not 4096. Use getpagesize() instead.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230326095341.816023-1-hengqi.chen@gmail.com
2023-03-27 11:41:16 -07:00
Martin KaFai Lau cbe9d93d58 selftests/bpf: Add bench for task storage creation
This patch adds a task storage benchmark to the existing
local-storage-create benchmark.

For task storage,
./bench --storage-type task --batch-size 32:
   bpf_ma: Summary: creates   30.456 ± 0.507k/s ( 30.456k/prod), 6.08 kmallocs/create
no bpf_ma: Summary: creates   31.962 ± 0.486k/s ( 31.962k/prod), 6.13 kmallocs/create

./bench --storage-type task --batch-size 64:
   bpf_ma: Summary: creates   30.197 ± 1.476k/s ( 30.197k/prod), 6.08 kmallocs/create
no bpf_ma: Summary: creates   31.103 ± 0.297k/s ( 31.103k/prod), 6.13 kmallocs/create

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230322215246.1675516-6-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 19:52:52 -07:00
Martin KaFai Lau d8db84d71c selftests/bpf: Test task storage when local_storage->smap is NULL
The current sk storage test ensures the memory free works when
the local_storage->smap is NULL.

This patch adds a task storage test to ensure the memory free
code path works when local_storage->smap is NULL.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230322215246.1675516-5-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 19:52:52 -07:00
Eduard Zingerman ffb515c933 selftests/bpf: verifier/xdp.c converted to inline assembly
Test verifier/xdp.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-43-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:06 -07:00
Eduard Zingerman a8036aea2d selftests/bpf: verifier/xadd.c converted to inline assembly
Test verifier/xadd.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-42-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:06 -07:00
Eduard Zingerman d15f5b68b6 selftests/bpf: verifier/var_off.c converted to inline assembly
Test verifier/var_off.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-41-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:06 -07:00
Eduard Zingerman d330528617 selftests/bpf: verifier/value_or_null.c converted to inline assembly
Test verifier/value_or_null.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-40-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:06 -07:00
Eduard Zingerman 8f59e87a3b selftests/bpf: verifier/value.c converted to inline assembly
Test verifier/value.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-39-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:06 -07:00
Eduard Zingerman 033914942d selftests/bpf: verifier/value_adj_spill.c converted to inline assembly
Test verifier/value_adj_spill.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-38-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:06 -07:00
Eduard Zingerman ab839a5819 selftests/bpf: verifier/uninit.c converted to inline assembly
Test verifier/uninit.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-37-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:05 -07:00
Eduard Zingerman edff37b2f2 selftests/bpf: verifier/stack_ptr.c converted to inline assembly
Test verifier/stack_ptr.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-36-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:05 -07:00
Eduard Zingerman f4fe3cfe6c selftests/bpf: verifier/spill_fill.c converted to inline assembly
Test verifier/spill_fill.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-35-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:05 -07:00
Eduard Zingerman b7e4203086 selftests/bpf: verifier/ringbuf.c converted to inline assembly
Test verifier/ringbuf.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-34-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:05 -07:00
Eduard Zingerman 18cdc2b531 selftests/bpf: verifier/raw_tp_writable.c converted to inline assembly
Test verifier/raw_tp_writable.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-33-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:05 -07:00
Eduard Zingerman 5a77a01f33 selftests/bpf: verifier/raw_stack.c converted to inline assembly
Test verifier/raw_stack.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-32-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:05 -07:00
Eduard Zingerman 65428312e3 selftests/bpf: verifier/meta_access.c converted to inline assembly
Test verifier/meta_access.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-31-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:05 -07:00
Eduard Zingerman ade3f08fc2 selftests/bpf: verifier/masking.c converted to inline assembly
Test verifier/masking.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-30-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:05 -07:00
Eduard Zingerman 05e474ecbb selftests/bpf: verifier/map_ret_val.c converted to inline assembly
Test verifier/map_ret_val.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-29-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:05 -07:00
Eduard Zingerman caf345cf12 selftests/bpf: verifier/map_ptr.c converted to inline assembly
Test verifier/map_ptr.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-28-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:04 -07:00
Eduard Zingerman 583c7ce5be selftests/bpf: verifier/leak_ptr.c converted to inline assembly
Test verifier/leak_ptr.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-27-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:04 -07:00
Eduard Zingerman e297875580 selftests/bpf: verifier/ld_ind.c converted to inline assembly
Test verifier/ld_ind.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-26-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:04 -07:00
Eduard Zingerman 01481e67dd selftests/bpf: verifier/int_ptr.c converted to inline assembly
Test verifier/int_ptr.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-25-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:04 -07:00
Eduard Zingerman ecc424827b selftests/bpf: verifier/helper_value_access.c converted to inline assembly
Test verifier/helper_value_access.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-24-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:04 -07:00
Eduard Zingerman 77aa2563cb selftests/bpf: verifier/helper_restricted.c converted to inline assembly
Test verifier/helper_restricted.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-23-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:04 -07:00
Eduard Zingerman fb179fe69e selftests/bpf: verifier/helper_packet_access.c converted to inline assembly
Test verifier/helper_packet_access.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-22-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:04 -07:00
Eduard Zingerman b37d776b43 selftests/bpf: verifier/helper_access_var_len.c converted to inline assembly
Test verifier/helper_access_var_len.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-21-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:04 -07:00
Eduard Zingerman 9553de70a8 selftests/bpf: verifier/div_overflow.c converted to inline assembly
Test verifier/div_overflow.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-20-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:04 -07:00
Eduard Zingerman 01a0925531 selftests/bpf: verifier/div0.c converted to inline assembly
Test verifier/div0.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-19-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:04 -07:00
Eduard Zingerman 84988478fb selftests/bpf: verifier/direct_stack_access_wraparound.c converted to inline assembly
Test verifier/direct_stack_access_wraparound.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-18-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:03 -07:00
Eduard Zingerman a58475a989 selftests/bpf: verifier/ctx_sk_msg.c converted to inline assembly
Test verifier/ctx_sk_msg.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-17-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:03 -07:00
Eduard Zingerman a2777eaad5 selftests/bpf: verifier/const_or.c converted to inline assembly
Test verifier/const_or.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-16-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:03 -07:00
Eduard Zingerman 8f16f3c07e selftests/bpf: verifier/cgroup_storage.c converted to inline assembly
Test verifier/cgroup_storage.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-15-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:03 -07:00
Eduard Zingerman b1b6372535 selftests/bpf: verifier/cgroup_skb.c converted to inline assembly
Test verifier/cgroup_skb.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-14-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:03 -07:00
Eduard Zingerman 047687a7f4 selftests/bpf: verifier/cgroup_inv_retcode.c converted to inline assembly
Test verifier/cgroup_inv_retcode.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-13-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:03 -07:00
Eduard Zingerman 2f2047c22c selftests/bpf: verifier/cfg.c converted to inline assembly
Test verifier/cfg.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-12-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:03 -07:00
Eduard Zingerman b14a702afd selftests/bpf: verifier/bounds_mix_sign_unsign.c converted to inline assembly
Test verifier/bounds_mix_sign_unsign.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-11-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:03 -07:00
Eduard Zingerman 7605f94b34 selftests/bpf: verifier/bounds_deduction.c converted to inline assembly
Test verifier/bounds_deduction.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-10-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:03 -07:00
Eduard Zingerman 0ccbe4956d selftests/bpf: verifier/basic_stack.c converted to inline assembly
Test verifier/basic_stack.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-9-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:02 -07:00
Eduard Zingerman a3c830ae02 selftests/bpf: verifier/array_access.c converted to inline assembly
Test verifier/array_access.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-8-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:02 -07:00
Eduard Zingerman 9d0f1568ad selftests/bpf: verifier/and.c converted to inline assembly
Test verifier/and.c automatically converted to use inline assembly.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:02 -07:00
Eduard Zingerman 55108621a3 selftests/bpf: prog_tests entry point for migrated test_verifier tests
prog_tests/verifier.c would be used as a host for verifier/*.c tests
migrated to use inline assembly and run from test_progs.

The run_test_aux() function mimics the test_verifier behavior
dropping CAP_SYS_ADMIN upon entry.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-6-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:02 -07:00
Eduard Zingerman 19a8e06f5f selftests/bpf: Tests execution support for test_loader.c
Extends test_loader.c:test_loader__run_subtests() by allowing to
execute BPF_PROG_TEST_RUN bpf command for selected programs.
This is similar to functionality provided by test_verifier.

Adds the following new attributes controlling test_loader behavior:

  __retval(...)
  __retval_unpriv(...)

* If any of these attributes is present, the annotated program would
  be executed using libbpf's bpf_prog_test_run_opts() function.
* If __retval is present, the test run would be done for program
  loaded in privileged mode.
* If __retval_unpriv is present, the test run would be done for
  program loaded in unprivileged mode.
* To mimic test_verifier behavior, the actual run is initiated in
  privileged mode.
* The value returned by a test run is compared against retval
  parameter.

The retval attribute takes one of the following parameters:
- a decimal number
- a hexadecimal number (must start from '0x')
- any of a three special literals (provided for compatibility with
  test_verifier):
  - INT_MIN
  - POINTER_VALUE
  - TEST_DATA_LEN

An example of the attribute usage:

  SEC("socket")
  __description("return 42")
  __success __success_unpriv __retval(42)
  __naked void the_42_test(void)
  {
          asm volatile ("                                 \
          r0 = 42;                                        \
          exit;                                           \
  "       ::: __clobber_all);
  }

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:02 -07:00
Eduard Zingerman 1d56ade032 selftests/bpf: Unprivileged tests for test_loader.c
Extends test_loader.c:test_loader__run_subtests() by allowing to
execute tests in unprivileged mode, similar to test_verifier.c.

Adds the following new attributes controlling test_loader behavior:

  __msg_unpriv
  __success_unpriv
  __failure_unpriv

* If any of these attributes is present the test would be loaded in
  unprivileged mode.
* If only "privileged" attributes are present the test would be loaded
  only in privileged mode.
* If both "privileged" and "unprivileged" attributes are present the
  test would be loaded in both modes.
* If test has to be executed in both modes, __msg(text) is specified
  and __msg_unpriv is not specified the behavior is the same as if
  __msg_unpriv(text) is specified.
* For test filtering purposes the name of the program loaded in
  unprivileged mode is derived from the usual program name by adding
  `@unpriv' suffix.

Also adds attribute '__description'. This attribute specifies text to
be used instead of a program name for display and filtering purposes.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:02 -07:00
Eduard Zingerman 207b1ba301 selftests/bpf: __imm_insn & __imm_const macro for bpf_misc.h
Add two convenience macro for BPF test cases,
allowing the following usage:

  #include <linux/filter.h>

  ...
  asm volatile (
  ...
  ".8byte %[raw_insn];"
  ...
  "r1 += %[st_foo_offset];"
  ...
  :
  : __imm_insn(raw_insn, BPF_RAW_INSN(...)),
    __imm_const(st_foo_offset, offsetof(struct st, foo))
  : __clobber_all);

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:02 -07:00
Eduard Zingerman 3e5329e193 selftests/bpf: Report program name on parse_test_spec error
Change test_loader.c:run_subtest() behavior to show BPF program name
when test spec for that program can't be parsed.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230325025524.144043-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 17:02:02 -07:00
David Vernet 6c831c4684 bpf: Treat KF_RELEASE kfuncs as KF_TRUSTED_ARGS
KF_RELEASE kfuncs are not currently treated as having KF_TRUSTED_ARGS,
even though they have a superset of the requirements of KF_TRUSTED_ARGS.
Like KF_TRUSTED_ARGS, KF_RELEASE kfuncs require a 0-offset argument, and
don't allow NULL-able arguments. Unlike KF_TRUSTED_ARGS which require
_either_ an argument with ref_obj_id > 0, _or_ (ref->type &
BPF_REG_TRUSTED_MODIFIERS) (and no unsafe modifiers allowed), KF_RELEASE
only allows for ref_obj_id > 0.  Because KF_RELEASE today doesn't
automatically imply KF_TRUSTED_ARGS, some of these requirements are
enforced in different ways that can make the behavior of the verifier
feel unpredictable. For example, a KF_RELEASE kfunc with a NULL-able
argument will currently fail in the verifier with a message like, "arg#0
is ptr_or_null_ expected ptr_ or socket" rather than "Possibly NULL
pointer passed to trusted arg0". Our intention is the same, but the
semantics are different due to implemenetation details that kfunc authors
and BPF program writers should not need to care about.

Let's make the behavior of the verifier more consistent and intuitive by
having KF_RELEASE kfuncs imply the presence of KF_TRUSTED_ARGS. Our
eventual goal is to have all kfuncs assume KF_TRUSTED_ARGS by default
anyways, so this takes us a step in that direction.

Note that it does not make sense to assume KF_TRUSTED_ARGS for all
KF_ACQUIRE kfuncs. KF_ACQUIRE kfuncs can have looser semantics than
KF_RELEASE, with e.g. KF_RCU | KF_RET_NULL. We may want to have
KF_ACQUIRE imply KF_TRUSTED_ARGS _unless_ KF_RCU is specified, but that
can be left to another patch set, and there are no such subtleties to
address for KF_RELEASE.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230325213144.486885-4-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-25 16:56:22 -07:00
Jakub Kicinski dc0a7b5200 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
  6e9d51b1a5 ("net/mlx5e: Initialize link speed to zero")
  1bffcea429 ("net/mlx5e: Add devlink hairpin queues parameters")
https://lore.kernel.org/all/20230324120623.4ebbc66f@canb.auug.org.au/
https://lore.kernel.org/all/20230321211135.47711-1-saeed@kernel.org/

Adjacent changes:

drivers/net/phy/phy.c
  323fe43cf9 ("net: phy: Improved PHY error reporting in state machine")
  4203d84032 ("net: phy: Ensure state transitions are processed from phy_stop()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-24 10:10:20 -07:00
Kui-Feng Lee 06da9f3bd6 selftests/bpf: Test switching TCP Congestion Control algorithms.
Create a pair of sockets that utilize the congestion control algorithm
under a particular name. Then switch up this congestion control
algorithm to another implementation and check whether newly created
connections using the same cc name now run the new implementation.

Also, try to update a link with a struct_ops that is without
BPF_F_LINK or with a wrong or different name.  These cases should fail
due to the violation of assumptions.  To update a bpf_link of a
struct_ops, it must be replaced with another struct_ops that is
identical in type and name and has the BPF_F_LINK flag.

The other test case is to create links from the same struct_ops more
than once.  It makes sure a struct_ops can be used repeatly.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230323032405.3735486-9-kuifeng@meta.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-22 22:53:25 -07:00
Xu Kuohai 1a3148fc17 selftests/bpf: Check when bounds are not in the 32-bit range
Add cases to check if bound is updated correctly when 64-bit value is
not in the 32-bit range.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230322213056.2470-2-daniel@iogearbox.net
2023-03-22 16:49:25 -07:00
Daniel Borkmann 7be14c1c90 bpf: Fix __reg_bound_offset 64->32 var_off subreg propagation
Xu reports that after commit 3f50f132d8 ("bpf: Verifier, do explicit ALU32
bounds tracking"), the following BPF program is rejected by the verifier:

   0: (61) r2 = *(u32 *)(r1 +0)          ; R2_w=pkt(off=0,r=0,imm=0)
   1: (61) r3 = *(u32 *)(r1 +4)          ; R3_w=pkt_end(off=0,imm=0)
   2: (bf) r1 = r2
   3: (07) r1 += 1
   4: (2d) if r1 > r3 goto pc+8
   5: (71) r1 = *(u8 *)(r2 +0)           ; R1_w=scalar(umax=255,var_off=(0x0; 0xff))
   6: (18) r0 = 0x7fffffffffffff10
   8: (0f) r1 += r0                      ; R1_w=scalar(umin=0x7fffffffffffff10,umax=0x800000000000000f)
   9: (18) r0 = 0x8000000000000000
  11: (07) r0 += 1
  12: (ad) if r0 < r1 goto pc-2
  13: (b7) r0 = 0
  14: (95) exit

And the verifier log says:

  func#0 @0
  0: R1=ctx(off=0,imm=0) R10=fp0
  0: (61) r2 = *(u32 *)(r1 +0)          ; R1=ctx(off=0,imm=0) R2_w=pkt(off=0,r=0,imm=0)
  1: (61) r3 = *(u32 *)(r1 +4)          ; R1=ctx(off=0,imm=0) R3_w=pkt_end(off=0,imm=0)
  2: (bf) r1 = r2                       ; R1_w=pkt(off=0,r=0,imm=0) R2_w=pkt(off=0,r=0,imm=0)
  3: (07) r1 += 1                       ; R1_w=pkt(off=1,r=0,imm=0)
  4: (2d) if r1 > r3 goto pc+8          ; R1_w=pkt(off=1,r=1,imm=0) R3_w=pkt_end(off=0,imm=0)
  5: (71) r1 = *(u8 *)(r2 +0)           ; R1_w=scalar(umax=255,var_off=(0x0; 0xff)) R2_w=pkt(off=0,r=1,imm=0)
  6: (18) r0 = 0x7fffffffffffff10       ; R0_w=9223372036854775568
  8: (0f) r1 += r0                      ; R0_w=9223372036854775568 R1_w=scalar(umin=9223372036854775568,umax=9223372036854775823,s32_min=-240,s32_max=15)
  9: (18) r0 = 0x8000000000000000       ; R0_w=-9223372036854775808
  11: (07) r0 += 1                      ; R0_w=-9223372036854775807
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775807 R1_w=scalar(umin=9223372036854775568,umax=9223372036854775809)
  13: (b7) r0 = 0                       ; R0_w=0
  14: (95) exit

  from 12 to 11: R0_w=-9223372036854775807 R1_w=scalar(umin=9223372036854775810,umax=9223372036854775823,var_off=(0x8000000000000000; 0xffffffff)) R2_w=pkt(off=0,r=1,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775806
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775806 R1_w=scalar(umin=9223372036854775810,umax=9223372036854775810,var_off=(0x8000000000000000; 0xffffffff))
  13: safe

  [...]

  from 12 to 11: R0_w=-9223372036854775795 R1=scalar(umin=9223372036854775822,umax=9223372036854775823,var_off=(0x8000000000000000; 0xffffffff)) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775794
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775794 R1=scalar(umin=9223372036854775822,umax=9223372036854775822,var_off=(0x8000000000000000; 0xffffffff))
  13: safe

  from 12 to 11: R0_w=-9223372036854775794 R1=scalar(umin=9223372036854775823,umax=9223372036854775823,var_off=(0x8000000000000000; 0xffffffff)) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775793
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775793 R1=scalar(umin=9223372036854775823,umax=9223372036854775823,var_off=(0x8000000000000000; 0xffffffff))
  13: safe

  from 12 to 11: R0_w=-9223372036854775793 R1=scalar(umin=9223372036854775824,umax=9223372036854775823,var_off=(0x8000000000000000; 0xffffffff)) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775792
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775792 R1=scalar(umin=9223372036854775824,umax=9223372036854775823,var_off=(0x8000000000000000; 0xffffffff))
  13: safe

  [...]

The 64bit umin=9223372036854775810 bound continuously bumps by +1 while
umax=9223372036854775823 stays as-is until the verifier complexity limit
is reached and the program gets finally rejected. During this simulation,
the umin also eventually surpasses umax. Looking at the first 'from 12
to 11' output line from the loop, R1 has the following state:

  R1_w=scalar(umin=0x8000000000000002 (9223372036854775810),
              umax=0x800000000000000f (9223372036854775823),
          var_off=(0x8000000000000000;
                           0xffffffff))

The var_off has technically not an inconsistent state but it's very
imprecise and far off surpassing 64bit umax bounds whereas the expected
output with refined known bits in var_off should have been like:

  R1_w=scalar(umin=0x8000000000000002 (9223372036854775810),
              umax=0x800000000000000f (9223372036854775823),
          var_off=(0x8000000000000000;
                                  0xf))

In the above log, var_off stays as var_off=(0x8000000000000000; 0xffffffff)
and does not converge into a narrower mask where more bits become known,
eventually transforming R1 into a constant upon umin=9223372036854775823,
umax=9223372036854775823 case where the verifier would have terminated and
let the program pass.

The __reg_combine_64_into_32() marks the subregister unknown and propagates
64bit {s,u}min/{s,u}max bounds to their 32bit equivalents iff they are within
the 32bit universe. The question came up whether __reg_combine_64_into_32()
should special case the situation that when 64bit {s,u}min bounds have
the same value as 64bit {s,u}max bounds to then assign the latter as
well to the 32bit reg->{s,u}32_{min,max}_value. As can be seen from the
above example however, that is just /one/ special case and not a /generic/
solution given above example would still not be addressed this way and
remain at an imprecise var_off=(0x8000000000000000; 0xffffffff).

The improvement is needed in __reg_bound_offset() to refine var32_off with
the updated var64_off instead of the prior reg->var_off. The reg_bounds_sync()
code first refines information about the register's min/max bounds via
__update_reg_bounds() from the current var_off, then in __reg_deduce_bounds()
from sign bit and with the potentially learned bits from bounds it'll
update the var_off tnum in __reg_bound_offset(). For example, intersecting
with the old var_off might have improved bounds slightly, e.g. if umax
was 0x7f...f and var_off was (0; 0xf...fc), then new var_off will then
result in (0; 0x7f...fc). The intersected var64_off holds then the
universe which is a superset of var32_off. The point for the latter is
not to broaden, but to further refine known bits based on the intersection
of var_off with 32 bit bounds, so that we later construct the final var_off
from upper and lower 32 bits. The final __update_reg_bounds() can then
potentially still slightly refine bounds if more bits became known from the
new var_off.

After the improvement, we can see R1 converging successively:

  func#0 @0
  0: R1=ctx(off=0,imm=0) R10=fp0
  0: (61) r2 = *(u32 *)(r1 +0)          ; R1=ctx(off=0,imm=0) R2_w=pkt(off=0,r=0,imm=0)
  1: (61) r3 = *(u32 *)(r1 +4)          ; R1=ctx(off=0,imm=0) R3_w=pkt_end(off=0,imm=0)
  2: (bf) r1 = r2                       ; R1_w=pkt(off=0,r=0,imm=0) R2_w=pkt(off=0,r=0,imm=0)
  3: (07) r1 += 1                       ; R1_w=pkt(off=1,r=0,imm=0)
  4: (2d) if r1 > r3 goto pc+8          ; R1_w=pkt(off=1,r=1,imm=0) R3_w=pkt_end(off=0,imm=0)
  5: (71) r1 = *(u8 *)(r2 +0)           ; R1_w=scalar(umax=255,var_off=(0x0; 0xff)) R2_w=pkt(off=0,r=1,imm=0)
  6: (18) r0 = 0x7fffffffffffff10       ; R0_w=9223372036854775568
  8: (0f) r1 += r0                      ; R0_w=9223372036854775568 R1_w=scalar(umin=9223372036854775568,umax=9223372036854775823,s32_min=-240,s32_max=15)
  9: (18) r0 = 0x8000000000000000       ; R0_w=-9223372036854775808
  11: (07) r0 += 1                      ; R0_w=-9223372036854775807
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775807 R1_w=scalar(umin=9223372036854775568,umax=9223372036854775809)
  13: (b7) r0 = 0                       ; R0_w=0
  14: (95) exit

  from 12 to 11: R0_w=-9223372036854775807 R1_w=scalar(umin=9223372036854775810,umax=9223372036854775823,var_off=(0x8000000000000000; 0xf),s32_min=0,s32_max=15,u32_max=15) R2_w=pkt(off=0,r=1,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775806
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775806 R1_w=-9223372036854775806
  13: safe

  from 12 to 11: R0_w=-9223372036854775806 R1_w=scalar(umin=9223372036854775811,umax=9223372036854775823,var_off=(0x8000000000000000; 0xf),s32_min=0,s32_max=15,u32_max=15) R2_w=pkt(off=0,r=1,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775805
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775805 R1_w=-9223372036854775805
  13: safe

  [...]

  from 12 to 11: R0_w=-9223372036854775798 R1=scalar(umin=9223372036854775819,umax=9223372036854775823,var_off=(0x8000000000000008; 0x7),s32_min=8,s32_max=15,u32_min=8,u32_max=15) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775797
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775797 R1=-9223372036854775797
  13: safe

  from 12 to 11: R0_w=-9223372036854775797 R1=scalar(umin=9223372036854775820,umax=9223372036854775823,var_off=(0x800000000000000c; 0x3),s32_min=12,s32_max=15,u32_min=12,u32_max=15) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775796
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775796 R1=-9223372036854775796
  13: safe

  from 12 to 11: R0_w=-9223372036854775796 R1=scalar(umin=9223372036854775821,umax=9223372036854775823,var_off=(0x800000000000000c; 0x3),s32_min=12,s32_max=15,u32_min=12,u32_max=15) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775795
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775795 R1=-9223372036854775795
  13: safe

  from 12 to 11: R0_w=-9223372036854775795 R1=scalar(umin=9223372036854775822,umax=9223372036854775823,var_off=(0x800000000000000e; 0x1),s32_min=14,s32_max=15,u32_min=14,u32_max=15) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775794
  12: (ad) if r0 < r1 goto pc-2         ; R0_w=-9223372036854775794 R1=-9223372036854775794
  13: safe

  from 12 to 11: R0_w=-9223372036854775794 R1=-9223372036854775793 R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  11: (07) r0 += 1                      ; R0_w=-9223372036854775793
  12: (ad) if r0 < r1 goto pc-2
  last_idx 12 first_idx 12
  parent didn't have regs=1 stack=0 marks: R0_rw=P-9223372036854775801 R1_r=scalar(umin=9223372036854775815,umax=9223372036854775823,var_off=(0x8000000000000000; 0xf),s32_min=0,s32_max=15,u32_max=15) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  last_idx 11 first_idx 11
  regs=1 stack=0 before 11: (07) r0 += 1
  parent didn't have regs=1 stack=0 marks: R0_rw=P-9223372036854775805 R1_rw=scalar(umin=9223372036854775812,umax=9223372036854775823,var_off=(0x8000000000000000; 0xf),s32_min=0,s32_max=15,u32_max=15) R2_w=pkt(off=0,r=1,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0
  last_idx 12 first_idx 0
  regs=1 stack=0 before 12: (ad) if r0 < r1 goto pc-2
  regs=1 stack=0 before 11: (07) r0 += 1
  regs=1 stack=0 before 12: (ad) if r0 < r1 goto pc-2
  regs=1 stack=0 before 11: (07) r0 += 1
  regs=1 stack=0 before 12: (ad) if r0 < r1 goto pc-2
  regs=1 stack=0 before 11: (07) r0 += 1
  regs=1 stack=0 before 9: (18) r0 = 0x8000000000000000
  last_idx 12 first_idx 12
  parent didn't have regs=2 stack=0 marks: R0_rw=P-9223372036854775801 R1_r=Pscalar(umin=9223372036854775815,umax=9223372036854775823,var_off=(0x8000000000000000; 0xf),s32_min=0,s32_max=15,u32_max=15) R2=pkt(off=0,r=1,imm=0) R3=pkt_end(off=0,imm=0) R10=fp0
  last_idx 11 first_idx 11
  regs=2 stack=0 before 11: (07) r0 += 1
  parent didn't have regs=2 stack=0 marks: R0_rw=P-9223372036854775805 R1_rw=Pscalar(umin=9223372036854775812,umax=9223372036854775823,var_off=(0x8000000000000000; 0xf),s32_min=0,s32_max=15,u32_max=15) R2_w=pkt(off=0,r=1,imm=0) R3_w=pkt_end(off=0,imm=0) R10=fp0
  last_idx 12 first_idx 0
  regs=2 stack=0 before 12: (ad) if r0 < r1 goto pc-2
  regs=2 stack=0 before 11: (07) r0 += 1
  regs=2 stack=0 before 12: (ad) if r0 < r1 goto pc-2
  regs=2 stack=0 before 11: (07) r0 += 1
  regs=2 stack=0 before 12: (ad) if r0 < r1 goto pc-2
  regs=2 stack=0 before 11: (07) r0 += 1
  regs=2 stack=0 before 9: (18) r0 = 0x8000000000000000
  regs=2 stack=0 before 8: (0f) r1 += r0
  regs=3 stack=0 before 6: (18) r0 = 0x7fffffffffffff10
  regs=2 stack=0 before 5: (71) r1 = *(u8 *)(r2 +0)
  13: safe

  from 4 to 13: safe
  verification time 322 usec
  stack depth 0
  processed 56 insns (limit 1000000) max_states_per_insn 1 total_states 3 peak_states 3 mark_read 1

This also fixes up a test case along with this improvement where we match
on the verifier log. The updated log now has a refined var_off, too.

Fixes: 3f50f132d8 ("bpf: Verifier, do explicit ALU32 bounds tracking")
Reported-by: Xu Kuohai <xukuohai@huaweicloud.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230314203424.4015351-2-xukuohai@huaweicloud.com
Link: https://lore.kernel.org/bpf/20230322213056.2470-1-daniel@iogearbox.net
2023-03-22 16:49:25 -07:00
JP Kobryn 830154cdc5 bpf/selftests: coverage for bpf_map_ops errors
These tests expose the issue of being unable to properly check for errors
returned from inlined bpf map helpers that make calls to the bpf_map_ops
functions. At best, a check for zero or non-zero can be done but these
tests show it is not possible to check for a negative value or for a
specific error value.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230322194754.185781-2-inwardvessel@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-22 15:11:06 -07:00
Alexei Starovoitov 3b2ec2140f selftests/bpf: Add light skeleton test for kfunc detection.
Add light skeleton test for kfunc detection and denylist it for s390.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230321203854.3035-5-alexei.starovoitov@gmail.com
2023-03-22 09:31:05 -07:00
Tushar Vyavahare 9a321fd330 selftests/xsk: add xdp populate metadata test
Add a new test in copy-mode for testing the copying of metadata from the
buffer in kernel-space to user-space. This is accomplished by adding a
new XDP program and using the bss map to store a counter that is written
to the metadata field. This counter is incremented for every packet so
that the number becomes unique and should be the same as the payload. It
is store in the bss so the value can be reset between runs.

The XDP program populates the metadata and the userspace program checks
the value stored in the metadata field against the payload using the new
is_metadata_correct() function. To turn this verification on or off, add
a new parameter (use_metadata) to the ifobject structure.

Signed-off-by: Tushar Vyavahare <tushar.vyavahare@intel.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20230320102705.306187-1-tushar.vyavahare@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-22 09:14:07 -07:00
Jakub Kicinski c0ba861117 net: skbuff: move the fields BPF cares about directly next to the offset marker
To avoid more possible BPF dependencies with moving bitfields
around keep the fields BPF cares about right next to the offset
marker.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230321014115.997841-4-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-20 21:57:49 -07:00
Jakub Kicinski 04aae213e7 net: skbuff: rename __pkt_vlan_present_offset to __mono_tc_offset
vlan_present is gone since
commit 354259fa73 ("net: remove skb->vlan_present")
rename the offset field to what BPF is currently looking
for in this byte - mono_delivery_time and tc_at_ingress.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230321014115.997841-2-kuba@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-20 21:57:49 -07:00
Alexei Starovoitov bb4a6a9237 selftest/bpf: Add a test case for ld_imm64 copy logic.
Add a test case to exercise {btf_id, btf_obj_fd} copy logic between ld_imm64 insns.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230319203014.55866-2-alexei.starovoitov@gmail.com
2023-03-20 09:26:41 -07:00
Manu Bretelle 2be7aa76cc selftests/bpf: Add --json-summary option to test_progs
Currently, test_progs outputs all stdout/stderr as it runs, and when it
is done, prints a summary.

It is non-trivial for tooling to parse that output and extract meaningful
information from it.

This change adds a new option, `--json-summary`/`-J` that let the caller
specify a file where `test_progs{,-no_alu32}` can write a summary of the
run in a json format that can later be parsed by tooling.

Currently, it creates a summary section with successes/skipped/failures
followed by a list of failed tests and subtests.

A test contains the following fields:
- name: the name of the test
- number: the number of the test
- message: the log message that was printed by the test.
- failed: A boolean indicating whether the test failed or not. Currently
we only output failed tests, but in the future, successful tests could
be added.
- subtests: A list of subtests associated with this test.

A subtest contains the following fields:
- name: same as above
- number: sanme as above
- message: the log message that was printed by the subtest.
- failed: same as above but for the subtest

An example run and json content below:
```
$ sudo ./test_progs -a $(grep -v '^#' ./DENYLIST.aarch64 | awk '{print
$1","}' | tr -d '\n') -j -J /tmp/test_progs.json
$ jq < /tmp/test_progs.json | head -n 30
{
  "success": 29,
  "success_subtest": 23,
  "skipped": 3,
  "failed": 28,
  "results": [
    {
      "name": "bpf_cookie",
      "number": 10,
      "message": "test_bpf_cookie:PASS:skel_open 0 nsec\n",
      "failed": true,
      "subtests": [
        {
          "name": "multi_kprobe_link_api",
          "number": 2,
          "message": "kprobe_multi_link_api_subtest:PASS:load_kallsyms 0 nsec\nlibbpf: extern 'bpf_testmod_fentry_test1' (strong): not resolved\nlibbpf: failed to load object 'kprobe_multi'\nlibbpf: failed to load BPF skeleton 'kprobe_multi': -3\nkprobe_multi_link_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3\n",
          "failed": true
        },
        {
          "name": "multi_kprobe_attach_api",
          "number": 3,
          "message": "libbpf: extern 'bpf_testmod_fentry_test1' (strong): not resolved\nlibbpf: failed to load object 'kprobe_multi'\nlibbpf: failed to load BPF skeleton 'kprobe_multi': -3\nkprobe_multi_attach_api_subtest:FAIL:fentry_raw_skel_load unexpected error: -3\n",
          "failed": true
        },
        {
          "name": "lsm",
          "number": 8,
          "message": "lsm_subtest:PASS:lsm.link_create 0 nsec\nlsm_subtest:FAIL:stack_mprotect unexpected stack_mprotect: actual 0 != expected -1\n",
          "failed": true
        }
```

The file can then be used to print a summary of the test run and list of
failing tests/subtests:

```
$ jq -r < /tmp/test_progs.json '"Success: \(.success)/\(.success_subtest), Skipped: \(.skipped), Failed: \(.failed)"'

Success: 29/23, Skipped: 3, Failed: 28
$ jq -r < /tmp/test_progs.json '.results | map([
    if .failed then "#\(.number) \(.name)" else empty end,
    (
        . as {name: $tname, number: $tnum} | .subtests | map(
            if .failed then "#\($tnum)/\(.number) \($tname)/\(.name)" else empty end
        )
    )
]) | flatten | .[]' | head -n 20
 #10 bpf_cookie
 #10/2 bpf_cookie/multi_kprobe_link_api
 #10/3 bpf_cookie/multi_kprobe_attach_api
 #10/8 bpf_cookie/lsm
 #15 bpf_mod_race
 #15/1 bpf_mod_race/ksym (used_btfs UAF)
 #15/2 bpf_mod_race/kfunc (kfunc_btf_tab UAF)
 #36 cgroup_hierarchical_stats
 #61 deny_namespace
 #61/1 deny_namespace/unpriv_userns_create_no_bpf
 #73 fexit_stress
 #83 get_func_ip_test
 #99 kfunc_dynptr_param
 #99/1 kfunc_dynptr_param/dynptr_data_null
 #99/4 kfunc_dynptr_param/dynptr_data_null
 #100 kprobe_multi_bench_attach
 #100/1 kprobe_multi_bench_attach/kernel
 #100/2 kprobe_multi_bench_attach/modules
 #101 kprobe_multi_test
 #101/1 kprobe_multi_test/skel_api
```

Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230317163256.3809328-1-chantr4@gmail.com
2023-03-17 16:00:33 -07:00
Alexei Starovoitov 95fdf6e313 selftests/bpf: Add test for bpf_ksym_exists().
Add load and run time test for bpf_ksym_exists() and check that the verifier
performs dead code elimination for non-existing kfunc.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230317201920.62030-5-alexei.starovoitov@gmail.com
2023-03-17 15:46:02 -07:00
Alexander Lobakin 5640b6d894 selftests/bpf: fix "metadata marker" getting overwritten by the netstack
Alexei noticed xdp_do_redirect test on BPF CI started failing on
BE systems after skb PP recycling was enabled:

test_xdp_do_redirect:PASS:prog_run 0 nsec
test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
test_xdp_do_redirect:PASS:pkt_count_zero 0 nsec
test_xdp_do_redirect:FAIL:pkt_count_tc unexpected pkt_count_tc: actual
220 != expected 9998
test_max_pkt_size:PASS:prog_run_max_size 0 nsec
test_max_pkt_size:PASS:prog_run_too_big 0 nsec
close_netns:PASS:setns 0 nsec
 #289 xdp_do_redirect:FAIL
Summary: 270/1674 PASSED, 30 SKIPPED, 1 FAILED

and it doesn't happen on LE systems.
Ilya then hunted it down to:

 #0  0x0000000000aaeee6 in neigh_hh_output (hh=0x83258df0,
skb=0x88142200) at linux/include/net/neighbour.h:503
 #1  0x0000000000ab2cda in neigh_output (skip_cache=false,
skb=0x88142200, n=<optimized out>) at linux/include/net/neighbour.h:544
 #2  ip6_finish_output2 (net=net@entry=0x88edba00, sk=sk@entry=0x0,
skb=skb@entry=0x88142200) at linux/net/ipv6/ip6_output.c:134
 #3  0x0000000000ab4cbc in __ip6_finish_output (skb=0x88142200, sk=0x0,
net=0x88edba00) at linux/net/ipv6/ip6_output.c:195
 #4  ip6_finish_output (net=0x88edba00, sk=0x0, skb=0x88142200) at
linux/net/ipv6/ip6_output.c:206

xdp_do_redirect test places a u32 marker (0x42) right before the Ethernet
header to check it then in the XDP program and return %XDP_ABORTED if it's
not there. Neigh xmit code likes to round up hard header length to speed
up copying the header, so it overwrites two bytes in front of the Eth
header. On LE systems, 0x42 is one byte at `data - 4`, while on BE it's
`data - 1`, what explains why it happens only there.
It didn't happen previously due to that %XDP_PASS meant the page will be
discarded and replaced by a new one, but now it can be recycled as well,
while bpf_test_run code doesn't reinitialize the content of recycled
pages. This mark is limited to this particular test and its setup though,
so there's no need to predict 1000 different possible cases. Just move
it 4 bytes to the left, still keeping it 32 bit to match on more bytes.

Fixes: 9c94bbf9a8 ("xdp: recycle Page Pool backed skbs built from XDP frames")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/CAADnVQ+B_JOU+EpP=DKhbY9yXdN6GiRPnpTTXfEZ9sNkUeb-yQ@mail.gmail.com
Reported-by: Ilya Leoshkevich <iii@linux.ibm.com> # + debugging
Link: https://lore.kernel.org/bpf/8341c1d9f935f410438e79d3bd8a9cc50aefe105.camel@linux.ibm.com
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230316175051.922550-3-aleksander.lobakin@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-16 22:20:09 -07:00
David Vernet 1b403ce77d bpf: Remove bpf_cpumask_kptr_get() kfunc
Now that struct bpf_cpumask is RCU safe, there's no need for this kfunc.
Rather than doing the following:

private(MASK) static struct bpf_cpumask __kptr *global;

int BPF_PROG(prog, s32 cpu, ...)
{
	struct bpf_cpumask *cpumask;

	bpf_rcu_read_lock();
	cpumask = bpf_cpumask_kptr_get(&global);
	if (!cpumask) {
		bpf_rcu_read_unlock();
		return -1;
	}
	bpf_cpumask_setall(cpumask);
	...
	bpf_cpumask_release(cpumask);
	bpf_rcu_read_unlock();
}

Programs can instead simply do (assume same global cpumask):

int BPF_PROG(prog, ...)
{
	struct bpf_cpumask *cpumask;

	bpf_rcu_read_lock();
	cpumask = global;
	if (!cpumask) {
		bpf_rcu_read_unlock();
		return -1;
	}
	bpf_cpumask_setall(cpumask);
	...
	bpf_rcu_read_unlock();
}

In other words, no extra atomic acquire / release, and less boilerplate
code.

This patch removes both the kfunc, as well as its selftests and
documentation.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230316054028.88924-5-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-16 12:28:30 -07:00
David Vernet a5a197df58 bpf/selftests: Test using global cpumask kptr with RCU
Now that struct bpf_cpumask * is considered an RCU-safe type according
to the verifier, we should add tests that validate its common usages.
This patch adds those tests to the cpumask test suite. A subsequent
changes will remove bpf_cpumask_kptr_get(), and will adjust the selftest
and BPF documentation accordingly.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230316054028.88924-4-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-16 12:28:30 -07:00
Martin KaFai Lau 226efec2b0 selftests/bpf: Fix a fd leak in an error path in network_helpers.c
In __start_server, it leaks a fd when setsockopt(SO_REUSEPORT) fails.
This patch fixes it.

Fixes: eed92afdd1 ("bpf: selftest: Test batching and bpf_(get|set)sockopt in bpf tcp iter")
Reported-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230316000726.1016773-2-martin.lau@linux.dev
2023-03-16 18:12:59 +01:00
Martin KaFai Lau ed01385c0d selftests/bpf: Use ASSERT_EQ instead ASSERT_OK for testing memcmp result
In tcp_hdr_options test, it ensures the received tcp hdr option
and the sk local storage have the expected values. It uses memcmp
to check that. Testing the memcmp result with ASSERT_OK is confusing
because ASSERT_OK will print out the errno which is not set.
This patch uses ASSERT_EQ to check for 0 instead.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20230316000726.1016773-1-martin.lau@linux.dev
2023-03-16 18:12:59 +01:00
Viktor Malik aa3d65de4b bpf/selftests: Test fentry attachment to shadowed functions
Adds a new test that tries to attach a program to fentry of two
functions of the same name, one located in vmlinux and the other in
bpf_testmod.

To avoid conflicts with existing tests, a new function
"bpf_fentry_shadow_test" was created both in vmlinux and in bpf_testmod.

The previous commit fixed a bug which caused this test to fail. The
verifier would always use the vmlinux function's address as the target
trampoline address, hence trying to create two trampolines for a single
address, which is forbidden.

The test (similarly to other fentry/fexit tests) is not working on arm64
at the moment.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/5fe2f364190b6f79b085066ed7c5989c5bc475fa.1678432753.git.vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-15 18:38:30 -07:00
Alexei Starovoitov 3c2611bac0 selftests/bpf: Fix trace_virtqueue_add_sgs test issue with LLVM 17.
LLVM commit https://reviews.llvm.org/D143726 introduced hoistMinMax optimization
that transformed
  (i < VIRTIO_MAX_SGS) && (i < out_sgs)
into
  i < MIN(VIRTIO_MAX_SGS, out_sgs)

and caused the verifier to stop recognizing such loop as bounded.

Which resulted in the following test failure:

libbpf: prog 'trace_virtqueue_add_sgs': BPF program load failed: Bad address
libbpf: prog 'trace_virtqueue_add_sgs': -- BEGIN PROG LOAD LOG --
The sequence of 8193 jumps is too complex.
verification time 789206 usec
stack depth 56
processed 156446 insns (limit 1000000) max_states_per_insn 7 total_states 1746 peak_states 1701 mark_read 12
-- END PROG LOAD LOG --
libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14
libbpf: failed to load object 'loop6.bpf.o'

Workaround the verifier limitation for now with inline asm that
prevents this particular optimization.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-14 15:28:11 -07:00
Alexander Lobakin 487deb3e33 selftests/bpf: robustify test_xdp_do_redirect with more payload magics
Currently, the test relies on that only dropped ("xmitted") frames will
be recycled and if a frame became an skb, it will be freed later by the
stack and never come back to its page_pool.
So, it easily gets broken by trying to recycle skbs[0]:

  test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec
  test_xdp_do_redirect:FAIL:pkt_count_zero unexpected pkt_count_zero:
actual 9936 != expected 2
  test_xdp_do_redirect:PASS:pkt_count_tc 0 nsec

That huge mismatch happened because after the TC ingress hook zeroes the
magic, the page gets recycled when skb is freed, not returned to the MM
layer. "Live frames" mode initializes only new pages and keeps the
recycled ones as is by design, so they appear with zeroed magic on the
Rx path again.
Expand the possible magic values from two: 0 (was "xmitted"/dropped or
did hit the TC hook) and 0x42 (hit the input XDP prog) to three: the new
one will mark frames hit the TC hook, so that they will elide both
@pkt_count_zero and @pkt_count_xdp. They can then be recycled to their
page_pool or returned to the page allocator, this won't affect the
counters anyhow. Just make sure to mark them as "input" (0x42) when they
appear on the Rx path again.
Also make an enum from those magics, so that they will be always visible
and can be changed in just one place anytime. This also eases adding any
new marks later on.

Link: https://github.com/kernel-patches/bpf/actions/runs/4386538411/jobs/7681081789
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230313215553.1045175-2-aleksander.lobakin@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-14 15:20:04 -07:00
Alexei Starovoitov f25fd60882 selftests/bpf: Add various tests to check helper access into ptr_to_btf_id.
Add various tests to check helper access into ptr_to_btf_id.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230313235845.61029-4-alexei.starovoitov@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-13 23:08:21 -07:00
Ross Zwisler ab4c15feb2 selftests/bpf: use canonical ftrace path
The canonical location for the tracefs filesystem is at
/sys/kernel/tracing.

But, from Documentation/trace/ftrace.rst:

  Before 4.1, all ftrace tracing control files were within the debugfs
  file system, which is typically located at /sys/kernel/debug/tracing.
  For backward compatibility, when mounting the debugfs file system,
  the tracefs file system will be automatically mounted at:

  /sys/kernel/debug/tracing

Many tests in the bpf selftest code still refer to this older debugfs
path, so let's update them to avoid confusion.

Signed-off-by: Ross Zwisler <zwisler@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20230313205628.1058720-3-zwisler@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-13 21:51:30 -07:00
Alexei Starovoitov a33a6eaa19 Merge branch 'bpf: Allow reads from uninit stack'
Merge commit bf9bec4cb3 ("Merge branch 'bpf: Allow reads from uninit stack'")
from bpf-next to bpf tree to address verification issues in some programs
due to stack usage.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-13 13:21:22 -07:00
Dave Marchevsky 5d8d6634cc selftests/bpf: Add local kptr stashing test
Add a new selftest, local_kptr_stash, which uses bpf_kptr_xchg to stash
a bpf_obj_new-allocated object in a map. Test the following scenarios:

  * Stash two rb_nodes in an arraymap, don't unstash them, rely on map
    free to destruct them
  * Stash two rb_nodes in an arraymap, unstash the second one in a
    separate program, rely on map free to destruct first

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230310230743.2320707-4-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 16:38:05 -08:00
Alexei Starovoitov e8c8361cfd selftests/bpf: Fix progs/test_deny_namespace.c issues.
The following build error can be seen:
progs/test_deny_namespace.c:22:19: error: call to undeclared function 'BIT_LL'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
        __u64 cap_mask = BIT_LL(CAP_SYS_ADMIN);

The struct kernel_cap_struct no longer exists in the kernel as well.
Adjust bpf prog to fix both issues.

Fixes: f122a08b19 ("capability: just use a 'u64' instead of a 'u32[2]' array")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 12:54:12 -08:00
Alexei Starovoitov 32513d40d9 selftests/bpf: Fix progs/find_vma_fail1.c build error.
The commit 11e456cae9 ("selftests/bpf: Fix compilation errors: Assign a value to a constant")
fixed the issue cleanly in bpf-next.
This is an alternative fix in bpf tree to avoid merge conflict between bpf and bpf-next.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 12:41:18 -08:00
Martin KaFai Lau 4659fba121 selftests/bpf: Add local-storage-create benchmark
This patch tests how many kmallocs is needed to create and free
a batch of UDP sockets and each socket has a 64bytes bpf storage.
It also measures how fast the UDP sockets can be created.

The result is from my qemu setup.

Before bpf_mem_cache_alloc/free:
./bench -p 1 local-storage-create
Setting up benchmark 'local-storage-create'...
Benchmark 'local-storage-create' started.
Iter   0 ( 73.193us): creates  213.552k/s (213.552k/prod), 3.09 kmallocs/create
Iter   1 (-20.724us): creates  211.908k/s (211.908k/prod), 3.09 kmallocs/create
Iter   2 (  9.280us): creates  212.574k/s (212.574k/prod), 3.12 kmallocs/create
Iter   3 ( 11.039us): creates  213.209k/s (213.209k/prod), 3.12 kmallocs/create
Iter   4 (-11.411us): creates  213.351k/s (213.351k/prod), 3.12 kmallocs/create
Iter   5 ( -7.915us): creates  214.754k/s (214.754k/prod), 3.12 kmallocs/create
Iter   6 ( 11.317us): creates  210.942k/s (210.942k/prod), 3.12 kmallocs/create
Summary: creates  212.789 ± 1.310k/s (212.789k/prod), 3.12 kmallocs/create

After bpf_mem_cache_alloc/free:
./bench -p 1 local-storage-create
Setting up benchmark 'local-storage-create'...
Benchmark 'local-storage-create' started.
Iter   0 ( 68.265us): creates  243.984k/s (243.984k/prod), 1.04 kmallocs/create
Iter   1 ( 30.357us): creates  238.424k/s (238.424k/prod), 1.04 kmallocs/create
Iter   2 (-18.712us): creates  232.963k/s (232.963k/prod), 1.04 kmallocs/create
Iter   3 (-15.885us): creates  238.879k/s (238.879k/prod), 1.04 kmallocs/create
Iter   4 (  5.590us): creates  237.490k/s (237.490k/prod), 1.04 kmallocs/create
Iter   5 (  8.577us): creates  237.521k/s (237.521k/prod), 1.04 kmallocs/create
Iter   6 ( -6.263us): creates  238.508k/s (238.508k/prod), 1.04 kmallocs/create
Summary: creates  237.298 ± 2.198k/s (237.298k/prod), 1.04 kmallocs/create

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230308065936.1550103-18-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 11:06:02 -08:00
Martin KaFai Lau 1f443d0f2b selftests/bpf: Check freeing sk->sk_local_storage with sk_local_storage->smap is NULL
This patch tweats the socket_bind bpf prog to test the
local_storage->smap == NULL case in the bpf_local_storage_free()
code path. The idea is to create the local_storage with
the sk_storage_map's selem first. Then add the sk_storage_map2's selem
and then delete the earlier sk_storeage_map's selem.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230308065936.1550103-17-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 11:06:02 -08:00
Martin KaFai Lau 57ef77152b selftests/bpf: Replace CHECK with ASSERT in test_local_storage
This patch migrates the CHECK macro to ASSERT macro.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20230308065936.1550103-16-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 11:06:02 -08:00
David Vernet 4a54de6596 bpf/selftests: Fix send_signal tracepoint tests
The send_signal tracepoint tests are non-deterministically failing in
CI. The test works as follows:

1. Two pairs of file descriptors are created using the pipe() function.
   One pair is used to communicate between a parent process -> child
   process, and the other for the reverse direction.

2. A child is fork()'ed. The child process registers a signal handler,
   notifies its parent that the signal handler is registered, and then
   and waits for its parent to have enabled a BPF program that sends a
   signal.

3. The parent opens and loads a BPF skeleton with programs that send
   signals to the child process. The different programs are triggered by
   different perf events (either NMI or normal perf), or by regular
   tracepoints. The signal is delivered to the child whenever the child
   triggers the program.

4. The child's signal handler is invoked, which sets a flag saying that
   the signal handler was reached. The child then signals to the parent
   that it received the signal, and the test ends.

The perf testcases (send_signal_perf{_thread} and
send_signal_nmi{_thread}) work 100% of the time, but the tracepoint
testcases fail non-deterministically because the tracepoint is not
always being fired for the child.

There are two tracepoint programs registered in the test:
'tracepoint/sched/sched_switch', and
'tracepoint/syscalls/sys_enter_nanosleep'. The child never intentionally
blocks, nor sleeps, so neither tracepoint is guaranteed to be triggered.
To fix this, we can have the child trigger the nanosleep program with a
usleep().

Before this patch, the test would fail locally every 2-3 runs. Now, it
doesn't fail after more than 1000 runs.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230310061909.1420887-1-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 10:36:46 -08:00
Andrii Nakryiko 3d5a55ddc2 selftests/bpf: make BPF compiler flags stricter
We recently added -Wuninitialized, but it's not enough to catch various
silly mistakes or omissions. Let's go all the way to -Wall, just like we
do for user-space code.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230309054015.4068562-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 08:14:08 -08:00
Andrii Nakryiko c8ed668593 selftests/bpf: fix lots of silly mistakes pointed out by compiler
Once we enable -Wall for BPF sources, compiler will complain about lots
of unused variables, variables that are set but never read, etc.

Fix all these issues first before enabling -Wall in Makefile.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230309054015.4068562-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 08:14:08 -08:00
Andrii Nakryiko 713461b895 selftests/bpf: add __sink() macro to fake variable consumption
Add __sink(expr) macro that forces compiler to believe that passed in
expression is both read and written. It used a simple embedded asm for
this. This is useful in a lot of tests where we assign value to some variable
to trigger some action, but later don't read variable, causing compiler
to complain (if corresponding compiler warnings are turned on, which
we'll do in the next patch).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230309054015.4068562-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 08:14:07 -08:00
Andrii Nakryiko 2498e6231b selftests/bpf: prevent unused variable warning in bpf_for()
Add __attribute__((unused)) to inner __p variable inside bpf_for(),
bpf_for_each(), and bpf_repeat() macros to avoid compiler warnings about
unused variable.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230309054015.4068562-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-10 08:14:07 -08:00
Jakub Kicinski d0ddf5065f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Documentation/bpf/bpf_devel_QA.rst
  b7abcd9c65 ("bpf, doc: Link to submitting-patches.rst for general patch submission info")
  d56b0c461d ("bpf, docs: Fix link to netdev-FAQ target")
https://lore.kernel.org/all/20230307095812.236eb1be@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-09 22:22:11 -08:00
Yonghong Song 63d78b7e8c selftests/bpf: Workaround verification failure for fexit_bpf2bpf/func_replace_return_code
With latest llvm17, selftest fexit_bpf2bpf/func_replace_return_code
has the following verification failure:

  0: R1=ctx(off=0,imm=0) R10=fp0
  ; int connect_v4_prog(struct bpf_sock_addr *ctx)
  0: (bf) r7 = r1                       ; R1=ctx(off=0,imm=0) R7_w=ctx(off=0,imm=0)
  1: (b4) w6 = 0                        ; R6_w=0
  ; memset(&tuple.ipv4.saddr, 0, sizeof(tuple.ipv4.saddr));
  ...
  ; return do_bind(ctx) ? 1 : 0;
  179: (bf) r1 = r7                     ; R1=ctx(off=0,imm=0) R7=ctx(off=0,imm=0)
  180: (85) call pc+147
  Func#3 is global and valid. Skipping.
  181: R0_w=scalar()
  181: (bc) w6 = w0                     ; R0_w=scalar() R6_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
  182: (05) goto pc-129
  ; }
  54: (bc) w0 = w6                      ; R0_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
  55: (95) exit
  At program exit the register R0 has value (0x0; 0xffffffff) should have been in (0x0; 0x1)
  processed 281 insns (limit 1000000) max_states_per_insn 1 total_states 26 peak_states 26 mark_read 13
  -- END PROG LOAD LOG --
  libbpf: prog 'connect_v4_prog': failed to load: -22

The corresponding source code:

  __attribute__ ((noinline))
  int do_bind(struct bpf_sock_addr *ctx)
  {
        struct sockaddr_in sa = {};

        sa.sin_family = AF_INET;
        sa.sin_port = bpf_htons(0);
        sa.sin_addr.s_addr = bpf_htonl(SRC_REWRITE_IP4);

        if (bpf_bind(ctx, (struct sockaddr *)&sa, sizeof(sa)) != 0)
                return 0;

        return 1;
  }
  ...
  SEC("cgroup/connect4")
  int connect_v4_prog(struct bpf_sock_addr *ctx)
  {
  ...
        return do_bind(ctx) ? 1 : 0;
  }

Insn 180 is a call to 'do_bind'. The call's return value is also the return value
for the program. Since do_bind() returns 0/1, so it is legitimate for compiler to
optimize 'return do_bind(ctx) ? 1 : 0' to 'return do_bind(ctx)'. However, such
optimization breaks verifier as the return value of 'do_bind()' is marked as any
scalar which violates the requirement of prog return value 0/1.

There are two ways to fix this problem, (1) changing 'return 1' in do_bind() to
e.g. 'return 10' so the compiler has to do 'do_bind(ctx) ? 1 :0', or (2)
suggested by Andrii, marking do_bind() with __weak attribute so the compiler
cannot make any assumption on do_bind() return value.

This patch adopted adding __weak approach which is simpler and more resistant
to potential compiler optimizations.

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230310012410.2920570-1-yhs@fb.com
2023-03-09 18:59:54 -08:00
Lorenzo Bianconi c1cd734c1b selftests/bpf: Improve error logs in XDP compliance test tool
Improve some error logs reported in the XDP compliance test tool.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/212fc5bd214ff706f6ef1acbe7272cf4d803ca9c.1678382940.git.lorenzo@kernel.org
2023-03-09 20:52:40 +01:00
Lorenzo Bianconi 27a36bc3cd selftests/bpf: Use ifname instead of ifindex in XDP compliance test tool
Rely on interface name instead of interface index in error messages or
logs from XDP compliance test tool.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/7dc5a8ff56c252b1a7ae29b059d0b2b1543c8b5d.1678382940.git.lorenzo@kernel.org
2023-03-09 20:52:30 +01:00
Martin KaFai Lau a686557631 selftests/bpf: Fix flaky fib_lookup test
There is a report that fib_lookup test is flaky when running in parallel.
A symptom of slowness or delay. An example:

Testing IPv6 stale neigh
set_lookup_params:PASS:inet_pton(IPV6_IFACE_ADDR) 0 nsec
test_fib_lookup:PASS:bpf_prog_test_run_opts 0 nsec
test_fib_lookup:FAIL:fib_lookup_ret unexpected fib_lookup_ret: actual 0 != expected 7
test_fib_lookup:FAIL:dmac not match unexpected dmac not match: actual 1 != expected 0
dmac expected 11:11:11:11:11:11 actual 00:00:00:00:00:00

[ Note that the "fib_lookup_ret unexpected fib_lookup_ret actual 0 ..."
  is reversed in terms of expected and actual value. Fixing in this
  patch also. ]

One possibility is the testing stale neigh entry was marked dead by the
gc (in neigh_periodic_work). The default gc_stale_time sysctl is 60s.
This patch increases it to 15 mins.

It also:

- fixes the reversed arg (actual vs expected) in one of the
  ASSERT_EQ test
- removes the nodad command arg when adding v4 neigh entry which
  currently has a warning.

Fixes: 168de02335 ("selftests/bpf: Add bpf_fib_lookup test")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230309060244.3242491-1-martin.lau@linux.dev
2023-03-09 20:37:55 +01:00
Linus Torvalds 44889ba56c Networking fixes for 6.3-rc2, including fixes from netfilter, bpf
Current release - regressions:
 
   - core: avoid skb end_offset change in __skb_unclone_keeptruesize()
 
   - sched:
     - act_connmark: handle errno on tcf_idr_check_alloc
     - flower: fix fl_change() error recovery path
 
   - ieee802154: prevent user from crashing the host
 
 Current release - new code bugs:
 
   - eth: bnxt_en: fix the double free during device removal
 
   - tools: ynl:
     - fix enum-as-flags in the generic CLI
     - fully inherit attrs in subsets
     - re-license uniformly under GPL-2.0 or BSD-3-clause
 
 Previous releases - regressions:
 
   - core: use indirect calls helpers for sk_exit_memory_pressure()
 
   - tls:
     - fix return value for async crypto
     - avoid hanging tasks on the tx_lock
 
   - eth: ice: copy last block omitted in ice_get_module_eeprom()
 
 Previous releases - always broken:
 
   - core: avoid double iput when sock_alloc_file fails
 
   - af_unix: fix struct pid leaks in OOB support
 
   - tls:
     - fix possible race condition
     - fix device-offloaded sendpage straddling records
 
   - bpf:
     - sockmap: fix an infinite loop error
     - test_run: fix &xdp_frame misplacement for LIVE_FRAMES
     - fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR
 
   - netfilter: tproxy: fix deadlock due to missing BH disable
 
   - phylib: get rid of unnecessary locking
 
   - eth: bgmac: fix *initial* chip reset to support BCM5358
 
   - eth: nfp: fix csum for ipsec offload
 
   - eth: mtk_eth_soc: fix RX data corruption issue
 
 Misc:
 
   - usb: qmi_wwan: add telit 0x1080 composition
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmQJzQISHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOky5YP/04Dbsbeqpk0Q94axmjoaS0J/4rW49js
 RaA7v8ci7sL1omW8k5tILPXniAouN4YHNOCW1KbLBMR5O7lyn9qM1RteHOIpOmte
 TLzAw+6Wl7CyGiiqirv2GU96Wd/jZoZpPXFZz/gXP59GnkChSHzQcpexmz0nrmxI
 eCRSs+qm+re3wmDKTYm5C+g+420PNXu9JItPnTNf+nTkTBxpmOEMyry03I0taXKS
 wceQHB2q5E0sSWXDfkxG/pmUuYTj3AdRSQ+vo+FLevSs/LWeThs2I6pT5sn8XS76
 1S8Lh6FytfBhyalFmRtrpqIJYyGae5MwEXQ29ddfmF4OFFLedx3IH0+JFQxTE9So
 i4gaXmM5SUI7c5vhib097xUISoLxKqqXQVQQSQ1MPZRfXtVubbA2gCv+vh6fXVoj
 zQYatZOLM7KT9q4Pw8A+9bJPof/FV+ObC67pbGQbJJgBoy+oOixDuP+x5DYT384L
 /5XS+23OZiFe7bvQoE/0SQMeRk3lF2XkS5l9gSbdSnGQPiaOqKhDgkoCmdkn1jvg
 qtkBS6+tRRoOBNsjC4r4eFXBVOQ1+myyjZetBnEOaSp22FaTJFQh9qX3AMFIHbUy
 m0jDi9OJZSWHICd6KNWPm3JK43cMjiyZbGftYqOHhuY5HN30vQN6sl7DXIJ0rIcE
 myHMfizwqmGT
 =hSXM
 -----END PGP SIGNATURE-----

Merge tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter and bpf.

  Current release - regressions:

   - core: avoid skb end_offset change in __skb_unclone_keeptruesize()

   - sched:
      - act_connmark: handle errno on tcf_idr_check_alloc
      - flower: fix fl_change() error recovery path

   - ieee802154: prevent user from crashing the host

  Current release - new code bugs:

   - eth: bnxt_en: fix the double free during device removal

   - tools: ynl:
      - fix enum-as-flags in the generic CLI
      - fully inherit attrs in subsets
      - re-license uniformly under GPL-2.0 or BSD-3-clause

  Previous releases - regressions:

   - core: use indirect calls helpers for sk_exit_memory_pressure()

   - tls:
      - fix return value for async crypto
      - avoid hanging tasks on the tx_lock

   - eth: ice: copy last block omitted in ice_get_module_eeprom()

  Previous releases - always broken:

   - core: avoid double iput when sock_alloc_file fails

   - af_unix: fix struct pid leaks in OOB support

   - tls:
      - fix possible race condition
      - fix device-offloaded sendpage straddling records

   - bpf:
      - sockmap: fix an infinite loop error
      - test_run: fix &xdp_frame misplacement for LIVE_FRAMES
      - fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR

   - netfilter: tproxy: fix deadlock due to missing BH disable

   - phylib: get rid of unnecessary locking

   - eth: bgmac: fix *initial* chip reset to support BCM5358

   - eth: nfp: fix csum for ipsec offload

   - eth: mtk_eth_soc: fix RX data corruption issue

  Misc:

   - usb: qmi_wwan: add telit 0x1080 composition"

* tag 'net-6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
  tools: ynl: fix enum-as-flags in the generic CLI
  tools: ynl: move the enum classes to shared code
  net: avoid double iput when sock_alloc_file fails
  af_unix: fix struct pid leaks in OOB support
  eth: fealnx: bring back this old driver
  net: dsa: mt7530: permit port 5 to work without port 6 on MT7621 SoC
  net: microchip: sparx5: fix deletion of existing DSCP mappings
  octeontx2-af: Unlock contexts in the queue context cache in case of fault detection
  net/smc: fix fallback failed while sendmsg with fastopen
  ynl: re-license uniformly under GPL-2.0 OR BSD-3-Clause
  mailmap: update entries for Stephen Hemminger
  mailmap: add entry for Maxim Mikityanskiy
  nfc: change order inside nfc_se_io error path
  ethernet: ice: avoid gcc-9 integer overflow warning
  ice: don't ignore return codes in VSI related code
  ice: Fix DSCP PFC TLV creation
  net: usb: qmi_wwan: add Telit 0x1080 composition
  net: usb: cdc_mbim: avoid altsetting toggling for Telit FE990
  netfilter: conntrack: adopt safer max chain length
  net: tls: fix device-offloaded sendpage straddling records
  ...
2023-03-09 10:56:58 -08:00
Andrii Nakryiko 7e86a8c4ac selftests/bpf: implement and test custom testmod_seq iterator
Implement a trivial iterator returning same specified integer value
N times as part of bpf_testmod kernel module. Add selftests to validate
everything works end to end.

We also reuse these tests as "verification-only" tests to validate that
kernel prints the state of custom kernel module-defined iterator correctly:

  fp-16=iter_testmod_seq(ref_id=1,state=drained,depth=0)

"testmod_seq" part is an iterator type, and is coming from module's BTF
data dynamically at runtime.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-9-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-08 16:19:51 -08:00
Andrii Nakryiko f59b146092 selftests/bpf: add number iterator tests
Add number iterator (bpf_iter_num_{new,next,destroy}()) tests,
validating the correct handling of various corner and common cases
*at runtime*.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-8-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-08 16:19:51 -08:00
Andrii Nakryiko 57400dcce6 selftests/bpf: add iterators tests
Add various tests for open-coded iterators. Some of them excercise
various possible coding patterns in C, some go down to low-level
assembly for more control over various conditions, especially invalid
ones.

We also make use of bpf_for(), bpf_for_each(), bpf_repeat() macros in
some of these tests.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-08 16:19:51 -08:00
Andrii Nakryiko 8c2b5e9050 selftests/bpf: add bpf_for_each(), bpf_for(), and bpf_repeat() macros
Add bpf_for_each(), bpf_for(), and bpf_repeat() macros that make writing
open-coded iterator-based loops much more convenient and natural. These
macros utilize cleanup attribute to ensure proper destruction of the
iterator and thanks to that manage to provide the ergonomics that is
very close to C language's for() construct. Typical loop would look like:

  int i;
  int arr[N];

  bpf_for(i, 0, N) {
      /* verifier will know that i >= 0 && i < N, so could be used to
       * directly access array elements with no extra checks
       */
       arr[i] = i;
  }

bpf_repeat() is very similar, but it doesn't expose iteration number and
is meant as a simple "repeat action N times" loop:

  bpf_repeat(N) { /* whatever, N times */ }

Note that `break` and `continue` statements inside the {} block work as
expected.

bpf_for_each() is a generalization over any kind of BPF open-coded
iterator allowing to use for-each-like approach instead of calling
low-level bpf_iter_<type>_{new,next,destroy}() APIs explicitly. E.g.:

  struct cgroup *cg;

  bpf_for_each(cgroup, cg, some, input, args) {
      /* do something with each cg */
  }

would call (not-yet-implemented) bpf_iter_cgroup_{new,next,destroy}()
functions to form a loop over cgroups, where `some, input, args` are
passed verbatim into constructor as

  bpf_iter_cgroup_new(&it, some, input, args).

As a first demonstration, add pyperf variant based on the bpf_for() loop.

Also clean up a few tests that either included bpf_misc.h header
unnecessarily from the user-space, which is unsupported, or included it
before any common types are defined (and thus leading to unnecessary
compilation warnings, potentially).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230308184121.1165081-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-08 16:19:51 -08:00
Roberto Sassu 12fabae03c selftests/bpf: Fix IMA test
Commit 62622dab0a ("ima: return IMA digest value only when IMA_COLLECTED
flag is set") caused bpf_ima_inode_hash() to refuse to give non-fresh
digests. IMA test #3 assumed the old behavior, that bpf_ima_inode_hash()
still returned also non-fresh digests.

Correct the test by accepting both cases. If the samples returned are 1,
assume that the commit above is applied and that the returned digest is
fresh. If the samples returned are 2, assume that the commit above is not
applied, and check both the non-fresh and fresh digest.

Fixes: 62622dab0a ("ima: return IMA digest value only when IMA_COLLECTED flag is set")
Reported-by: David Vernet <void@manifault.com>
Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://lore.kernel.org/bpf/20230308103713.1681200-1-roberto.sassu@huaweicloud.com
2023-03-08 11:15:39 -08:00
Lorenz Bauer dfdd608c3b selftests/bpf: check that modifier resolves after pointer
Add a regression test that ensures that a VAR pointing at a
modifier which follows a PTR (or STRUCT or ARRAY) is resolved
correctly by the datasec validator.

Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/r/20230306112138.155352-3-lmb@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-06 11:44:14 -08:00
Alexander Lobakin 294635a816 bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
&xdp_buff and &xdp_frame are bound in a way that

xdp_buff->data_hard_start == xdp_frame

It's always the case and e.g. xdp_convert_buff_to_frame() relies on
this.
IOW, the following:

	for (u32 i = 0; i < 0xdead; i++) {
		xdpf = xdp_convert_buff_to_frame(&xdp);
		xdp_convert_frame_to_buff(xdpf, &xdp);
	}

shouldn't ever modify @xdpf's contents or the pointer itself.
However, "live packet" code wrongly treats &xdp_frame as part of its
context placed *before* the data_hard_start. With such flow,
data_hard_start is sizeof(*xdpf) off to the right and no longer points
to the XDP frame.

Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
places and praying that there are no more miscalcs left somewhere in the
code, unionize ::frm with ::data in a flex array, so that both starts
pointing to the actual data_hard_start and the XDP frame actually starts
being a part of it, i.e. a part of the headroom, not the context.
A nice side effect is that the maximum frame size for this mode gets
increased by 40 bytes, as xdp_buff::frame_sz includes everything from
data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
info.
Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
hardcoded for 64 bit && 4k pages, it can be made more flexible later on.

Minor: align `&head->data` with how `head->frm` is assigned for
consistency.
Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
clarity.

(was found while testing XDP traffic generator on ice, which calls
 xdp_convert_frame_to_buff() for each XDP frame)

Fixes: b530e9e106 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230224163607.2994755-1-aleksander.lobakin@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-03-06 11:15:54 -08:00
Menglong Dong c7aec81b31 selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode
Add the testing for kprobe/uprobe attaching in default, legacy, perf and
link mode. And the testing passed:

./test_progs -t attach_probe
$5/1     attach_probe/manual-default:OK
$5/2     attach_probe/manual-legacy:OK
$5/3     attach_probe/manual-perf:OK
$5/4     attach_probe/manual-link:OK
$5/5     attach_probe/auto:OK
$5/6     attach_probe/kprobe-sleepable:OK
$5/7     attach_probe/uprobe-lib:OK
$5/8     attach_probe/uprobe-sleepable:OK
$5/9     attach_probe/uprobe-ref_ctr:OK
$5       attach_probe:OK
Summary: 1/9 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Biao Jiang <benbjiang@tencent.com>
Link: https://lore.kernel.org/bpf/20230306064833.7932-4-imagedong@tencent.com
2023-03-06 09:38:08 -08:00
Menglong Dong 7391ec6391 selftests/bpf: Split test_attach_probe into multi subtests
In order to adapt to the older kernel, now we split the "attach_probe"
testing into multi subtests:

  manual // manual attach tests for kprobe/uprobe
  auto // auto-attach tests for kprobe and uprobe
  kprobe-sleepable // kprobe sleepable test
  uprobe-lib // uprobe tests for library function by name
  uprobe-sleepable // uprobe sleepable test
  uprobe-ref_ctr // uprobe ref_ctr test

As sleepable kprobe needs to set BPF_F_SLEEPABLE flag before loading,
we need to move it to a stand alone skel file, in case of it is not
supported by kernel and make the whole loading fail.

Therefore, we can only enable part of the subtests for older kernel.

Signed-off-by: Menglong Dong <imagedong@tencent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Biao Jiang <benbjiang@tencent.com>
Link: https://lore.kernel.org/bpf/20230306064833.7932-3-imagedong@tencent.com
2023-03-06 09:38:08 -08:00
Andrii Nakryiko fffc893b6b selftests/bpf: adjust log_fixup's buffer size for proper truncation
Adjust log_fixup's expected buffer length to fix the test. It's pretty
finicky in its length expectation, but it doesn't break often. So just
adjust the length to work on current kernel and with follow up iterator
changes as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230302235015.2044271-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04 11:14:32 -08:00
Andrii Nakryiko 6f876e75d3 selftests/bpf: enhance align selftest's expected log matching
Allow to search for expected register state in all the verifier log
output that's related to specified instruction number.

See added comment for an example of possible situation that is happening
due to a simple enhancement done in the next patch, which fixes handling
of env->test_state_freq flag in state checkpointing logic.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230302235015.2044271-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-04 11:14:31 -08:00
Eduard Zingerman 71cf4d027a selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access()
Function verifier.c:convert_ctx_access() applies some rewrites to BPF
instructions that read or write BPF program context. This commit adds
machinery to allow test cases that inspect BPF program after these
rewrites are applied.

An example of a test case:

  {
        // Shorthand for field offset and size specification
	N(CGROUP_SOCKOPT, struct bpf_sockopt, retval),

        // Pattern generated for field read
	.read  = "$dst = *(u64 *)($ctx + bpf_sockopt_kern::current_task);"
		 "$dst = *(u64 *)($dst + task_struct::bpf_ctx);"
		 "$dst = *(u32 *)($dst + bpf_cg_run_ctx::retval);",

        // Pattern generated for field write
	.write = "*(u64 *)($ctx + bpf_sockopt_kern::tmp_reg) = r9;"
		 "r9 = *(u64 *)($ctx + bpf_sockopt_kern::current_task);"
		 "r9 = *(u64 *)(r9 + task_struct::bpf_ctx);"
		 "*(u32 *)(r9 + bpf_cg_run_ctx::retval) = $src;"
		 "r9 = *(u64 *)($ctx + bpf_sockopt_kern::tmp_reg);" ,
  },

For each test case, up to three programs are created:
- One that uses BPF_LDX_MEM to read the context field.
- One that uses BPF_STX_MEM to write to the context field.
- One that uses BPF_ST_MEM to write to the context field.

The disassembly of each program is compared with the pattern specified
in the test case.

Kernel code for disassembly is reused (as is in the bpftool).
To keep Makefile changes to the minimum, symbolic links to
`kernel/bpf/disasm.c` and `kernel/bpf/disasm.h ` are added.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230304011247.566040-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03 21:41:46 -08:00
Eduard Zingerman 806f81cd1e selftests/bpf: test if pointer type is tracked for BPF_ST_MEM
Check that verifier tracks pointer types for BPF_ST_MEM instructions
and reports error if pointer types do not match for different
execution branches.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230304011247.566040-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03 21:41:46 -08:00
Eduard Zingerman 0d80a619c1 bpf: allow ctx writes using BPF_ST_MEM instruction
Lift verifier restriction to use BPF_ST_MEM instructions to write to
context data structures. This requires the following changes:
 - verifier.c:do_check() for BPF_ST updated to:
   - no longer forbid writes to registers of type PTR_TO_CTX;
   - track dst_reg type in the env->insn_aux_data[...].ptr_type field
     (same way it is done for BPF_STX and BPF_LDX instructions).
 - verifier.c:convert_ctx_access() and various callbacks invoked by
   it are updated to handled BPF_ST instruction alongside BPF_STX.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230304011247.566040-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-03 21:41:46 -08:00
Alexei Starovoitov 6fcd486b3a bpf: Refactor RCU enforcement in the verifier.
bpf_rcu_read_lock/unlock() are only available in clang compiled kernels. Lack
of such key mechanism makes it impossible for sleepable bpf programs to use RCU
pointers.

Allow bpf_rcu_read_lock/unlock() in GCC compiled kernels (though GCC doesn't
support btf_type_tag yet) and allowlist certain field dereferences in important
data structures like tast_struct, cgroup, socket that are used by sleepable
programs either as RCU pointer or full trusted pointer (which is valid outside
of RCU CS). Use BTF_TYPE_SAFE_RCU and BTF_TYPE_SAFE_TRUSTED macros for such
tagging. They will be removed once GCC supports btf_type_tag.

With that refactor check_ptr_to_btf_access(). Make it strict in enforcing
PTR_TRUSTED and PTR_UNTRUSTED while deprecating old PTR_TO_BTF_ID without
modifier flags. There is a chance that this strict enforcement might break
existing programs (especially on GCC compiled kernels), but this cleanup has to
start sooner than later. Note PTR_TO_CTX access still yields old deprecated
PTR_TO_BTF_ID. Once it's converted to strict PTR_TRUSTED or PTR_UNTRUSTED the
kfuncs and helpers will be able to default to KF_TRUSTED_ARGS. KF_RCU will
remain as a weaker version of KF_TRUSTED_ARGS where obj refcnt could be 0.

Adjust rcu_read_lock selftest to run on gcc and clang compiled kernels.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-7-alexei.starovoitov@gmail.com
2023-03-03 17:42:20 +01:00
Alexei Starovoitov 0047d8343f selftests/bpf: Tweak cgroup kfunc test.
Adjust cgroup kfunc test to dereference RCU protected cgroup pointer
as PTR_TRUSTED and pass into KF_TRUSTED_ARGS kfunc.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-6-alexei.starovoitov@gmail.com
2023-03-03 17:42:20 +01:00
Alexei Starovoitov 838bd4ac9a selftests/bpf: Add a test case for kptr_rcu.
Tweak existing map_kptr test to check kptr_rcu.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-5-alexei.starovoitov@gmail.com
2023-03-03 17:42:20 +01:00
Alexei Starovoitov 20c09d92fa bpf: Introduce kptr_rcu.
The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps.
The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU.
Derefrence of other kptr-s returns PTR_UNTRUSTED.

For example:
struct map_value {
   struct cgroup __kptr *cgrp;
};

SEC("tp_btf/cgroup_mkdir")
int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
{
  struct cgroup *cg, *cg2;

  cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
  bpf_kptr_xchg(&v->cgrp, cg);

  cg2 = v->cgrp; // This is new feature introduced by this patch.
  // cg2 is PTR_MAYBE_NULL | MEM_RCU.
  // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero

  if (cg2)
    bpf_cgroup_ancestor(cg2, level); // safe to do.
}

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com
2023-03-03 17:42:20 +01:00
Alexei Starovoitov 03b77e17ae bpf: Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted.
__kptr meant to store PTR_UNTRUSTED kernel pointers inside bpf maps.
The concept felt useful, but didn't get much traction,
since bpf_rdonly_cast() was added soon after and bpf programs received
a simpler way to access PTR_UNTRUSTED kernel pointers
without going through restrictive __kptr usage.

Rename __kptr_ref -> __kptr and __kptr -> __kptr_untrusted to indicate
its intended usage.
The main goal of __kptr_untrusted was to read/write such pointers
directly while bpf_kptr_xchg was a mechanism to access refcnted
kernel pointers. The next patch will allow RCU protected __kptr access
with direct read. At that point __kptr_untrusted will be deprecated.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230303041446.3630-2-alexei.starovoitov@gmail.com
2023-03-03 17:42:20 +01:00
Tero Kristo 944459e88b selftests/bpf: Add absolute timer test
Add test for the absolute BPF timer under the existing timer tests. This
will run the timer two times with 1us expiration time, and then re-arm
the timer at ~35s in the future. At the end, it is verified that the
absolute timer expired exactly two times.

Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com>
Link: https://lore.kernel.org/r/20230302114614.2985072-3-tero.kristo@linux.intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-02 22:41:32 -08:00
Dave Marchevsky ec97a76f11 selftests/bpf: Add -Wuninitialized flag to bpf prog flags
Per C99 standard [0], Section 6.7.8, Paragraph 10:

  If an object that has automatic storage duration is not initialized
  explicitly, its value is indeterminate.

And in the same document, in appendix "J.2 Undefined behavior":

  The behavior is undefined in the following circumstances:
  [...]
  The value of an object with automatic storage duration is used while
  it is indeterminate (6.2.4, 6.7.8, 6.8).

This means that use of an uninitialized stack variable is undefined
behavior, and therefore that clang can choose to do a variety of scary
things, such as not generating bytecode for "bunch of useful code" in
the below example:

  void some_func()
  {
    int i;
    if (!i)
      return;
    // bunch of useful code
  }

To add insult to injury, if some_func above is a helper function for
some BPF program, clang can choose to not generate an "exit" insn,
causing verifier to fail with "last insn is not an exit or jmp". Going
from that verification failure to the root cause of uninitialized use
is certain to be frustrating.

This patch adds -Wuninitialized to the cflags for selftest BPF progs and
fixes up existing instances of uninitialized use.

  [0]: https://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Cc: David Vernet <void@manifault.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230303005500.1614874-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-02 22:38:50 -08:00
Andrii Nakryiko 35cbf7f915 selftests/bpf: Support custom per-test flags and multiple expected messages
Extend __flag attribute by allowing to specify one of the following:
 * BPF_F_STRICT_ALIGNMENT
 * BPF_F_ANY_ALIGNMENT
 * BPF_F_TEST_RND_HI32
 * BPF_F_TEST_STATE_FREQ
 * BPF_F_SLEEPABLE
 * BPF_F_XDP_HAS_FRAGS
 * Some numeric value

Extend __msg attribute by allowing to specify multiple exepcted messages.
All messages are expected to be present in the verifier log in the
order of application.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230301175417.3146070-2-eddyz87@gmail.com

[ Eduard: added commit message, formatting, comments ]
2023-03-01 11:13:42 -08:00
Tiezhu Yang be35f4af71 selftests/bpf: Set __BITS_PER_LONG if target is bpf for LoongArch
If target is bpf, there is no __loongarch__ definition, __BITS_PER_LONG
defaults to 32, __NR_nanosleep is not defined:

  #if defined(__ARCH_WANT_TIME32_SYSCALLS) || __BITS_PER_LONG != 32
  #define __NR_nanosleep 101
  __SC_3264(__NR_nanosleep, sys_nanosleep_time32, sys_nanosleep)
  #endif

Work around this problem, by explicitly setting __BITS_PER_LONG to
__loongarch_grlen which is defined by compiler as 64 for LA64.

This is similar with commit 36e70b9b06 ("selftests, bpf: Fix broken
riscv build").

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1677585781-21628-1-git-send-email-yangtiezhu@loongson.cn
2023-03-01 11:05:50 -08:00
Kumar Kartikeya Dwivedi 85521e1ea4 selftests/bpf: Add more tests for kptrs in maps
Firstly, ensure programs successfully load when using all of the
supported maps. Then, extend existing tests to test more cases at
runtime. We are currently testing both the synchronous freeing of items
and asynchronous destruction when map is freed, but the code needs to be
adjusted a bit to be able to also accomodate support for percpu maps.

We now do a delete on the item (and update for array maps which has a
similar effect for kptrs) to perform a synchronous free of the kptr, and
test destruction both for the synchronous and asynchronous deletion.
Next time the program runs, it should observe the refcount as 1 since
all existing references should have been released by then. By running
the program after both possible paths freeing kptrs, we establish that
they correctly release resources. Next, we augment the existing test to
also test the same code path shared by all local storage maps using a
task local storage map.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230225154010.391965-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01 10:24:33 -08:00
Joanne Koong cfa7b01189 selftests/bpf: tests for using dynptrs to parse skb and xdp buffers
Test skb and xdp dynptr functionality in the following ways:

1) progs/test_cls_redirect_dynptr.c
   * Rewrite "progs/test_cls_redirect.c" test to use dynptrs to parse
     skb data

   * This is a great example of how dynptrs can be used to simplify a
     lot of the parsing logic for non-statically known values.

     When measuring the user + system time between the original version
     vs. using dynptrs, and averaging the time for 10 runs (using
     "time ./test_progs -t cls_redirect"):
         original version: 0.092 sec
         with dynptrs: 0.078 sec

2) progs/test_xdp_dynptr.c
   * Rewrite "progs/test_xdp.c" test to use dynptrs to parse xdp data

     When measuring the user + system time between the original version
     vs. using dynptrs, and averaging the time for 10 runs (using
     "time ./test_progs -t xdp_attach"):
         original version: 0.118 sec
         with dynptrs: 0.094 sec

3) progs/test_l4lb_noinline_dynptr.c
   * Rewrite "progs/test_l4lb_noinline.c" test to use dynptrs to parse
     skb data

     When measuring the user + system time between the original version
     vs. using dynptrs, and averaging the time for 10 runs (using
     "time ./test_progs -t l4lb_all"):
         original version: 0.062 sec
         with dynptrs: 0.081 sec

     For number of processed verifier instructions:
         original version: 6268 insns
         with dynptrs: 2588 insns

4) progs/test_parse_tcp_hdr_opt_dynptr.c
   * Add sample code for parsing tcp hdr opt lookup using dynptrs.
     This logic is lifted from a real-world use case of packet parsing
     in katran [0], a layer 4 load balancer. The original version
     "progs/test_parse_tcp_hdr_opt.c" (not using dynptrs) is included
     here as well, for comparison.

     When measuring the user + system time between the original version
     vs. using dynptrs, and averaging the time for 10 runs (using
     "time ./test_progs -t parse_tcp_hdr_opt"):
         original version: 0.031 sec
         with dynptrs: 0.045 sec

5) progs/dynptr_success.c
   * Add test case "test_skb_readonly" for testing attempts at writes
     on a prog type with read-only skb ctx.
   * Add "test_dynptr_skb_data" for testing that bpf_dynptr_data isn't
     supported for skb progs.

6) progs/dynptr_fail.c
   * Add test cases "skb_invalid_data_slice{1,2,3,4}" and
     "xdp_invalid_data_slice{1,2}" for testing that helpers that modify the
     underlying packet buffer automatically invalidate the associated
     data slice.
   * Add test cases "skb_invalid_ctx" and "xdp_invalid_ctx" for testing
     that prog types that do not support bpf_dynptr_from_skb/xdp don't
     have access to the API.
   * Add test case "dynptr_slice_var_len{1,2}" for testing that
     variable-sized len can't be passed in to bpf_dynptr_slice
   * Add test case "skb_invalid_slice_write" for testing that writes to a
     read-only data slice are rejected by the verifier.
   * Add test case "data_slice_out_of_bounds_skb" for testing that
     writes to an area outside the slice are rejected.
   * Add test case "invalid_slice_rdwr_rdonly" for testing that prog
     types that don't allow writes to packet data don't accept any calls
     to bpf_dynptr_slice_rdwr.

[0] https://github.com/facebookincubator/katran/blob/main/katran/lib/bpf/pckt_parsing.h

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230301154953.641654-11-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-01 10:05:19 -08:00
Linus Torvalds f122a08b19 capability: just use a 'u64' instead of a 'u32[2]' array
Back in 2008 we extended the capability bits from 32 to 64, and we did
it by extending the single 32-bit capability word from one word to an
array of two words.  It was then obfuscated by hiding the "2" behind two
macro expansions, with the reasoning being that maybe it gets extended
further some day.

That reasoning may have been valid at the time, but the last thing we
want to do is to extend the capability set any more.  And the array of
values not only causes source code oddities (with loops to deal with
it), but also results in worse code generation.  It's a lose-lose
situation.

So just change the 'u32[2]' into a 'u64' and be done with it.

We still have to deal with the fact that the user space interface is
designed around an array of these 32-bit values, but that was the case
before too, since the array layouts were different (ie user space
doesn't use an array of 32-bit values for individual capability masks,
but an array of 32-bit slices of multiple masks).

So that marshalling of data is actually simplified too, even if it does
remain somewhat obscure and odd.

This was all triggered by my reaction to the new "cap_isidentical()"
introduced recently.  By just using a saner data structure, it went from

	unsigned __capi;
	CAP_FOR_EACH_U32(__capi) {
		if (a.cap[__capi] != b.cap[__capi])
			return false;
	}
	return true;

to just being

	return a.val == b.val;

instead.  Which is rather more obvious both to humans and to compilers.

Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-01 10:01:22 -08:00
Rong Tao 11e456cae9 selftests/bpf: Fix compilation errors: Assign a value to a constant
Commit bc292ab00f6c("mm: introduce vma->vm_flags wrapper functions")
turns the vm_flags into a const variable.

Added bpf_find_vma test in commit f108662b27c9("selftests/bpf: Add tests
for bpf_find_vma") to assign values to variables that declare const in
find_vma_fail1.c programs, which is an error to the compiler and does not
test BPF verifiers. It is better to replace 'const vm_flags_t vm_flags'
with 'unsigned long vm_start' for testing.

    $ make -C tools/testing/selftests/bpf/ -j8
    ...
    progs/find_vma_fail1.c:16:16: error: cannot assign to non-static data
    member 'vm_flags' with const-qualified type 'const vm_flags_t' (aka
    'const unsigned long')
            vma->vm_flags |= 0x55;
            ~~~~~~~~~~~~~ ^
    ../tools/testing/selftests/bpf/tools/include/vmlinux.h:1898:20:
    note: non-static data member 'vm_flags' declared const here
                    const vm_flags_t vm_flags;
                    ~~~~~~~~~~~`~~~~~~^~~~~~~~

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/tencent_CB281722B3C1BD504C16CDE586CACC2BE706@qq.com
2023-02-27 11:47:48 -08:00
Tiezhu Yang 84c22fa83f selftests/bpf: Use __NR_prlimit64 instead of __NR_getrlimit in user_ringbuf test
After commit 80d7da1cac ("asm-generic: Drop getrlimit and setrlimit
syscalls from default list"), new architectures won't need to include
getrlimit and setrlimit, they are superseded with prlimit64.

In order to maintain compatibility for the new architectures, such as
LoongArch which does not define __NR_getrlimit, it is better to use
__NR_prlimit64 instead of __NR_getrlimit in user_ringbuf test to fix
the following build error:

    TEST-OBJ [test_progs] user_ringbuf.test.o
  tools/testing/selftests/bpf/prog_tests/user_ringbuf.c: In function 'kick_kernel_cb':
  tools/testing/selftests/bpf/prog_tests/user_ringbuf.c:593:17: error: '__NR_getrlimit' undeclared (first use in this function)
    593 |         syscall(__NR_getrlimit);
        |                 ^~~~~~~~~~~~~~
  tools/testing/selftests/bpf/prog_tests/user_ringbuf.c:593:17: note: each undeclared identifier is reported only once for each function it appears in
  make: *** [Makefile:573: tools/testing/selftests/bpf/user_ringbuf.test.o] Error 1
  make: Leaving directory 'tools/testing/selftests/bpf'

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1677235015-21717-4-git-send-email-yangtiezhu@loongson.cn
2023-02-27 09:49:40 -08:00
Hangbin Liu 02d6a057c7 selftests/bpf: run mptcp in a dedicated netns
The current mptcp test is run in init netns. If the user or default
system config disabled mptcp, the test will fail. Let's run the mptcp
test in a dedicated netns to avoid none kernel default mptcp setting.

Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20230224061343.506571-3-liuhangbin@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-24 17:34:23 -08:00
Hangbin Liu b61987d37c selftests/bpf: move SYS() macro into the test_progs.h
A lot of tests defined SYS() macro to run system calls with goto label.
Let's move this macro to test_progs.h and add configurable
"goto_label" as the first arg.

Suggested-by: Martin KaFai Lau <martin.lau@linux.dev>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20230224061343.506571-2-liuhangbin@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-24 17:34:09 -08:00
Tejun Heo d0093aaefa selftests/bpf: Add a test case for bpf_cgroup_from_id()
Add a test case for bpf_cgroup_from_id.

Signed-off-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/Y/bBlt+tPozcQgws@slm.duckdns.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-23 08:33:43 -08:00
Stanislav Fomichev 9fa0289285 selftests/bpf: Fix BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL for empty flow label
Kernel's flow dissector continues to parse the packet when
the (optional) IPv6 flow label is empty even when instructed
to stop (via BPF_FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL). Do
the same in our reference BPF reimplementation.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20230221180518.2139026-1-sdf@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-22 13:13:13 -08:00
Eduard Zingerman 6338a94d5a selftests/bpf: Tests for uninitialized stack reads
Three testcases to make sure that stack reads from uninitialized
locations are accepted by verifier when executed in privileged mode:
- read from a fixed offset;
- read from a variable offset;
- passing a pointer to stack to a helper converts
  STACK_INVALID to STACK_MISC.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230219200427.606541-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-22 12:34:50 -08:00
Eduard Zingerman 6715df8d5d bpf: Allow reads from uninit stack
This commits updates the following functions to allow reads from
uninitialized stack locations when env->allow_uninit_stack option is
enabled:
- check_stack_read_fixed_off()
- check_stack_range_initialized(), called from:
  - check_stack_read_var_off()
  - check_helper_mem_access()

Such change allows to relax logic in stacksafe() to treat STACK_MISC
and STACK_INVALID in a same way and make the following stack slot
configurations equivalent:

  |  Cached state    |  Current state   |
  |   stack slot     |   stack slot     |
  |------------------+------------------|
  | STACK_INVALID or | STACK_INVALID or |
  | STACK_MISC       | STACK_SPILL   or |
  |                  | STACK_MISC    or |
  |                  | STACK_ZERO    or |
  |                  | STACK_DYNPTR     |

This leads to significant verification speed gains (see below).

The idea was suggested by Andrii Nakryiko [1] and initial patch was
created by Alexei Starovoitov [2].

Currently the env->allow_uninit_stack is allowed for programs loaded
by users with CAP_PERFMON or CAP_SYS_ADMIN capabilities.

A number of test cases from verifier/*.c were expecting uninitialized
stack access to be an error. These test cases were updated to execute
in unprivileged mode (thus preserving the tests).

The test progs/test_global_func10.c expected "invalid indirect read
from stack" error message because of the access to uninitialized
memory region. This error is no longer possible in privileged mode.
The test is updated to provoke an error "invalid indirect access to
stack" because of access to invalid stack address (such error is not
verified by progs/test_global_func*.c series of tests).

The following tests had to be removed because these can't be made
unprivileged:
- verifier/sock.c:
  - "sk_storage_get(map, skb->sk, &stack_value, 1): partially init
  stack_value"
  BPF_PROG_TYPE_SCHED_CLS programs are not executed in unprivileged mode.
- verifier/var_off.c:
  - "indirect variable-offset stack access, max_off+size > max_initialized"
  - "indirect variable-offset stack access, uninitialized"
  These tests verify that access to uninitialized stack values is
  detected when stack offset is not a constant. However, variable
  stack access is prohibited in unprivileged mode, thus these tests
  are no longer valid.

 * * *

Here is veristat log comparing this patch with current master on a
set of selftest binaries listed in tools/testing/selftests/bpf/veristat.cfg
and cilium BPF binaries (see [3]):

$ ./veristat -e file,prog,states -C -f 'states_pct<-30' master.log current.log
File                        Program                     States (A)  States (B)  States    (DIFF)
--------------------------  --------------------------  ----------  ----------  ----------------
bpf_host.o                  tail_handle_ipv6_from_host         349         244    -105 (-30.09%)
bpf_host.o                  tail_handle_nat_fwd_ipv4          1320         895    -425 (-32.20%)
bpf_lxc.o                   tail_handle_nat_fwd_ipv4          1320         895    -425 (-32.20%)
bpf_sock.o                  cil_sock4_connect                   70          48     -22 (-31.43%)
bpf_sock.o                  cil_sock4_sendmsg                   68          46     -22 (-32.35%)
bpf_xdp.o                   tail_handle_nat_fwd_ipv4          1554         803    -751 (-48.33%)
bpf_xdp.o                   tail_lb_ipv4                      6457        2473   -3984 (-61.70%)
bpf_xdp.o                   tail_lb_ipv6                      7249        3908   -3341 (-46.09%)
pyperf600_bpf_loop.bpf.o    on_event                           287         145    -142 (-49.48%)
strobemeta.bpf.o            on_event                         15915        4772  -11143 (-70.02%)
strobemeta_nounroll2.bpf.o  on_event                         17087        3820  -13267 (-77.64%)
xdp_synproxy_kern.bpf.o     syncookie_tc                     21271        6635  -14636 (-68.81%)
xdp_synproxy_kern.bpf.o     syncookie_xdp                    23122        6024  -17098 (-73.95%)
--------------------------  --------------------------  ----------  ----------  ----------------

Note: I limited selection by states_pct<-30%.

Inspection of differences in pyperf600_bpf_loop behavior shows that
the following patch for the test removes almost all differences:

    - a/tools/testing/selftests/bpf/progs/pyperf.h
    + b/tools/testing/selftests/bpf/progs/pyperf.h
    @ -266,8 +266,8 @ int __on_event(struct bpf_raw_tracepoint_args *ctx)
            }

            if (event->pthread_match || !pidData->use_tls) {
    -               void* frame_ptr;
    -               FrameData frame;
    +               void* frame_ptr = 0;
    +               FrameData frame = {};
                    Symbol sym = {};
                    int cur_cpu = bpf_get_smp_processor_id();

W/o this patch the difference comes from the following pattern
(for different variables):

    static bool get_frame_data(... FrameData *frame ...)
    {
        ...
        bpf_probe_read_user(&frame->f_code, ...);
        if (!frame->f_code)
            return false;
        ...
        bpf_probe_read_user(&frame->co_name, ...);
        if (frame->co_name)
            ...;
    }

    int __on_event(struct bpf_raw_tracepoint_args *ctx)
    {
        FrameData frame;
        ...
        get_frame_data(... &frame ...) // indirectly via a bpf_loop & callback
        ...
    }

    SEC("raw_tracepoint/kfree_skb")
    int on_event(struct bpf_raw_tracepoint_args* ctx)
    {
        ...
        ret |= __on_event(ctx);
        ret |= __on_event(ctx);
        ...
    }

With regards to value `frame->co_name` the following is important:
- Because of the conditional `if (!frame->f_code)` each call to
  __on_event() produces two states, one with `frame->co_name` marked
  as STACK_MISC, another with it as is (and marked STACK_INVALID on a
  first call).
- The call to bpf_probe_read_user() does not mark stack slots
  corresponding to `&frame->co_name` as REG_LIVE_WRITTEN but it marks
  these slots as BPF_MISC, this happens because of the following loop
  in the check_helper_call():

	for (i = 0; i < meta.access_size; i++) {
		err = check_mem_access(env, insn_idx, meta.regno, i, BPF_B,
				       BPF_WRITE, -1, false);
		if (err)
			return err;
	}

  Note the size of the write, it is a one byte write for each byte
  touched by a helper. The BPF_B write does not lead to write marks
  for the target stack slot.
- Which means that w/o this patch when second __on_event() call is
  verified `if (frame->co_name)` will propagate read marks first to a
  stack slot with STACK_MISC marks and second to a stack slot with
  STACK_INVALID marks and these states would be considered different.

[1] https://lore.kernel.org/bpf/CAEf4BzY3e+ZuC6HUa8dCiUovQRg2SzEk7M-dSkqNZyn=xEmnPA@mail.gmail.com/
[2] https://lore.kernel.org/bpf/CAADnVQKs2i1iuZ5SUGuJtxWVfGYR9kDgYKhq3rNV+kBLQCu7rA@mail.gmail.com/
[3] git@github.com:anakryiko/cilium.git

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Co-developed-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230219200427.606541-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-22 12:34:50 -08:00
Linus Torvalds 5b7c4cabbb Networking changes for 6.3.
Core
 ----
 
  - Add dedicated kmem_cache for typical/small skb->head, avoid having
    to access struct page at kfree time, and improve memory use.
 
  - Introduce sysctl to set default RPS configuration for new netdevs.
 
  - Define Netlink protocol specification format which can be used
    to describe messages used by each family and auto-generate parsers.
    Add tools for generating kernel data structures and uAPI headers.
 
  - Expose all net/core sysctls inside netns.
 
  - Remove 4s sleep in netpoll if carrier is instantly detected on boot.
 
  - Add configurable limit of MDB entries per port, and port-vlan.
 
  - Continue populating drop reasons throughout the stack.
 
  - Retire a handful of legacy Qdiscs and classifiers.
 
 Protocols
 ---------
 
  - Support IPv4 big TCP (TSO frames larger than 64kB).
 
  - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
    on socket by socket basis.
 
  - Track and report in procfs number of MPTCP sockets used.
 
  - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP
    path manager.
 
  - IPv6: don't check net.ipv6.route.max_size and rely on garbage
    collection to free memory (similarly to IPv4).
 
  - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
 
  - ICMP: add per-rate limit counters.
 
  - Add support for user scanning requests in ieee802154.
 
  - Remove static WEP support.
 
  - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
    reporting.
 
  - WiFi 7 EHT channel puncturing support (client & AP).
 
 BPF
 ---
 
  - Add a rbtree data structure following the "next-gen data structure"
    precedent set by recently added linked list, that is, by using
    kfunc + kptr instead of adding a new BPF map type.
 
  - Expose XDP hints via kfuncs with initial support for RX hash and
    timestamp metadata.
 
  - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key
    to better support decap on GRE tunnel devices not operating
    in collect metadata.
 
  - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
 
  - Remove the need for trace_printk_lock for bpf_trace_printk
    and bpf_trace_vprintk helpers.
 
  - Extend libbpf's bpf_tracing.h support for tracing arguments of
    kprobes/uprobes and syscall as a special case.
 
  - Significantly reduce the search time for module symbols
    by livepatch and BPF.
 
  - Enable cpumasks to be used as kptrs, which is useful for tracing
    programs tracking which tasks end up running on which CPUs in
    different time intervals.
 
  - Add support for BPF trampoline on s390x and riscv64.
 
  - Add capability to export the XDP features supported by the NIC.
 
  - Add __bpf_kfunc tag for marking kernel functions as kfuncs.
 
  - Add cgroup.memory=nobpf kernel parameter option to disable BPF
    memory accounting for container environments.
 
 Netfilter
 ---------
 
  - Remove the CLUSTERIP target. It has been marked as obsolete
    for years, and we still have WARN splats wrt. races of
    the out-of-band /proc interface installed by this target.
 
  - Add 'destroy' commands to nf_tables. They are identical to
    the existing 'delete' commands, but do not return an error if
    the referenced object (set, chain, rule...) did not exist.
 
 Driver API
 ----------
 
  - Improve cpumask_local_spread() locality to help NICs set the right
    IRQ affinity on AMD platforms.
 
  - Separate C22 and C45 MDIO bus transactions more clearly.
 
  - Introduce new DCB table to control DSCP rewrite on egress.
 
  - Support configuration of Physical Layer Collision Avoidance (PLCA)
    Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
    shared medium Ethernet.
 
  - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
    preemption of low priority frames by high priority frames.
 
  - Add support for controlling MACSec offload using netlink SET.
 
  - Rework devlink instance refcounts to allow registration and
    de-registration under the instance lock. Split the code into multiple
    files, drop some of the unnecessarily granular locks and factor out
    common parts of netlink operation handling.
 
  - Add TX frame aggregation parameters (for USB drivers).
 
  - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
    messages with notifications for debug.
 
  - Allow offloading of UDP NEW connections via act_ct.
 
  - Add support for per action HW stats in TC.
 
  - Support hardware miss to TC action (continue processing in SW from
    a specific point in the action chain).
 
  - Warn if old Wireless Extension user space interface is used with
    modern cfg80211/mac80211 drivers. Do not support Wireless Extensions
    for Wi-Fi 7 devices at all. Everyone should switch to using nl80211
    interface instead.
 
  - Improve the CAN bit timing configuration. Use extack to return error
    messages directly to user space, update the SJW handling, including
    the definition of a new default value that will benefit CAN-FD
    controllers, by increasing their oscillator tolerance.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - nVidia BlueField-3 support (control traffic driver)
    - Ethernet support for imx93 SoCs
    - Motorcomm yt8531 gigabit Ethernet PHY
    - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
    - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
    - Amlogic gxl MDIO mux
 
  - WiFi:
    - RealTek RTL8188EU (rtl8xxxu)
    - Qualcomm Wi-Fi 7 devices (ath12k)
 
  - CAN:
    - Renesas R-Car V4H
 
 Drivers
 -------
 
  - Bluetooth:
    - Set Per Platform Antenna Gain (PPAG) for Intel controllers.
 
  - Ethernet NICs:
    - Intel (1G, igc):
      - support TSN / Qbv / packet scheduling features of i226 model
    - Intel (100G, ice):
      - use GNSS subsystem instead of TTY
      - multi-buffer XDP support
      - extend support for GPIO pins to E823 devices
    - nVidia/Mellanox:
      - update the shared buffer configuration on PFC commands
      - implement PTP adjphase function for HW offset control
      - TC support for Geneve and GRE with VF tunnel offload
      - more efficient crypto key management method
      - multi-port eswitch support
    - Netronome/Corigine:
      - add DCB IEEE support
      - support IPsec offloading for NFP3800
    - Freescale/NXP (enetc):
      - enetc: support XDP_REDIRECT for XDP non-linear buffers
      - enetc: improve reconfig, avoid link flap and waiting for idle
      - enetc: support MAC Merge layer
    - Other NICs:
      - sfc/ef100: add basic devlink support for ef100
      - ionic: rx_push mode operation (writing descriptors via MMIO)
      - bnxt: use the auxiliary bus abstraction for RDMA
      - r8169: disable ASPM and reset bus in case of tx timeout
      - cpsw: support QSGMII mode for J721e CPSW9G
      - cpts: support pulse-per-second output
      - ngbe: add an mdio bus driver
      - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
      - r8152: handle devices with FW with NCM support
      - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
      - virtio-net: support multi buffer XDP
      - virtio/vsock: replace virtio_vsock_pkt with sk_buff
      - tsnep: XDP support
 
  - Ethernet high-speed switches:
    - nVidia/Mellanox (mlxsw):
      - add support for latency TLV (in FW control messages)
    - Microchip (sparx5):
      - separate explicit and implicit traffic forwarding rules, make
        the implicit rules always active
      - add support for egress DSCP rewrite
      - IS0 VCAP support (Ingress Classification)
      - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.)
      - ES2 VCAP support (Egress Access Control)
      - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1)
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - add MAB (port auth) offload support
      - enable PTP receive for mv88e6390
    - NXP (ocelot):
      - support MAC Merge layer
      - support for the the vsc7512 internal copper phys
    - Microchip:
      - lan9303: convert to PHYLINK
      - lan966x: support TC flower filter statistics
      - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
      - lan937x: support Credit Based Shaper configuration
      - ksz9477: support Energy Efficient Ethernet
    - other:
      - qca8k: convert to regmap read/write API, use bulk operations
      - rswitch: Improve TX timestamp accuracy
 
  - Intel WiFi (iwlwifi):
    - EHT (Wi-Fi 7) rate reporting
    - STEP equalizer support: transfer some STEP (connection to radio
      on platforms with integrated wifi) related parameters from the
      BIOS to the firmware.
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - IPQ5018 support
    - Fine Timing Measurement (FTM) responder role support
    - channel 177 support
 
  - MediaTek WiFi (mt76):
    - per-PHY LED support
    - mt7996: EHT (Wi-Fi 7) support
    - Wireless Ethernet Dispatch (WED) reset support
    - switch to using page pool allocator
 
  - RealTek WiFi (rtw89):
    - support new version of Bluetooth co-existance
 
  - Mobile:
    - rmnet: support TX aggregation.
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmP1VIYACgkQMUZtbf5S
 IrvsChAApz0rNL/sPKxXTEfxZ1tN7D3sYxYKQPomxvl5BV+MvicrLddJy3KmzEFK
 nnJNO3nuRNuH422JQ/ylZ4mGX1opa6+5QJb0UINImXUI7Fm8HHBIuPGkv7d5CheZ
 7JexFqjPJXUy9nPyh1Rra+IA9AcRd2U7jeGEZR38wb99bHJQj5Bzdk20WArEB0el
 n44aqg49LXH71bSeXRz77x5SjkwVtYiccQxLcnmTbjLU2xVraLvI2J+wAhHnVXWW
 9lrU1+V4Ex2Xcd1xR0L0cHeK+meP1TrPRAeF+JDpVI3a/zJiE7cZjfHdG/jH5xWl
 leZJqghVozrZQNtewWWO7XhUFhMDgFu3W/1vNLjSHPZEqaz1JpM67J1+ql6s63l4
 LMWoXbcYZz+SL9ZRCoPkbGue/5fKSHv8/Jl9Sh58+eTS+c/zgN8uFGRNFXLX1+EP
 n8uvt985PxMd6x1+dHumhOUzxnY4Sfi1vjitSunTsNFQ3Cmp4SO0IfBVJWfLUCuC
 xz5hbJGJJbSpvUsO+HWyCg83E5OWghRE/Onpt2jsQSZCrO9HDg4FRTEf3WAMgaqc
 edb5KfbRZPTJQM08gWdluXzSk1nw3FNP2tXW4XlgUrEbjb+fOk0V9dQg2gyYTxQ1
 Nhvn8ZQPi6/GMMELHAIPGmmW1allyOGiAzGlQsv8EmL+OFM6WDI=
 =xXhC
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Add dedicated kmem_cache for typical/small skb->head, avoid having
     to access struct page at kfree time, and improve memory use.

   - Introduce sysctl to set default RPS configuration for new netdevs.

   - Define Netlink protocol specification format which can be used to
     describe messages used by each family and auto-generate parsers.
     Add tools for generating kernel data structures and uAPI headers.

   - Expose all net/core sysctls inside netns.

   - Remove 4s sleep in netpoll if carrier is instantly detected on
     boot.

   - Add configurable limit of MDB entries per port, and port-vlan.

   - Continue populating drop reasons throughout the stack.

   - Retire a handful of legacy Qdiscs and classifiers.

  Protocols:

   - Support IPv4 big TCP (TSO frames larger than 64kB).

   - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
     on socket by socket basis.

   - Track and report in procfs number of MPTCP sockets used.

   - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
     manager.

   - IPv6: don't check net.ipv6.route.max_size and rely on garbage
     collection to free memory (similarly to IPv4).

   - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).

   - ICMP: add per-rate limit counters.

   - Add support for user scanning requests in ieee802154.

   - Remove static WEP support.

   - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
     reporting.

   - WiFi 7 EHT channel puncturing support (client & AP).

  BPF:

   - Add a rbtree data structure following the "next-gen data structure"
     precedent set by recently added linked list, that is, by using
     kfunc + kptr instead of adding a new BPF map type.

   - Expose XDP hints via kfuncs with initial support for RX hash and
     timestamp metadata.

   - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
     better support decap on GRE tunnel devices not operating in collect
     metadata.

   - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.

   - Remove the need for trace_printk_lock for bpf_trace_printk and
     bpf_trace_vprintk helpers.

   - Extend libbpf's bpf_tracing.h support for tracing arguments of
     kprobes/uprobes and syscall as a special case.

   - Significantly reduce the search time for module symbols by
     livepatch and BPF.

   - Enable cpumasks to be used as kptrs, which is useful for tracing
     programs tracking which tasks end up running on which CPUs in
     different time intervals.

   - Add support for BPF trampoline on s390x and riscv64.

   - Add capability to export the XDP features supported by the NIC.

   - Add __bpf_kfunc tag for marking kernel functions as kfuncs.

   - Add cgroup.memory=nobpf kernel parameter option to disable BPF
     memory accounting for container environments.

  Netfilter:

   - Remove the CLUSTERIP target. It has been marked as obsolete for
     years, and we still have WARN splats wrt races of the out-of-band
     /proc interface installed by this target.

   - Add 'destroy' commands to nf_tables. They are identical to the
     existing 'delete' commands, but do not return an error if the
     referenced object (set, chain, rule...) did not exist.

  Driver API:

   - Improve cpumask_local_spread() locality to help NICs set the right
     IRQ affinity on AMD platforms.

   - Separate C22 and C45 MDIO bus transactions more clearly.

   - Introduce new DCB table to control DSCP rewrite on egress.

   - Support configuration of Physical Layer Collision Avoidance (PLCA)
     Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
     shared medium Ethernet.

   - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
     preemption of low priority frames by high priority frames.

   - Add support for controlling MACSec offload using netlink SET.

   - Rework devlink instance refcounts to allow registration and
     de-registration under the instance lock. Split the code into
     multiple files, drop some of the unnecessarily granular locks and
     factor out common parts of netlink operation handling.

   - Add TX frame aggregation parameters (for USB drivers).

   - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
     messages with notifications for debug.

   - Allow offloading of UDP NEW connections via act_ct.

   - Add support for per action HW stats in TC.

   - Support hardware miss to TC action (continue processing in SW from
     a specific point in the action chain).

   - Warn if old Wireless Extension user space interface is used with
     modern cfg80211/mac80211 drivers. Do not support Wireless
     Extensions for Wi-Fi 7 devices at all. Everyone should switch to
     using nl80211 interface instead.

   - Improve the CAN bit timing configuration. Use extack to return
     error messages directly to user space, update the SJW handling,
     including the definition of a new default value that will benefit
     CAN-FD controllers, by increasing their oscillator tolerance.

  New hardware / drivers:

   - Ethernet:
      - nVidia BlueField-3 support (control traffic driver)
      - Ethernet support for imx93 SoCs
      - Motorcomm yt8531 gigabit Ethernet PHY
      - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
      - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
      - Amlogic gxl MDIO mux

   - WiFi:
      - RealTek RTL8188EU (rtl8xxxu)
      - Qualcomm Wi-Fi 7 devices (ath12k)

   - CAN:
      - Renesas R-Car V4H

  Drivers:

   - Bluetooth:
      - Set Per Platform Antenna Gain (PPAG) for Intel controllers.

   - Ethernet NICs:
      - Intel (1G, igc):
         - support TSN / Qbv / packet scheduling features of i226 model
      - Intel (100G, ice):
         - use GNSS subsystem instead of TTY
         - multi-buffer XDP support
         - extend support for GPIO pins to E823 devices
      - nVidia/Mellanox:
         - update the shared buffer configuration on PFC commands
         - implement PTP adjphase function for HW offset control
         - TC support for Geneve and GRE with VF tunnel offload
         - more efficient crypto key management method
         - multi-port eswitch support
      - Netronome/Corigine:
         - add DCB IEEE support
         - support IPsec offloading for NFP3800
      - Freescale/NXP (enetc):
         - support XDP_REDIRECT for XDP non-linear buffers
         - improve reconfig, avoid link flap and waiting for idle
         - support MAC Merge layer
      - Other NICs:
         - sfc/ef100: add basic devlink support for ef100
         - ionic: rx_push mode operation (writing descriptors via MMIO)
         - bnxt: use the auxiliary bus abstraction for RDMA
         - r8169: disable ASPM and reset bus in case of tx timeout
         - cpsw: support QSGMII mode for J721e CPSW9G
         - cpts: support pulse-per-second output
         - ngbe: add an mdio bus driver
         - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
         - r8152: handle devices with FW with NCM support
         - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
         - virtio-net: support multi buffer XDP
         - virtio/vsock: replace virtio_vsock_pkt with sk_buff
         - tsnep: XDP support

   - Ethernet high-speed switches:
      - nVidia/Mellanox (mlxsw):
         - add support for latency TLV (in FW control messages)
      - Microchip (sparx5):
         - separate explicit and implicit traffic forwarding rules, make
           the implicit rules always active
         - add support for egress DSCP rewrite
         - IS0 VCAP support (Ingress Classification)
         - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
           etc.)
         - ES2 VCAP support (Egress Access Control)
         - support for Per-Stream Filtering and Policing (802.1Q,
           8.6.5.1)

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - add MAB (port auth) offload support
         - enable PTP receive for mv88e6390
      - NXP (ocelot):
         - support MAC Merge layer
         - support for the the vsc7512 internal copper phys
      - Microchip:
         - lan9303: convert to PHYLINK
         - lan966x: support TC flower filter statistics
         - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
         - lan937x: support Credit Based Shaper configuration
         - ksz9477: support Energy Efficient Ethernet
      - other:
         - qca8k: convert to regmap read/write API, use bulk operations
         - rswitch: Improve TX timestamp accuracy

   - Intel WiFi (iwlwifi):
      - EHT (Wi-Fi 7) rate reporting
      - STEP equalizer support: transfer some STEP (connection to radio
        on platforms with integrated wifi) related parameters from the
        BIOS to the firmware.

   - Qualcomm 802.11ax WiFi (ath11k):
      - IPQ5018 support
      - Fine Timing Measurement (FTM) responder role support
      - channel 177 support

   - MediaTek WiFi (mt76):
      - per-PHY LED support
      - mt7996: EHT (Wi-Fi 7) support
      - Wireless Ethernet Dispatch (WED) reset support
      - switch to using page pool allocator

   - RealTek WiFi (rtw89):
      - support new version of Bluetooth co-existance

   - Mobile:
      - rmnet: support TX aggregation"

* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
  page_pool: add a comment explaining the fragment counter usage
  net: ethtool: fix __ethtool_dev_mm_supported() implementation
  ethtool: pse-pd: Fix double word in comments
  xsk: add linux/vmalloc.h to xsk.c
  sefltests: netdevsim: wait for devlink instance after netns removal
  selftest: fib_tests: Always cleanup before exit
  net/mlx5e: Align IPsec ASO result memory to be as required by hardware
  net/mlx5e: TC, Set CT miss to the specific ct action instance
  net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
  net/mlx5: Refactor tc miss handling to a single function
  net/mlx5: Kconfig: Make tc offload depend on tc skb extension
  net/sched: flower: Support hardware miss to tc action
  net/sched: flower: Move filter handle initialization earlier
  net/sched: cls_api: Support hardware miss to tc action
  net/sched: Rename user cookie and act cookie
  sfc: fix builds without CONFIG_RTC_LIB
  sfc: clean up some inconsistent indentings
  net/mlx4_en: Introduce flexible array to silence overflow warning
  net: lan966x: Fix possible deadlock inside PTP
  net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
  ...
2023-02-21 18:24:12 -08:00
Linus Torvalds 05e6295f7b fs.idmapped.v6.3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY+5NlQAKCRCRxhvAZXjc
 orOaAP9i2h3OJy95nO2Fpde0Bt2UT+oulKCCcGlvXJ8/+TQpyQD/ZQq47gFQ0EAz
 Br5NxeyGeecAb0lHpFz+CpLGsxMrMwQ=
 =+BG5
 -----END PGP SIGNATURE-----

Merge tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull vfs idmapping updates from Christian Brauner:

 - Last cycle we introduced the dedicated struct mnt_idmap type for
   mount idmapping and the required infrastucture in 256c8aed2b ("fs:
   introduce dedicated idmap type for mounts"). As promised in last
   cycle's pull request message this converts everything to rely on
   struct mnt_idmap.

   Currently we still pass around the plain namespace that was attached
   to a mount. This is in general pretty convenient but it makes it easy
   to conflate namespaces that are relevant on the filesystem with
   namespaces that are relevant on the mount level. Especially for
   non-vfs developers without detailed knowledge in this area this was a
   potential source for bugs.

   This finishes the conversion. Instead of passing the plain namespace
   around this updates all places that currently take a pointer to a
   mnt_userns with a pointer to struct mnt_idmap.

   Now that the conversion is done all helpers down to the really
   low-level helpers only accept a struct mnt_idmap argument instead of
   two namespace arguments.

   Conflating mount and other idmappings will now cause the compiler to
   complain loudly thus eliminating the possibility of any bugs. This
   makes it impossible for filesystem developers to mix up mount and
   filesystem idmappings as they are two distinct types and require
   distinct helpers that cannot be used interchangeably.

   Everything associated with struct mnt_idmap is moved into a single
   separate file. With that change no code can poke around in struct
   mnt_idmap. It can only be interacted with through dedicated helpers.
   That means all filesystems are and all of the vfs is completely
   oblivious to the actual implementation of idmappings.

   We are now also able to extend struct mnt_idmap as we see fit. For
   example, we can decouple it completely from namespaces for users that
   don't require or don't want to use them at all. We can also extend
   the concept of idmappings so we can cover filesystem specific
   requirements.

   In combination with the vfs{g,u}id_t work we finished in v6.2 this
   makes this feature substantially more robust and thus difficult to
   implement wrong by a given filesystem and also protects the vfs.

 - Enable idmapped mounts for tmpfs and fulfill a longstanding request.

   A long-standing request from users had been to make it possible to
   create idmapped mounts for tmpfs. For example, to share the host's
   tmpfs mount between multiple sandboxes. This is a prerequisite for
   some advanced Kubernetes cases. Systemd also has a range of use-cases
   to increase service isolation. And there are more users of this.

   However, with all of the other work going on this was way down on the
   priority list but luckily someone other than ourselves picked this
   up.

   As usual the patch is tiny as all the infrastructure work had been
   done multiple kernel releases ago. In addition to all the tests that
   we already have I requested that Rodrigo add a dedicated tmpfs
   testsuite for idmapped mounts to xfstests. It is to be included into
   xfstests during the v6.3 development cycle. This should add a slew of
   additional tests.

* tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits)
  shmem: support idmapped mounts for tmpfs
  fs: move mnt_idmap
  fs: port vfs{g,u}id helpers to mnt_idmap
  fs: port fs{g,u}id helpers to mnt_idmap
  fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap
  fs: port i_{g,u}id_{needs_}update() to mnt_idmap
  quota: port to mnt_idmap
  fs: port privilege checking helpers to mnt_idmap
  fs: port inode_owner_or_capable() to mnt_idmap
  fs: port inode_init_owner() to mnt_idmap
  fs: port acl to mnt_idmap
  fs: port xattr to mnt_idmap
  fs: port ->permission() to pass mnt_idmap
  fs: port ->fileattr_set() to pass mnt_idmap
  fs: port ->set_acl() to pass mnt_idmap
  fs: port ->get_acl() to pass mnt_idmap
  fs: port ->tmpfile() to pass mnt_idmap
  fs: port ->rename() to pass mnt_idmap
  fs: port ->mknod() to pass mnt_idmap
  fs: port ->mkdir() to pass mnt_idmap
  ...
2023-02-20 11:53:11 -08:00
Martin KaFai Lau 168de02335 selftests/bpf: Add bpf_fib_lookup test
This patch tests the bpf_fib_lookup helper when looking up
a neigh in NUD_FAILED and NUD_STALE state. It also adds test
for the new BPF_FIB_LOOKUP_SKIP_NEIGH flag.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217205515.3583372-2-martin.lau@linux.dev
2023-02-17 22:12:04 +01:00
Martin KaFai Lau 181127fb76 Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
This reverts commit 6c20822fad.

build bot failed on arch with different cache line size:
https://lore.kernel.org/bpf/50c35055-afa9-d01e-9a05-ea5351280e4f@intel.com/

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-17 12:24:33 -08:00
Andrii Nakryiko e2b5cfc978 selftests/bpf: Add global subprog context passing tests
Add tests validating that it's possible to pass context arguments into
global subprogs for various types of programs, including a particularly
tricky KPROBE programs (which cover kprobes, uprobes, USDTs, a vast and
important class of programs).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230216045954.3002473-4-andrii@kernel.org
2023-02-17 21:21:50 +01:00
Andrii Nakryiko 95ebb37617 selftests/bpf: Convert test_global_funcs test to test_loader framework
Convert 17 test_global_funcs subtests into test_loader framework for
easier maintenance and more declarative way to define expected
failures/successes.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230216045954.3002473-3-andrii@kernel.org
2023-02-17 21:20:44 +01:00
Taichi Nishimura df71a42cc3 Fix typos in selftest/bpf files
Run spell checker on files in selftest/bpf and fixed typos.

Signed-off-by: Taichi Nishimura <awkrail01@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/bpf/20230216085537.519062-1-awkrail01@gmail.com
2023-02-16 16:56:17 -08:00
Ilya Leoshkevich c5a237a4db selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
Use the new type-safe wrappers around bpf_obj_get_info_by_fd().
Fix a prog/map mixup in prog_holds_map().

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230214231221.249277-6-iii@linux.ibm.com
2023-02-16 15:32:46 -08:00
Alexander Lobakin 6c20822fad bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
&xdp_buff and &xdp_frame are bound in a way that

xdp_buff->data_hard_start == xdp_frame

It's always the case and e.g. xdp_convert_buff_to_frame() relies on
this.
IOW, the following:

	for (u32 i = 0; i < 0xdead; i++) {
		xdpf = xdp_convert_buff_to_frame(&xdp);
		xdp_convert_frame_to_buff(xdpf, &xdp);
	}

shouldn't ever modify @xdpf's contents or the pointer itself.
However, "live packet" code wrongly treats &xdp_frame as part of its
context placed *before* the data_hard_start. With such flow,
data_hard_start is sizeof(*xdpf) off to the right and no longer points
to the XDP frame.

Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
places and praying that there are no more miscalcs left somewhere in the
code, unionize ::frm with ::data in a flex array, so that both starts
pointing to the actual data_hard_start and the XDP frame actually starts
being a part of it, i.e. a part of the headroom, not the context.
A nice side effect is that the maximum frame size for this mode gets
increased by 40 bytes, as xdp_buff::frame_sz includes everything from
data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
info.
Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
hardcoded for 64 bit && 4k pages, it can be made more flexible later on.

Minor: align `&head->data` with how `head->frm` is assigned for
consistency.
Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
clarity.

(was found while testing XDP traffic generator on ice, which calls
 xdp_convert_frame_to_buff() for each XDP frame)

Fixes: b530e9e106 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230215185440.4126672-1-aleksander.lobakin@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-15 17:39:36 -08:00
Anton Protopopov f371f2dc53 selftest/bpf/benchs: Add benchmark for hashmap lookups
Add a new benchmark which measures hashmap lookup operations speed.  A user can
control the following parameters of the benchmark:

    * key_size (max 1024): the key size to use
    * max_entries: the hashmap max entries
    * nr_entries: the number of entries to insert/lookup
    * nr_loops: the number of loops for the benchmark
    * map_flags The hashmap flags passed to BPF_MAP_CREATE

The BPF program performing the benchmarks calls two nested bpf_loop:

    bpf_loop(nr_loops/nr_entries)
            bpf_loop(nr_entries)
                     bpf_map_lookup()

So the nr_loops determines the number of actual map lookups. All lookups are
successful.

Example (the output is generated on a AMD Ryzen 9 3950X machine):

    for nr_entries in `seq 4096 4096 65536`; do echo -n "$((nr_entries*100/65536))% full: "; sudo ./bench -d2 -a bpf-hashmap-lookup --key_size=4 --nr_entries=$nr_entries --max_entries=65536 --nr_loops=1000000 --map_flags=0x40 | grep cpu; done
    6% full: cpu01: lookup 50.739M ± 0.018M events/sec (approximated from 32 samples of ~19ms)
    12% full: cpu01: lookup 47.751M ± 0.015M events/sec (approximated from 32 samples of ~20ms)
    18% full: cpu01: lookup 45.153M ± 0.013M events/sec (approximated from 32 samples of ~22ms)
    25% full: cpu01: lookup 43.826M ± 0.014M events/sec (approximated from 32 samples of ~22ms)
    31% full: cpu01: lookup 41.971M ± 0.012M events/sec (approximated from 32 samples of ~23ms)
    37% full: cpu01: lookup 41.034M ± 0.015M events/sec (approximated from 32 samples of ~24ms)
    43% full: cpu01: lookup 39.946M ± 0.012M events/sec (approximated from 32 samples of ~25ms)
    50% full: cpu01: lookup 38.256M ± 0.014M events/sec (approximated from 32 samples of ~26ms)
    56% full: cpu01: lookup 36.580M ± 0.018M events/sec (approximated from 32 samples of ~27ms)
    62% full: cpu01: lookup 36.252M ± 0.012M events/sec (approximated from 32 samples of ~27ms)
    68% full: cpu01: lookup 35.200M ± 0.012M events/sec (approximated from 32 samples of ~28ms)
    75% full: cpu01: lookup 34.061M ± 0.009M events/sec (approximated from 32 samples of ~29ms)
    81% full: cpu01: lookup 34.374M ± 0.010M events/sec (approximated from 32 samples of ~29ms)
    87% full: cpu01: lookup 33.244M ± 0.011M events/sec (approximated from 32 samples of ~30ms)
    93% full: cpu01: lookup 32.182M ± 0.013M events/sec (approximated from 32 samples of ~31ms)
    100% full: cpu01: lookup 31.497M ± 0.016M events/sec (approximated from 32 samples of ~31ms)

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-8-aspsk@isovalent.com
2023-02-15 16:29:31 -08:00
Anton Protopopov a237dda05e selftest/bpf/benchs: Print less if the quiet option is set
The bench utility will print

    Setting up benchmark '<bench-name>'...
    Benchmark '<bench-name>' started.

on startup to stdout. Suppress this output if --quiet option if given. This
makes it simpler to parse benchmark output by a script.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-7-aspsk@isovalent.com
2023-02-15 16:29:31 -08:00
Anton Protopopov 90c22503cd selftest/bpf/benchs: Make quiet option common
The "local-storage-tasks-trace" benchmark has a `--quiet` option. Move it to
the list of common options, so that the main code and other benchmarks can use
(new) env.quiet variable. Patch the run_bench_local_storage_rcu_tasks_trace.sh
helper script accordingly.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-6-aspsk@isovalent.com
2023-02-15 16:29:31 -08:00
Anton Protopopov 9644546260 selftest/bpf/benchs: Remove an unused header
The benchs/bench_bpf_hashmap_full_update.c doesn't set a custom argp,
so it shouldn't include the <argp.h> header.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-5-aspsk@isovalent.com
2023-02-15 16:29:31 -08:00
Anton Protopopov 22ff7aeaa9 selftest/bpf/benchs: Enhance argp parsing
To parse command line the bench utility uses the argp_parse() function. This
function takes as an argument a parent 'struct argp' structure which defines
common command line options and an array of children 'struct argp' structures
which defines additional command line options for particular benchmarks. This
implementation doesn't allow benchmarks to share option names, e.g., if two
benchmarks want to use, say, the --option option, then only one of them will
succeed (the first one encountered in the array).  This will be convenient if
same option names could be used in different benchmarks (with the same
semantics, e.g., --nr_loops=N).

Fix this by calling the argp_parse() function twice. The first call is the same
as it was before, with all children argps, and helps to find the benchmark name
and to print a combined help message if anything is wrong.  Given the name, we
can call the argp_parse the second time, but now the children array points only
to a correct benchmark thus always calling the correct parsers. (If there's no
a specific list of arguments, then only one call to argp_parse will be done.)

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-4-aspsk@isovalent.com
2023-02-15 16:29:31 -08:00
Anton Protopopov 2f1c59637f selftest/bpf/benchs: Make a function static in bpf_hashmap_full_update
The hashmap_report_final callback function defined in the
benchs/bench_bpf_hashmap_full_update.c file should be static.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-3-aspsk@isovalent.com
2023-02-15 16:29:31 -08:00
Anton Protopopov 4db98ab445 selftest/bpf/benchs: Fix a typo in bpf_hashmap_full_update
To call the bpf_hashmap_full_update benchmark, one should say:

    bench bpf-hashmap-ful-update

The patch adds a missing 'l' to the benchmark name.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20230213091519.1202813-2-aspsk@isovalent.com
2023-02-15 16:29:31 -08:00
Hou Tao f88da2d46c selftests/bpf: Add test case for element reuse in htab map
The reinitialization of spin-lock in map value after immediate reuse may
corrupt lookup with BPF_F_LOCK flag and result in hard lock-up, so add
one test case to demonstrate the problem.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230215082132.3856544-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15 15:40:06 -08:00
Eduard Zingerman 2a33c5a25e selftests/bpf: check if BPF_ST with variable offset preserves STACK_ZERO
A test case to verify that variable offset BPF_ST instruction
preserves STACK_ZERO marks when writes zeros, e.g. in the following
situation:

  *(u64*)(r10 - 8) = 0   ; STACK_ZERO marks for fp[-8]
  r0 = random(-7, -1)    ; some random number in range of [-7, -1]
  r0 += r10              ; r0 is now variable offset pointer to stack
  *(u8*)(r0) = 0         ; BPF_ST writing zero, STACK_ZERO mark for
                         ; fp[-8] should be preserved.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230214232030.1502829-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15 11:48:48 -08:00
Eduard Zingerman 1a24af65bb selftests/bpf: check if verifier tracks constants spilled by BPF_ST_MEM
Check that verifier tracks the value of 'imm' spilled to stack by
BPF_ST_MEM instruction. Cover the following cases:
- write of non-zero constant to stack;
- write of a zero constant to stack.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230214232030.1502829-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15 11:48:47 -08:00
Eduard Zingerman ecdf985d76 bpf: track immediate values written to stack by BPF_ST instruction
For aligned stack writes using BPF_ST instruction track stored values
in a same way BPF_STX is handled, e.g. make sure that the following
commands produce similar verifier knowledge:

  fp[-8] = 42;             r1 = 42;
                       fp[-8] = r1;

This covers two cases:
 - non-null values written to stack are stored as spill of fake
   registers;
 - null values written to stack are stored as STACK_ZERO marks.

Previously both cases above used STACK_MISC marks instead.

Some verifier test cases relied on the old logic to obtain STACK_MISC
marks for some stack values. These test cases are updated in the same
commit to avoid failures during bisect.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230214232030.1502829-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15 11:48:47 -08:00
Alexei Starovoitov 62d101d5f4 selftests/bpf: Fix map_kptr test.
The compiler is optimizing out majority of unref_ptr read/writes, so the test
wasn't testing much. For example, one could delete '__kptr' tag from
'struct prog_test_ref_kfunc __kptr *unref_ptr;' and the test would still "pass".

Convert it to volatile stores. Confirmed by comparing bpf asm before/after.

Fixes: 2cbc469a6f ("selftests/bpf: Add C tests for kptr")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230214235051.22938-1-alexei.starovoitov@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-15 09:24:59 -08:00
Björn Töpel 5e53e5c7ed selftests/bpf: Cross-compile bpftool
When the BPF selftests are cross-compiled, only the a host version of
bpftool is built. This version of bpftool is used on the host-side to
generate various intermediates, e.g., skeletons.

The test runners are also using bpftool, so the Makefile will symlink
bpftool from the selftest/bpf root, where the test runners will look
the tool:

  | $(Q)ln -sf $(if $2,..,.)/tools/build/bpftool/bootstrap/bpftool \
  |    $(OUTPUT)/$(if $2,$2/)bpftool

There are two problems for cross-compilation builds:

 1. There is no native (cross-compilation target) of bpftool
 2. The bootstrap/bpftool is never cross-compiled (by design)

Make sure that a native/cross-compiled version of bpftool is built,
and if CROSS_COMPILE is set, symlink the native/non-bootstrap version.

Acked-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230214161253.183458-1-bjorn@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-15 08:50:20 -08:00
Joanne Koong 50a7cedb15 selftests/bpf: Clean up dynptr prog_tests
Clean up prog_tests/dynptr.c by removing the unneeded "expected_err_msg"
in the dynptr_tests struct, which is a remnant from converting the fail
tests cases to use the generic verification tester.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230214051332.4007131-2-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-13 21:42:29 -08:00
Joanne Koong 8032cad103 selftests/bpf: Clean up user_ringbuf, cgrp_kfunc, kfunc_dynptr_param tests
Clean up user_ringbuf, cgrp_kfunc, and kfunc_dynptr_param tests to use
the generic verification tester for checking verifier rejections.
The generic verification tester uses btf_decl_tag-based annotations
for verifying that the tests fail with the expected log messages.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Link: https://lore.kernel.org/r/20230214051332.4007131-1-joannelkoong@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-13 21:42:29 -08:00
Dave Marchevsky 215249f6ad selftests/bpf: Add rbtree selftests
This patch adds selftests exercising the logic changed/added in the
previous patches in the series. A variety of successful and unsuccessful
rbtree usages are validated:

Success:
  * Add some nodes, let map_value bpf_rbtree_root destructor clean them
    up
  * Add some nodes, remove one using the non-owning ref leftover by
    successful rbtree_add() call
  * Add some nodes, remove one using the non-owning ref returned by
    rbtree_first() call

Failure:
  * BTF where bpf_rb_root owns bpf_list_node should fail to load
  * BTF where node of type X is added to tree containing nodes of type Y
    should fail to load
  * No calling rbtree api functions in 'less' callback for rbtree_add
  * No releasing lock in 'less' callback for rbtree_add
  * No removing a node which hasn't been added to any tree
  * No adding a node which has already been added to a tree
  * No escaping of non-owning references past their lock's
    critical section
  * No escaping of non-owning references past other invalidation points
    (rbtree_remove)

These tests mostly focus on rbtree-specific additions, but some of the
failure cases revalidate scenarios common to both linked_list and rbtree
which are covered in the former's tests. Better to be a bit redundant in
case linked_list and rbtree semantics deviate over time.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230214004017.2534011-8-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-13 19:40:53 -08:00
Dave Marchevsky c834df847e bpf: Add bpf_rbtree_{add,remove,first} decls to bpf_experimental.h
These kfuncs will be used by selftests in following patches

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230214004017.2534011-7-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-13 19:40:53 -08:00
Dave Marchevsky a40d363243 bpf: Special verifier handling for bpf_rbtree_{remove, first}
Newly-added bpf_rbtree_{remove,first} kfuncs have some special properties
that require handling in the verifier:

  * both bpf_rbtree_remove and bpf_rbtree_first return the type containing
    the bpf_rb_node field, with the offset set to that field's offset,
    instead of a struct bpf_rb_node *
    * mark_reg_graph_node helper added in previous patch generalizes
      this logic, use it

  * bpf_rbtree_remove's node input is a node that's been inserted
    in the tree - a non-owning reference.

  * bpf_rbtree_remove must invalidate non-owning references in order to
    avoid aliasing issue. Use previously-added
    invalidate_non_owning_refs helper to mark this function as a
    non-owning ref invalidation point.

  * Unlike other functions, which convert one of their input arg regs to
    non-owning reference, bpf_rbtree_first takes no arguments and just
    returns a non-owning reference (possibly null)
    * For now verifier logic for this is special-cased instead of
      adding new kfunc flag.

This patch, along with the previous one, complete special verifier
handling for all rbtree API functions added in this series.

With functional verifier handling of rbtree_remove, under current
non-owning reference scheme, a node type with both bpf_{list,rb}_node
fields could cause the verifier to accept programs which remove such
nodes from collections they haven't been added to.

In order to prevent this, this patch adds a check to btf_parse_fields
which rejects structs with both bpf_{list,rb}_node fields. This is a
temporary measure that can be removed after "collection identity"
followup. See comment added in btf_parse_fields. A linked_list BTF test
exercising the new check is added in this patch as well.

Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230214004017.2534011-6-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-02-13 19:40:53 -08:00