WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Alexei Starovoitov	3c2611bac0	selftests/bpf: Fix trace_virtqueue_add_sgs test issue with LLVM 17. LLVM commit https://reviews.llvm.org/D143726 introduced hoistMinMax optimization that transformed (i < VIRTIO_MAX_SGS) && (i < out_sgs) into i < MIN(VIRTIO_MAX_SGS, out_sgs) and caused the verifier to stop recognizing such loop as bounded. Which resulted in the following test failure: libbpf: prog 'trace_virtqueue_add_sgs': BPF program load failed: Bad address libbpf: prog 'trace_virtqueue_add_sgs': -- BEGIN PROG LOAD LOG -- The sequence of 8193 jumps is too complex. verification time 789206 usec stack depth 56 processed 156446 insns (limit 1000000) max_states_per_insn 7 total_states 1746 peak_states 1701 mark_read 12 -- END PROG LOAD LOG -- libbpf: prog 'trace_virtqueue_add_sgs': failed to load: -14 libbpf: failed to load object 'loop6.bpf.o' Workaround the verifier limitation for now with inline asm that prevents this particular optimization. Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-14 15:28:11 -07:00
Alexei Starovoitov	5584d9e63e	Merge branch 'xdp: recycle Page Pool backed skbs built from XDP frames' Alexander Lobakin says: ==================== Yeah, I still remember that "Who needs cpumap nowadays" (c), but anyway. __xdp_build_skb_from_frame() missed the moment when the networking stack became able to recycle skb pages backed by a page_pool. This was making e.g. cpumap redirect even less effective than simple %XDP_PASS. veth was also affected in some scenarios. A lot of drivers use skb_mark_for_recycle() already, it's been almost two years and seems like there are no issues in using it in the generic code too. {__,}xdp_release_frame() can be then removed as it losts its last user. Page Pool becomes then zero-alloc (or almost) in the abovementioned cases, too. Other memory type models (who needs them at this point) have no changes. Some numbers on 1 Xeon Platinum core bombed with 27 Mpps of 64-byte IPv6 UDP, iavf w/XDP[0] (CONFIG_PAGE_POOL_STATS is enabled): Plain %XDP_PASS on baseline, Page Pool driver: src cpu Rx drops dst cpu Rx 2.1 Mpps N/A 2.1 Mpps cpumap redirect (cross-core, w/o leaving its NUMA node) on baseline: 6.8 Mpps 5.0 Mpps 1.8 Mpps cpumap redirect with skb PP recycling: 7.9 Mpps 5.7 Mpps 2.2 Mpps +22% (from cpumap redir on baseline) [0] https://github.com/alobakin/linux/commits/iavf-xdp ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-14 15:20:05 -07:00
Alexander Lobakin	d4e492338d	xdp: remove unused {__,}xdp_release_frame() __xdp_build_skb_from_frame() was the last user of {__,}xdp_release_frame(), which detaches pages from the page_pool. All the consumers now recycle Page Pool skbs and page, except mlx5, stmmac and tsnep drivers, which use page_pool_release_page() directly (might change one day). It's safe to assume this functionality is not needed anymore and can be removed (in favor of recycling). Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230313215553.1045175-5-aleksander.lobakin@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-14 15:20:05 -07:00
Alexander Lobakin	9c94bbf9a8	xdp: recycle Page Pool backed skbs built from XDP frames __xdp_build_skb_from_frame() state(d): /* Until page_pool get SKB return path, release DMA here */ Page Pool got skb pages recycling in April 2021, but missed this function. xdp_release_frame() is relevant only for Page Pool backed frames and it detaches the page from the corresponding page_pool in order to make it freeable via page_frag_free(). It can instead just mark the output skb as eligible for recycling if the frame is backed by a pp. No change for other memory model types (the same condition check as before). cpumap redirect and veth on Page Pool drivers now become zero-alloc (or almost). Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230313215553.1045175-4-aleksander.lobakin@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-14 15:20:05 -07:00
Alexander Lobakin	2c854e5fcd	net: page_pool, skbuff: make skb_mark_for_recycle() always available skb_mark_for_recycle() is guarded with CONFIG_PAGE_POOL, this creates unneeded complication when using it in the generic code. For now, it's only used in the drivers always selecting Page Pool, so this works. Move the guards so that preprocessor will cut out only the operation itself and the function will still be a noop on !PAGE_POOL systems, but available there as well. No functional changes. Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202303020342.Wi2PRFFH-lkp@intel.com Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230313215553.1045175-3-aleksander.lobakin@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-14 15:20:04 -07:00
Alexander Lobakin	487deb3e33	selftests/bpf: robustify test_xdp_do_redirect with more payload magics Currently, the test relies on that only dropped ("xmitted") frames will be recycled and if a frame became an skb, it will be freed later by the stack and never come back to its page_pool. So, it easily gets broken by trying to recycle skbs[0]: test_xdp_do_redirect:PASS:pkt_count_xdp 0 nsec test_xdp_do_redirect:FAIL:pkt_count_zero unexpected pkt_count_zero: actual 9936 != expected 2 test_xdp_do_redirect:PASS:pkt_count_tc 0 nsec That huge mismatch happened because after the TC ingress hook zeroes the magic, the page gets recycled when skb is freed, not returned to the MM layer. "Live frames" mode initializes only new pages and keeps the recycled ones as is by design, so they appear with zeroed magic on the Rx path again. Expand the possible magic values from two: 0 (was "xmitted"/dropped or did hit the TC hook) and 0x42 (hit the input XDP prog) to three: the new one will mark frames hit the TC hook, so that they will elide both @pkt_count_zero and @pkt_count_xdp. They can then be recycled to their page_pool or returned to the page allocator, this won't affect the counters anyhow. Just make sure to mark them as "input" (0x42) when they appear on the Rx path again. Also make an enum from those magics, so that they will be always visible and can be changed in just one place anytime. This also eases adding any new marks later on. Link: https://github.com/kernel-patches/bpf/actions/runs/4386538411/jobs/7681081789 Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20230313215553.1045175-2-aleksander.lobakin@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-14 15:20:04 -07:00
Martin KaFai Lau	283b40c52d	Merge branch 'bpf: Allow helpers access ptr_to_btf_id.' Alexei Starovoitov says: ==================== From: Alexei Starovoitov <ast@kernel.org> Allow code like: bpf_strncmp(task->comm, 16, "foo"); ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-03-13 23:08:21 -07:00
Alexei Starovoitov	f25fd60882	selftests/bpf: Add various tests to check helper access into ptr_to_btf_id. Add various tests to check helper access into ptr_to_btf_id. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230313235845.61029-4-alexei.starovoitov@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-03-13 23:08:21 -07:00
Alexei Starovoitov	3e30be4288	bpf: Allow helpers access trusted PTR_TO_BTF_ID. The verifier rejects the code: bpf_strncmp(task->comm, 16, "my_task"); with the message: 16: (85) call bpf_strncmp#182 R1 type=trusted_ptr_ expected=fp, pkt, pkt_meta, map_key, map_value, mem, ringbuf_mem, buf Teach the verifier that such access pattern is safe. Do not allow untrusted and legacy ptr_to_btf_id to be passed into helpers. Reported-by: David Vernet <void@manifault.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230313235845.61029-3-alexei.starovoitov@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-03-13 23:08:21 -07:00
Alexei Starovoitov	c9267aa8b7	bpf: Fix bpf_strncmp proto. bpf_strncmp() doesn't write into its first argument. Make sure that the verifier knows about it. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230313235845.61029-2-alexei.starovoitov@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-03-13 23:08:21 -07:00
Dave Thaler	b9fe8e8d03	bpf, docs: Add signed comparison example Improve clarity by adding an example of a signed comparison instruction Signed-off-by: Dave Thaler <dthaler@microsoft.com> Acked-by: David Vernet <void@manifault.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/r/20230310233814.4641-1-dthaler1968@googlemail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-13 22:10:45 -07:00
Ross Zwisler	ab4c15feb2	selftests/bpf: use canonical ftrace path The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing Many tests in the bpf selftest code still refer to this older debugfs path, so let's update them to avoid confusion. Signed-off-by: Ross Zwisler <zwisler@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230313205628.1058720-3-zwisler@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-13 21:51:30 -07:00
Ross Zwisler	27d7fdf06f	bpf: use canonical ftrace path The canonical location for the tracefs filesystem is at /sys/kernel/tracing. But, from Documentation/trace/ftrace.rst: Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing Many comments and samples in the bpf code still refer to this older debugfs path, so let's update them to avoid confusion. There are a few spots where the bpf code explicitly checks both tracefs and debugfs (tools/bpf/bpftool/tracelog.c and tools/lib/api/fs/fs.c) and I've left those alone so that the tools can continue to work with both paths. Signed-off-by: Ross Zwisler <zwisler@google.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20230313205628.1058720-2-zwisler@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-13 21:51:30 -07:00
Dave Marchevsky	9e36a204bd	bpf: Disable migration when freeing stashed local kptr using obj drop When a local kptr is stashed in a map and freed when the map goes away, currently an error like the below appears: [ 39.195695] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/u32:15/2875 [ 39.196549] caller is bpf_mem_free+0x56/0xc0 [ 39.196958] CPU: 15 PID: 2875 Comm: kworker/u32:15 Tainted: G O 6.2.0-13016-g22df776a9a86 #4477 [ 39.197897] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 39.198949] Workqueue: events_unbound bpf_map_free_deferred [ 39.199470] Call Trace: [ 39.199703] <TASK> [ 39.199911] dump_stack_lvl+0x60/0x70 [ 39.200267] check_preemption_disabled+0xbf/0xe0 [ 39.200704] bpf_mem_free+0x56/0xc0 [ 39.201032] ? bpf_obj_new_impl+0xa0/0xa0 [ 39.201430] bpf_obj_free_fields+0x1cd/0x200 [ 39.201838] array_map_free+0xad/0x220 [ 39.202193] ? finish_task_switch+0xe5/0x3c0 [ 39.202614] bpf_map_free_deferred+0xea/0x210 [ 39.203006] ? lockdep_hardirqs_on_prepare+0xe/0x220 [ 39.203460] process_one_work+0x64f/0xbe0 [ 39.203822] ? pwq_dec_nr_in_flight+0x110/0x110 [ 39.204264] ? do_raw_spin_lock+0x107/0x1c0 [ 39.204662] ? lockdep_hardirqs_on_prepare+0xe/0x220 [ 39.205107] worker_thread+0x74/0x7a0 [ 39.205451] ? process_one_work+0xbe0/0xbe0 [ 39.205818] kthread+0x171/0x1a0 [ 39.206111] ? kthread_complete_and_exit+0x20/0x20 [ 39.206552] ret_from_fork+0x1f/0x30 [ 39.206886] </TASK> This happens because the call to __bpf_obj_drop_impl I added in the patch adding support for stashing local kptrs doesn't disable migration. Prior to that patch, __bpf_obj_drop_impl logic only ran when called by a BPF progarm, whereas now it can be called from map free path, so it's necessary to explicitly disable migration. Also, refactor a bit to just call __bpf_obj_drop_impl directly instead of bothering w/ dtor union and setting pointer-to-obj_drop. Fixes: `c8e1875409` ("bpf: Support __kptr to local kptrs") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230313214641.3731908-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-13 16:55:04 -07:00
David Vernet	22df776a9a	tasks: Extract rcu_users out of union In commit `3fbd7ee285` ("tasks: Add a count of task RCU users"), a count on the number of RCU users was added to struct task_struct. This was done so as to enable the removal of task_rcu_dereference(), and allow tasks to be protected by RCU even after exiting and being removed from the runqueue. In this commit, the 'refcount_t rcu_users' field that keeps track of this refcount was put into a union co-located with 'struct rcu_head rcu', so as to avoid taking up any extra space in task_struct. This was possible to do safely, because the field was only ever decremented by a static set of specific callers, and then never incremented again. While this restriction of there only being a small, static set of users of this field has worked fine, it prevents us from leveraging the field to use RCU to protect tasks in other contexts. During tracing, for example, it would be useful to be able to collect some tasks that performed a certain operation, put them in a map, and then periodically summarize who they are, which cgroup they're in, how much CPU time they've utilized, etc. While this can currently be done with 'usage', it becomes tricky when a task is already in a map, or if a reference should only be taken if a task is valid and will not soon be reaped. Ideally, we could do something like pass a reference to a map value, and then try to acquire a reference to the task in an RCU read region by using refcount_inc_not_zero(). Similarly, in sched_ext, schedulers are using integer pids to remember tasks, and then looking them up with find_task_by_pid_ns(). This is slow, error prone, and adds complexity. It would be more convenient and performant if BPF schedulers could instead store tasks directly in maps, and then leverage RCU to ensure they can be safely accessed with low overhead. Finally, overloading fields like this is error prone. Someone that wants to use 'rcu_users' could easily overlook the fact that once the rcu callback is scheduled, the refcount will go back to being nonzero, thus precluding the use of refcount_inc_not_zero(). Furthermore, as described below, it's possible to extract the fields of the union without changing the size of task_struct. There are several possible ways to enable this: 1. The lightest touch approach is likely the one proposed in this patch, which is to simply extract 'rcu_users' and 'rcu' from the union, so that scheduling the 'rcu' callback doesn't overwrite the 'rcu_users' refcount. If we have a trusted task pointer, this would allow us to use refcnt_inc_not_zero() inside of an RCU region to determine if we can safely acquire a reference to the task and store it in a map. As mentioned below, this can be done without changing the size of task_struct, by moving the location of the union to another location that has padding gaps we can fill in. 2. Removing 'refcount_t rcu_users', and instead having the entire task be freed in an rcu callback. This is likely the most sound overall design, though it changes the behavioral semantics exposed to callers, who currently expect that a task that's successfully looked up in e.g. the pid_list with find_task_by_pid_ns(), can always have a 'usage' reference acquired on them, as it's guaranteed to be > 0 until after the next gp. In order for this approach to work, we'd have to audit all callers. This approach also slightly changes behavior observed by user space by not invoking trace_sched_process_free() until the whole task_struct is actually being freed, rather than just after it's exited. It also may change timings, as memory will be freed in an RCU callback rather than immediately when the final 'usage' refcount drops to 0. This also is arguably a benefit, as it provides more predictable performance to callers who are refcounting tasks. 3. There may be other solutions as well that don't require changing the layout of task_struct. For example, we could possibly do something complex from the BPF side, such as listen for task exit and remove a task from a map when the task is exiting. This would likely require significant custom handling for task_struct in the verifier, so a more generalizable solution is likely warranted. As mentioned above, this patch proposes the lightest-touch approach which allows callers elsewhere in the kernel to use 'rcu_users' to ensure the lifetime of a task, by extracting 'rcu_users' and 'rcu' from the union. There is no size change in task_struct with this patch. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: David Vernet <void@manifault.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20230215233033.889644-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-13 12:42:24 -07:00
Andrii Nakryiko	34f0677e7a	bpf: fix precision propagation verbose logging Fix wrong order of frame index vs register/slot index in precision propagation verbose (level 2) output. It's wrong and very confusing as is. Fixes: `529409ea92` ("bpf: propagate precision across all frames, not just the last one") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230313184017.4083374-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-13 11:46:44 -07:00
Alexei Starovoitov	49b5300f1f	Merge branch 'Support stashing local kptrs with bpf_kptr_xchg' Dave Marchevsky says: ==================== Local kptrs are kptrs allocated via bpf_obj_new with a type specified in program BTF. A BPF program which creates a local kptr has exclusive control of the lifetime of the kptr, and, prior to terminating, must: * free the kptr via bpf_obj_drop * If the kptr is a {list,rbtree} node, add the node to a {list, rbtree}, thereby passing control of the lifetime to the collection This series adds a third option: * stash the kptr in a map value using bpf_kptr_xchg As indicated by the use of "stash" to describe this behavior, the intended use of this feature is temporary storage of local kptrs. For example, a sched_ext ([0]) scheduler may want to create an rbtree node for each new cgroup on cgroup init, but to add that node to the rbtree as part of a separate program which runs on enqueue. Stashing the node in a map_value allows its lifetime to outlive the execution of the cgroup_init program. Behavior: There is no semantic difference between adding a kptr to a graph collection and "stashing" it in a map. In both cases exclusive ownership of the kptr's lifetime is passed to some containing data structure, which is responsible for bpf_obj_drop'ing it when the container goes away. Since graph collections also expect exclusive ownership of the nodes they contain, graph nodes cannot be both stashed in a map_value and contained by their corresponding collection. Implementation: Two observations simplify the verifier changes for this feature. First, kptrs ("referenced kptrs" until a recent renaming) require registration of a dtor function as part of their acquire/release semantics, so that a referenced kptr which is placed in a map_value is properly released when the map goes away. We want this exact behavior for local kptrs, but with bpf_obj_drop as the dtor instead of a per-btf_id dtor. The second observation is that, in terms of identification, "referenced kptr" and "local kptr" already don't interfere with one another. Consider the following example: struct node_data { long key; long data; struct bpf_rb_node node; }; struct map_value { struct node_data __kptr node; }; struct { __uint(type, BPF_MAP_TYPE_ARRAY); __type(key, int); __type(value, struct map_value); __uint(max_entries, 1); } some_nodes SEC(".maps"); struct map_value mapval; struct node_data res; int key = 0; res = bpf_obj_new(typeof(res)); if (!res) { /* err handling / } mapval = bpf_map_lookup_elem(&some_nodes, &key); if (!mapval) { / err handling / } res = bpf_kptr_xchg(&mapval->node, res); if (res) bpf_obj_drop(res); The __kptr tag identifies map_value's node as a referenced kptr, while the PTR_TO_BTF_ID which bpf_obj_new returns - a type in some non-vmlinux, non-module BTF - identifies res as a local kptr. Type tag on the pointer indicates referenced kptr, while the type of the pointee indicates local kptr. So using existing facilities we can tell the verifier about a "referenced kptr" pointer to a "local kptr" pointee. When kptr_xchg'ing a kptr into a map_value, the verifier can recognize local kptr types and treat them like referenced kptrs with a properly-typed bpf_obj_drop as a dtor. Other implementation notes: We don't need to do anything special to enforce "graph nodes cannot be both stashed in a map_value and contained by their corresponding collection" * bpf_kptr_xchg both returns and takes as input a (possibly-null) owning reference. It does not accept non-owning references as input by virtue of requiring a ref_obj_id. By definition, if a program has an owning ref to a node, the node isn't in a collection, so it's safe to pass ownership via bpf_kptr_xchg. Summary of patches: * Patch 1 modifies BTF plumbing to support using bpf_obj_drop as a dtor * Patch 2 adds verifier plumbing to support MEM_ALLOC-flagged param for bpf_kptr_xchg * Patch 3 adds selftests exercising the new behavior Changelog: v1 -> v2: https://lore.kernel.org/bpf/20230309180111.1618459-1-davemarchevsky@fb.com/ Patch #s used below refer to the patch's position in v1 unless otherwise specified. Patches 1-3 were applied and are not included in v2. Rebase onto latest bpf-next: "libbpf: Revert poisoning of strlcpy" Patch 4: "bpf: Support __kptr to local kptrs" * Remove !btf_is_kernel(btf) check, WARN_ON_ONCE instead (Alexei) Patch 6: "selftests/bpf: Add local kptr stashing test" * Add test which stashes 2 nodes and later unstashes one of them using a separate BPF program (Alexei) * Fix incorrect runner subtest name for original test (was "rbtree_add_nodes") ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 16:38:06 -08:00
Dave Marchevsky	5d8d6634cc	selftests/bpf: Add local kptr stashing test Add a new selftest, local_kptr_stash, which uses bpf_kptr_xchg to stash a bpf_obj_new-allocated object in a map. Test the following scenarios: * Stash two rb_nodes in an arraymap, don't unstash them, rely on map free to destruct them * Stash two rb_nodes in an arraymap, unstash the second one in a separate program, rely on map free to destruct first Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230310230743.2320707-4-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 16:38:05 -08:00
Dave Marchevsky	738c96d5e2	bpf: Allow local kptrs to be exchanged via bpf_kptr_xchg The previous patch added necessary plumbing for verifier and runtime to know what to do with non-kernel PTR_TO_BTF_IDs in map values, but didn't provide any way to get such local kptrs into a map value. This patch modifies verifier handling of bpf_kptr_xchg to allow MEM_ALLOC kptr types. check_reg_type is modified accept MEM_ALLOC-flagged input to bpf_kptr_xchg despite such types not being in btf_ptr_types. This could have been done with a MAYBE_MEM_ALLOC equivalent to MAYBE_NULL, but bpf_kptr_xchg is the only helper that I can forsee using MAYBE_MEM_ALLOC, so keep it special-cased for now. The verifier tags bpf_kptr_xchg retval MEM_ALLOC if and only if the BTF associated with the retval is not kernel BTF. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230310230743.2320707-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 16:38:05 -08:00
Dave Marchevsky	c8e1875409	bpf: Support __kptr to local kptrs If a PTR_TO_BTF_ID type comes from program BTF - not vmlinux or module BTF - it must have been allocated by bpf_obj_new and therefore must be free'd with bpf_obj_drop. Such a PTR_TO_BTF_ID is considered a "local kptr" and is tagged with MEM_ALLOC type tag by bpf_obj_new. This patch adds support for treating __kptr-tagged pointers to "local kptrs" as having an implicit bpf_obj_drop destructor for referenced kptr acquire / release semantics. Consider the following example: struct node_data { long key; long data; struct bpf_rb_node node; }; struct map_value { struct node_data __kptr node; }; struct { __uint(type, BPF_MAP_TYPE_ARRAY); __type(key, int); __type(value, struct map_value); __uint(max_entries, 1); } some_nodes SEC(".maps"); If struct node_data had a matching definition in kernel BTF, the verifier would expect a destructor for the type to be registered. Since struct node_data does not match any type in kernel BTF, the verifier knows that there is no kfunc that provides a PTR_TO_BTF_ID to this type, and that such a PTR_TO_BTF_ID can only come from bpf_obj_new. So instead of searching for a registered dtor, a bpf_obj_drop dtor can be assumed. This allows the runtime to properly destruct such kptrs in bpf_obj_free_fields, which enables maps to clean up map_vals w/ such kptrs when going away. Implementation notes: "kernel_btf" variable is renamed to "kptr_btf" in btf_parse_kptr. Before this patch, the variable would only ever point to vmlinux or module BTFs, but now it can point to some program BTF for local kptr type. It's later used to populate the (btf, btf_id) pair in kptr btf field. * It's necessary to btf_get the program BTF when populating btf_field for local kptr. btf_record_free later does a btf_put. * Behavior for non-local referenced kptrs is not modified, as bpf_find_btf_id helper only searches vmlinux and module BTFs for matching BTF type. If such a type is found, btf_field_kptr's btf will pass btf_is_kernel check, and the associated release function is some one-argument dtor. If btf_is_kernel check fails, associated release function is two-arg bpf_obj_drop_impl. Before this patch only btf_field_kptr's w/ kernel or module BTFs were created. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230310230743.2320707-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 16:38:05 -08:00
Dave Thaler	c1f9e14e3b	bpf, docs: Explain helper functions Add brief text about existence of helper functions, with details to go in separate psABI text. Note that text about runtime functions (kfuncs) is part of a separate patch, not this one. Signed-off-by: Dave Thaler <dthaler@microsoft.com> Link: https://lore.kernel.org/r/20230308205303.1308-1-dthaler1968@googlemail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 13:02:00 -08:00
Dave Marchevsky	74843b57ec	bpf: Change btf_record_find enum parameter to field_mask btf_record_find's 3rd parameter can be multiple enum btf_field_type's masked together. The function is called with BPF_KPTR in two places in verifier.c, so it works with masked values already. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230309180111.1618459-4-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 12:16:37 -08:00
Dave Marchevsky	a4aa38897b	bpf: btf: Remove unused btf_field_info_type enum This enum was added and used in commit `aa3496accc` ("bpf: Refactor kptr_off_tab into btf_record"). Later refactoring in commit `db55911782` ("bpf: Consolidate spin_lock, timer management into btf_record") resulted in the enum values no longer being used anywhere. Let's remove them. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230309180111.1618459-3-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 12:16:37 -08:00
Dave Marchevsky	b32a5dae44	bpf: verifier: Rename kernel_type_name helper to btf_type_name kernel_type_name was introduced in commit `9e15db6613` ("bpf: Implement accurate raw_tp context access via BTF") with type signature: const char kernel_type_name(u32 id) At that time the function used global btf_vmlinux BTF for all id lookups. Later, in commit `22dc4a0f5e` ("bpf: Remove hard-coded btf_vmlinux assumption from BPF verifier"), the type signature was changed to: static const char kernel_type_name(const struct btf* btf, u32 id) With the btf parameter used for lookups instead of global btf_vmlinux. The helper will function as expected for type name lookup using non-kernel BTFs, and will be used for such in further patches in the series. Let's rename it to avoid incorrect assumptions that might arise when seeing the current name. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230309180111.1618459-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 12:16:37 -08:00
Martin KaFai Lau	4659fba121	selftests/bpf: Add local-storage-create benchmark This patch tests how many kmallocs is needed to create and free a batch of UDP sockets and each socket has a 64bytes bpf storage. It also measures how fast the UDP sockets can be created. The result is from my qemu setup. Before bpf_mem_cache_alloc/free: ./bench -p 1 local-storage-create Setting up benchmark 'local-storage-create'... Benchmark 'local-storage-create' started. Iter 0 ( 73.193us): creates 213.552k/s (213.552k/prod), 3.09 kmallocs/create Iter 1 (-20.724us): creates 211.908k/s (211.908k/prod), 3.09 kmallocs/create Iter 2 ( 9.280us): creates 212.574k/s (212.574k/prod), 3.12 kmallocs/create Iter 3 ( 11.039us): creates 213.209k/s (213.209k/prod), 3.12 kmallocs/create Iter 4 (-11.411us): creates 213.351k/s (213.351k/prod), 3.12 kmallocs/create Iter 5 ( -7.915us): creates 214.754k/s (214.754k/prod), 3.12 kmallocs/create Iter 6 ( 11.317us): creates 210.942k/s (210.942k/prod), 3.12 kmallocs/create Summary: creates 212.789 ± 1.310k/s (212.789k/prod), 3.12 kmallocs/create After bpf_mem_cache_alloc/free: ./bench -p 1 local-storage-create Setting up benchmark 'local-storage-create'... Benchmark 'local-storage-create' started. Iter 0 ( 68.265us): creates 243.984k/s (243.984k/prod), 1.04 kmallocs/create Iter 1 ( 30.357us): creates 238.424k/s (238.424k/prod), 1.04 kmallocs/create Iter 2 (-18.712us): creates 232.963k/s (232.963k/prod), 1.04 kmallocs/create Iter 3 (-15.885us): creates 238.879k/s (238.879k/prod), 1.04 kmallocs/create Iter 4 ( 5.590us): creates 237.490k/s (237.490k/prod), 1.04 kmallocs/create Iter 5 ( 8.577us): creates 237.521k/s (237.521k/prod), 1.04 kmallocs/create Iter 6 ( -6.263us): creates 238.508k/s (238.508k/prod), 1.04 kmallocs/create Summary: creates 237.298 ± 2.198k/s (237.298k/prod), 1.04 kmallocs/create Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-18-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:06:02 -08:00
Martin KaFai Lau	1f443d0f2b	selftests/bpf: Check freeing sk->sk_local_storage with sk_local_storage->smap is NULL This patch tweats the socket_bind bpf prog to test the local_storage->smap == NULL case in the bpf_local_storage_free() code path. The idea is to create the local_storage with the sk_storage_map's selem first. Then add the sk_storage_map2's selem and then delete the earlier sk_storeage_map's selem. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-17-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:06:02 -08:00
Martin KaFai Lau	57ef77152b	selftests/bpf: Replace CHECK with ASSERT in test_local_storage This patch migrates the CHECK macro to ASSERT macro. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-16-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:06:02 -08:00
Martin KaFai Lau	7e30a8477b	bpf: Add bpf_local_storage_free() This patch refactors local_storage freeing logic into bpf_local_storage_free(). It is a preparation work for a later patch that uses bpf_mem_cache_alloc/free. The other kfree(local_storage) cases are also changed to bpf_local_storage_free(..., reuse_now = true). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-12-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:05:29 -08:00
Martin KaFai Lau	1288aaa278	bpf: Add bpf_local_storage_rcu callback The existing bpf_local_storage_free_rcu is renamed to bpf_local_storage_free_trace_rcu. A new bpf_local_storage_rcu callback is added to do the kfree instead of using kfree_rcu. It is a preparation work for a later patch using bpf_mem_cache_alloc/free. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-11-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:05:29 -08:00
Martin KaFai Lau	c0d63f3091	bpf: Add bpf_selem_free() This patch refactors the selem freeing logic into bpf_selem_free(). It is a preparation work for a later patch using bpf_mem_cache_alloc/free. The other kfree(selem) cases are also changed to bpf_selem_free(..., reuse_now = true). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-10-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:05:28 -08:00
Martin KaFai Lau	f8ccf30c17	bpf: Add bpf_selem_free_rcu callback Add bpf_selem_free_rcu() callback to do the kfree() instead of using kfree_rcu. It is a preparation work for using bpf_mem_cache_alloc/free in a later patch. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-9-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:05:28 -08:00
Martin KaFai Lau	c609981342	bpf: Remove bpf_selem_free_fields_rcu This patch removes the bpf_selem_free_fields_rcu. The bpf_obj_free_fields() can be done before the call_rcu_trasks_trace() and kfree_rcu(). It is needed when a later patch uses bpf_mem_cache_alloc/free. In bpf hashtab, bpf_obj_free_fields() is also called before calling bpf_mem_cache_free. The discussion can be found in https://lore.kernel.org/bpf/f67021ee-21d9-bfae-6134-4ca542fab843@linux.dev/ Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-8-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:05:28 -08:00
Martin KaFai Lau	a47eabf216	bpf: Repurpose use_trace_rcu to reuse_now in bpf_local_storage This patch re-purpose the use_trace_rcu to mean if the freed memory can be reused immediately or not. The use_trace_rcu is renamed to reuse_now. Other than the boolean test is reversed, it should be a no-op. The following explains the reason for the rename and how it will be used in a later patch. In a later patch, bpf_mem_cache_alloc/free will be used in the bpf_local_storage. The bpf mem allocator will reuse the freed memory immediately. Some of the free paths in bpf_local_storage does not support memory to be reused immediately. These paths are the "delete" elem cases from the bpf_*_storage_delete() helper and the map_delete_elem() syscall. Note that "delete" elem before the owner's (sk/task/cgrp/inode) lifetime ended is not the common usage for the local storage. The common free path, bpf_local_storage_destroy(), can reuse the memory immediately. This common path means the storage stays with its owner until the owner is destroyed. The above mentioned "delete" elem paths that cannot reuse immediately always has the 'use_trace_rcu == true'. The cases that is safe for immediate reuse always have 'use_trace_rcu == false'. Instead of adding another arg in a later patch, this patch re-purpose this arg to reuse_now and have the test logic reversed. In a later patch, 'reuse_now == true' will free to the bpf_mem_cache_free() where the memory can be reused immediately. 'reuse_now == false' will go through the call_rcu_tasks_trace(). Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-7-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:05:28 -08:00
Martin KaFai Lau	fc6652aab6	bpf: Remember smap in bpf_local_storage This patch remembers which smap triggers the allocation of a 'struct bpf_local_storage' object. The local_storage is allocated during the very first selem added to the owner. The smap pointer is needed when using the bpf_mem_cache_free in a later patch because it needs to free to the correct smap's bpf_mem_alloc object. When a selem is being removed, it needs to check if it is the selem that triggers the creation of the local_storage. If it is, the local_storage->smap pointer will be reset to NULL. This NULL reset is done under the local_storage->lock in bpf_selem_unlink_storage_nolock() when a selem is being removed. Also note that the local_storage may not go away even local_storage->smap is NULL because there may be other selem still stored in the local_storage. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-6-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:05:28 -08:00
Martin KaFai Lau	121f31f3e0	bpf: Remove the preceding __ from __bpf_selem_unlink_storage __bpf_selem_unlink_storage is taking the spin lock and there is no name collision also. Having the preceding '__' is confusing when reviewing the later patch. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-5-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:05:28 -08:00
Martin KaFai Lau	62827d612a	bpf: Remove __bpf_local_storage_map_alloc bpf_local_storage_map_alloc() is the only caller of __bpf_local_storage_map_alloc(). The remaining logic in bpf_local_storage_map_alloc() is only a one liner setting the smap->cache_idx. Remove __bpf_local_storage_map_alloc() to simplify code. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-4-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:05:28 -08:00
Martin KaFai Lau	2ffcb6fc50	bpf: Refactor codes into bpf_local_storage_destroy This patch first renames bpf_local_storage_unlink_nolock to bpf_local_storage_destroy(). It better reflects that it is only used when the storage's owner (sk/task/cgrp/inode) is being kfree(). All bpf_local_storage_destroy's caller is taking the spin lock and then free the storage. This patch also moves these two steps into the bpf_local_storage_destroy. This is a preparation work for a later patch that uses bpf_mem_cache_alloc/free in the bpf_local_storage. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-3-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:05:28 -08:00
Martin KaFai Lau	4cbd23cc92	bpf: Move a few bpf_local_storage functions to static scope This patch moves the bpf_local_storage_free_rcu() and bpf_selem_unlink_map() to static because they are not used outside of bpf_local_storage.c. Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20230308065936.1550103-2-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 11:05:28 -08:00
David Vernet	4a54de6596	bpf/selftests: Fix send_signal tracepoint tests The send_signal tracepoint tests are non-deterministically failing in CI. The test works as follows: 1. Two pairs of file descriptors are created using the pipe() function. One pair is used to communicate between a parent process -> child process, and the other for the reverse direction. 2. A child is fork()'ed. The child process registers a signal handler, notifies its parent that the signal handler is registered, and then and waits for its parent to have enabled a BPF program that sends a signal. 3. The parent opens and loads a BPF skeleton with programs that send signals to the child process. The different programs are triggered by different perf events (either NMI or normal perf), or by regular tracepoints. The signal is delivered to the child whenever the child triggers the program. 4. The child's signal handler is invoked, which sets a flag saying that the signal handler was reached. The child then signals to the parent that it received the signal, and the test ends. The perf testcases (send_signal_perf{_thread} and send_signal_nmi{_thread}) work 100% of the time, but the tracepoint testcases fail non-deterministically because the tracepoint is not always being fired for the child. There are two tracepoint programs registered in the test: 'tracepoint/sched/sched_switch', and 'tracepoint/syscalls/sys_enter_nanosleep'. The child never intentionally blocks, nor sleeps, so neither tracepoint is guaranteed to be triggered. To fix this, we can have the child trigger the nanosleep program with a usleep(). Before this patch, the test would fail locally every 2-3 runs. Now, it doesn't fail after more than 1000 runs. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230310061909.1420887-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 10:36:46 -08:00
Andrii Nakryiko	52c2b005a3	bpf: take into account liveness when propagating precision When doing state comparison, if old state has register that is not marked as REG_LIVE_READ, then we just skip comparison, regardless what's the state of corresponing register in current state. This is because not REG_LIVE_READ register is irrelevant for further program execution and correctness. All good here. But when we get to precision propagation, after two states were declared equivalent, we don't take into account old register's liveness, and thus attempt to propagate precision for register in current state even if that register in old state was not REG_LIVE_READ anymore. This is bad, because register in current state could be anything at all and this could cause -EFAULT due to internal logic bugs. Fix by taking into account REG_LIVE_READ liveness mark to keep the logic in state comparison in sync with precision propagation. Fixes: `a3ce685dd0` ("bpf: fix precision tracking") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230309224131.57449-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 10:11:42 -08:00
Andrii Nakryiko	4b5ce570db	bpf: ensure state checkpointing at iter_next() call sites State equivalence check and checkpointing performed in is_state_visited() employs certain heuristics to try to save memory by avoiding state checkpoints if not enough jumps and instructions happened since last checkpoint. This leads to unpredictability of whether a particular instruction will be checkpointed and how regularly. While normally this is not causing much problems (except inconveniences for predictable verifier tests, which we overcome with BPF_F_TEST_STATE_FREQ flag), turns out it's not the case for open-coded iterators. Checking and saving state checkpoints at iter_next() call is crucial for fast convergence of open-coded iterator loop logic, so we need to force it. If we don't do that, is_state_visited() might skip saving a checkpoint, causing unnecessarily long sequence of not checkpointed instructions and jumps, leading to exhaustion of jump history buffer, and potentially other undesired outcomes. It is expected that with correct open-coded iterators convergence will happen quickly, so we don't run a risk of exhausting memory. This patch adds, in addition to prune and jump instruction marks, also a "forced checkpoint" mark, and makes sure that any iter_next() call instruction is marked as such. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230310060149.625887-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 08:31:42 -08:00
Alexei Starovoitov	1456ddcce5	Merge branch 'selftests/bpf: make BPF_CFLAGS stricter with -Wall' Andrii Nakryiko says: ==================== Make BPF-side compiler flags stricter by adding -Wall. Fix tons of small issues pointed out by compiler immediately after that. That includes newly added bpf_for(), bpf_for_each(), and bpf_repeat() macros. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 08:14:08 -08:00
Andrii Nakryiko	3d5a55ddc2	selftests/bpf: make BPF compiler flags stricter We recently added -Wuninitialized, but it's not enough to catch various silly mistakes or omissions. Let's go all the way to -Wall, just like we do for user-space code. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230309054015.4068562-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 08:14:08 -08:00
Andrii Nakryiko	c8ed668593	selftests/bpf: fix lots of silly mistakes pointed out by compiler Once we enable -Wall for BPF sources, compiler will complain about lots of unused variables, variables that are set but never read, etc. Fix all these issues first before enabling -Wall in Makefile. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230309054015.4068562-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 08:14:08 -08:00
Andrii Nakryiko	713461b895	selftests/bpf: add __sink() macro to fake variable consumption Add __sink(expr) macro that forces compiler to believe that passed in expression is both read and written. It used a simple embedded asm for this. This is useful in a lot of tests where we assign value to some variable to trigger some action, but later don't read variable, causing compiler to complain (if corresponding compiler warnings are turned on, which we'll do in the next patch). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230309054015.4068562-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 08:14:07 -08:00
Andrii Nakryiko	2498e6231b	selftests/bpf: prevent unused variable warning in bpf_for() Add __attribute__((unused)) to inner __p variable inside bpf_for(), bpf_for_each(), and bpf_repeat() macros to avoid compiler warnings about unused variable. Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230309054015.4068562-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-03-10 08:14:07 -08:00
Yonghong Song	63d78b7e8c	selftests/bpf: Workaround verification failure for fexit_bpf2bpf/func_replace_return_code With latest llvm17, selftest fexit_bpf2bpf/func_replace_return_code has the following verification failure: 0: R1=ctx(off=0,imm=0) R10=fp0 ; int connect_v4_prog(struct bpf_sock_addr ctx) 0: (bf) r7 = r1 ; R1=ctx(off=0,imm=0) R7_w=ctx(off=0,imm=0) 1: (b4) w6 = 0 ; R6_w=0 ; memset(&tuple.ipv4.saddr, 0, sizeof(tuple.ipv4.saddr)); ... ; return do_bind(ctx) ? 1 : 0; 179: (bf) r1 = r7 ; R1=ctx(off=0,imm=0) R7=ctx(off=0,imm=0) 180: (85) call pc+147 Func#3 is global and valid. Skipping. 181: R0_w=scalar() 181: (bc) w6 = w0 ; R0_w=scalar() R6_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 182: (05) goto pc-129 ; } 54: (bc) w0 = w6 ; R0_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) 55: (95) exit At program exit the register R0 has value (0x0; 0xffffffff) should have been in (0x0; 0x1) processed 281 insns (limit 1000000) max_states_per_insn 1 total_states 26 peak_states 26 mark_read 13 -- END PROG LOAD LOG -- libbpf: prog 'connect_v4_prog': failed to load: -22 The corresponding source code: __attribute__ ((noinline)) int do_bind(struct bpf_sock_addr ctx) { struct sockaddr_in sa = {}; sa.sin_family = AF_INET; sa.sin_port = bpf_htons(0); sa.sin_addr.s_addr = bpf_htonl(SRC_REWRITE_IP4); if (bpf_bind(ctx, (struct sockaddr )&sa, sizeof(sa)) != 0) return 0; return 1; } ... SEC("cgroup/connect4") int connect_v4_prog(struct bpf_sock_addr ctx) { ... return do_bind(ctx) ? 1 : 0; } Insn 180 is a call to 'do_bind'. The call's return value is also the return value for the program. Since do_bind() returns 0/1, so it is legitimate for compiler to optimize 'return do_bind(ctx) ? 1 : 0' to 'return do_bind(ctx)'. However, such optimization breaks verifier as the return value of 'do_bind()' is marked as any scalar which violates the requirement of prog return value 0/1. There are two ways to fix this problem, (1) changing 'return 1' in do_bind() to e.g. 'return 10' so the compiler has to do 'do_bind(ctx) ? 1 :0', or (2) suggested by Andrii, marking do_bind() with __weak attribute so the compiler cannot make any assumption on do_bind() return value. This patch adopted adding __weak approach which is simpler and more resistant to potential compiler optimizations. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230310012410.2920570-1-yhs@fb.com	2023-03-09 18:59:54 -08:00
Lorenzo Bianconi	c1cd734c1b	selftests/bpf: Improve error logs in XDP compliance test tool Improve some error logs reported in the XDP compliance test tool. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/212fc5bd214ff706f6ef1acbe7272cf4d803ca9c.1678382940.git.lorenzo@kernel.org	2023-03-09 20:52:40 +01:00
Lorenzo Bianconi	27a36bc3cd	selftests/bpf: Use ifname instead of ifindex in XDP compliance test tool Rely on interface name instead of interface index in error messages or logs from XDP compliance test tool. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/7dc5a8ff56c252b1a7ae29b059d0b2b1543c8b5d.1678382940.git.lorenzo@kernel.org	2023-03-09 20:52:30 +01:00
Michael Weiß	5a70f4a630	bpf: Fix a typo for BPF_F_ANY_ALIGNMENT in bpf.h Fix s/BPF_PROF_LOAD/BPF_PROG_LOAD/ typo in the documentation comment for BPF_F_ANY_ALIGNMENT in bpf.h. Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230309133823.944097-1-michael.weiss@aisec.fraunhofer.de	2023-03-09 20:42:57 +01:00

1 2 3 4 5 ...

1168764 Коммитов Все ветки Поиск

1168764 Коммитов

Все ветки