WSL2-Linux-Kernel/net/core
Taehee Yoo 6c390ef198 xdp: fix invalid wait context of page_pool_destroy()
[ Upstream commit 59a931c5b732ca5fc2ca727f5a72aeabaafa85ec ]

If the driver uses a page pool, it creates a page pool with
page_pool_create().
The reference count of page pool is 1 as default.
A page pool will be destroyed only when a reference count reaches 0.
page_pool_destroy() is used to destroy page pool, it decreases a
reference count.
When a page pool is destroyed, ->disconnect() is called, which is
mem_allocator_disconnect().
This function internally acquires mutex_lock().

If the driver uses XDP, it registers a memory model with
xdp_rxq_info_reg_mem_model().
The xdp_rxq_info_reg_mem_model() internally increases a page pool
reference count if a memory model is a page pool.
Now the reference count is 2.

To destroy a page pool, the driver should call both page_pool_destroy()
and xdp_unreg_mem_model().
The xdp_unreg_mem_model() internally calls page_pool_destroy().
Only page_pool_destroy() decreases a reference count.

If a driver calls page_pool_destroy() then xdp_unreg_mem_model(), we
will face an invalid wait context warning.
Because xdp_unreg_mem_model() calls page_pool_destroy() with
rcu_read_lock().
The page_pool_destroy() internally acquires mutex_lock().

Splat looks like:
=============================
[ BUG: Invalid wait context ]
6.10.0-rc6+ #4 Tainted: G W
-----------------------------
ethtool/1806 is trying to lock:
ffffffff90387b90 (mem_id_lock){+.+.}-{4:4}, at: mem_allocator_disconnect+0x73/0x150
other info that might help us debug this:
context-{5:5}
3 locks held by ethtool/1806:
stack backtrace:
CPU: 0 PID: 1806 Comm: ethtool Tainted: G W 6.10.0-rc6+ #4 f916f41f172891c800f2fed
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
Call Trace:
<TASK>
dump_stack_lvl+0x7e/0xc0
__lock_acquire+0x1681/0x4de0
? _printk+0x64/0xe0
? __pfx_mark_lock.part.0+0x10/0x10
? __pfx___lock_acquire+0x10/0x10
lock_acquire+0x1b3/0x580
? mem_allocator_disconnect+0x73/0x150
? __wake_up_klogd.part.0+0x16/0xc0
? __pfx_lock_acquire+0x10/0x10
? dump_stack_lvl+0x91/0xc0
__mutex_lock+0x15c/0x1690
? mem_allocator_disconnect+0x73/0x150
? __pfx_prb_read_valid+0x10/0x10
? mem_allocator_disconnect+0x73/0x150
? __pfx_llist_add_batch+0x10/0x10
? console_unlock+0x193/0x1b0
? lockdep_hardirqs_on+0xbe/0x140
? __pfx___mutex_lock+0x10/0x10
? tick_nohz_tick_stopped+0x16/0x90
? __irq_work_queue_local+0x1e5/0x330
? irq_work_queue+0x39/0x50
? __wake_up_klogd.part.0+0x79/0xc0
? mem_allocator_disconnect+0x73/0x150
mem_allocator_disconnect+0x73/0x150
? __pfx_mem_allocator_disconnect+0x10/0x10
? mark_held_locks+0xa5/0xf0
? rcu_is_watching+0x11/0xb0
page_pool_release+0x36e/0x6d0
page_pool_destroy+0xd7/0x440
xdp_unreg_mem_model+0x1a7/0x2a0
? __pfx_xdp_unreg_mem_model+0x10/0x10
? kfree+0x125/0x370
? bnxt_free_ring.isra.0+0x2eb/0x500
? bnxt_free_mem+0x5ac/0x2500
xdp_rxq_info_unreg+0x4a/0xd0
bnxt_free_mem+0x1356/0x2500
bnxt_close_nic+0xf0/0x3b0
? __pfx_bnxt_close_nic+0x10/0x10
? ethnl_parse_bit+0x2c6/0x6d0
? __pfx___nla_validate_parse+0x10/0x10
? __pfx_ethnl_parse_bit+0x10/0x10
bnxt_set_features+0x2a8/0x3e0
__netdev_update_features+0x4dc/0x1370
? ethnl_parse_bitset+0x4ff/0x750
? __pfx_ethnl_parse_bitset+0x10/0x10
? __pfx___netdev_update_features+0x10/0x10
? mark_held_locks+0xa5/0xf0
? _raw_spin_unlock_irqrestore+0x42/0x70
? __pm_runtime_resume+0x7d/0x110
ethnl_set_features+0x32d/0xa20

To fix this problem, it uses rhashtable_lookup_fast() instead of
rhashtable_lookup() with rcu_read_lock().
Using xa without rcu_read_lock() here is safe.
xa is freed by __xdp_mem_allocator_rcu_free() and this is called by
call_rcu() of mem_xa_remove().
The mem_xa_remove() is called by page_pool_destroy() if a reference
count reaches 0.
The xa is already protected by the reference count mechanism well in the
control plane.
So removing rcu_read_lock() for page_pool_destroy() is safe.

Fixes: c3f812cea0 ("page_pool: do not release pool until inflight == 0.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240712095116.3801586-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-19 05:44:59 +02:00
..
Makefile
bpf_sk_storage.c
datagram.c net: fix rc7's __skb_datagram_iter() 2024-07-18 13:07:37 +02:00
datagram.h
dev.c net: give more chances to rcu in netdev_wait_allrefs_any() 2024-06-16 13:39:23 +02:00
dev_addr_lists.c
dev_ioctl.c net: dev: Convert sa_data to flexible array in struct sockaddr 2024-03-01 13:21:59 +01:00
devlink.c devlink: report devlink_port_type_warn source device 2024-03-01 13:21:55 +01:00
drop_monitor.c drop_monitor: replace spin_lock by raw_spin_lock 2024-07-05 09:14:26 +02:00
dst.c ipv6: remove max_size check inline with ipv4 2024-01-15 18:51:25 +01:00
dst_cache.c
failover.c
fib_notifier.c
fib_rules.c
filter.c bpf: Add a check for struct bpf_fib_lookup size 2024-07-05 09:14:43 +02:00
flow_dissector.c
flow_offload.c
gen_estimator.c
gen_stats.c
gro_cells.c
hwbm.c
link_watch.c
lwt_bpf.c
lwtunnel.c
neighbour.c neighbour: Don't let neigh_forced_gc() disable preemption for long 2024-01-25 14:52:29 -08:00
net-procfs.c
net-sysfs.c
net-sysfs.h
net-traces.c
net_namespace.c netns: Make get_net_ns() handle zero refcount net 2024-07-05 09:14:30 +02:00
netclassid_cgroup.c
netevent.c
netpoll.c netpoll: Fix race condition in netpoll_owner_active 2024-07-05 09:14:26 +02:00
netprio_cgroup.c
of_net.c
page_pool.c
pktgen.c net: pktgen: Fix interface flags printing 2023-10-25 11:58:58 +02:00
ptp_classifier.c
request_sock.c tcp: make sure init the accept_queue's spinlocks once 2024-02-23 08:54:27 +01:00
rtnetlink.c rtnetlink: Correct nested IFLA_VF_VLAN_LIST attribute validation 2024-05-17 11:50:58 +02:00
scm.c io_uring/unix: drop usage of io_uring socket 2024-03-26 18:21:11 -04:00
secure_seq.c
selftests.c
skbuff.c skbuff: introduce skb_pull_data 2024-07-05 09:14:11 +02:00
skmsg.c skmsg: Skip zero length skb in sk_msg_recvmsg 2024-07-18 13:07:37 +02:00
sock.c ipv6: Fix data races around sk->sk_prot. 2024-07-05 09:14:50 +02:00
sock_destructor.h
sock_diag.c sock_diag: annotate data-races around sock_diag_handlers[family] 2024-03-26 18:21:17 -04:00
sock_map.c sock_map: avoid race between sock_map_close and sk_psock_put 2024-07-05 09:14:20 +02:00
sock_reuseport.c
stream.c
sysctl_net_core.c
timestamping.c
tso.c
utils.c
xdp.c xdp: fix invalid wait context of page_pool_destroy() 2024-08-19 05:44:59 +02:00