WSL2-Linux-Kernel/net
Taehee Yoo 6c390ef198 xdp: fix invalid wait context of page_pool_destroy()
[ Upstream commit 59a931c5b732ca5fc2ca727f5a72aeabaafa85ec ]

If the driver uses a page pool, it creates a page pool with
page_pool_create().
The reference count of page pool is 1 as default.
A page pool will be destroyed only when a reference count reaches 0.
page_pool_destroy() is used to destroy page pool, it decreases a
reference count.
When a page pool is destroyed, ->disconnect() is called, which is
mem_allocator_disconnect().
This function internally acquires mutex_lock().

If the driver uses XDP, it registers a memory model with
xdp_rxq_info_reg_mem_model().
The xdp_rxq_info_reg_mem_model() internally increases a page pool
reference count if a memory model is a page pool.
Now the reference count is 2.

To destroy a page pool, the driver should call both page_pool_destroy()
and xdp_unreg_mem_model().
The xdp_unreg_mem_model() internally calls page_pool_destroy().
Only page_pool_destroy() decreases a reference count.

If a driver calls page_pool_destroy() then xdp_unreg_mem_model(), we
will face an invalid wait context warning.
Because xdp_unreg_mem_model() calls page_pool_destroy() with
rcu_read_lock().
The page_pool_destroy() internally acquires mutex_lock().

Splat looks like:
=============================
[ BUG: Invalid wait context ]
6.10.0-rc6+ #4 Tainted: G W
-----------------------------
ethtool/1806 is trying to lock:
ffffffff90387b90 (mem_id_lock){+.+.}-{4:4}, at: mem_allocator_disconnect+0x73/0x150
other info that might help us debug this:
context-{5:5}
3 locks held by ethtool/1806:
stack backtrace:
CPU: 0 PID: 1806 Comm: ethtool Tainted: G W 6.10.0-rc6+ #4 f916f41f172891c800f2fed
Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021
Call Trace:
<TASK>
dump_stack_lvl+0x7e/0xc0
__lock_acquire+0x1681/0x4de0
? _printk+0x64/0xe0
? __pfx_mark_lock.part.0+0x10/0x10
? __pfx___lock_acquire+0x10/0x10
lock_acquire+0x1b3/0x580
? mem_allocator_disconnect+0x73/0x150
? __wake_up_klogd.part.0+0x16/0xc0
? __pfx_lock_acquire+0x10/0x10
? dump_stack_lvl+0x91/0xc0
__mutex_lock+0x15c/0x1690
? mem_allocator_disconnect+0x73/0x150
? __pfx_prb_read_valid+0x10/0x10
? mem_allocator_disconnect+0x73/0x150
? __pfx_llist_add_batch+0x10/0x10
? console_unlock+0x193/0x1b0
? lockdep_hardirqs_on+0xbe/0x140
? __pfx___mutex_lock+0x10/0x10
? tick_nohz_tick_stopped+0x16/0x90
? __irq_work_queue_local+0x1e5/0x330
? irq_work_queue+0x39/0x50
? __wake_up_klogd.part.0+0x79/0xc0
? mem_allocator_disconnect+0x73/0x150
mem_allocator_disconnect+0x73/0x150
? __pfx_mem_allocator_disconnect+0x10/0x10
? mark_held_locks+0xa5/0xf0
? rcu_is_watching+0x11/0xb0
page_pool_release+0x36e/0x6d0
page_pool_destroy+0xd7/0x440
xdp_unreg_mem_model+0x1a7/0x2a0
? __pfx_xdp_unreg_mem_model+0x10/0x10
? kfree+0x125/0x370
? bnxt_free_ring.isra.0+0x2eb/0x500
? bnxt_free_mem+0x5ac/0x2500
xdp_rxq_info_unreg+0x4a/0xd0
bnxt_free_mem+0x1356/0x2500
bnxt_close_nic+0xf0/0x3b0
? __pfx_bnxt_close_nic+0x10/0x10
? ethnl_parse_bit+0x2c6/0x6d0
? __pfx___nla_validate_parse+0x10/0x10
? __pfx_ethnl_parse_bit+0x10/0x10
bnxt_set_features+0x2a8/0x3e0
__netdev_update_features+0x4dc/0x1370
? ethnl_parse_bitset+0x4ff/0x750
? __pfx_ethnl_parse_bitset+0x10/0x10
? __pfx___netdev_update_features+0x10/0x10
? mark_held_locks+0xa5/0xf0
? _raw_spin_unlock_irqrestore+0x42/0x70
? __pm_runtime_resume+0x7d/0x110
ethnl_set_features+0x32d/0xa20

To fix this problem, it uses rhashtable_lookup_fast() instead of
rhashtable_lookup() with rcu_read_lock().
Using xa without rcu_read_lock() here is safe.
xa is freed by __xdp_mem_allocator_rcu_free() and this is called by
call_rcu() of mem_xa_remove().
The mem_xa_remove() is called by page_pool_destroy() if a reference
count reaches 0.
The xa is already protected by the reference count mechanism well in the
control plane.
So removing rcu_read_lock() for page_pool_destroy() is safe.

Fixes: c3f812cea0 ("page_pool: do not release pool until inflight == 0.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://patch.msgid.link/20240712095116.3801586-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-19 05:44:59 +02:00
..
6lowpan 6lowpan: iphc: Fix an off-by-one check of array index 2021-07-22 16:19:03 +02:00
9p net/9p: fix uninit-value in p9_client_rpc() 2024-06-16 13:39:58 +02:00
802 mrp: introduce active flags to prevent UAF when applicant uninit 2022-12-31 13:14:42 +01:00
8021q vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING 2024-02-23 08:54:27 +01:00
appletalk appletalk: Fix Use-After-Free in atalk_ioctl 2023-12-20 15:17:37 +01:00
atm atm: Fix Use-After-Free in do_vcc_ioctl 2023-12-20 15:17:35 +01:00
ax25 net: ax25: Fix deadlock caused by skb_recv_datagram in ax25_recvmsg 2022-06-22 14:22:01 +02:00
batman-adv batman-adv: Don't accept TT entries for out-of-spec VIDs 2024-07-05 09:14:49 +02:00
bluetooth Bluetooth: hci_core: cancel all works upon hci_unregister_dev() 2024-07-27 10:46:13 +02:00
bpf bpf: Set run context for rawtp test_run callback 2024-07-05 09:14:06 +02:00
bpfilter bpfilter: Specify the log level for the kmsg message 2021-06-25 13:13:50 +02:00
bridge net: bridge: fix corrupted ethernet header on multicast-to-unicast 2024-05-17 11:50:58 +02:00
caif net: caif: Fix use-after-free in cfusbl_device_notify() 2023-03-17 08:48:54 +01:00
can net: can: j1939: enhanced error handling for tightly received RTS messages in xtp_rx_rts_session_new 2024-07-05 09:14:48 +02:00
ceph libceph: fix race between delayed_work() and ceph_monc_stop() 2024-07-18 13:07:42 +02:00
core xdp: fix invalid wait context of page_pool_destroy() 2024-08-19 05:44:59 +02:00
dcb net: dcb: choose correct policy to parse DCB_ATTR_BCN 2023-08-11 15:13:53 +02:00
dccp Fix race for duplicate reqsk on identical SYN 2024-07-05 09:14:41 +02:00
dns_resolver keys, dns: Fix size check of V1 server-list header 2024-01-25 14:52:46 -08:00
dsa net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPT 2024-06-16 13:39:54 +02:00
ethernet ethernet: Add helper for assigning packet type when dest address does not match device address 2024-05-02 16:24:49 +02:00
ethtool ethtool: netlink: do not return SQI value if link is down 2024-07-18 13:07:38 +02:00
hsr hsr: Handle failures in module init 2024-03-26 18:21:36 -04:00
ieee802154 net: drop nopreempt requirement on sock_prot_inuse_add() 2024-07-05 09:14:08 +02:00
ife net: sched: ife: fix potential use-after-free 2024-01-05 15:13:29 +01:00
ipv4 tcp: fix races in tcp_v[46]_err() 2024-08-19 05:44:56 +02:00
ipv6 tcp: fix races in tcp_v[46]_err() 2024-08-19 05:44:56 +02:00
iucv net/iucv: Avoid explicit cpumask var allocation on stack 2024-07-05 09:14:43 +02:00
kcm net: kcm: fix incorrect parameter validation in the kcm_getsockopt) function 2024-03-26 18:21:23 -04:00
key net: af_key: fix sadb_x_filter validation 2023-08-26 14:23:32 +02:00
l2tp net l2tp: drop flow hash on forward 2024-05-17 11:50:48 +02:00
l3mdev l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu 2022-04-27 14:38:53 +02:00
lapb net: lapb: Use list_for_each_entry() to simplify code in lapb_iface.c 2021-06-08 16:31:25 -07:00
llc llc: call sock_orphan() at release time 2024-02-23 08:54:54 +01:00
mac80211 wifi: mac80211: disable softirqs for queued frame handling 2024-07-27 10:46:15 +02:00
mac802154 net: mac802154: Fix racy device stats updates by DEV_STATS_INC() and DEV_STATS_ADD() 2024-07-27 10:46:13 +02:00
mctp mctp: perform route lookups under a RCU read-side lock 2023-10-25 11:58:59 +02:00
mpls net: mpls: fix stale pointer if allocation fails during device rename 2023-02-22 12:57:09 +01:00
mptcp mptcp: pm: update add_addr counters after connect 2024-07-05 09:14:23 +02:00
ncsi net/ncsi: Fix the multi thread manner of NCSI driver 2024-07-05 09:14:06 +02:00
netfilter ipvs: Avoid unnecessary calls to skb_is_gso_sctp 2024-08-19 05:44:57 +02:00
netlabel calipso: fix memory leak in netlbl_calipso_add_pass() 2024-01-25 14:52:33 -08:00
netlink net: drop nopreempt requirement on sock_prot_inuse_add() 2024-07-05 09:14:08 +02:00
netrom netrom: Fix a memory leak in nr_heartbeat_expiry() 2024-07-05 09:14:29 +02:00
nfc nfc: nci: Fix handling of zero-length payload packets in nci_rx_work() 2024-06-16 13:39:48 +02:00
nsh nsh: Restore skb->{protocol,data,mac_header} for outer header in nsh_gso_segment(). 2024-05-17 11:50:48 +02:00
openvswitch openvswitch: Set the skbuff pkt_type for proper pmtud support. 2024-06-16 13:39:47 +02:00
packet af_packet: avoid a false positive warning in packet_setsockopt() 2024-07-05 09:14:26 +02:00
phonet phonet: fix rtm_phonet_notify() skb allocation 2024-05-17 11:50:58 +02:00
psample psample: Require 'CAP_NET_ADMIN' when joining "packets" group 2023-12-13 18:36:37 +01:00
qrtr net: qrtr: ns: Fix module refcnt 2024-06-16 13:39:33 +02:00
rds net/rds: fix possible cp null dereference 2024-04-10 16:19:37 +02:00
rfkill net: rfkill: gpio: set GPIO direction 2024-01-05 15:13:34 +01:00
rose net/rose: fix races in rose_kill_by_device() 2024-01-05 15:13:29 +01:00
rxrpc rxrpc: Fix response to PING RESPONSE ACKs to a dead call 2024-02-23 08:54:58 +01:00
sched net/sched: Fix UAF when resolving a clash 2024-07-18 13:07:38 +02:00
sctp sctp: prefer struct_size over open coded arithmetic 2024-07-18 13:07:27 +02:00
smc net/smc: set rmb's SG_MAX_SINGLE_ALLOC limitation only when CONFIG_ARCH_NO_SG_CHAIN is defined 2024-08-19 05:44:56 +02:00
strparser bpf: sockmap, strparser, and tls are reusing qdisc_skb_cb and colliding 2021-11-18 19:17:11 +01:00
sunrpc gss_krb5: Fix the error handling path for crypto_sync_skcipher_setkey 2024-08-19 05:44:58 +02:00
switchdev net: make switchdev_bridge_port_{,unoffload} loosely coupled with the bridge 2021-08-04 12:35:07 +01:00
tipc tipc: force a dst refcount before doing decryption 2024-07-05 09:14:30 +02:00
tls tls: fix missing memory barrier in tls_init 2024-06-16 13:39:48 +02:00
unix af_unix: Read with MSG_PEEK loops if the first unread byte is OOB 2024-07-05 09:14:19 +02:00
vmw_vsock virtio/vsock: fix logic which reduces credit update messages 2024-01-25 14:52:38 -08:00
wireless wifi: cfg80211: handle 2x996 RU allocation in cfg80211_calculate_bitrate_he() 2024-08-19 05:44:57 +02:00
x25 net/x25: fix incorrect parameter validation in the x25_getsockopt() function 2024-03-26 18:21:23 -04:00
xdp net: drop nopreempt requirement on sock_prot_inuse_add() 2024-07-05 09:14:08 +02:00
xfrm net: fix __dst_negative_advice() race 2024-06-16 13:39:59 +02:00
Kconfig Remove DECnet support from kernel 2023-06-21 15:59:15 +02:00
Makefile Remove DECnet support from kernel 2023-06-21 15:59:15 +02:00
compat.c net: Return the correct errno code 2021-06-03 15:13:56 -07:00
devres.c net: devres: Correct a grammatical error 2021-06-11 12:55:28 -07:00
socket.c net: Save and restore msg_namelen in sock_sendmsg 2024-01-15 18:51:16 +01:00
sysctl_net.c net: Ensure net namespace isolation of sysctls 2021-04-12 13:27:11 -07:00