WSL2-Linux-Kernel/net
Kolmakov Dmitriy 7845989cb4 net: tipc: fix stall during bclink wakeup procedure
If an attempt to wake up users of broadcast link is made when there is
no enough place in send queue than it may hang up inside the
tipc_sk_rcv() function since the loop breaks only after the wake up
queue becomes empty. This can lead to complete CPU stall with the
following message generated by RCU:

INFO: rcu_sched self-detected stall on CPU { 0}  (t=2101 jiffies
					g=54225 c=54224 q=11465)
Task dump for CPU 0:
tpch            R  running task        0 39949  39948 0x0000000a
 ffffffff818536c0 ffff88181fa037a0 ffffffff8106a4be 0000000000000000
 ffffffff818536c0 ffff88181fa037c0 ffffffff8106d8a8 ffff88181fa03800
 0000000000000001 ffff88181fa037f0 ffffffff81094a50 ffff88181fa15680
Call Trace:
 <IRQ>  [<ffffffff8106a4be>] sched_show_task+0xae/0x120
 [<ffffffff8106d8a8>] dump_cpu_task+0x38/0x40
 [<ffffffff81094a50>] rcu_dump_cpu_stacks+0x90/0xd0
 [<ffffffff81097c3b>] rcu_check_callbacks+0x3eb/0x6e0
 [<ffffffff8106e53f>] ? account_system_time+0x7f/0x170
 [<ffffffff81099e64>] update_process_times+0x34/0x60
 [<ffffffff810a84d1>] tick_sched_handle.isra.18+0x31/0x40
 [<ffffffff810a851c>] tick_sched_timer+0x3c/0x70
 [<ffffffff8109a43d>] __run_hrtimer.isra.34+0x3d/0xc0
 [<ffffffff8109aa95>] hrtimer_interrupt+0xc5/0x1e0
 [<ffffffff81030d52>] ? native_smp_send_reschedule+0x42/0x60
 [<ffffffff81032f04>] local_apic_timer_interrupt+0x34/0x60
 [<ffffffff810335bc>] smp_apic_timer_interrupt+0x3c/0x60
 [<ffffffff8165a3fb>] apic_timer_interrupt+0x6b/0x70
 [<ffffffff81659129>] ? _raw_spin_unlock_irqrestore+0x9/0x10
 [<ffffffff8107eb9f>] __wake_up_sync_key+0x4f/0x60
 [<ffffffffa313ddd1>] tipc_write_space+0x31/0x40 [tipc]
 [<ffffffffa313dadf>] filter_rcv+0x31f/0x520 [tipc]
 [<ffffffffa313d699>] ? tipc_sk_lookup+0xc9/0x110 [tipc]
 [<ffffffff81659259>] ? _raw_spin_lock_bh+0x19/0x30
 [<ffffffffa314122c>] tipc_sk_rcv+0x2dc/0x3e0 [tipc]
 [<ffffffffa312e7ff>] tipc_bclink_wakeup_users+0x2f/0x40 [tipc]
 [<ffffffffa313ce26>] tipc_node_unlock+0x186/0x190 [tipc]
 [<ffffffff81597c1c>] ? kfree_skb+0x2c/0x40
 [<ffffffffa313475c>] tipc_rcv+0x2ac/0x8c0 [tipc]
 [<ffffffffa312ff58>] tipc_l2_rcv_msg+0x38/0x50 [tipc]
 [<ffffffff815a76d3>] __netif_receive_skb_core+0x5a3/0x950
 [<ffffffff815a98d3>] __netif_receive_skb+0x13/0x60
 [<ffffffff815a993e>] netif_receive_skb_internal+0x1e/0x90
 [<ffffffff815aa138>] napi_gro_receive+0x78/0xa0
 [<ffffffffa07f93f4>] tg3_poll_work+0xc54/0xf40 [tg3]
 [<ffffffff81597c8c>] ? consume_skb+0x2c/0x40
 [<ffffffffa07f9721>] tg3_poll_msix+0x41/0x160 [tg3]
 [<ffffffff815ab0f2>] net_rx_action+0xe2/0x290
 [<ffffffff8104b92a>] __do_softirq+0xda/0x1f0
 [<ffffffff8104bc26>] irq_exit+0x76/0xa0
 [<ffffffff81004355>] do_IRQ+0x55/0xf0
 [<ffffffff8165a12b>] common_interrupt+0x6b/0x6b
 <EOI>

The issue occurs only when tipc_sk_rcv() is used to wake up postponed
senders:

	tipc_bclink_wakeup_users()
		// wakeupq - is a queue which consists of special
		// 		 messages with SOCK_WAKEUP type.
		tipc_sk_rcv(wakeupq)
			...
			while (skb_queue_len(inputq)) {
				filter_rcv(skb)
					// Here the type of message is checked
					// and if it is SOCK_WAKEUP then
					// it tries to wake up a sender.
					tipc_write_space(sk)
						wake_up_interruptible_sync_poll()
			}

After the sender thread is woke up it can gather control and perform
an attempt to send a message. But if there is no enough place in send
queue it will call link_schedule_user() function which puts a message
of type SOCK_WAKEUP to the wakeup queue and put the sender to sleep.
Thus the size of the queue actually is not changed and the while()
loop never exits.

The approach I proposed is to wake up only senders for which there is
enough place in send queue so the described issue can't occur.
Moreover the same approach is already used to wake up senders on
unicast links.

I have got into the issue on our product code but to reproduce the
issue I changed a benchmark test application (from
tipcutils/demos/benchmark) to perform the following scenario:
	1. Run 64 instances of test application (nodes). It can be done
	   on the one physical machine.
	2. Each application connects to all other using TIPC sockets in
	   RDM mode.
	3. When setup is done all nodes start simultaneously send
	   broadcast messages.
	4. Everything hangs up.

The issue is reproducible only when a congestion on broadcast link
occurs. For example, when there are only 8 nodes it works fine since
congestion doesn't occur. Send queue limit is 40 in my case (I use a
critical importance level) and when 64 nodes send a message at the
same moment a congestion occurs every time.

Signed-off-by: Dmitry S Kolmakov <kolmakov.dmitriy@huawei.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-08 22:50:26 -07:00
..
6lowpan 6lowpan: move module_init into core functionality 2015-08-11 22:05:36 +02:00
9p 9p: ensure err is initialized to 0 in p9_client_read/write 2015-08-22 21:35:02 -04:00
802 net: Kill dev_rebuild_header 2015-03-02 16:43:41 -05:00
8021q net: 8021q: convert to using IFF_NO_QUEUE 2015-08-18 11:55:06 -07:00
appletalk net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
atm br2684: Remove unnecessary formatting macros b1 and bs 2015-07-31 15:25:52 -07:00
ax25 NET: AX.25: Stop heartbeat timer on disconnect. 2015-07-15 15:59:58 -07:00
batman-adv batman-adv: turn batadv_neigh_node_get() into local function 2015-08-27 20:15:34 +02:00
bluetooth Bluetooth: Fix SCO link type handling on connection complete 2015-08-28 21:03:00 +02:00
bridge net: bridge: remove unnecessary switchdev include 2015-09-08 22:33:14 -07:00
caif net: caif: convert to using IFF_NO_QUEUE 2015-08-18 11:55:07 -07:00
can can: replace timestamp as unique skb attribute 2015-07-12 21:13:22 +02:00
ceph libceph: treat sockaddr_storage with uninitialized family as blank 2015-07-09 20:30:34 +03:00
core sock, diag: fix panic in sock_diag_put_filterinfo 2015-09-02 11:29:29 -07:00
dcb net/dcb: Add IEEE QCN attribute 2015-03-06 21:50:02 -05:00
dccp tcp: fix recv with flags MSG_WAITALL | MSG_PEEK 2015-07-27 01:06:53 -07:00
decnet net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
dns_resolver
dsa net: dsa: Allow DSA and CPU ports to have a phy-mode property 2015-08-31 14:48:02 -07:00
ethernet flow_dissector: Add flags argument to skb_flow_dissector functions 2015-09-01 15:06:22 -07:00
hsr net: hsr: convert to using IFF_NO_QUEUE 2015-08-18 11:55:07 -07:00
ieee802154 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2015-08-29 13:15:03 -07:00
ipv4 netfilter: nf_dup{4, 6}: fix build error when nf_conntrack disabled 2015-09-02 16:28:06 -07:00
ipv6 net/ipv6: Correct PIM6 mrt_lock handling 2015-09-05 17:31:30 -07:00
ipx net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
irda irda: use msecs_to_jiffies for conversion to jiffies 2015-05-25 17:46:21 -04:00
iucv net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
key net: Fix RCU splat in af_key 2015-08-24 14:48:10 -07:00
l2tp net: Modify sk_alloc to not reference count the netns of kernel sockets. 2015-05-11 10:50:18 -04:00
lapb
llc tcp: fix recv with flags MSG_WAITALL | MSG_PEEK 2015-07-27 01:06:53 -07:00
mac80211 mac80211: reject software RSSI CQM with beacon filtering 2015-09-04 15:23:22 +02:00
mac802154 ieee802154: add ack request default handling 2015-08-10 20:43:06 +02:00
mpls mpls: fix mpls_net_init memory leak 2015-08-31 12:45:09 -07:00
netfilter Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2015-09-05 21:57:42 -07:00
netlabel netlink: implement nla_put_in_addr and nla_put_in6_addr 2015-03-31 13:58:35 -04:00
netlink netlink: rx mmap: fix POLLIN condition 2015-08-30 21:55:51 -07:00
netrom netfilter: Remove spurios included of netfilter.h 2015-06-18 21:14:32 +02:00
nfc nfc: netlink: Add capability to reply to vendor_cmd with data 2015-08-20 22:00:11 +02:00
openvswitch openvswitch: Remove conntrack Kconfig option. 2015-09-06 23:48:33 -07:00
packet packet: add extended BPF fanout mode 2015-08-17 14:22:48 -07:00
phonet net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
rds RDS: rds_conn_lookup() should factor in the struct net for a match 2015-09-05 22:04:58 -07:00
rfkill rfkill: Copy "all" global state to other types 2015-09-04 14:26:56 +02:00
rose Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-06-24 02:58:51 -07:00
rxrpc net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
sched flow_dissector: Add flags argument to skb_flow_dissector functions 2015-09-01 15:06:22 -07:00
sctp sctp: add routing output fallback 2015-09-03 15:43:05 -07:00
sunrpc NFS client bugfixes for Linux 4.2 2015-07-28 09:37:44 -07:00
switchdev switchdev: fix return value of switchdev_port_fdb_dump in case of error 2015-09-05 22:02:11 -07:00
tipc net: tipc: fix stall during bclink wakeup procedure 2015-09-08 22:50:26 -07:00
unix net/unix: support SCM_SECURITY for stream sockets 2015-06-10 22:49:20 -07:00
vmw_vsock net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
wimax net:wimax: Fix doucble word "the the" in networking.xml 2015-08-09 22:43:52 -07:00
wireless cfg80211: regulatory: restore proper user alpha2 2015-09-04 14:29:25 +02:00
x25 net: Pass kern from net_proto_family.create to sk_alloc 2015-05-11 10:50:17 -04:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2015-09-03 08:08:17 -07:00
Kconfig lwtunnel: infrastructure for handling light weight tunnels like mpls 2015-07-21 10:39:03 -07:00
Makefile mpls: Refactor how the mpls module is built 2015-03-04 00:26:06 -05:00
compat.c net: switch importing msghdr from userland to {compat_,}import_iovec() 2015-04-09 00:02:26 -04:00
socket.c net: Add a struct net parameter to sock_create_kern 2015-05-11 10:50:17 -04:00
sysctl_net.c