WSL2-Linux-Kernel/net
Jesper Dangaard Brouer 8788370a1d pktgen: RCU-ify "if_list" to remove lock in next_to_run()
The if_lock()/if_unlock() in next_to_run() adds a significant
overhead, because its called for every packet in busy loop of
pktgen_thread_worker().  (Thomas Graf originally pointed me
at this lock problem).

Removing these two "LOCK" operations should in theory save us approx
16ns (8ns x 2), as illustrated below we do save 16ns when removing
the locks and introducing RCU protection.

Performance data with CLONE_SKB==100000, TX-size=512, rx-usecs=30:
 (single CPU performance, ixgbe 10Gbit/s, E5-2630)
 * Prev   : 5684009 pps --> 175.93ns (1/5684009*10^9)
 * RCU-fix: 6272204 pps --> 159.43ns (1/6272204*10^9)
 * Diff   : +588195 pps --> -16.50ns

To understand this RCU patch, I describe the pktgen thread model
below.

In pktgen there is several kernel threads, but there is only one CPU
running each kernel thread.  Communication with the kernel threads are
done through some thread control flags.  This allow the thread to
change data structures at a know synchronization point, see main
thread func pktgen_thread_worker().

Userspace changes are communicated through proc-file writes.  There
are three types of changes, general control changes "pgctrl"
(func:pgctrl_write), thread changes "kpktgend_X"
(func:pktgen_thread_write), and interface config changes "etcX@N"
(func:pktgen_if_write).

Userspace "pgctrl" and "thread" changes are synchronized via the mutex
pktgen_thread_lock, thus only a single userspace instance can run.
The mutex is taken while the packet generator is running, by pgctrl
"start".  Thus e.g. "add_device" cannot be invoked when pktgen is
running/started.

All "pgctrl" and all "thread" changes, except thread "add_device",
communicate via the thread control flags.  The main problem is the
exception "add_device", that modifies threads "if_list" directly.

Fortunately "add_device" cannot be invoked while pktgen is running.
But there exists a race between "rem_device_all" and "add_device"
(which normally don't occur, because "rem_device_all" waits 125ms
before returning). Background'ing "rem_device_all" and running
"add_device" immediately allow the race to occur.

The race affects the threads (list of devices) "if_list".  The if_lock
is used for protecting this "if_list".  Other readers are given
lock-free access to the list under RCU read sections.

Note, interface config changes (via proc) can occur while pktgen is
running, which worries me a bit.  I'm assuming proc_remove() takes
appropriate locks, to assure no writers exists after proc_remove()
finish.

I've been running a script exercising the race condition (leading me
to fix the proc_remove order), without any issues.  The script also
exercises concurrent proc writes, while the interface config is
getting removed.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 15:50:23 -07:00
..
9p A bunch of updates and cleanup within the transport layer, 2014-04-11 14:14:57 -07:00
802
8021q 8021q: fix a potential memory leak 2014-06-21 15:12:13 -07:00
appletalk net: Split sk_no_check into sk_no_check_{rx,tx} 2014-05-23 16:28:53 -04:00
atm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
ax25 net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
batman-adv net: add __pskb_copy_fclone and pskb_copy_for_clone 2014-06-11 15:38:02 -07:00
bluetooth Bluetooth: Allow change security level on ATT_CID in slave role 2014-06-13 14:36:39 +02:00
bridge bridge: use list_for_each_entry_continue_reverse 2014-06-21 15:33:22 -07:00
caif net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
can can: add hash based access to single EFF frame filters 2014-05-19 09:38:24 +02:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2014-06-12 23:06:23 -07:00
core pktgen: RCU-ify "if_list" to remove lock in next_to_run() 2014-07-01 15:50:23 -07:00
dcb net: Use netlink_ns_capable to verify the permisions of netlink messages 2014-04-24 13:44:54 -04:00
dccp net: remove inet6_reqsk_alloc 2014-06-27 15:53:35 -07:00
decnet net: Split sk_no_check into sk_no_check_{rx,tx} 2014-05-23 16:28:53 -04:00
dns_resolver Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-06-11 16:02:55 -07:00
dsa net/dsa/dsa.c: remove unnecessary null test before kfree 2014-06-25 16:15:16 -07:00
ethernet
hsr
ieee802154 6lowpan_rtnl: fix off by one while fragmentation 2014-06-02 10:39:42 -07:00
ipv4 tcp: tcp_conn_request: fix build error when IPv6 is disabled 2014-06-29 23:46:38 -07:00
ipv6 ipv6: Allow accepting RA from local IP addresses. 2014-07-01 12:16:24 -07:00
ipx net: Split sk_no_check into sk_no_check_{rx,tx} 2014-05-23 16:28:53 -04:00
irda trivial: net/irda/irlmp.c: Fix closing brace followed by if 2014-06-23 15:04:33 -07:00
iucv af_iucv: correct cleanup if listen backlog is full 2014-05-30 17:35:23 -07:00
key af_key: Replace comma with semicolon 2014-05-30 17:48:58 -07:00
l2tp l2tp: call udp{6}_set_csum 2014-06-04 22:46:38 -07:00
lapb
llc
mac80211 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-06-11 16:02:55 -07:00
mac802154 mac802154: don't deliver packets to devices that are down 2014-06-11 12:10:19 -07:00
mpls gre: Call gso_make_checksum 2014-06-04 22:46:38 -07:00
netfilter netfilter: nf_nat: fix oops on netns removal 2014-06-16 13:58:54 +02:00
netlabel
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-06-03 23:32:12 -07:00
netrom net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
nfc net: add __pskb_copy_fclone and pskb_copy_for_clone 2014-06-11 15:38:02 -07:00
openvswitch openvswitch: introduce rtnl ops stub 2014-07-01 14:40:17 -07:00
packet net: Use netlink_ns_capable to verify the permisions of netlink messages 2014-04-24 13:44:54 -04:00
phonet net: Use netlink_ns_capable to verify the permisions of netlink messages 2014-04-24 13:44:54 -04:00
rds Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
rfkill Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-05-22 13:58:36 -04:00
rose net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
rxrpc af_rxrpc: Fix XDR length check in rxrpc key demarshalling. 2014-05-16 15:24:47 -04:00
sched net: fix some typos in comment 2014-07-01 14:20:32 -07:00
sctp net: sctp: check proc_dointvec result in proc_sctp_do_auth 2014-06-19 21:30:19 -07:00
sunrpc NFSv4: test SECINFO RPC_AUTH_GSS pseudoflavors for support 2014-06-24 18:46:58 -04:00
tipc tipc: simplify connection congestion handling 2014-06-27 12:50:56 -07:00
unix Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
vmw_vsock vsock: Make transport the proto owner 2014-05-05 13:13:50 -04:00
wimax
wireless Merge git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2014-05-29 12:55:38 -04:00
x25 net: Fix use after free by removing length arg from sk_data_ready callbacks. 2014-04-11 16:15:36 -04:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-06-03 23:32:12 -07:00
Kconfig
Makefile
compat.c
nonet.c
socket.c net: use SYSCALL_DEFINEx for sys_recv 2014-04-16 15:15:05 -04:00
sysctl_net.c