WSL2-Linux-Kernel/net
Jesper Dangaard Brouer 93bb0ceb75 netfilter: conntrack: remove central spinlock nf_conntrack_lock
nf_conntrack_lock is a monolithic lock and suffers from huge contention
on current generation servers (8 or more core/threads).

Perf locking congestion is clear on base kernel:

-  72.56%  ksoftirqd/6  [kernel.kallsyms]    [k] _raw_spin_lock_bh
   - _raw_spin_lock_bh
      + 25.33% init_conntrack
      + 24.86% nf_ct_delete_from_lists
      + 24.62% __nf_conntrack_confirm
      + 24.38% destroy_conntrack
      + 0.70% tcp_packet
+   2.21%  ksoftirqd/6  [kernel.kallsyms]    [k] fib_table_lookup
+   1.15%  ksoftirqd/6  [kernel.kallsyms]    [k] __slab_free
+   0.77%  ksoftirqd/6  [kernel.kallsyms]    [k] inet_getpeer
+   0.70%  ksoftirqd/6  [nf_conntrack]       [k] nf_ct_delete
+   0.55%  ksoftirqd/6  [ip_tables]          [k] ipt_do_table

This patch change conntrack locking and provides a huge performance
improvement.  SYN-flood attack tested on a 24-core E5-2695v2(ES) with
10Gbit/s ixgbe (with tool trafgen):

 Base kernel:   810.405 new conntrack/sec
 After patch: 2.233.876 new conntrack/sec

Notice other floods attack (SYN+ACK or ACK) can easily be deflected using:
 # iptables -A INPUT -m state --state INVALID -j DROP
 # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0

Use an array of hashed spinlocks to protect insertions/deletions of
conntracks into the hash table. 1024 spinlocks seem to give good
results, at minimal cost (4KB memory). Due to lockdep max depth,
1024 becomes 8 if CONFIG_LOCKDEP=y

The hash resize is a bit tricky, because we need to take all locks in
the array. A seqcount_t is used to synchronize the hash table users
with the resizing process.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07 11:41:13 +01:00
..
9p 9p/trans_virtio.c: Fix broken zero-copy on vmalloc() buffers 2014-02-10 17:48:54 -08:00
802 neigh: use NEIGH_VAR_INIT in ndo_neigh_setup functions. 2014-01-16 11:31:58 -08:00
8021q net: introduce netdev_alloc_pcpu_stats() for drivers 2014-02-14 15:49:55 -05:00
appletalk appletalk: fix checkpatch error with indent 2014-02-14 16:18:32 -05:00
atm net: Fix some fallout from the etner_addr_copy() changes. 2014-01-21 18:57:26 -08:00
ax25 net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
batman-adv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-19 01:24:22 -05:00
bluetooth Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid 2014-02-18 16:29:46 -08:00
bridge net: introduce netdev_alloc_pcpu_stats() for drivers 2014-02-14 15:49:55 -05:00
caif net: Include appropriate header file in caif/cfsrvl.c 2014-02-09 17:32:49 -08:00
can linux-can-fixes-for-3.14-20140129 2014-01-30 16:48:17 -08:00
ceph net: remove unnecessary return's 2014-02-13 18:33:38 -05:00
core Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-19 01:24:22 -05:00
dcb dcb: use __dev_get_by_name instead of dev_get_by_name to find interface 2014-01-14 18:50:46 -08:00
dccp dccp: re-enable debug macro 2014-02-16 23:45:00 -05:00
decnet net: Move prototype declaration to header file include/net/dn.h from net/decnet/af_decnet.c 2014-02-09 17:32:49 -08:00
dns_resolver net/*: Fix FSF address in file headers 2013-12-06 12:37:57 -05:00
dsa dsa: Use ether_addr_copy 2014-01-21 18:13:05 -08:00
ethernet net: eth_type_trans() should use skb_header_pointer() 2014-01-16 15:30:31 -08:00
hsr hsr: Use ether_addr_copy 2014-02-18 18:14:09 -05:00
ieee802154 ieee802154: fix faulty check in set_phy_params api 2014-02-18 18:11:05 -05:00
ipv4 netfilter: remove double colon 2014-02-19 11:41:25 +01:00
ipv6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-19 01:24:22 -05:00
ipx ipx: implement shutdown() 2014-02-12 19:26:32 -05:00
irda net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
iucv net: rework recvmsg handler msg_name and msg_namelen logic 2013-11-20 21:52:30 -05:00
key xfrm: export verify_userspi_info for pkfey and netlink interface 2013-12-16 12:54:02 +01:00
l2tp net: remove unnecessary return's 2014-02-13 18:33:38 -05:00
lapb net/lapb: re-send packets on timeout 2013-09-23 16:52:45 -04:00
llc llc: remove noisy WARN from llc_mac_hdr_init 2014-01-28 18:01:32 -08:00
mac80211 netdevice: add queue selection fallback handler for ndo_select_queue 2014-02-17 00:36:34 -05:00
mac802154 ieee802154: add netlink APIs for smartMAC configuration 2014-02-17 16:42:39 -05:00
mpls ipip: add GSO/TSO support 2013-10-19 19:36:19 -04:00
netfilter netfilter: conntrack: remove central spinlock nf_conntrack_lock 2014-03-07 11:41:13 +01:00
netlabel netlabel: Fix FSF address in file headers 2013-12-06 12:37:56 -05:00
netlink netlink: fix checkpatch errors space and "foo *bar" 2014-02-17 16:57:28 -05:00
netrom net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
openvswitch openvswitch: rename ->sync to ->syncp 2014-02-15 02:06:23 -05:00
packet af_packet: remove a stray tab in packet_set_ring() 2014-02-18 18:02:25 -05:00
phonet net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rds net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rfkill Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
rose net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
rxrpc RxRPC fixes 2014-01-28 18:04:18 -08:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-19 01:24:22 -05:00
sctp Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-19 01:24:22 -05:00
sunrpc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-11 12:05:55 -08:00
tipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-02-19 01:24:22 -05:00
unix net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
vmw_vsock net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
wimax wimax: remove dead code 2013-11-21 13:09:42 -05:00
wireless net: remove unnecessary return's 2014-02-13 18:33:38 -05:00
x25 net: add build-time checks for msg->msg_name size 2014-01-18 23:04:16 -08:00
xfrm Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-01-25 11:17:34 -08:00
Kconfig net: netprio: rename config to be more consistent with cgroup configs 2014-01-03 23:41:42 +01:00
Makefile net: move 6lowpan compression code to separate module 2014-01-15 15:36:38 -08:00
compat.c x86, x32: Correct invalid use of user timespec in the kernel 2014-01-30 18:44:13 -08:00
nonet.c
socket.c socket: replace some printk with pr_* 2014-02-13 18:15:10 -05:00
sysctl_net.c net: Update the sysctl permissions handler to test effective uid/gid 2013-10-07 15:57:56 -04:00