WSL2-Linux-Kernel/include
Jesper Dangaard Brouer 93bb0ceb75 netfilter: conntrack: remove central spinlock nf_conntrack_lock
nf_conntrack_lock is a monolithic lock and suffers from huge contention
on current generation servers (8 or more core/threads).

Perf locking congestion is clear on base kernel:

-  72.56%  ksoftirqd/6  [kernel.kallsyms]    [k] _raw_spin_lock_bh
   - _raw_spin_lock_bh
      + 25.33% init_conntrack
      + 24.86% nf_ct_delete_from_lists
      + 24.62% __nf_conntrack_confirm
      + 24.38% destroy_conntrack
      + 0.70% tcp_packet
+   2.21%  ksoftirqd/6  [kernel.kallsyms]    [k] fib_table_lookup
+   1.15%  ksoftirqd/6  [kernel.kallsyms]    [k] __slab_free
+   0.77%  ksoftirqd/6  [kernel.kallsyms]    [k] inet_getpeer
+   0.70%  ksoftirqd/6  [nf_conntrack]       [k] nf_ct_delete
+   0.55%  ksoftirqd/6  [ip_tables]          [k] ipt_do_table

This patch change conntrack locking and provides a huge performance
improvement.  SYN-flood attack tested on a 24-core E5-2695v2(ES) with
10Gbit/s ixgbe (with tool trafgen):

 Base kernel:   810.405 new conntrack/sec
 After patch: 2.233.876 new conntrack/sec

Notice other floods attack (SYN+ACK or ACK) can easily be deflected using:
 # iptables -A INPUT -m state --state INVALID -j DROP
 # sysctl -w net/netfilter/nf_conntrack_tcp_loose=0

Use an array of hashed spinlocks to protect insertions/deletions of
conntracks into the hash table. 1024 spinlocks seem to give good
results, at minimal cost (4KB memory). Due to lockdep max depth,
1024 becomes 8 if CONFIG_LOCKDEP=y

The hash resize is a bit tricky, because we need to take all locks in
the array. A seqcount_t is used to synchronize the hash table users
with the resizing process.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-03-07 11:41:13 +01:00
..
acpi Merge branches 'acpi-processor', 'acpi-hotplug', 'acpi-init', 'acpi-pm' and 'acpica' 2014-01-29 11:47:18 +01:00
asm-generic mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit 2014-02-17 11:19:36 +11:00
clocksource
crypto
drm Merge tag 'ttm-fixes-3.14-2014-02-18' of git://people.freedesktop.org/~thomash/linux into drm-fixes 2014-02-19 08:21:26 +10:00
dt-bindings Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2014-01-30 17:07:18 -08:00
keys
kvm
linux netfilter: ipset: add forceadd kernel support for hash set types 2014-03-06 09:31:43 +01:00
math-emu
media
memory
misc
net netfilter: conntrack: remove central spinlock nf_conntrack_lock 2014-03-07 11:41:13 +01:00
pcmcia
ras
rdma IB: Report using RoCE IP based gids in port caps 2014-02-13 14:46:03 -08:00
rxrpc
scsi Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending 2014-01-31 15:31:23 -08:00
sound ARM: SoC board updates for 3.14 2014-01-23 18:48:28 -08:00
target target: Simplify command completion by removing CMD_T_FAILED flag 2014-02-12 15:14:30 -08:00
trace intel_pstate: Remove energy reporting from pstate_sample tracepoint 2014-02-13 02:11:18 +01:00
uapi netfilter: ipset: add forceadd kernel support for hash set types 2014-03-06 09:31:43 +01:00
video video: pxa168fb: Cleanup pxa168fb.h file 2014-01-17 10:57:43 +02:00
xen Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2014-02-14 10:45:18 -08:00
Kbuild