WSL2-Linux-Kernel/include/net
Eric Dumazet 75c119afe1 tcp: implement rb-tree based retransmit queue
Using a linear list to store all skbs in write queue has been okay
for quite a while : O(N) is not too bad when N < 500.

Things get messy when N is the order of 100,000 : Modern TCP stacks
want 10Gbit+ of throughput even with 200 ms RTT flows.

40 ns per cache line miss means a full scan can use 4 ms,
blowing away CPU caches.

SACK processing often can use various hints to avoid parsing
whole retransmit queue. But with high packet losses and/or high
reordering, hints no longer work.

Sender has to process thousands of unfriendly SACK, accumulating
a huge socket backlog, burning a cpu and massively dropping packets.

Using an rb-tree for retransmit queue has been avoided for years
because it added complexity and overhead, but now is the time
to be more resistant and say no to quadratic behavior.

1) RTX queue is no longer part of the write queue : already sent skbs
are stored in one rb-tree.

2) Since reaching the head of write queue no longer needs
sk->sk_send_head, we added an union of sk_send_head and tcp_rtx_queue

Tested:

 On receiver :
 netem on ingress : delay 150ms 200us loss 1
 GRO disabled to force stress and SACK storms.

for f in `seq 1 10`
do
 ./netperf -H lpaa6 -l30 -- -K bbr -o THROUGHPUT|tail -1
done | awk '{print $0} {sum += $0} END {printf "%7u\n",sum}'

Before patch :

323.87
351.48
339.59
338.62
306.72
204.07
304.93
291.88
202.47
176.88
   2840

After patch:

1700.83
2207.98
2070.17
1544.26
2114.76
2124.89
1693.14
1080.91
2216.82
1299.94
  18053

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-07 00:28:54 +01:00
..
9p 9p: Implement show_options 2017-07-11 06:08:58 -04:00
bluetooth Bluetooth: make baswap src const 2017-09-01 22:49:47 +02:00
caif
iucv
netfilter netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable" 2017-09-08 18:55:50 +02:00
netns ipv4: Namespaceify tcp_fastopen_blackhole_timeout knob 2017-10-01 17:55:54 -07:00
nfc
phonet
sctp sctp: introduce round robin stream scheduler 2017-10-03 16:27:29 -07:00
tc_act net: sched: introduce helper to identify gact pass action 2017-09-26 20:26:45 -07:00
6lowpan.h
Space.h
act_api.h net_sched: get rid of tcfa_rcu 2017-09-12 20:41:02 -07:00
addrconf.h net: Convert int functions to bool 2017-09-18 11:40:03 -07:00
af_ieee802154.h ieee802154: af_ieee802154: fix typo in comment. 2015-09-17 13:20:05 +02:00
af_rxrpc.h rxrpc: Allow failed client calls to be retried 2017-08-29 10:55:20 +01:00
af_unix.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-07-21 03:38:43 +01:00
af_vsock.h VSOCK: use TCP state constants for sk_state 2017-10-05 18:44:17 -07:00
ah.h
arp.h
atmclip.h
ax25.h
ax88796.h
bond_3ad.h
bond_alb.h
bond_options.h
bonding.h net: Add extack to ndo_add_slave 2017-10-04 21:39:33 -07:00
busy_poll.h net: fix compilation when busy poll is not enabled 2017-08-11 14:59:24 -07:00
calipso.h
cfg80211-wext.h
cfg80211.h
cfg802154.h
checksum.h
cipso_ipv4.h
cls_cgroup.h
codel.h
codel_impl.h
codel_qdisc.h
compat.h
datalink.h
dcbevent.h
dcbnl.h
devlink.h devlink: Add IPv6 header for dpipe 2017-08-31 14:42:19 -07:00
dn.h
dn_dev.h
dn_fib.h
dn_neigh.h
dn_nsp.h
dn_route.h
dsa.h net: dsa: remove tag ops from the switch tree 2017-10-01 04:15:07 +01:00
dsfield.h
dst.h net: prevent dst uses after free 2017-09-21 20:42:15 -07:00
dst_cache.h
dst_metadata.h net/dst: Make skb parameter of skb{metadata_dst, tunnel_info}() const 2017-10-02 11:06:07 -07:00
dst_ops.h
erspan.h gre: introduce native tunnel support for ERSPAN 2017-08-22 14:29:30 -07:00
esp.h
ethoc.h
fib_notifier.h fib: notifier: Add VIF add and delete event types 2017-09-27 11:33:27 -07:00
fib_rules.h net: fib_rules: Implement notification logic in core 2017-08-03 15:35:59 -07:00
firewire.h
flow.h net: Extend struct flowi6 with multipath hash 2017-08-24 18:21:17 -07:00
flow_dissector.h flow_dissector: Cleanup control flow 2017-09-05 11:40:08 -07:00
fou.h
fq.h
fq_impl.h
garp.h
gen_stats.h
genetlink.h
geneve.h
gre.h
gro_cells.h
gtp.h
gue.h
hwbm.h
icmp.h
ieee80211_radiotap.h
ieee802154_netdev.h
if_inet6.h
ife.h
ila.h
inet6_connection_sock.h
inet6_hashtables.h net: ipv6: add second dif to inet6 socket lookups 2017-08-07 11:39:22 -07:00
inet_common.h
inet_connection_sock.h
inet_ecn.h net-ipv6: remove unused IP6_ECN_clear() function 2017-10-01 17:55:54 -07:00
inet_frag.h Revert "net: fix percpu memory leaks" 2017-09-03 11:01:05 -07:00
inet_hashtables.h net: ipv4: add second dif to inet socket lookups 2017-08-07 11:39:21 -07:00
inet_sock.h
inet_timewait_sock.h
inetpeer.h inetpeer: remove AVL implementation in favor of RB tree 2017-07-17 08:59:01 -07:00
ip.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-08-21 17:06:42 -07:00
ip6_checksum.h
ip6_fib.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-09-01 17:42:05 -07:00
ip6_route.h ipv6: Use rt6i_idev index for echo replies to a local address 2017-08-29 15:32:25 -07:00
ip6_tunnel.h
ip_fib.h net: ipv4: remove fib_weight 2017-09-29 06:19:32 +01:00
ip_tunnels.h ipv4: speedup ipv6 tunnels dismantle 2017-09-19 16:32:24 -07:00
ip_vs.h
ipcomp.h
ipconfig.h
ipv6.h net/ipv6: Convert icmpv6_push_pending_frames to void 2017-10-06 09:52:31 -07:00
ipx.h
iw_handler.h
kcm.h
l3mdev.h
lapb.h
lib80211.h
llc.h
llc_c_ac.h
llc_c_ev.h
llc_c_st.h
llc_conn.h
llc_if.h
llc_pdu.h
llc_s_ac.h
llc_s_ev.h
llc_s_st.h
llc_sap.h
lwtunnel.h
mac80211.h mac80211: fix VLAN handling with TXQs 2017-09-05 11:28:43 +02:00
mac802154.h
mip6.h
mld.h
mpls.h
mpls_iptunnel.h
mrp.h
ncsi.h net/ncsi: fix ncsi_vlan_rx_{add,kill}_vid references 2017-09-05 09:11:45 -07:00
ndisc.h
neighbour.h neigh: make strucrt neigh_table::entry_size unsigned int 2017-09-25 20:36:17 -07:00
net_namespace.h net: core: Make the FIB notification chain generic 2017-08-03 15:35:59 -07:00
net_ratelimit.h
netevent.h
netlabel.h
netlink.h netlink: fix nla_put_{u8,u16,u32} for KASAN 2017-09-25 20:18:27 -07:00
netprio_cgroup.h
netrom.h
nexthop.h
nl802154.h
nsh.h net: add NSH header structures and helpers 2017-08-29 15:16:52 -07:00
p8022.h
ping.h
pkt_cls.h net: sched: remove cops->tcf_cl_offload 2017-08-11 13:47:01 -07:00
pkt_sched.h net: sched: Add helpers to identify classids 2017-08-11 13:47:00 -07:00
pptp.h
protocol.h IPv4: early demux can return an error code 2017-10-01 03:55:47 +01:00
psample.h
psnap.h
raw.h net: ipv4: add second dif to raw socket lookups 2017-08-07 11:39:21 -07:00
rawv6.h net: ipv6: add second dif to raw socket lookups 2017-08-07 11:39:22 -07:00
red.h
regulatory.h
request_sock.h
rose.h
route.h udp: perform source validation for mcast early demux 2017-10-01 03:55:47 +01:00
rtnetlink.h rtnetlink: remove __rtnl_af_unregister 2017-10-04 10:33:59 -07:00
sch_generic.h net_sched: no need to free qdisc in RCU callback 2017-09-19 16:30:03 -07:00
scm.h
secure_seq.h
seg6.h ipv6: sr: add support for ip4ip6 encapsulation 2017-08-25 17:10:23 -07:00
seg6_hmac.h
slhc_vj.h
smc.h
snmp.h
sock.h tcp: implement rb-tree based retransmit queue 2017-10-07 00:28:54 +01:00
sock_reuseport.h
stp.h
strparser.h strparser: initialize all callbacks 2017-08-24 21:57:50 -07:00
switchdev.h net: switchdev: Remove bridge bypass support from switchdev 2017-08-07 14:48:48 -07:00
tcp.h tcp: implement rb-tree based retransmit queue 2017-10-07 00:28:54 +01:00
tcp_states.h
timewait_sock.h
tls.h
transp_v6.h
tso.h net: define the TSO header size in net/tso.h 2017-08-23 20:42:09 -07:00
tun_proto.h vxlan: factor out VXLAN-GPE next protocol 2017-08-29 15:16:52 -07:00
udp.h IPv4: early demux can return an error code 2017-10-01 03:55:47 +01:00
udp_tunnel.h net: add infrastructure to un-offload UDP tunnel port 2017-07-24 13:52:59 -07:00
udplite.h
vsock_addr.h
vxlan.h vxlan: factor out VXLAN-GPE next protocol 2017-08-29 15:16:52 -07:00
wext.h
wimax.h
x25.h
x25device.h
xfrm.h xfrm: Add support for network devices capable of removing the ESP trailer 2017-08-31 09:04:03 +02:00