tcp: extend F-RTO to catch more spurious timeouts

Current F-RTO reverts cwnd reset whenever a never-retransmitted packet was (s)acked. The timeout can be declared spurious because the packets acknoledged with this ACK was transmitted before the timeout, so clearly not all the packets are lost to reset the cwnd. This nice detection does not really depend F-RTO internals. This patch applies the detection universally. On Google servers this change detected 20% more spurious timeouts. Suggested-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 22:11:37 -08:00 · 2017-01-12 22:11:37 -08:00 · 89fe18e44f
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@ -1939,7 +1939,6 @@ void tcp_enter_loss(struct sock *sk)
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct net *net = sock_net(sk);
 	struct sk_buff *skb;
 	bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery;
 	bool is_reneg;			/* is receiver reneging on SACKs? */
 	bool mark_lost;
@ -2000,13 +1999,15 @@ void tcp_enter_loss(struct sock *sk)
 	tp->high_seq = tp->snd_nxt;
 	tcp_ecn_queue_cwr(tp);
-	/* F-RTO RFC5682 sec 3.1 step 1: retransmit SND.UNA if no previous
+	/* F-RTO RFC5682 sec 3.1 step 1 mandates to disable F-RTO
-	 * loss recovery is underway except recurring timeout(s) on
+	 * if a previous recovery is underway, otherwise it may incorrectly
-	 * the same SND.UNA (sec 3.2). Disable F-RTO on path MTU probing
+	 * call a timeout spurious if some previously retransmitted packets
 	 * are s/acked (sec 3.2). We do not apply that retriction since
 	 * retransmitted skbs are permanently tagged with TCPCB_EVER_RETRANS
 	 * so FLAG_ORIG_SACK_ACKED is always correct. But we do disable F-RTO
 	 * on PTMU discovery to avoid sending new data.
 	 */
-	tp->frto = sysctl_tcp_frto &&
+	tp->frto = sysctl_tcp_frto && !inet_csk(sk)->icsk_mtup.probe_size;
 		   (new_recovery || icsk->icsk_retransmits) &&
 		   !inet_csk(sk)->icsk_mtup.probe_size;
 }
 /* If ACK arrived pointing to a remembered SACK, it means that our
@ -2740,14 +2741,18 @@ static void tcp_process_loss(struct sock *sk, int flag, bool is_dupack,
 	    tcp_try_undo_loss(sk, false))
 		return;
-	if (tp->frto) { /* F-RTO RFC5682 sec 3.1 (sack enhanced version). */
+	/* The ACK (s)acks some never-retransmitted data meaning not all
-		/* Step 3.b. A timeout is spurious if not all data are
+	 * the data packets before the timeout were lost. Therefore we
-		 * lost, i.e., never-retransmitted data are (s)acked.
+	 * undo the congestion window and state. This is essentially
 	 * the operation in F-RTO (RFC5682 section 3.1 step 3.b). Since
 	 * a retransmitted skb is permantly marked, we can apply such an
 	 * operation even if F-RTO was not used.
 	 */
 	if ((flag & FLAG_ORIG_SACK_ACKED) &&
-		    tcp_try_undo_loss(sk, true))
+	    tcp_try_undo_loss(sk, tp->undo_marker))
 		return;
 	if (tp->frto) { /* F-RTO RFC5682 sec 3.1 (sack enhanced version). */
 		if (after(tp->snd_nxt, tp->high_seq)) {
 			if (flag & FLAG_DATA_SACKED || is_dupack)
 				tp->frto = 0; /* Step 3.a. loss was real */