From 310886dd5fa3606d9325b10caf7c8ba5e9f9ab03 Mon Sep 17 00:00:00 2001 From: "Herton R. Krzesinski" Date: Wed, 1 Oct 2014 18:49:52 -0300 Subject: [PATCH 1/3] net/rds: call rds_conn_drop instead of open code it at rds_connect_complete Signed-off-by: Herton R. Krzesinski Signed-off-by: David S. Miller --- net/rds/threads.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/rds/threads.c b/net/rds/threads.c index 65eaefcab241..dc2402e871fd 100644 --- a/net/rds/threads.c +++ b/net/rds/threads.c @@ -78,8 +78,7 @@ void rds_connect_complete(struct rds_connection *conn) "current state is %d\n", __func__, atomic_read(&conn->c_state)); - atomic_set(&conn->c_state, RDS_CONN_ERROR); - queue_work(rds_wq, &conn->c_down_w); + rds_conn_drop(conn); return; } From eb74cc97b830c1e438dc1d6b049f17bdb2b9aae5 Mon Sep 17 00:00:00 2001 From: "Herton R. Krzesinski" Date: Wed, 1 Oct 2014 18:49:53 -0300 Subject: [PATCH 2/3] net/rds: do proper house keeping if connection fails in rds_tcp_conn_connect I see two problems if we consider the sock->ops->connect attempt to fail in rds_tcp_conn_connect. The first issue is that for example we don't remove the previously added rds_tcp_connection item to rds_tcp_tc_list at rds_tcp_set_callbacks, which means that on a next reconnect attempt for the same rds_connection, when rds_tcp_conn_connect is called we can again call rds_tcp_set_callbacks, resulting in duplicated items on rds_tcp_tc_list, leading to list corruption: to avoid this just make sure we call properly rds_tcp_restore_callbacks before we exit. The second issue is that we should also release the sock properly, by setting sock = NULL only if we are returning without error. Signed-off-by: Herton R. Krzesinski Signed-off-by: David S. Miller --- net/rds/tcp_connect.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c index a65ee78db0c5..f9f564a6c960 100644 --- a/net/rds/tcp_connect.c +++ b/net/rds/tcp_connect.c @@ -106,11 +106,14 @@ int rds_tcp_conn_connect(struct rds_connection *conn) rds_tcp_set_callbacks(sock, conn); ret = sock->ops->connect(sock, (struct sockaddr *)&dest, sizeof(dest), O_NONBLOCK); - sock = NULL; rdsdebug("connect to address %pI4 returned %d\n", &conn->c_faddr, ret); if (ret == -EINPROGRESS) ret = 0; + if (ret == 0) + sock = NULL; + else + rds_tcp_restore_callbacks(sock, conn->c_transport_data); out: if (sock) From 593cbb3ec6a3f2424966832727f394b1696d0d72 Mon Sep 17 00:00:00 2001 From: "Herton R. Krzesinski" Date: Wed, 1 Oct 2014 18:49:54 -0300 Subject: [PATCH 3/3] net/rds: fix possible double free on sock tear down I got a report of a double free happening at RDS slab cache. One suspicion was that may be somewhere we were doing a sock_hold/sock_put on an already freed sock. Thus after providing a kernel with the following change: static inline void sock_hold(struct sock *sk) { - atomic_inc(&sk->sk_refcnt); + if (!atomic_inc_not_zero(&sk->sk_refcnt)) + WARN(1, "Trying to hold sock already gone: %p (family: %hd)\n", + sk, sk->sk_family); } The warning successfuly triggered: Trying to hold sock already gone: ffff81f6dda61280 (family: 21) WARNING: at include/net/sock.h:350 sock_hold() Call Trace: [] :rds:rds_send_remove_from_sock+0xf0/0x21b [] :rds:rds_send_drop_acked+0xbf/0xcf [] :rds_rdma:rds_ib_recv_tasklet_fn+0x256/0x2dc [] tasklet_action+0x8f/0x12b [] __do_softirq+0x89/0x133 [] call_softirq+0x1c/0x28 [] do_softirq+0x2c/0x7d [] do_IRQ+0xee/0xf7 [] ret_from_intr+0x0/0xa Looking at the call chain above, the only way I think this would be possible is if somewhere we already released the same socket->sock which is assigned to the rds_message at rds_send_remove_from_sock. Which seems only possible to happen after the tear down done on rds_release. rds_release properly calls rds_send_drop_to to drop the socket from any rds_message, and some proper synchronization is in place to avoid race with rds_send_drop_acked/rds_send_remove_from_sock. However, I still see a very narrow window where it may be possible we touch a sock already released: when rds_release races with rds_send_drop_acked, we check RDS_MSG_ON_CONN to avoid cleanup on the same rds_message, but in this specific case we don't clear rm->m_rs. In this case, it seems we could then go on at rds_send_drop_to and after it returns, the sock is freed by last sock_put on rds_release, with concurrently we being at rds_send_remove_from_sock; then at some point in the loop at rds_send_remove_from_sock we process an rds_message which didn't have rm->m_rs unset for a freed sock, and a possible sock_hold on an sock already gone at rds_release happens. This hopefully address the described condition above and avoids a double free on "second last" sock_put. In addition, I removed the comment about socket destruction on top of rds_send_drop_acked: we call rds_send_drop_to in rds_release and we should have things properly serialized there, thus I can't see the comment being accurate there. Signed-off-by: Herton R. Krzesinski Signed-off-by: David S. Miller --- net/rds/send.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/net/rds/send.c b/net/rds/send.c index 23718160d71e..0a64541020b0 100644 --- a/net/rds/send.c +++ b/net/rds/send.c @@ -593,8 +593,11 @@ static void rds_send_remove_from_sock(struct list_head *messages, int status) sock_put(rds_rs_to_sk(rs)); } rs = rm->m_rs; - sock_hold(rds_rs_to_sk(rs)); + if (rs) + sock_hold(rds_rs_to_sk(rs)); } + if (!rs) + goto unlock_and_drop; spin_lock(&rs->rs_lock); if (test_and_clear_bit(RDS_MSG_ON_SOCK, &rm->m_flags)) { @@ -638,9 +641,6 @@ unlock_and_drop: * queue. This means that in the TCP case, the message may not have been * assigned the m_ack_seq yet - but that's fine as long as tcp_is_acked * checks the RDS_MSG_HAS_ACK_SEQ bit. - * - * XXX It's not clear to me how this is safely serialized with socket - * destruction. Maybe it should bail if it sees SOCK_DEAD. */ void rds_send_drop_acked(struct rds_connection *conn, u64 ack, is_acked_func is_acked) @@ -711,6 +711,9 @@ void rds_send_drop_to(struct rds_sock *rs, struct sockaddr_in *dest) */ if (!test_and_clear_bit(RDS_MSG_ON_CONN, &rm->m_flags)) { spin_unlock_irqrestore(&conn->c_lock, flags); + spin_lock_irqsave(&rm->m_rs_lock, flags); + rm->m_rs = NULL; + spin_unlock_irqrestore(&rm->m_rs_lock, flags); continue; } list_del_init(&rm->m_conn_item);