Documentation: RDS: Document Multipath RDS (mprds)
Document the design of mprds, covering a brief description of the motivation, data-structures and modifications to the RDS control plane. Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
Родитель
d67214a29b
Коммит
09204a6cda
|
@ -365,4 +365,59 @@ The recv path
|
|||
handle CMSGs
|
||||
return to application
|
||||
|
||||
Multipath RDS (mprds)
|
||||
=====================
|
||||
Mprds is multipathed-RDS, primarily intended for RDS-over-TCP
|
||||
(though the concept can be extended to other transports). The classical
|
||||
implementation of RDS-over-TCP is implemented by demultiplexing multiple
|
||||
PF_RDS sockets between any 2 endpoints (where endpoint == [IP address,
|
||||
port]) over a single TCP socket between the 2 IP addresses involved. This
|
||||
has the limitation that it ends up funneling multiple RDS flows over a
|
||||
single TCP flow, thus it is
|
||||
(a) upper-bounded to the single-flow bandwidth,
|
||||
(b) suffers from head-of-line blocking for all the RDS sockets.
|
||||
|
||||
Better throughput (for a fixed small packet size, MTU) can be achieved
|
||||
by having multiple TCP/IP flows per rds/tcp connection, i.e., multipathed
|
||||
RDS (mprds). Each such TCP/IP flow constitutes a path for the rds/tcp
|
||||
connection. RDS sockets will be attached to a path based on some hash
|
||||
(e.g., of local address and RDS port number) and packets for that RDS
|
||||
socket will be sent over the attached path using TCP to segment/reassemble
|
||||
RDS datagrams on that path.
|
||||
|
||||
Multipathed RDS is implemented by splitting the struct rds_connection into
|
||||
a common (to all paths) part, and a per-path struct rds_conn_path. All
|
||||
I/O workqs and reconnect threads are driven from the rds_conn_path.
|
||||
Transports such as TCP that are multipath capable may then set up a
|
||||
TPC socket per rds_conn_path, and this is managed by the transport via
|
||||
the transport privatee cp_transport_data pointer.
|
||||
|
||||
Transports announce themselves as multipath capable by setting the
|
||||
t_mp_capable bit during registration with the rds core module. When the
|
||||
transport is multipath-capable, rds_sendmsg() hashes outgoing traffic
|
||||
across multiple paths. The outgoing hash is computed based on the
|
||||
local address and port that the PF_RDS socket is bound to.
|
||||
|
||||
Additionally, even if the transport is MP capable, we may be
|
||||
peering with some node that does not support mprds, or supports
|
||||
a different number of paths. As a result, the peering nodes need
|
||||
to agree on the number of paths to be used for the connection.
|
||||
This is done by sending out a control packet exchange before the
|
||||
first data packet. The control packet exchange must have completed
|
||||
prior to outgoing hash completion in rds_sendmsg() when the transport
|
||||
is mutlipath capable.
|
||||
|
||||
The control packet is an RDS ping packet (i.e., packet to rds dest
|
||||
port 0) with the ping packet having a rds extension header option of
|
||||
type RDS_EXTHDR_NPATHS, length 2 bytes, and the value is the
|
||||
number of paths supported by the sender. The "probe" ping packet will
|
||||
get sent from some reserved port, RDS_FLAG_PROBE_PORT (in <linux/rds.h>)
|
||||
The receiver of a ping from RDS_FLAG_PROBE_PORT will thus immediately
|
||||
be able to compute the min(sender_paths, rcvr_paths). The pong
|
||||
sent in response to a probe-ping should contain the rcvr's npaths
|
||||
when the rcvr is mprds-capable.
|
||||
|
||||
If the rcvr is not mprds-capable, the exthdr in the ping will be
|
||||
ignored. In this case the pong will not have any exthdrs, so the sender
|
||||
of the probe-ping can default to single-path mprds.
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче