зеркало из https://github.com/microsoft/CCF.git
250 строки
8.4 KiB
ReStructuredText
250 строки
8.4 KiB
ReStructuredText
TCP Internals
|
|
=============
|
|
|
|
Overview
|
|
~~~~~~~~
|
|
|
|
In CCF, the :term:`TCP` host layer is implemented using `libuv <https://libuv.org/>`_, allowing us to listen for connections from other nodes and requests from clients as well as connect to other nodes.
|
|
|
|
Both :term:`RPC` and Node-to-Node connections use TCP to communicate with external resources and then pass the packets through the :term:`ring buffer` to communicate with the enclave.
|
|
|
|
CCF uses a HTTP :term:`REST` interface to call programs inside the enclave, so the process is usually read request, call enclave function and receive response (via `ring buffer` message), send the response to the client.
|
|
|
|
However, the TCP implementation in CCF is generic and could adapt to other common communication processes, but perhaps would need to change how the users (RPC, Node-to-node) use it.
|
|
|
|
Overall structure
|
|
~~~~~~~~~~~~~~~~~
|
|
|
|
The `TCPImpl` class (in ``src/host/tcp.h``) implements all TCP logic (using the asynchronous `libuv`), used by both `RPCConnections` and `NodeConnections`.
|
|
|
|
Because `TCPImpl` does not have access to the `ring buffer`, it must use behaviour classes to allow users to register callbacks on actions (ex. `on_read`, `on_accept`, etc).
|
|
|
|
Most of the call backs are for logging purposes, but the two important ones are:
|
|
- `on_accept` on servers, which creates a new socket to communicate with the particular connecting client
|
|
- `on_read`, which takes the data that is read and writes it to the `ring buffer`
|
|
|
|
For note-to-node connections, the behaviours are:
|
|
- `NodeServerBehaviour`, the main listening socket and, `on_accept`, creates a new socket to communicate with a particular connecting client
|
|
- `NodeIncomingBehaviour`, the socket that is created above, that waits for input and passes that to the enclave
|
|
- `NodeOutgoingBehaviour`, a socket that is created by the enclave (via ring buffer messages into the host), to connect to external nodes
|
|
|
|
For RPC connections, the behaviours are:
|
|
- `RPCServerBehaviour`, same as the `NodeServerBehaviour` above
|
|
- `RPCClientBehaviour`, a misnomer, used for both incoming and outgoing behaviours above
|
|
|
|
Here's a diagram with the types of behaviours and their relationships:
|
|
|
|
.. mermaid::
|
|
|
|
graph BT
|
|
subgraph TCP
|
|
TCPBehaviour
|
|
TCPServerBehaviour
|
|
end
|
|
|
|
subgraph RPCConnections
|
|
RPCClientBehaviour
|
|
RCPServerBehaviour
|
|
end
|
|
|
|
subgraph NodeConnections
|
|
NodeConnectionBehaviour
|
|
NodeIncomingBehaviour
|
|
NodeOutgoingBehaviour
|
|
NodeServerBehaviour
|
|
end
|
|
|
|
RPCClientBehaviour --> TCPBehaviour
|
|
NodeConnectionBehaviour --> TCPBehaviour
|
|
NodeIncomingBehaviour --> NodeConnectionBehaviour
|
|
NodeOutgoingBehaviour --> NodeConnectionBehaviour
|
|
NodeServerBehaviour --> TCPServerBehaviour
|
|
RCPServerBehaviour --> TCPServerBehaviour
|
|
|
|
|
|
State machine
|
|
~~~~~~~~~~~~~
|
|
|
|
The `TCPImpl` has an internal state machine where states change as reactions to callbacks from `libuv`.
|
|
|
|
Since it implements both server (listen, peer, read) and client (connect, write) logic, the state helps common functions to know where to continue to on completion.
|
|
|
|
The complete state machine diagram, without failed states, is:
|
|
|
|
.. mermaid::
|
|
|
|
stateDiagram-v2
|
|
%% Server side
|
|
FRESH --> LISTENING_RESOLVING : server
|
|
LISTENING_RESOLVING --> LISTENING : uv_listen
|
|
|
|
%% Client side
|
|
state client_host <<choice>>
|
|
FRESH --> client_host : client
|
|
client_host --> BINDING : client_host != null
|
|
|
|
BINDING --> CONNECTING_RESOLVING : client_host resolved
|
|
|
|
client_host --> CONNECTING_RESOLVING : client_host == null
|
|
CONNECTING_RESOLVING --> CONNECTING : host resolved
|
|
|
|
CONNECTING --> CONNECTING_RESOLVING : retry
|
|
CONNECTING --> CONNECTED : uv_tcp_connect
|
|
|
|
%% Peer side
|
|
FRESH --> CONNECTED : peer
|
|
|
|
%% Disconnect / reconnect
|
|
CONNECTED --> DISCONNECTED : error<br>close
|
|
DISCONNECTED --> RECONNECTING : retry
|
|
RECONNECTING --> FRESH : init
|
|
|
|
Some failed states give transition to retries / reconnects, others are terminal and close the connection.
|
|
|
|
Server logic
|
|
~~~~~~~~~~~~
|
|
|
|
The main cycle of a server is the following:
|
|
- create a main socket and listen for connections
|
|
- on accepting a new connection, creates a new (`peer`) socket to communicate with that client
|
|
|
|
- read the request, communicate with the enclave, get the response backs
|
|
- send the response to the client
|
|
- close the socket
|
|
|
|
There could be several `peer` sockets open communicating with different clients at the same time and it's up to `libuv` to handle the asynchronous tasks.
|
|
|
|
Here's a diagram of the control flow for a server connection:
|
|
|
|
.. mermaid::
|
|
|
|
graph TD
|
|
subgraph RPCConnections
|
|
rl(listen)
|
|
subgraph RPCServerBehaviour
|
|
rsboa(on_accept)
|
|
end
|
|
end
|
|
|
|
subgraph TCPImpl
|
|
tl(listen)
|
|
tr(resolve)
|
|
tor(on_resolved)
|
|
tlr(listen_resolved)
|
|
toa(on_accept)
|
|
tp[TCP peer]
|
|
end
|
|
|
|
subgraph NodeConnections
|
|
nctor(NodeConnections)
|
|
subgraph NodeServerBehaviour
|
|
nsboa(on_accept)
|
|
end
|
|
end
|
|
|
|
%% Entry Points
|
|
rl --> tl
|
|
nctor --> tl
|
|
|
|
%% Listen path
|
|
tl -- LISTENING_RESOLVING --> tr
|
|
tr -. via: DNS::resolve .-> tor
|
|
tor --> tlr
|
|
tlr -. LISTENING<br>via: uv_listen .-> toa
|
|
toa --> rsboa
|
|
toa --> nsboa
|
|
toa ==> tp
|
|
|
|
The control flow of the `peer` connection is similar to the client (below), but the order is reverse.
|
|
|
|
The client first writes the request and then waits for the response, while the peer first waits for the request and then writes the response back.
|
|
|
|
Client logic
|
|
~~~~~~~~~~~~
|
|
|
|
Clients don't have a cycle, as they connect to an existing server, send the request, wait for the response and disconnect.
|
|
|
|
Clients are used from the enclave side (Node-to-node and RPC), via a `ring buffer` message.
|
|
|
|
Node-to-node clients are used for pings across nodes, electing a new leader, etc.
|
|
|
|
RPC clients are used for REST service callbacks from other services, ex. metrics.
|
|
|
|
Here's the diagram of the client control flow:
|
|
|
|
.. mermaid::
|
|
|
|
graph TD
|
|
subgraph RPCConnections
|
|
rc(connect)
|
|
rw(write)
|
|
subgraph RPCClientBehaviour
|
|
rsbor(on_read)
|
|
end
|
|
end
|
|
|
|
subgraph TCPImpl
|
|
tc(connect)
|
|
tocr(on_client_resolved)
|
|
tcb(client_bind)
|
|
tr(resolve)
|
|
tor(on_resolved)
|
|
tcr(connect_resolved)
|
|
toc(on_connect<br>CONNECTED)
|
|
|
|
trs(read_start)
|
|
toa(on_alloc)
|
|
tore(on_read)
|
|
tof(on_free)
|
|
|
|
tw(write)
|
|
tow(on_write)
|
|
tfw(free_write)
|
|
tsw(send_write)
|
|
end
|
|
|
|
subgraph NodeConnections
|
|
ncc(create_connection)
|
|
nw(ccf::node_outbound)
|
|
subgraph NodeConnectionBehaviour
|
|
nsbor(on_read)
|
|
end
|
|
end
|
|
|
|
%% Entry Points
|
|
rc --> tc
|
|
ncc --> tc
|
|
rw --> tw
|
|
nw --> tw
|
|
|
|
%% Connect path
|
|
tc -- CONNECTING_RESOLVING --> tr
|
|
tc -. BINDING<br>via: DNS::resolve .-> tocr
|
|
tocr --> tcb
|
|
tcb -- uv_tcp_bind<br>CONNECTING_RESOLVING --> tr
|
|
tr -. via: DNS::resolve .-> tor
|
|
tor --> tcr
|
|
tcr -. CONNECTING<br>via: uv_tcp_connect .-> toc
|
|
toc -- retry<br>CONNECTING_RESOLVING --> tcr
|
|
toc -- pending writes --> tw
|
|
toc --> trs
|
|
|
|
%% Read path
|
|
trs -. via: uv_read_start .-> toa
|
|
trs -. via: uv_read_start .-> tore
|
|
tore -- DISCONNECTED<br>uv_read_stop --> tof
|
|
tore --> rsbor
|
|
tore --> nsbor
|
|
|
|
%% Write path
|
|
tw -- CONNECTED --> tsw
|
|
tw -- DISCONNECTED<br>no data --> tfw
|
|
tsw -. via: uv_write .-> tow
|
|
tow --> tfw
|
|
|
|
Note that some clients have a `client_host` parameter separate from `host` that is used for testing, and uses the state `BINDING`.
|
|
|
|
The `client_host` is resolved separately, bound to the client handle (via `uv_tcp_bind`) but the call to `uv_tcp_connect` is done on the `host` address.
|
|
|
|
This allows us to bind separate addresses to the client side while connecting to the `host`, to allow external packet filters (like `iptables`) to restrict traffic.
|