171 строка
7.6 KiB
Plaintext
171 строка
7.6 KiB
Plaintext
Segmentation Offloads in the Linux Networking Stack
|
|
|
|
Introduction
|
|
============
|
|
|
|
This document describes a set of techniques in the Linux networking stack
|
|
to take advantage of segmentation offload capabilities of various NICs.
|
|
|
|
The following technologies are described:
|
|
* TCP Segmentation Offload - TSO
|
|
* UDP Fragmentation Offload - UFO
|
|
* IPIP, SIT, GRE, and UDP Tunnel Offloads
|
|
* Generic Segmentation Offload - GSO
|
|
* Generic Receive Offload - GRO
|
|
* Partial Generic Segmentation Offload - GSO_PARTIAL
|
|
* SCTP accelleration with GSO - GSO_BY_FRAGS
|
|
|
|
TCP Segmentation Offload
|
|
========================
|
|
|
|
TCP segmentation allows a device to segment a single frame into multiple
|
|
frames with a data payload size specified in skb_shinfo()->gso_size.
|
|
When TCP segmentation requested the bit for either SKB_GSO_TCPV4 or
|
|
SKB_GSO_TCPV6 should be set in skb_shinfo()->gso_type and
|
|
skb_shinfo()->gso_size should be set to a non-zero value.
|
|
|
|
TCP segmentation is dependent on support for the use of partial checksum
|
|
offload. For this reason TSO is normally disabled if the Tx checksum
|
|
offload for a given device is disabled.
|
|
|
|
In order to support TCP segmentation offload it is necessary to populate
|
|
the network and transport header offsets of the skbuff so that the device
|
|
drivers will be able determine the offsets of the IP or IPv6 header and the
|
|
TCP header. In addition as CHECKSUM_PARTIAL is required csum_start should
|
|
also point to the TCP header of the packet.
|
|
|
|
For IPv4 segmentation we support one of two types in terms of the IP ID.
|
|
The default behavior is to increment the IP ID with every segment. If the
|
|
GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP
|
|
ID and all segments will use the same IP ID. If a device has
|
|
NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO
|
|
and we will either increment the IP ID for all frames, or leave it at a
|
|
static value based on driver preference.
|
|
|
|
UDP Fragmentation Offload
|
|
=========================
|
|
|
|
UDP fragmentation offload allows a device to fragment an oversized UDP
|
|
datagram into multiple IPv4 fragments. Many of the requirements for UDP
|
|
fragmentation offload are the same as TSO. However the IPv4 ID for
|
|
fragments should not increment as a single IPv4 datagram is fragmented.
|
|
|
|
UFO is deprecated: modern kernels will no longer generate UFO skbs, but can
|
|
still receive them from tuntap and similar devices. Offload of UDP-based
|
|
tunnel protocols is still supported.
|
|
|
|
IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads
|
|
========================================================
|
|
|
|
In addition to the offloads described above it is possible for a frame to
|
|
contain additional headers such as an outer tunnel. In order to account
|
|
for such instances an additional set of segmentation offload types were
|
|
introduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, and
|
|
SKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identify
|
|
cases where there are more than just 1 set of headers. For example in the
|
|
case of IPIP and SIT we should have the network and transport headers moved
|
|
from the standard list of headers to "inner" header offsets.
|
|
|
|
Currently only two levels of headers are supported. The convention is to
|
|
refer to the tunnel headers as the outer headers, while the encapsulated
|
|
data is normally referred to as the inner headers. Below is the list of
|
|
calls to access the given headers:
|
|
|
|
IPIP/SIT Tunnel:
|
|
Outer Inner
|
|
MAC skb_mac_header
|
|
Network skb_network_header skb_inner_network_header
|
|
Transport skb_transport_header
|
|
|
|
UDP/GRE Tunnel:
|
|
Outer Inner
|
|
MAC skb_mac_header skb_inner_mac_header
|
|
Network skb_network_header skb_inner_network_header
|
|
Transport skb_transport_header skb_inner_transport_header
|
|
|
|
In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM and
|
|
SKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect the
|
|
fact that the outer header also requests to have a non-zero checksum
|
|
included in the outer header.
|
|
|
|
Finally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnel
|
|
header has requested a remote checksum offload. In this case the inner
|
|
headers will be left with a partial checksum and only the outer header
|
|
checksum will be computed.
|
|
|
|
Generic Segmentation Offload
|
|
============================
|
|
|
|
Generic segmentation offload is a pure software offload that is meant to
|
|
deal with cases where device drivers cannot perform the offloads described
|
|
above. What occurs in GSO is that a given skbuff will have its data broken
|
|
out over multiple skbuffs that have been resized to match the MSS provided
|
|
via skb_shinfo()->gso_size.
|
|
|
|
Before enabling any hardware segmentation offload a corresponding software
|
|
offload is required in GSO. Otherwise it becomes possible for a frame to
|
|
be re-routed between devices and end up being unable to be transmitted.
|
|
|
|
Generic Receive Offload
|
|
=======================
|
|
|
|
Generic receive offload is the complement to GSO. Ideally any frame
|
|
assembled by GRO should be segmented to create an identical sequence of
|
|
frames using GSO, and any sequence of frames segmented by GSO should be
|
|
able to be reassembled back to the original by GRO. The only exception to
|
|
this is IPv4 ID in the case that the DF bit is set for a given IP header.
|
|
If the value of the IPv4 ID is not sequentially incrementing it will be
|
|
altered so that it is when a frame assembled via GRO is segmented via GSO.
|
|
|
|
Partial Generic Segmentation Offload
|
|
====================================
|
|
|
|
Partial generic segmentation offload is a hybrid between TSO and GSO. What
|
|
it effectively does is take advantage of certain traits of TCP and tunnels
|
|
so that instead of having to rewrite the packet headers for each segment
|
|
only the inner-most transport header and possibly the outer-most network
|
|
header need to be updated. This allows devices that do not support tunnel
|
|
offloads or tunnel offloads with checksum to still make use of segmentation.
|
|
|
|
With the partial offload what occurs is that all headers excluding the
|
|
inner transport header are updated such that they will contain the correct
|
|
values for if the header was simply duplicated. The one exception to this
|
|
is the outer IPv4 ID field. It is up to the device drivers to guarantee
|
|
that the IPv4 ID field is incremented in the case that a given header does
|
|
not have the DF bit set.
|
|
|
|
SCTP accelleration with GSO
|
|
===========================
|
|
|
|
SCTP - despite the lack of hardware support - can still take advantage of
|
|
GSO to pass one large packet through the network stack, rather than
|
|
multiple small packets.
|
|
|
|
This requires a different approach to other offloads, as SCTP packets
|
|
cannot be just segmented to (P)MTU. Rather, the chunks must be contained in
|
|
IP segments, padding respected. So unlike regular GSO, SCTP can't just
|
|
generate a big skb, set gso_size to the fragmentation point and deliver it
|
|
to IP layer.
|
|
|
|
Instead, the SCTP protocol layer builds an skb with the segments correctly
|
|
padded and stored as chained skbs, and skb_segment() splits based on those.
|
|
To signal this, gso_size is set to the special value GSO_BY_FRAGS.
|
|
|
|
Therefore, any code in the core networking stack must be aware of the
|
|
possibility that gso_size will be GSO_BY_FRAGS and handle that case
|
|
appropriately.
|
|
|
|
There are some helpers to make this easier:
|
|
|
|
- skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see if
|
|
an skb is an SCTP GSO skb.
|
|
|
|
- For size checks, the skb_gso_validate_*_len family of helpers correctly
|
|
considers GSO_BY_FRAGS.
|
|
|
|
- For manipulating packets, skb_increase_gso_size and skb_decrease_gso_size
|
|
will check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs.
|
|
|
|
This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bits
|
|
set. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE.
|