WSL2-Linux-Kernel/Documentation/networking
Ilya Maximets edd944b70a xsk: Honor SO_BINDTODEVICE on bind
[ Upstream commit f7306acec9 ]

Initial creation of an AF_XDP socket requires CAP_NET_RAW capability. A
privileged process might create the socket and pass it to a non-privileged
process for later use. However, that process will be able to bind the socket
to any network interface. Even though it will not be able to receive any
traffic without modification of the BPF map, the situation is not ideal.

Sockets already have a mechanism that can be used to restrict what interface
they can be attached to. That is SO_BINDTODEVICE.

To change the SO_BINDTODEVICE binding the process will need CAP_NET_RAW.

Make xsk_bind() honor the SO_BINDTODEVICE in order to allow safer workflow
when non-privileged process is using AF_XDP.

The intended workflow is following:

  1. First process creates a bare socket with socket(AF_XDP, ...).
  2. First process loads the XSK program to the interface.
  3. First process adds the socket fd to a BPF map.
  4. First process ties socket fd to a particular interface using
     SO_BINDTODEVICE.
  5. First process sends socket fd to a second process.
  6. Second process allocates UMEM.
  7. Second process binds socket to the interface with bind(...).
  8. Second process sends/receives the traffic.

All the steps above are possible today if the first process is privileged
and the second one has sufficient RLIMIT_MEMLOCK and no capabilities.
However, the second process will be able to bind the socket to any interface
it wants on step 7 and send traffic from it. With the proposed change, the
second process will be able to bind the socket only to a specific interface
chosen by the first process at step 4.

Fixes: 965a990984 ("xsk: add support for bind for Rx")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/bpf/20230703175329.3259672-1-i.maximets@ovn.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-23 13:47:29 +02:00
..
caif
device_drivers ixgbe: Document how to enable NBASE-T support 2021-12-22 09:32:43 +01:00
devlink ice: Print the api_patch as part of the fw.mgmt.api 2021-10-14 10:14:46 -07:00
dsa docs: net: dsa: sja1105: fix reference to sja1105.txt 2021-09-19 12:45:03 +01:00
mac80211_hwsim
6lowpan.rst
6pack.rst
af_xdp.rst xsk: Honor SO_BINDTODEVICE on bind 2023-07-23 13:47:29 +02:00
alias.rst
arcnet-hardware.rst
arcnet.rst
atm.rst
ax25.rst
bareudp.rst
batman-adv.rst batman-adv: Move IRC channel to hackint.org 2021-08-08 20:05:46 +02:00
bonding.rst Bonding: add arp_missed_max option 2023-06-05 09:21:19 +02:00
bridge.rst
can.rst
can_ucan_protocol.rst
cdc_mbim.rst
checksum-offloads.rst
dccp.rst
dctcp.rst
dns_resolver.rst
driver.rst
eql.rst
ethtool-netlink.rst ethtool: add two coalesce attributes for CQE mode 2021-08-24 07:38:28 -07:00
failover.rst
fib_trie.rst
filter.rst bpf: Refactor BPF_PROG_RUN into a function 2021-08-17 00:45:07 +02:00
gen_stats.rst
generic-hdlc.rst
generic_netlink.rst
gtp.rst
ieee802154.rst
ila.rst
index.rst Remove DECnet support from kernel 2023-06-21 15:59:15 +02:00
ioam6-sysctl.rst ipv6: ioam: Documentation for new IOAM sysctls 2021-07-21 08:14:33 -07:00
ip-sysctl.rst tcp: restrict net.ipv4.tcp_app_win 2023-04-20 12:13:53 +02:00
ip_dynaddr.rst
ipddp.rst
ipsec.rst
ipv6.rst
ipvlan.rst
ipvs-sysctl.rst netfilter: ipvs: Fix reuse connection if RS weight is 0 2021-12-01 09:04:45 +01:00
j1939.rst
kapi.rst
kcm.rst
l2tp.rst
lapb-module.rst
mac80211-auth-assoc-deauth.txt
mac80211-injection.rst
mctp.rst mctp: unify sockaddr_mctp types 2021-10-18 13:47:09 +01:00
mpls-sysctl.rst
mptcp-sysctl.rst mptcp: faster active backup recovery 2021-08-14 11:37:25 +01:00
msg_zerocopy.rst
multiqueue.rst
net_dim.rst
net_failover.rst
netconsole.rst
netdev-FAQ.rst docs: networking: netdevsim rules 2021-08-04 12:43:27 +01:00
netdev-features.rst
netdevices.rst net: bonding: move ioctl handling to private ndo operation 2021-07-27 20:11:45 +01:00
netfilter-sysctl.rst
netif-msg.rst
nexthop-group-resilient.rst
nf_conntrack-sysctl.rst Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf 2021-09-03 16:20:37 -07:00
nf_flowtable.rst
nfc.rst
openvswitch.rst
operstates.rst docs: operstates: document IF_OPER_TESTING 2021-08-02 15:16:04 +01:00
packet_mmap.rst docs: networking: Replace strncpy() with strscpy() 2021-06-04 11:21:43 -06:00
page_pool.rst
phonet.rst
phy.rst net: phy: Add 25G BASE-R interface mode 2021-06-12 13:08:57 -07:00
pktgen.rst pktgen: document the latest pktgen usage options 2021-08-25 13:44:30 +01:00
plip.rst
ppp_generic.rst
proc_net_tcp.rst
radiotap-headers.rst
rds.rst
regulatory.rst
rxrpc.rst
scaling.rst
sctp.rst
secid.rst
seg6-sysctl.rst
segmentation-offloads.rst
sfp-phylink.rst
snmp_counter.rst
statistics.rst
strparser.rst
switchdev.rst
sysfs-tagging.rst
tc-actions-env-rules.rst
tcp-thin.rst
team.rst
timestamping.rst dev_ioctl: split out ndo_eth_ioctl 2021-07-27 20:11:45 +01:00
tipc.rst Documentation: add more details in tipc.rst 2021-07-01 13:18:18 -07:00
tls-offload-layers.svg
tls-offload-reorder-bad.svg
tls-offload-reorder-good.svg
tls-offload.rst
tls.rst
tproxy.rst
tuntap.rst docs: networking: Replace strncpy() with strscpy() 2021-06-04 11:21:43 -06:00
udplite.rst
vrf.rst doc: Document unexpected tcp_l3mdev_accept=1 behavior 2021-08-23 11:53:24 +01:00
vxlan.rst
x25-iface.rst
x25.rst
xfrm_device.rst
xfrm_proc.rst
xfrm_sync.rst
xfrm_sysctl.rst