WSL2-Linux-Kernel/net
David Ahern 8f34e53b60 ipv6: Use global sernum for dst validation with nexthop objects
Nik reported a bug with pcpu dst cache when nexthop objects are
used illustrated by the following:
    $ ip netns add foo
    $ ip -netns foo li set lo up
    $ ip -netns foo addr add 2001:db8:11::1/128 dev lo
    $ ip netns exec foo sysctl net.ipv6.conf.all.forwarding=1
    $ ip li add veth1 type veth peer name veth2
    $ ip li set veth1 up
    $ ip addr add 2001:db8:10::1/64 dev veth1
    $ ip li set dev veth2 netns foo
    $ ip -netns foo li set veth2 up
    $ ip -netns foo addr add 2001:db8:10::2/64 dev veth2
    $ ip -6 nexthop add id 100 via 2001:db8:10::2 dev veth1
    $ ip -6 route add 2001:db8:11::1/128 nhid 100

    Create a pcpu entry on cpu 0:
    $ taskset -a -c 0 ip -6 route get 2001:db8:11::1

    Re-add the route entry:
    $ ip -6 ro del 2001:db8:11::1
    $ ip -6 route add 2001:db8:11::1/128 nhid 100

    Route get on cpu 0 returns the stale pcpu:
    $ taskset -a -c 0 ip -6 route get 2001:db8:11::1
    RTNETLINK answers: Network is unreachable

    While cpu 1 works:
    $ taskset -a -c 1 ip -6 route get 2001:db8:11::1
    2001:db8:11::1 from :: via 2001:db8:10::2 dev veth1 src 2001:db8:10::1 metric 1024 pref medium

Conversion of FIB entries to work with external nexthop objects
missed an important difference between IPv4 and IPv6 - how dst
entries are invalidated when the FIB changes. IPv4 has a per-network
namespace generation id (rt_genid) that is bumped on changes to the FIB.
Checking if a dst_entry is still valid means comparing rt_genid in the
rtable to the current value of rt_genid for the namespace.

IPv6 also has a per network namespace counter, fib6_sernum, but the
count is saved per fib6_node. With the per-node counter only dst_entries
based on fib entries under the node are invalidated when changes are
made to the routes - limiting the scope of invalidations. IPv6 uses a
reference in the rt6_info, 'from', to track the corresponding fib entry
used to create the dst_entry. When validating a dst_entry, the 'from'
is used to backtrack to the fib6_node and check the sernum of it to the
cookie passed to the dst_check operation.

With the inline format (nexthop definition inline with the fib6_info),
dst_entries cached in the fib6_nh have a 1:1 correlation between fib
entries, nexthop data and dst_entries. With external nexthops, IPv6
looks more like IPv4 which means multiple fib entries across disparate
fib6_nodes can all reference the same fib6_nh. That means validation
of dst_entries based on external nexthops needs to use the IPv4 format
- the per-network namespace counter.

Add sernum to rt6_info and set it when creating a pcpu dst entry. Update
rt6_get_cookie to return sernum if it is set and update dst_check for
IPv6 to look for sernum set and based the check on it if so. Finally,
rt6_get_pcpu_route needs to validate the cached entry before returning
a pcpu entry (similar to the rt_cache_valid calls in __mkroute_input and
__mkroute_output for IPv4).

This problem only affects routes using the new, external nexthops.

Thanks to the kbuild test robot for catching the IS_ENABLED needed
around rt_genid_ipv6 before I sent this out.

Fixes: 5b98324ebe ("ipv6: Allow routes to use nexthop objects")
Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David Ahern <dsahern@kernel.org>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-01 12:46:30 -07:00
..
6lowpan
9p 9pnet: allow making incomplete read requests 2020-03-27 09:29:56 +00:00
802 net: 802: psnap.c: Use built-in RCU list checking 2020-02-24 13:02:53 -08:00
8021q net: vlan: suppress "failed to kill vid" warnings 2020-02-17 14:30:54 -08:00
appletalk
atm proc: convert everything to "struct proc_ops" 2020-02-04 03:05:26 +00:00
ax25 net: Make sock protocol value checks more specific 2020-01-09 18:41:40 -08:00
batman-adv batman-adv: Fix refcnt leak in batadv_v_ogm_process 2020-04-21 10:08:05 +02:00
bluetooth Bluetooth: L2CAP: Use DEFER_SETUP to group ECRED connections 2020-03-25 22:16:08 +01:00
bpf bpf: Fix build warning regarding missing prototypes 2020-03-28 18:13:18 +01:00
bpfilter SPDX patches for 5.7-rc1. 2020-04-03 13:12:26 -07:00
bridge net: bridge: vlan: Add a schedule point during VLAN processing 2020-04-30 17:45:41 -07:00
caif net: caif: Add lockdep expression to RCU traversal primitive 2020-03-11 22:55:25 -07:00
can can: j1939: j1939_sk_bind(): take priv after lock is held 2019-12-08 11:52:02 +01:00
ceph libceph: directly skip to the end of redirect reply 2020-03-30 12:42:41 +02:00
core net: remove obsolete comment 2020-04-25 20:49:32 -07:00
dcb
dccp Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-02-29 15:53:35 -08:00
decnet Remove DST_HOST 2020-03-23 21:57:44 -07:00
dns_resolver KEYS: Don't write out to userspace while holding key semaphore 2020-03-29 12:40:41 +01:00
dsa net: dsa: don't fail to probe if we couldn't set the MTU 2020-04-22 19:22:59 -07:00
ethernet net: remove eth_change_mtu 2020-01-27 11:09:31 +01:00
ethtool ethtool: provide timestamping information with TSINFO_GET request 2020-03-29 22:32:37 -07:00
hsr hsr: check protocol version in hsr_newlink() 2020-04-07 18:34:18 -07:00
ieee802154 nl802154: add missing attribute validation for dev_type 2020-03-03 13:28:48 -08:00
ife
ipv4 mptcp: move option parsing into mptcp_incoming_options() 2020-04-30 12:23:22 -07:00
ipv6 ipv6: Use global sernum for dst validation with nexthop objects 2020-05-01 12:46:30 -07:00
iucv treewide: Use sizeof_field() macro 2019-12-09 10:36:44 -08:00
kcm net: kcm: kcmproc.c: Fix RCU list suspicious usage warning 2020-03-16 17:14:02 -07:00
key
l2tp l2tp: Allow management of tunnels and session in user namespace 2020-04-08 14:30:46 -07:00
l3mdev
lapb
llc af_llc: fix if-statement empty body warning 2020-02-26 20:38:13 -08:00
mac80211 mac80211: sta_info: Add lockdep condition for RCU list usage 2020-04-24 11:31:20 +02:00
mac802154
mpls net: add net available in build_state 2020-03-29 22:30:57 -07:00
mptcp mptcp: fix uninitialized value access 2020-04-30 12:34:07 -07:00
ncsi net/ncsi: Support for multi host mellanox card 2020-01-09 18:36:22 -08:00
netfilter netfilter: nf_osf: avoid passing pointer to local var 2020-04-29 21:17:57 +02:00
netlabel netlabel: Kconfig: Update reference for NetLabel Tools project 2020-04-22 19:55:01 -07:00
netlink Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-25 18:58:11 -07:00
netrom net: netrom: Fix potential nr_neigh refcnt leak in nr_add_node 2020-04-18 13:09:46 -07:00
nfc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
nsh
openvswitch net: openvswitch: ovs_ct_exit to be done under ovs_lock 2020-04-20 10:53:54 -07:00
packet net/packet: tpacket_rcv: avoid a producer race condition 2020-03-15 00:25:25 -07:00
phonet net: Remove redundant BUG_ON() check in phonet_pernet 2020-01-03 12:25:50 -08:00
psample net: psample: fix skb_over_panic 2019-11-26 14:40:13 -08:00
qrtr net: qrtr: send msgs from local of same id as broadcast 2020-04-09 10:08:31 -07:00
rds net/rds: Use ERR_PTR for rds_message_alloc_sgs() 2020-04-15 12:33:29 -07:00
rfkill rfkill: Fix incorrect check to avoid NULL pointer dereference 2019-12-16 10:15:49 +01:00
rose Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-01-26 10:40:21 +01:00
rxrpc rxrpc: Fix DATA Tx to disable nofrag for UDP on AF_INET6 socket 2020-04-14 16:26:47 -07:00
sched sch_sfq: validate silly quantum values 2020-04-27 11:49:57 -07:00
sctp sctp: Fix SHUTDOWN CTSN Ack in the peer restart case 2020-04-22 19:27:40 -07:00
smc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-03-12 22:34:48 -07:00
strparser
sunrpc svcrdma: Fix leak of svc_rdma_recv_ctxt objects 2020-04-17 12:40:38 -04:00
switchdev net: switchdev: do not propagate bridge updates across bridges 2020-02-26 20:58:33 -08:00
tipc tipc: Fix potential tipc_node refcnt leak in tipc_rcv 2020-04-18 13:24:20 -07:00
tls net/tls: Fix sk_psock refcnt leak when in tls_data_ready() 2020-04-27 11:22:38 -07:00
unix net: datagram: drop 'destructor' argument from several helpers 2020-02-28 12:12:53 -08:00
vmw_vsock vsock/virtio: fix multiple packet delivery to monitoring devices 2020-04-27 10:18:01 -07:00
wimax
wireless nl80211: fix NL80211_ATTR_FTM_RESPONDER policy 2020-04-14 12:28:48 +02:00
x25 net/x25: Fix null-ptr-deref in x25_disconnect 2020-04-28 14:08:59 -07:00
xdp xsk: Add missing check on user supplied headroom size 2020-04-15 13:07:18 +02:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2020-03-30 10:59:20 -07:00
Kconfig net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build 2020-03-25 12:24:33 -07:00
Makefile mptcp: Add MPTCP socket stubs 2020-01-24 13:44:07 +01:00
compat.c net: abstract out normal and compat msghdr import 2020-03-10 09:12:49 -06:00
socket.c for-5.7/io_uring-2020-03-29 2020-03-30 12:18:49 -07:00
sysctl_net.c