WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Ido Schimmel	06ec313eea	vxlan: Do not assume RTNL is held in vxlan_fdb_info() vxlan_fdb_info() is not always called with RTNL held or from an RCU read-side critical section. For example, in the following call path: vxlan_cleanup() vxlan_fdb_destroy() vxlan_fdb_notify() __vxlan_fdb_notify() vxlan_fdb_info() The use of rtnl_dereference() can therefore result in the following splat [1]. Fix this by dereferencing the nexthop under RCU read-side critical section. [1] [May24 22:56] ============================= [ +0.004676] WARNING: suspicious RCU usage [ +0.004614] 5.7.0-rc5-custom-16219-g201392003491 #2772 Not tainted [ +0.007116] ----------------------------- [ +0.004657] drivers/net/vxlan.c:276 suspicious rcu_dereference_check() usage! [ +0.008164] other info that might help us debug this: [ +0.009126] rcu_scheduler_active = 2, debug_locks = 1 [ +0.007504] 5 locks held by bash/6892: [ +0.004392] #0: ffff8881d47e3410 (&sig->cred_guard_mutex){+.+.}-{3:3}, at: __do_execve_file.isra.27+0x392/0x23c0 [ +0.011795] #1: ffff8881d47e34b0 (&sig->exec_update_mutex){+.+.}-{3:3}, at: flush_old_exec+0x510/0x2030 [ +0.010947] #2: ffff8881a141b0b0 (ptlock_ptr(page)#2){+.+.}-{2:2}, at: unmap_page_range+0x9c0/0x2590 [ +0.010585] #3: ffff888230009d50 ((&vxlan->age_timer)){+.-.}-{0:0}, at: call_timer_fn+0xe8/0x800 [ +0.010192] #4: ffff888183729bc8 (&vxlan->hash_lock[h]){+.-.}-{2:2}, at: vxlan_cleanup+0x133/0x4a0 [ +0.010382] stack backtrace: [ +0.005103] CPU: 1 PID: 6892 Comm: bash Not tainted 5.7.0-rc5-custom-16219-g201392003491 #2772 [ +0.009675] Hardware name: Mellanox Technologies Ltd. MSN2100-CB2FO/SA001017, BIOS 5.6.5 06/07/2016 [ +0.010155] Call Trace: [ +0.002775] <IRQ> [ +0.002313] dump_stack+0xfd/0x178 [ +0.003895] lockdep_rcu_suspicious+0x14a/0x153 [ +0.005157] vxlan_fdb_info+0xe39/0x12a0 [ +0.004775] __vxlan_fdb_notify+0xb8/0x160 [ +0.004672] vxlan_fdb_notify+0x8e/0xe0 [ +0.004370] vxlan_fdb_destroy+0x117/0x330 [ +0.004662] vxlan_cleanup+0x1aa/0x4a0 [ +0.004329] call_timer_fn+0x1c4/0x800 [ +0.004357] run_timer_softirq+0x129d/0x17e0 [ +0.004762] __do_softirq+0x24c/0xaef [ +0.004232] irq_exit+0x167/0x190 [ +0.003767] smp_apic_timer_interrupt+0x1dd/0x6a0 [ +0.005340] apic_timer_interrupt+0xf/0x20 [ +0.004620] </IRQ> Fixes: `1274e1cc42` ("vxlan: ecmp support for mac fdb entries") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Amit Cohen <amitc@mellanox.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:34:11 -07:00
Roopa Prabhu	c7cdbe2efc	vxlan: support for nexthop notifiers vxlan driver registers for nexthop add/del notifiers to cleanup fdb entries pointing to such nexthops. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 14:00:38 -07:00
Roopa Prabhu	1274e1cc42	vxlan: ecmp support for mac fdb entries Todays vxlan mac fdb entries can point to multiple remote ips (rdsts) with the sole purpose of replicating broadcast-multicast and unknown unicast packets to those remote ips. E-VPN multihoming [1,2,3] requires bridged vxlan traffic to be load balanced to remote switches (vteps) belonging to the same multi-homed ethernet segment (E-VPN multihoming is analogous to multi-homed LAG implementations, but with the inter-switch peerlink replaced with a vxlan tunnel). In other words it needs support for mac ecmp. Furthermore, for faster convergence, E-VPN multihoming needs the ability to update fdb ecmp nexthops independent of the fdb entries. New route nexthop API is perfect for this usecase. This patch extends the vxlan fdb code to take a nexthop id pointing to an ecmp nexthop group. Changes include: - New NDA_NH_ID attribute for fdbs - Use the newly added fdb nexthop groups - makes vxlan rdsts and nexthop handling code mutually exclusive - since this is a new use-case and the requirement is for ecmp nexthop groups, the fdb add and update path checks that the nexthop is really an ecmp nexthop group. This check can be relaxed in the future, if we want to introduce replication fdb nexthop groups and allow its use in lieu of current rdst lists. - fdb update requests with nexthop id's only allowed for existing fdb's that have nexthop id's - learning will not override an existing fdb entry with nexthop group - I have wrapped the switchdev offload code around the presence of rdst [1] E-VPN RFC https://tools.ietf.org/html/rfc7432 [2] E-VPN with vxlan https://tools.ietf.org/html/rfc8365 [3] http://vger.kernel.org/lpc_net2018_talks/scaling_bridge_fdb_database_slidesV3.pdf Includes a null check fix in vxlan_xmit from Nikolay v2 - Fixed build issue: Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 14:00:38 -07:00
Sabrina Dubroca	cc8e7c69db	vxlan: use the correct nlattr array in NL_SET_ERR_MSG_ATTR IFLA_VXLAN_* attributes are in the data array, which is correctly used when fetching the value, but not when setting the extended ack. Because IFLA_VXLAN_MAX < IFLA_MAX, we avoid out of bounds array accesses, but we don't provide a pointer to the invalid attribute to userspace. Fixes: `653ef6a3e4` ("vxlan: change vxlan_[config_]validate() to use netlink_ext_ack for error reporting") Fixes: `b4d3069783` ("vxlan: Allow configuration of DF behaviour") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-04-23 12:39:09 -07:00
Taehee Yoo	384d91c267	vxlan: check return value of gro_cells_init() gro_cells_init() returns error if memory allocation is failed. But the vxlan module doesn't check the return value of gro_cells_init(). Fixes: `58ce31cca1` ("vxlan: GRO support at tunnel layer")` Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-03-18 16:43:12 -07:00
David S. Miller	a2d6d7ae59	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net The ungrafting from PRIO bug fixes in net, when merged into net-next, merge cleanly but create a build failure. The resolution used here is from Petr Machata. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-09 12:13:43 -08:00
Hangbin Liu	71130f2997	vxlan: fix tos value before xmit Before ip_tunnel_ecn_encap() and udp_tunnel_xmit_skb() we should filter tos value by RT_TOS() instead of using config tos directly. vxlan_get_route() would filter the tos to fl4.flowi4_tos but we didn't return it back, as geneve_get_v4_rt() did. So we have to use RT_TOS() directly in function ip_tunnel_ecn_encap(). Fixes: `206aaafcd2` ("VXLAN: Use IP Tunnels tunnel ENC encap API") Fixes: `1400615d64` ("vxlan: allow setting ipv6 traffic class") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-02 16:35:48 -08:00
Niu Xilei	98c8147648	vxlan: Fix alignment and code style of vxlan.c Fixed Coding function and style issues Signed-off-by: Niu Xilei <niu_xilei@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-01-02 15:41:33 -08:00
Pankaj Bharadiya	c593642c8b	treewide: Use sizeof_field() macro Replace all the occurrences of FIELD_SIZEOF() with sizeof_field() except at places where these are defined. Later patches will remove the unused definition of FIELD_SIZEOF(). This patch is generated using following script: EXCLUDE_FILES="include/linux/stddef.h\|include/linux/kernel.h" git grep -l -e "\bFIELD_SIZEOF\b" \| while read file; do if [[ "$file" =~ $EXCLUDE_FILES ]]; then continue fi sed -i -e 's/\bFIELD_SIZEOF\b/sizeof_field/g' $file; done Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Link: https://lore.kernel.org/r/20190924105839.110713-3-pankaj.laxminarayan.bharadiya@intel.com Co-developed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: David Miller <davem@davemloft.net> # for net	2019-12-09 10:36:44 -08:00
Sabrina Dubroca	6c8991f415	net: ipv6_stub: use ip6_dst_lookup_flow instead of ip6_dst_lookup ipv6_stub uses the ip6_dst_lookup function to allow other modules to perform IPv6 lookups. However, this function skips the XFRM layer entirely. All users of ipv6_stub->ip6_dst_lookup use ip_route_output_flow (via the ip_route_output_key and ip_route_output helpers) for their IPv4 lookups, which calls xfrm_lookup_route(). This patch fixes this inconsistent behavior by switching the stub to ip6_dst_lookup_flow, which also calls xfrm_lookup_route(). This requires some changes in all the callers, as these two functions take different arguments and have different return types. Fixes: `5f81bd2e5d` ("ipv6: export a stub for IPv6 symbols used by vxlan") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-12-04 12:27:13 -08:00
Matthias Schiffer	36fe3a61aa	vxlan: implement get_link_ksettings ethtool method Similar to VLAN and similar drivers, we can forward get_link_ksettings to the lower dev if we have one to get meaningful speed/duplex data. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-12 19:52:15 -08:00
David S. Miller	d31e95585c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net The only slightly tricky merge conflict was the netdevsim because the mutex locking fix overlapped a lot of driver reload reorganization. The rest were (relatively) trivial in nature. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-02 13:54:56 -07:00
Guillaume Nault	1d7a55267f	vxlan: drop "vxlan" parameter in vxlan_fdb_alloc() This parameter has never been used. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-30 17:41:50 -07:00
Taehee Yoo	c6761cf521	vxlan: fix unexpected failure of vxlan_changelink() After commit `0ce1822c2a` ("vxlan: add adjacent link to limit depth level"), vxlan_changelink() could fail because of netdev_adjacent_change_prepare(). netdev_adjacent_change_prepare() returns -EEXIST when old lower device and new lower device are same. (old lower device is "dst->remote_dev" and new lower device is "lowerdev") So, before calling it, lowerdev should be NULL if these devices are same. Test command1: ip link add dummy0 type dummy ip link add vxlan0 type vxlan dev dummy0 dstport 4789 vni 1 ip link set vxlan0 type vxlan ttl 5 RTNETLINK answers: File exists Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `0ce1822c2a` ("vxlan: add adjacent link to limit depth level") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-30 11:52:47 -07:00
Xin Long	eadf52cf18	vxlan: check tun_info options_len properly This patch is to improve the tun_info options_len by dropping the skb when TUNNEL_VXLAN_OPT is set but options_len is less than vxlan_metadata. This can void a potential out-of-bounds access on ip_tun_info. Fixes: `ee122c79d4` ("vxlan: Flow based tunneling") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-29 17:39:26 -07:00
Taehee Yoo	0ce1822c2a	vxlan: add adjacent link to limit depth level Current vxlan code doesn't limit the number of nested devices. Nested devices would be handled recursively and this routine needs huge stack memory. So, unlimited nested devices could make stack overflow. In order to fix this issue, this patch adds adjacent links. The adjacent link APIs internally check the depth level. Test commands: ip link add dummy0 type dummy ip link add vxlan0 type vxlan id 0 group 239.1.1.1 dev dummy0 \ dstport 4789 for i in {1..100} do let A=$i-1 ip link add vxlan$i type vxlan id $i group 239.1.1.1 \ dev vxlan$A dstport 4789 done ip link del dummy0 The top upper link is vxlan100 and the lowest link is vxlan0. When vxlan0 is deleting, the upper devices will be deleted recursively. It needs huge stack memory so it makes stack overflow. Splat looks like: [ 229.628477] ============================================================================= [ 229.629785] BUG page->ptl (Not tainted): Padding overwritten. 0x0000000026abf214-0x0000000091f6abb2 [ 229.629785] ----------------------------------------------------------------------------- [ 229.629785] [ 229.655439] ================================================================== [ 229.629785] INFO: Slab 0x00000000ff7cfda8 objects=19 used=19 fp=0x00000000fe33776c flags=0x200000000010200 [ 229.655688] BUG: KASAN: stack-out-of-bounds in unmap_single_vma+0x25a/0x2e0 [ 229.655688] Read of size 8 at addr ffff888113076928 by task vlan-network-in/2334 [ 229.655688] [ 229.629785] Padding 0000000026abf214: 00 80 14 0d 81 88 ff ff 68 91 81 14 81 88 ff ff ........h....... [ 229.629785] Padding 0000000001e24790: 38 91 81 14 81 88 ff ff 68 91 81 14 81 88 ff ff 8.......h....... [ 229.629785] Padding 00000000b39397c8: 33 30 62 a7 ff ff ff ff ff eb 60 22 10 f1 ff 1f 30b.......`".... [ 229.629785] Padding 00000000bc98f53a: 80 60 07 13 81 88 ff ff 00 80 14 0d 81 88 ff ff .`.............. [ 229.629785] Padding 000000002aa8123d: 68 91 81 14 81 88 ff ff f7 21 17 a7 ff ff ff ff h........!...... [ 229.629785] Padding 000000001c8c2369: 08 81 14 0d 81 88 ff ff 03 02 00 00 00 00 00 00 ................ [ 229.629785] Padding 000000004e290c5d: 21 90 a2 21 10 ed ff ff 00 00 00 00 00 fc ff df !..!............ [ 229.629785] Padding 000000000e25d731: 18 60 07 13 81 88 ff ff c0 8b 13 05 81 88 ff ff .`.............. [ 229.629785] Padding 000000007adc7ab3: b3 8a b5 41 00 00 00 00 ...A.... [ 229.629785] FIX page->ptl: Restoring 0x0000000026abf214-0x0000000091f6abb2=0x5a [ ... ] Fixes: `acaf4e7099` ("net: vxlan: when lower dev unregisters remove vxlan dev as well") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-10-24 14:53:49 -07:00
David S. Miller	af144a9834	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Two cases of overlapping changes, nothing fancy. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-08 19:48:57 -07:00
Taehee Yoo	7c31e54aee	vxlan: do not destroy fdb if register_netdevice() is failed __vxlan_dev_create() destroys FDB using specific pointer which indicates a fdb when error occurs. But that pointer should not be used when register_netdevice() fails because register_netdevice() internally destroys fdb when error occurs. This patch makes vxlan_fdb_create() to do not link fdb entry to vxlan dev internally. Instead, a new function vxlan_fdb_insert() is added to link fdb to vxlan dev. vxlan_fdb_insert() is called after calling register_netdevice(). This routine can avoid situation that ->ndo_uninit() destroys fdb entry in error path of register_netdevice(). Hence, error path of __vxlan_dev_create() routine can have an opportunity to destroy default fdb entry by hand. Test command ip link add bonding_masters type vxlan id 0 group 239.1.1.1 \ dev enp0s9 dstport 4789 Splat looks like: [ 213.392816] kasan: GPF could be caused by NULL-ptr deref or user memory access [ 213.401257] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 213.402178] CPU: 0 PID: 1414 Comm: ip Not tainted 5.2.0-rc5+ #256 [ 213.402178] RIP: 0010:vxlan_fdb_destroy+0x120/0x220 [vxlan] [ 213.402178] Code: df 48 8b 2b 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 06 01 00 00 4c 8b 63 08 48 b8 00 00 00 00 00 fc d [ 213.402178] RSP: 0018:ffff88810cb9f0a0 EFLAGS: 00010202 [ 213.402178] RAX: dffffc0000000000 RBX: ffff888101d4a8c8 RCX: 0000000000000000 [ 213.402178] RDX: 1bd5a00000000040 RSI: ffff888101d4a8c8 RDI: ffff888101d4a8d0 [ 213.402178] RBP: 0000000000000000 R08: fffffbfff22b72d9 R09: 0000000000000000 [ 213.402178] R10: 00000000ffffffef R11: 0000000000000000 R12: dead000000000200 [ 213.402178] R13: ffff88810cb9f1f8 R14: ffff88810efccda0 R15: ffff88810efccda0 [ 213.402178] FS: 00007f7f6621a0c0(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000 [ 213.402178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 213.402178] CR2: 000055746f0807d0 CR3: 00000001123e0000 CR4: 00000000001006f0 [ 213.402178] Call Trace: [ 213.402178] __vxlan_dev_create+0x3a9/0x7d0 [vxlan] [ 213.402178] ? vxlan_changelink+0x740/0x740 [vxlan] [ 213.402178] ? rcu_read_unlock+0x60/0x60 [vxlan] [ 213.402178] ? __kasan_kmalloc.constprop.3+0xa0/0xd0 [ 213.402178] vxlan_newlink+0x8d/0xc0 [vxlan] [ 213.402178] ? __vxlan_dev_create+0x7d0/0x7d0 [vxlan] [ 213.554119] ? __netlink_ns_capable+0xc3/0xf0 [ 213.554119] __rtnl_newlink+0xb75/0x1180 [ 213.554119] ? rtnl_link_unregister+0x230/0x230 [ ... ] Fixes: `0241b83673` ("vxlan: fix default fdb entry netlink notify ordering during netdev create") Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 19:06:02 -07:00
David S. Miller	92ad6325cb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor SPDX change conflict. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-22 08:59:24 -04:00
Linus Torvalds	c884d8ac7f	SPDX update for 5.2-rc6 Another round of SPDX updates for 5.2-rc6 Here is what I am guessing is going to be the last "big" SPDX update for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates that were "easy" to determine by pattern matching. The ones after this are going to be a bit more difficult and the people on the spdx list will be discussing them on a case-by-case basis now. Another 5000+ files are fixed up, so our overall totals are: Files checked: 64545 Files with SPDX: 45529 Compared to the 5.1 kernel which was: Files checked: 63848 Files with SPDX: 22576 This is a huge improvement. Also, we deleted another 20000 lines of boilerplate license crud, always nice to see in a diffstat. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXQyQYA8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ymnGQCghETUBotn1p3hTjY56VEs6dGzpHMAnRT0m+lv kbsjBGEJpLbMRB2krnaU =RMcT -----END PGP SIGNATURE----- Merge tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx Pull still more SPDX updates from Greg KH: "Another round of SPDX updates for 5.2-rc6 Here is what I am guessing is going to be the last "big" SPDX update for 5.2. It contains all of the remaining GPLv2 and GPLv2+ updates that were "easy" to determine by pattern matching. The ones after this are going to be a bit more difficult and the people on the spdx list will be discussing them on a case-by-case basis now. Another 5000+ files are fixed up, so our overall totals are: Files checked: 64545 Files with SPDX: 45529 Compared to the 5.1 kernel which was: Files checked: 63848 Files with SPDX: 22576 This is a huge improvement. Also, we deleted another 20000 lines of boilerplate license crud, always nice to see in a diffstat" * tag 'spdx-5.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx: (65 commits) treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 507 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 506 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 505 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 504 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 503 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 502 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 501 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 499 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 498 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 497 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 496 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 495 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 491 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 490 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 489 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 488 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 487 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 486 treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 485 ...	2019-06-21 09:58:42 -07:00
Thomas Gleixner	d2912cb15b	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-06-19 17:09:55 +02:00
David S. Miller	13091aa305	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Honestly all the conflicts were simple overlapping changes, nothing really interesting to report. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-17 20:20:36 -07:00
Stefano Brivio	8399a6930d	vxlan: Don't assume linear buffers in error handler In commit `c3a43b9fec` ("vxlan: ICMP error lookup handler") I wrongly assumed buffers from icmp_socket_deliver() would be linear. This is not the case: icmp_socket_deliver() only guarantees we have 8 bytes of linear data. Eric fixed this same issue for fou and fou6 in commits `26fc181e6c` ("fou, fou6: do not assume linear skbs") and `5355ed6388` ("fou, fou6: avoid uninit-value in gue_err() and gue6_err()"). Use pskb_may_pull() instead of checking skb->len, and take into account the fact we later access the VXLAN header with udp_hdr(), so we also need to sum skb_transport_header() here. Reported-by: Guillaume Nault <gnault@redhat.com> Fixes: `c3a43b9fec` ("vxlan: ICMP error lookup handler") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-11 12:07:33 -07:00
Litao jiao	fe1e0713bb	vxlan: Use FDB_HASH_SIZE hash_locks to reduce contention The monolithic hash_lock could cause huge contention when inserting/deletiing vxlan_fdbs into the fdb_head. Use FDB_HASH_SIZE hash_locks to protect insertions/deletions of vxlan_fdbs into the fdb_head hash table. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Litao jiao <jiaolitao@raisecom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-06 11:08:55 -07:00
Enrico Weigelt	478db1f1fc	drivers: net: vxlan: drop unneeded likely() call around IS_ERR() IS_ERR() already calls unlikely(), so this extra likely() call around the !IS_ERR() is not needed. Signed-off-by: Enrico Weigelt <info@metux.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-06-05 16:57:23 -07:00
David Ahern	3616d08bcb	ipv6: Move ipv6 stubs to a separate header file The number of stubs is growing and has nothing to do with addrconf. Move the definition of the stubs to a separate header file and update users. In the move, drop the vxlan specific comment before ipv6_stub. Code move only; no functional change intended. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-29 10:53:45 -07:00
Zhiqiang Liu	cc4807bb60	vxlan: Don't call gro_cells_destroy() before device is unregistered Commit `ad6c9986bc` ("vxlan: Fix GRO cells race condition between receive and link delete") fixed a race condition for the typical case a vxlan device is dismantled from the current netns. But if a netns is dismantled, vxlan_destroy_tunnels() is called to schedule a unregister_netdevice_queue() of all the vxlan tunnels that are related to this netns. In vxlan_destroy_tunnels(), gro_cells_destroy() is called and finished before unregister_netdevice_queue(). This means that the gro_cells_destroy() call is done too soon, for the same reasons explained in above commit. So we need to fully respect the RCU rules, and thus must remove the gro_cells_destroy() call or risk use after-free. Fixes: `58ce31cca1` ("vxlan: GRO support at tunnel layer") Signed-off-by: Suanming.Mou <mousuanming@huawei.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-18 17:07:27 -07:00
Eric Dumazet	59cbf56fcd	vxlan: test dev->flags & IFF_UP before calling gro_cells_receive() Same reasons than the ones explained in commit `4179cb5a4c` ("vxlan: test dev->flags & IFF_UP before calling netif_rx()") netif_rx() or gro_cells_receive() must be called under a strict contract. At device dismantle phase, core networking clears IFF_UP and flush_all_backlogs() is called after rcu grace period to make sure no incoming packet might be in a cpu backlog and still referencing the device. A similar protocol is used for gro_cells infrastructure, as gro_cells_destroy() will be called only after a full rcu grace period is observed after IFF_UP has been cleared. Most drivers call netif_rx() from their interrupt handler, and since the interrupts are disabled at device dismantle, netif_rx() does not have to check dev->flags & IFF_UP Virtual drivers do not have this guarantee, and must therefore make the check themselves. Otherwise we risk use-after-free and/or crashes. Fixes: `d342894c5d` ("vxlan: virtual extensible lan") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-10 11:05:52 -07:00
Litao Jiao	f98ec78851	vxlan: do not need BH again in vxlan_cleanup() vxlan_cleanup() is a timer callback, it is already and only running in BH context. Signed-off-by: Litao Jiao <jiaolitao@raisecom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-08 14:45:42 -08:00
Stefano Brivio	ad6c9986bc	vxlan: Fix GRO cells race condition between receive and link delete If we receive a packet while deleting a VXLAN device, there's a chance vxlan_rcv() is called at the same time as vxlan_dellink(). This is fine, except that vxlan_dellink() should never ever touch stuff that's still in use, such as the GRO cells list. Otherwise, vxlan_rcv() crashes while queueing packets via gro_cells_receive(). Move the gro_cells_destroy() to vxlan_uninit(), which runs after the RCU grace period is elapsed and nothing needs the gro_cells anymore. This is now done in the same way as commit `8e816df879` ("geneve: Use GRO cells infrastructure.") originally implemented for GENEVE. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: `58ce31cca1` ("vxlan: GRO support at tunnel layer") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-03-08 11:27:21 -08:00
Roopa Prabhu	70fb082880	vxlan: add extack support for create and changelink This patch adds extack coverage in vxlan link create and changelink paths. Introduces a new helper vxlan_nl2flags to consolidate flag attribute validation. thanks to Johannes Berg for some tips to construct the generic vxlan flag extack strings. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-26 08:54:37 -08:00
Andy Roulin	8f1af75df3	vxlan: add ndo_change_proto_down support Add ndo_change_proto_down support through dev_change_proto_down_generic for use by control protocols like VRRPD. Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-24 13:01:05 -08:00
David S. Miller	3313da8188	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net The netfilter conflicts were rather simple overlapping changes. However, the cls_tcindex.c stuff was a bit more complex. On the 'net' side, Cong is fixing several races and memory leaks. Whilst on the 'net-next' side we have Vlad adding the rtnl-ness support. What I've decided to do, in order to resolve this, is revert the conversion over to using a workqueue that Cong did, bringing us back to pure RCU. I did it this way because I believe that either Cong's races don't apply with have Vlad did things, or Cong will have to implement the race fix slightly differently. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-15 12:38:38 -08:00
Eric Dumazet	4179cb5a4c	vxlan: test dev->flags & IFF_UP before calling netif_rx() netif_rx() must be called under a strict contract. At device dismantle phase, core networking clears IFF_UP and flush_all_backlogs() is called after rcu grace period to make sure no incoming packet might be in a cpu backlog and still referencing the device. Most drivers call netif_rx() from their interrupt handler, and since the interrupts are disabled at device dismantle, netif_rx() does not have to check dev->flags & IFF_UP Virtual drivers do not have this guarantee, and must therefore make the check themselves. Otherwise we risk use-after-free and/or crashes. Note this patch also fixes a small issue that came with commit `ce6502a8f9` ("vxlan: fix a use after free in vxlan_encap_bypass"), since the dev->stats.rx_dropped change was done on the wrong device. Fixes: `d342894c5d` ("vxlan: virtual extensible lan") Fixes: `ce6502a8f9` ("vxlan: fix a use after free in vxlan_encap_bypass") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Petr Machata <petrm@mellanox.com> Cc: Ido Schimmel <idosch@mellanox.com> Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-11 12:39:51 -08:00
Petr Machata	fc4aa1ca16	net: vxlan: Free a leaked vetoed multicast rdst When an rdst is rejected by a driver, the current code removes it from the remote list, but neglects to free it. This is triggered by tools/testing/selftests/drivers/net/mlxsw/vxlan_fdb_veto.sh and shows as the following kmemleak trace: unreferenced object 0xffff88817fa3d888 (size 96): comm "softirq", pid 0, jiffies 4372702718 (age 165.252s) hex dump (first 32 bytes): 02 00 00 00 c6 33 64 03 80 f5 a2 61 81 88 ff ff .....3d....a.... 06 df 71 ae ff ff ff ff 0c 00 00 00 04 d2 6a 6b ..q...........jk backtrace: [<00000000296b27ac>] kmem_cache_alloc_trace+0x1ae/0x370 [<0000000075c86dc6>] vxlan_fdb_append.part.12+0x62/0x3b0 [vxlan] [<00000000e0414b63>] vxlan_fdb_update+0xc61/0x1020 [vxlan] [<00000000f330c4bd>] vxlan_fdb_add+0x2e8/0x3d0 [vxlan] [<0000000008f81c2c>] rtnl_fdb_add+0x4c2/0xa10 [<00000000bdc4b270>] rtnetlink_rcv_msg+0x6dd/0x970 [<000000006701f2ce>] netlink_rcv_skb+0x290/0x410 [<00000000c08a5487>] rtnetlink_rcv+0x15/0x20 [<00000000d5f54b1e>] netlink_unicast+0x43f/0x5e0 [<00000000db4336bb>] netlink_sendmsg+0x789/0xcd0 [<00000000e1ee26b6>] sock_sendmsg+0xba/0x100 [<00000000ba409802>] ___sys_sendmsg+0x631/0x960 [<000000003c332113>] __sys_sendmsg+0xea/0x180 [<00000000f4139144>] __x64_sys_sendmsg+0x78/0xb0 [<000000006d1ddc59>] do_syscall_64+0x94/0x410 [<00000000c8defa9a>] entry_SYSCALL_64_after_hwframe+0x49/0xbe Move vxlan_dst_free() up and schedule a call thereof to plug this leak. Fixes: `61f46fe8c6` ("vxlan: Allow vetoing of FDB notifications") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-02-07 11:17:08 -08:00
Petr Machata	6685987c29	switchdev: Add extack argument to call_switchdev_notifiers() A follow-up patch will enable vetoing of FDB entries. Make it possible to communicate details of why an FDB entry is not acceptable back to the user. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:18:47 -08:00
Petr Machata	4c59b7d160	vxlan: Add extack to switchdev operations There are four sources of VXLAN switchdev notifier calls: - the changelink() link operation, which already supports extack, - ndo_fdb_add() which got extack support in a previous patch, - FDB updates due to packet forwarding, - and vxlan_fdb_replay(). Extend vxlan_fdb_switchdev_call_notifiers() to include extack in the switchdev message that it sends, and propagate the argument upwards to the callers. For the first two cases, pass in the extack gotten through the operation. For case #3, pass in NULL. To cover the last case, extend vxlan_fdb_replay() to take extack argument, which might come from whatever operation necessitated the FDB replay. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:18:47 -08:00
Petr Machata	87b0984ebf	net: Add extack argument to ndo_fdb_add() Drivers may not be able to support certain FDB entries, and an error code is insufficient to give clear hints as to the reasons of rejection. In order to make it possible to communicate the rejection reason, extend ndo_fdb_add() with an extack argument. Adapt the existing implementations of ndo_fdb_add() to take the parameter (and ignore it). Pass the extack parameter when invoking ndo_fdb_add() from rtnl_fdb_add(). Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:18:47 -08:00
Petr Machata	1cdc98c271	vxlan: changelink: Delete remote after update If a change in remote address prompts a change in a default FDB entry, that change might be vetoed. If that happens, it would then be necessary to reinstate the already-removed default FDB entry corresponding to the previous remote address. Instead, arrange to have the previous address removed only after the FDB is successfully vetted. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:18:46 -08:00
Petr Machata	038a5a99e9	vxlan: changelink: Postpone vxlan_config_apply() When an FDB entry is vetoed, it is necessary to unroll the changes that have already been done. To avoid having to unroll vxlan_config_apply(), postpone the call after the point where the vetoing takes place. Since the call can't fail, it doesn't necessitate any cleanups in the preceding FDB update logic. Correspondingly, move down the mod_timer() call as well. References to *dst need to be replaced with references to conf. Additionally, old_dst and old_age_interval are not necessary anymore, and therefore drop them. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:18:46 -08:00
Petr Machata	8db9427d52	vxlan: changelink: Inline vxlan_dev_configure() The changelink operation may cause change in remote address, and therefore an FDB update, which can be vetoed. To properly handle vetoing, vxlan_changelink() needs to be gradually updated. In this patch simply replace vxlan_dev_configure() with the two constituent calls. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:18:46 -08:00
Petr Machata	61f46fe8c6	vxlan: Allow vetoing of FDB notifications Change vxlan_fdb_switchdev_call_notifiers() to return the result from calling switchdev notifiers. Propagate the error number up the stack. In vxlan_fdb_update_existing() and vxlan_fdb_update_create() add rollbacks to clean up the work that was done before the veto. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:18:46 -08:00
Petr Machata	ccdfd4f71d	vxlan: Have vxlan_fdb_replace() save original rdst value To enable rollbacks after vetoed FDB updates, extend vxlan_fdb_replace() to take an additional argument where it should store the original values of a modified rdst. Update the sole caller. The following patch will make use of the saved value. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:18:46 -08:00
Petr Machata	a76d1ca296	vxlan: Split vxlan_fdb_update() in two In order to make it easier to implement rollbacks after FDB update vetoing, separate the FDB update code to two parts: one that deals with updates of existing FDB entries, and one that creates new entries. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:18:46 -08:00
Petr Machata	c2b200e0ba	vxlan: Move up vxlan_fdb_free(), vxlan_fdb_destroy() These functions will be needed for rollbacks of vetoed FDB entries. Move them up so that they are visible at their intended point of use. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-01-17 15:18:46 -08:00
David S. Miller	3a6d528a5e	vxlan: Correct merge error. When resolving the conflict wrt. the vxlan_fdb_update call in vxlan_changelink() I made the last argument false instead of true. Fix this. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 16:14:22 -08:00
David S. Miller	2be09de7d6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Lots of conflicts, by happily all cases of overlapping changes, parallel adds, things of that nature. Thanks to Stephen Rothwell, Saeed Mahameed, and others for their guidance in these resolutions. Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-20 11:53:36 -08:00
Petr Machata	ce5e098f7a	vxlan: changelink: Fix handling of default remotes Default remotes are stored as FDB entries with an Ethernet address of 00:00:00:00:00:00. When a request is made to change a remote address of a VXLAN device, vxlan_changelink() first deletes the existing default remote, and then creates a new FDB entry. This works well as long as the list of default remotes matches exactly the configuration of a VXLAN remote address. Thus when the VXLAN device has a remote of X, there should be exactly one default remote FDB entry X. If the VXLAN device has no remote address, there should be no such entry. Besides using "ip link set", it is possible to manipulate the list of default remotes by using the "bridge fdb". It is therefore easy to break the above condition. Under such circumstances, the __vxlan_fdb_delete() call doesn't delete the FDB entry itself, but just one remote. The following vxlan_fdb_create() then creates a new FDB entry, leading to a situation where two entries exist for the address 00:00:00:00:00:00, each with a different subset of default remotes. An even more obvious breakage rooted in the same cause can be observed when a remote address is configured for a VXLAN device that did not have one before. In that case vxlan_changelink() doesn't remove any remote, and just creates a new FDB entry for the new address: $ ip link add name vx up type vxlan id 2000 dstport 4789 $ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.20 self permanent $ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.30 self permanent $ ip link set dev vx type vxlan remote 192.0.2.30 $ bridge fdb sh dev vx \| grep 00:00:00:00:00:00 00:00:00:00:00:00 dst 192.0.2.30 self permanent <- new entry, 1 rdst 00:00:00:00:00:00 dst 192.0.2.20 self permanent <- orig. entry, 2 rdsts 00:00:00:00:00:00 dst 192.0.2.30 self permanent To fix this, instead of calling vxlan_fdb_create() directly, defer to vxlan_fdb_update(). That has logic to handle the duplicates properly. Additionally, it also handles notifications, so drop that call from changelink as well. Fixes: `0241b83673` ("vxlan: fix default fdb entry netlink notify ordering during netdev create") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-18 21:18:26 -08:00
Petr Machata	6db9246871	vxlan: Fix error path in __vxlan_dev_create() When a failure occurs in rtnl_configure_link(), the current code calls unregister_netdevice() to roll back the earlier call to register_netdevice(), and jumps to errout, which calls vxlan_fdb_destroy(). However unregister_netdevice() calls transitively ndo_uninit, which is vxlan_uninit(), and that already takes care of deleting the default FDB entry by calling vxlan_fdb_delete_default(). Since the entry added earlier in __vxlan_dev_create() is exactly the default entry, the cleanup code in the errout block always leads to double free and thus a panic. Besides, since vxlan_fdb_delete_default() always destroys the FDB entry with notification enabled, the deletion of the default entry is notified even before the addition was notified. Instead, move the unregister_netdevice() call after the manual destroy, which solves both problems. Fixes: `0241b83673` ("vxlan: fix default fdb entry netlink notify ordering during netdev create") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-18 21:18:25 -08:00
Petr Machata	6ad0b5a4e0	vxlan: Unmark offloaded bit on replaced FDB entries When rdst of an offloaded FDB entry is replaced, it certainly isn't offloaded anymore. Drivers are notified about such replacements, and can re-mark the entry as offloaded again if they so wish. However until a driver does so explicitly, assume a replaced FDB entry is not offloaded. Note that replaces coming via vxlan_fdb_external_learn_add() are always immediately followed by an explicit offload marking. Fixes: `0efe117333` ("vxlan: Support marking RDSTs as offloaded") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2018-12-18 21:18:25 -08:00

1 2 3 4 5 ...

509 Коммитов