WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Petar Penkov	c39e342a05	tun: fix data-race in gro_normal_list() There is a race in the TUN driver between napi_busy_loop and napi_gro_frags. This commit resolves the race by adding the NAPI struct via netif_tx_napi_add, instead of netif_napi_add, which disables polling for the NAPI struct. KCSAN reported: BUG: KCSAN: data-race in gro_normal_list.part.0 / napi_busy_loop write to 0xffff8880b5d474b0 of 4 bytes by task 11205 on cpu 0: gro_normal_list.part.0+0x77/0xb0 net/core/dev.c:5682 gro_normal_list net/core/dev.c:5678 [inline] gro_normal_one net/core/dev.c:5692 [inline] napi_frags_finish net/core/dev.c:5705 [inline] napi_gro_frags+0x625/0x770 net/core/dev.c:5778 tun_get_user+0x2150/0x26a0 drivers/net/tun.c:1976 tun_chr_write_iter+0x79/0xd0 drivers/net/tun.c:2022 call_write_iter include/linux/fs.h:1895 [inline] do_iter_readv_writev+0x487/0x5b0 fs/read_write.c:693 do_iter_write fs/read_write.c:970 [inline] do_iter_write+0x13b/0x3c0 fs/read_write.c:951 vfs_writev+0x118/0x1c0 fs/read_write.c:1015 do_writev+0xe3/0x250 fs/read_write.c:1058 __do_sys_writev fs/read_write.c:1131 [inline] __se_sys_writev fs/read_write.c:1128 [inline] __x64_sys_writev+0x4e/0x60 fs/read_write.c:1128 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 read to 0xffff8880b5d474b0 of 4 bytes by task 11168 on cpu 1: gro_normal_list net/core/dev.c:5678 [inline] napi_busy_loop+0xda/0x4f0 net/core/dev.c:6126 sk_busy_loop include/net/busy_poll.h:108 [inline] __skb_recv_udp+0x4ad/0x560 net/ipv4/udp.c:1689 udpv6_recvmsg+0x29e/0xe90 net/ipv6/udp.c:288 inet6_recvmsg+0xbb/0x240 net/ipv6/af_inet6.c:592 sock_recvmsg_nosec net/socket.c:871 [inline] sock_recvmsg net/socket.c:889 [inline] sock_recvmsg+0x92/0xb0 net/socket.c:885 sock_read_iter+0x15f/0x1e0 net/socket.c:967 call_read_iter include/linux/fs.h:1889 [inline] new_sync_read+0x389/0x4f0 fs/read_write.c:414 __vfs_read+0xb1/0xc0 fs/read_write.c:427 vfs_read fs/read_write.c:461 [inline] vfs_read+0x143/0x2c0 fs/read_write.c:446 ksys_read+0xd5/0x1b0 fs/read_write.c:587 __do_sys_read fs/read_write.c:597 [inline] __se_sys_read fs/read_write.c:595 [inline] __x64_sys_read+0x4c/0x60 fs/read_write.c:595 do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 11168 Comm: syz-executor.0 Not tainted 5.4.0-rc6+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: `943170998b` ("tun: enable NAPI for TUN/TAP driver") Signed-off-by: Petar Penkov <ppenkov@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:46:49 -08:00
Eric Dumazet	20021578ba	selftests: net: tcp_mmap should create detached threads Since we do not plan using pthread_join() in the server do_accept() loop, we better create detached threads, or risk increasing memory footprint over time. Fixes: `192dc405f3` ("selftests: net: add tcp_mmap program") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:46:08 -08:00
Tonghao Zhang	61ca533c0e	net: openvswitch: don't call pad_packet if not necessary The nla_put_u16/nla_put_u32 makes sure that *attrlen is align. The call tree is that: nla_put_u16/nla_put_u32 -> nla_put attrlen = sizeof(u16) or sizeof(u32) -> __nla_put attrlen -> __nla_reserve attrlen -> skb_put(skb, nla_total_size(attrlen)) nla_total_size returns the total length of attribute including padding. Cc: Joe Stringer <joe@ovn.org> Cc: William Tu <u9012063@gmail.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:43:27 -08:00
David S. Miller	3bb884a4a0	Merge branch 'DSA-driver-for-Vitesse-Felix-switch' Vladimir Oltean says: ==================== DSA driver for Vitesse Felix switch This series builds upon the previous "Accomodate DSA front-end into Ocelot" topic and does the following: - Reworks the Ocelot (VSC7514) driver to support one more switching core (VSC9959), used in NPI mode. Some code which was thought to be SoC-specific (ocelot_board.c) wasn't, and vice versa, so it is being accordingly moved. - Exports ocelot driver structures and functions to include/soc/mscc. - Adds a DSA ocelot front-end for VSC9959, which is a PCI device and uses the exported ocelot functionality for hardware configuration. - Adds a tagger driver for the Vitesse injection/extraction DSA headers. This is known to be compatible with at least Ocelot and Felix. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:17 -08:00
Vladimir Oltean	5605194877	net: dsa: ocelot: add driver for Felix switch family This supports an Ethernet switching core from Vitesse / Microsemi / Microchip (VSC9959) which is part of the Ocelot family (a brand name), and whose code name is Felix. The switch can be (and is) integrated on different SoCs as a PCIe endpoint device. The functionality is provided by the core of the Ocelot switch driver (drivers/net/ethernet/mscc). In this regard, the current driver is an instance of Microsemi's Ocelot core driver, with a DSA front-end. It inherits its name from VSC9959's code name, to distinguish itself from the switchdev ocelot driver. The patch adds the logic for probing a PCI device and defines the register map for the VSC9959 switch core, since it has some differences in register addresses and bitfield mappings compared to the other Ocelot switches (VSC7511, VSC7512, VSC7513, VSC7514). The Felix driver declares the register map as part of the "instance table". Currently the VSC9959 inside NXP LS1028A is the only instance, but presumably it can support other switches in the Ocelot family, when used in DSA mode (Linux running on the external CPU, and not on the embedded MIPS). In a few cases, some h/w operations have to be done differently on VSC9959 due to missing bitfields. This is the case for the switch core reset and init. Because for this operation Ocelot uses some bits that are not present on Felix, the latter has to use a register from the global registers block (GCB) instead. Although it is a PCI driver, it relies on DT bindings for compatibility with DSA (CPU port link, PHY library). It does not have any custom device tree bindings, since we would like to minimize its dependency on device tree though. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
Vladimir Oltean	8dce89aa5f	net: dsa: ocelot: add tagger for Ocelot/Felix switches While it is entirely possible that this tagger format is in fact more generic than just these 2 switch families, I don't have that knowledge. The Seville switch in NXP T1040 has a similar frame format, but there are enough differences (e.g. DEST field starts at bit 57 instead of 56) that calling this file tag_vitesse.c is a bit of a stretch at the moment. The frame format has been listed in a comment so that people who add support for further Vitesse switches can rework this tagger while keeping compatibility with Felix. The "ocelot" name was chosen instead of "felix" because even the Ocelot switch can act as a DSA device when it is used in NPI mode, and the Felix tagger format is almost identical. Currently it is only used for the Felix switch embedded in the NXP LS1028A chip. The ABI for this tagger should be considered "not stable" at the moment. The DSA tag is always placed before the Ethernet header and therefore, we are using the long prefix for RX tags to avoid putting the DSA master port in promiscuous mode. Once there will be an API in DSA for drivers to request DSA masters to be in promiscuous mode unconditionally, we will switch to the "no prefix" extraction frame header, which will save 16 padding bytes for each RX frame. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
Vladimir Oltean	a030dfe194	net: mscc: ocelot: publish ocelot_sys.h to include/soc/mscc The Felix DSA driver needs to write to SYS_RAM_INIT_RAM_INIT for its own chip initialization process. Also update the MAINTAINERS file such that the headers exported by the ocelot driver are under the same maintainers' umbrella as the driver itself. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
Vladimir Oltean	5e25636502	net: mscc: ocelot: publish structure definitions to include/soc/mscc/ocelot.h We will be registering another switch driver based on ocelot, which lives under drivers/net/dsa. Make sure the Felix DSA front-end has the necessary abstractions to implement a new Ocelot driver instantiation. This includes the function prototypes for implementing DSA callbacks. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
Vladimir Oltean	3a77b5933f	net: mscc: ocelot: separate the implementation of switch reset The Felix switch has a different reset procedure, so a function pointer needs to be created and added to the ocelot_ops structure. The reset procedure has been moved into ocelot_init. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
Vladimir Oltean	ba551bc3bc	net: mscc: ocelot: adjust MTU on the CPU port in NPI mode When using the NPI port, the DSA tag is passed through Ethernet, so the switch's MAC needs to accept it as it comes from the DSA master. Increase the MTU on the external CPU port to account for the length of the injection header. Without this patch, MTU-sized frames are dropped by the switch's CPU port on xmit, which is especially obvious in TCP sessions. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
Vladimir Oltean	f24711fddc	net: mscc: ocelot: export a constant for the tag length in bytes This constant will be used in a future patch to increase the MTU on NPI ports, and will also be used in the tagger driver for Felix. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
Vladimir Oltean	fa914e9c4d	net: mscc: ocelot: create a helper for changing the port MTU Since in an NPI/DSA setup, not all ports will have the same MTU, we need to make sure the watermarks for pause frames and/or tail dropping logic that existed in the driver is still coherent for the new MTU values. We need to do this because the NPI (aka external CPU) port needs an increased MTU for the DSA tag. This will be done in a future patch. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
Vladimir Oltean	5bc9d2e6e7	net: mscc: ocelot: move invariant configs out of adjust_link It doesn't make sense to rewrite all these registers every time the PHY library notifies us about a link state change. In a future patch we will customize the MTU for the CPU port, and since the MTU was previously configured from adjust_link, if we don't make this change, its value would have got overridden. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
Claudiu Manoil	dc3de2a294	net: mscc: ocelot: filter out ocelot SoC specific PCS config from common path The adjust_link routine should be generic enough to be (re)used by any SoC that integrates a switch core compatible with the Ocelot core switch driver. Currently all configurations are generic except for the PCS settings that are SoC specific. Move these out to the Ocelot SoC/board instance. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
Claudiu Manoil	259630e08c	net: mscc: ocelot: move resource ioremap and regmap init to common code Let's make this ioremap and regmap init code common. It should not be platform dependent as it should be usable by PCI devices too. Use better names where necessary to avoid clashes. Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:32:16 -08:00
David S. Miller	e7be235fa7	Merge branch 'net-smc-improve-termination-handling-part-3' Karsten Graul says: ==================== net/smc: improve termination handling (part 3) Part 3 of the SMC termination patches improves the link group termination processing and introduces the ability to immediately terminate a link group. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	0b29ec6436	net/smc: immediate termination for SMCR link groups If the SMC module is unloaded or an IB device is thrown away, the immediate link group freeing introduced for SMCD is exploited for SMCR as well. That means SMCR-specifics are added to smc_conn_kill(). Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	6a37ad3da5	net/smc: wait for tx completions before link freeing Make sure all pending work requests are completed before freeing a link. Dismiss tx pending slots already when terminating a link group to exploit termination shortcut in tx completion queue handler. And kill the completion queue tasklets after destroy of the completion queues, otherwise there is a time window for another tasklet schedule of an already killed tasklet. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	2c1d3e5030	net/smc: abnormal termination without orderly flag For abnormal termination issue an LLC DELETE_LINK without the orderly flag. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	15e1b99aad	net/smc: no WR buffer wait for terminating link group Avoid waiting for a free work request buffer, if the link group is already terminating. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	5edd6b9cb8	net/smc: introduce bookkeeping of SMCD link groups If the ism module is unloaded return control from exit routine only, if all link groups are freed. If an IB device is thrown away return control from device removal only, if all link groups belonging to this device are freed. A counters for the total number of SMCD link groups per ISM device is introduced. ism module unloading continues only if the total number of SMCD link groups for all ISM devices is zero. ISM device removal continues only it the total number of SMCD link groups per ISM device has decreased to zero. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	5421ec281d	net/smc: abnormal termination of SMCD link groups A final cleanup due to SMCD device removal means immediate freeing of all link groups belonging to this device in interrupt context. This patch introduces a separate SMCD link group termination routine, which terminates all link groups of an SMCD device. This new routine smcd_terminate_all ()is reused if the smc module is unloaded. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	42bfba9eaa	net/smc: immediate termination for SMCD link groups SMCD link group termination is called when peer signals its shutdown of its corresponding link group. For regular shutdowns no connections exist anymore. For abnormal shutdowns connections must be killed and their DMBs must be unregistered immediately. That means the SMCR method to delay the link group freeing several seconds does not fit. This patch adds immediate termination of a link group and its SMCD connections and makes sure all SMCD link group related cleanup steps are finished. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
Ursula Braun	50c6b20eff	net/smc: fix final cleanup sequence for SMCD devices If peer announces shutdown, use the link group terminate worker for local cleanup of link groups and connections to terminate link group in proper context. Make sure link groups are cleaned up first before destroying the event queue of the SMCD device, because link group cleanup may raise events. Send signal shutdown only if peer has not done it already. Send socket abort or close only, if peer has not already announced shutdown. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:28:28 -08:00
David S. Miller	43da44c876	Merge branch 'net-stmmac-CPU-Performance-Improvements' Jose Abreu says: ==================== net: stmmac: CPU Performance Improvements CPU Performance improvements for stmmac. Please check bellow for results before and after the series. Patch 1/7, allows RX Interrupt on Completion to be disabled and only use the RX HW Watchdog. Patch 2/7, setups the default RX coalesce settings instead of using the minimum value. Patch 3/7 and 4/7, removes the uneeded computations for RX Flow Control activation/de-activation, on some cases. Patch 5/7, tunes-up the default coalesce settings. Patch 6/7, re-works the TX coalesce timer activation logic. Patch 7/7, removes the now uneeded TBU interrupt. NetPerf UDP Results: -------------------- Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB --- XGMAC@2.5G: Before 212992 1400 10.00 2100620 0 2351.7 36.69 5.112 212992 10.00 2100539 2351.6 26.18 3.648 --- XGMAC@2.5G: After 212992 1400 10.00 2108972 0 2361.5 21.73 3.015 212992 10.00 2097038 2348.1 19.21 2.666 --- GMAC5@1G: Before 212992 1400 10.00 786000 0 880.2 34.71 12.923 212992 10.00 786000 880.2 23.42 8.719 --- GMAC5@1G: After 212992 1400 10.00 842648 0 943.7 14.12 4.903 212992 10.00 842648 943.7 12.73 4.418 Perf TCP Results on RX Path: ---------------------------- --- XGMAC@2.5G: Before 22.51% swapper [stmmac] [k] dwxgmac2_dma_interrupt 10.82% swapper [stmmac] [k] dwxgmac2_host_mtl_irq_status 5.21% swapper [stmmac] [k] dwxgmac2_host_irq_status 4.67% swapper [stmmac] [k] dwxgmac3_safety_feat_irq_status 3.63% swapper [kernel.kallsyms] [k] stack_trace_consume_entry 2.74% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string 2.52% swapper [kernel.kallsyms] [k] update_stack_state 1.94% ksoftirqd/0 [stmmac] [k] dwxgmac2_dma_interrupt 1.45% iperf3 [kernel.kallsyms] [k] queued_spin_lock_slowpath 1.26% swapper [kernel.kallsyms] [k] create_object --- XGMAC@2.5G: After 7.43% swapper [kernel.kallsyms] [k] stack_trace_consume_entry 5.86% swapper [stmmac] [k] dwxgmac2_dma_interrupt 5.68% swapper [kernel.kallsyms] [k] update_stack_state 4.71% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string 2.88% swapper [kernel.kallsyms] [k] create_object 2.69% swapper [stmmac] [k] dwxgmac2_host_mtl_irq_status 2.61% swapper [stmmac] [k] stmmac_napi_poll_rx 2.52% swapper [kernel.kallsyms] [k] unwind_next_frame.part.4 1.48% swapper [kernel.kallsyms] [k] unwind_get_return_address 1.38% swapper [kernel.kallsyms] [k] arch_stack_walk --- GMAC5@1G: Before 31.29% swapper [stmmac] [k] dwmac4_dma_interrupt 14.57% swapper [stmmac] [k] dwmac4_irq_mtl_status 10.66% swapper [stmmac] [k] dwmac4_irq_status 1.97% swapper [kernel.kallsyms] [k] stack_trace_consume_entry 1.73% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string 1.59% swapper [kernel.kallsyms] [k] update_stack_state 1.15% iperf3 [kernel.kallsyms] [k] do_syscall_64 1.01% ksoftirqd/0 [stmmac] [k] dwmac4_dma_interrupt 0.89% swapper [kernel.kallsyms] [k] __default_send_IPI_dest_field 0.75% swapper [stmmac] [k] stmmac_napi_poll_rx --- GMAC5@1G: After 6.70% swapper [kernel.kallsyms] [k] stack_trace_consume_entry 5.79% swapper [stmmac] [k] dwmac4_dma_interrupt 5.29% swapper [kernel.kallsyms] [k] update_stack_state 3.52% iperf3 [kernel.kallsyms] [k] copy_user_enhanced_fast_string 2.83% swapper [stmmac] [k] dwmac4_irq_mtl_status 2.62% swapper [kernel.kallsyms] [k] create_object 2.46% swapper [stmmac] [k] stmmac_napi_poll_rx 2.32% swapper [kernel.kallsyms] [k] unwind_next_frame.part.4 2.19% swapper [stmmac] [k] dwmac4_irq_status 1.39% swapper [kernel.kallsyms] [k] unwind_get_return_address ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:42 -08:00
Jose Abreu	8d07a79304	net: stmmac: xgmac: Do not enable TBU interrupt Now that TX Coalesce has been rewritten we no longer need this additional interrupt enabled. This reduces CPU usage. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Jose Abreu	c2837423cb	net: stmmac: Rework TX Coalesce logic Coalesce logic currently increments the number of packets and sets the IC bit when the coalesced packets have passed a given limit. This does not reflect very well what coalesce was meant for as we can have a large number of packets that are coalesced and then a single one, sent later on that has the IC bit. Rework the logic so that it coalesces only upon a limit of packets and sets the IC bit for large number of packets. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Jose Abreu	da20245100	net: stmmac: Tune-up default coalesce settings Tune-up the defalt coalesce settings for optimal values. This gives the best performance in most of the use-cases. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Jose Abreu	52f96cd135	net: stmmac: xgmac: Remove uneeded computation for RFA/RFD RFA and RFD should not be dependent on FIFO size. In fact, the more FIFO space we have, the later we can activate Flow Control. Let's use hard-coded values for RFA and RFD for all FIFO sizes with the exception of 4k, which is a special case. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Jose Abreu	854248e5ec	net: stmmac: gmac4+: Remove uneeded computation for RFA/RFD RFA and RFD should not be dependent on FIFO size. In fact, the more FIFO space we have, the later we can activate Flow Control. Let's use hard-coded values for RFA and RFD for all FIFO sizes with the exception of 4k, which is a special case. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Jose Abreu	4e4337ccf7	net: stmmac: Setup a default RX Coalesce value instead of the minimum For performance reasons, sometimes using the minimum RX Coalesce value is not optimal. Lets setup a default value that is optimal in most of the use cases. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Jose Abreu	09146abebc	net: stmmac: Do not set RX IC bit if RX Coalesce is zero We may only want to use the RX Watchdog so lets check if RX Coalesce settings are non-zero and only set the RX Interrupt on Completion bit if its not. Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:25:41 -08:00
Ido Schimmel	983db6198f	mlxsw: spectrum_router: Allocate discard adjacency entry when needed Commit `0c3cbbf96d` ("mlxsw: Add specific trap for packets routed via invalid nexthops") allocated an adjacency entry during driver initialization whose purpose is to discard packets hitting the route pointing to it. These adjacency entries are allocated from a resource called KVD linear (KVDL). There are situations in which the user can decide to set the size of this resource (via devlink-resource) to 0, in which case the driver will not be able to load. Therefore, instead of pre-allocating this adjacency entry, simply allocate it only when needed. A variable indicating the validity of the entry is added and is used to ensure it is only allocated and written once and that it is freed after all the routes were flushed. Fixes: `0c3cbbf96d` ("mlxsw: Add specific trap for packets routed via invalid nexthops") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:24:54 -08:00
YueHaibing	d6649d788e	net/tls: Fix unused function warning If PROC_FS is not set, gcc warning this: net/tls/tls_proc.c:23:12: warning: 'tls_statistics_seq_show' defined but not used [-Wunused-function] Use #ifdef to guard this. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-15 12:12:28 -08:00
David S. Miller	a98cdaf73e	Merge branch 's390-next' Julian Wiedmann says: ==================== s390/qeth: updates 2019-11-14 please apply the following qeth patches to net-next. Along with the usual cleanups, this (1) reduces collateral packet loss in the RX path when dealing with bad packets and/or allocation errors, and (2) simplifies how the L3 driver deals with mcast IP addresses. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	0b81c6c620	s390/qeth: don't check drvdata in sysfs code Given the way how the sysfs attributes are registered / unregistered, the show/store helpers will never be called with a NULL drvdata. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	b80c08ac94	s390/qeth: replace qeth_l3_get_addr_buffer() The remaining usage effectively is a kmemdup() of the query object. By not wrapping it, some of the callers can now use GFP_KERNEL for the allocation. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	8659c189b6	s390/qeth: remove VLAN tracking for L3 devices Use vlan_for_each() instead of tracking each registered VID internally. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	611abe5165	s390/qeth: consolidate L3 mcast registration code Current code processes each (VLAN) device twice - once to inspect the IPv4 mcast addresses, and then a second time to walk the IPv6 mcast addresses. Unify all this into a single helper, thus removing some checks and a duplicated VLAN lookup. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	32a186c7f9	s390/qeth: remove gratuitious RX modeset Trust the IPv4/IPv6 code to properly remove its mcast addresses when a VLAN device is unregistered, and then also trigger an RX modeset whenever it's needed. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	ddf28100ee	s390/qeth: fine-tune L3 mcast locking Push the inet6_dev locking down into the helper that actually needs it for walking the mc_list. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	8311c7a252	s390/qeth: clean up error path in qeth_core_probe_device() qeth_core_free_card() is meant to be the counterpart of qeth_alloc_card() - but unfortunately was also picked as the place to free the QDIO queues. This gets messy when qeth_core_probe_device() fails during qeth_add_dbf_entry(). At this point the card->qdio.state is not initialized yet, so qeth_free_qdio_queues() ends up operating on uninitialized data. Luckily for now, the whole qeth_card struct is zero-allocated and the value of the QETH_QDIO_UNINITIALIZED enum is 0 as well. So there's no real impact from this bug at the moment, it's just really fragile. Clean this up by moving the qeth_free_qdio_queues() call up one level in the hierarchy. This way it doesn't get called from the error path. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	17caeaa476	s390/qeth: handle skb allocation error gracefully When current code fails to allocate an skb in the RX path, it drops the whole RX buffer. Considering the large number of packets that a single RX buffer might contain, this is quite drastic. Skip over the packet instead, and try to extract the next packet from the RX buffer. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	7d4faee7c6	s390/qeth: drop unwanted packets earlier in RX path Packets with an unexpected HW format are currently first extracted from the RX buffer, passed upwards to the layer-specific driver and only then finally dropped. Enhance the RX path so that we can drop such packets before even allocating an skb. For this, add some additional logic so that when a packet is meant to be dropped, we can still walk along the packet's data chunks in the RX buffer. This allows us to extract the following packet(s) from the buffer. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:51 -08:00
Julian Wiedmann	5fd3fcbb8a	s390/qeth: support per-frame invalidation Each RX buffer may contain up to 64KB worth of data. In case the device needs to discard a packet _after_ already having reserved space for it in the buffer, the whole buffer gets set to ERROR state. As the buffer might contain any number of good packets, this can result in collateral packet loss. qeth can provide relief by enabling per-frame invalidation. The RX buffer is then presented as usual, we just need to spot & drop any individual packet that was flagged as invalid. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:50 -08:00
Julian Wiedmann	845ef9047b	s390/qeth: gather more detailed RX dropped/error statistics Where available, use the fine-grained counters in rtnl_link_stats64 to indicate different RX error causes. For drop reasons, use driver-private ethtool counters. In particular this patch allows us to keep track of driver-side drops due to unknown/unsupported HW descriptor format. Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:16:50 -08:00
David S. Miller	24df31f8d5	Merge branch 'vsock-add-multi-transports-support' Stefano Garzarella says: ==================== vsock: add multi-transports support Most of the patches are reviewed by Dexuan, Stefan, and Jorgen. The following patches need reviews: - [11/15] vsock: add multi-transports support - [12/15] vsock/vmci: register vmci_transport only when VMCI guest/host are active - [15/15] vhost/vsock: refuse CID assigned to the guest->host transport RFC: https://patchwork.ozlabs.org/cover/1168442/ v1: https://patchwork.ozlabs.org/cover/1181986/ v1 -> v2: - Patch 11: + vmci_transport: sent reset when vsock_assign_transport() fails [Jorgen] + fixed loopback in the guests, checking if the remote_addr is the same of transport_g2h->get_local_cid() + virtio_transport_common: updated space available while creating the new child socket during a connection request - Patch 12: + removed 'features' variable in vmci_transport_init() [Stefan] + added a flag to register only once the host [Jorgen] - Added patch 15 to refuse CID assigned to the guest->host transport in the vhost_transport This series adds the multi-transports support to vsock, following this proposal: https://www.spinics.net/lists/netdev/msg575792.html With the multi-transports support, we can use VSOCK with nested VMs (using also different hypervisors) loading both guest->host and host->guest transports at the same time. Before this series, vmci_transport supported this behavior but only using VMware hypervisor on L0, L1, etc. The first 9 patches are cleanups and preparations, maybe some of these can go regardless of this series. Patch 10 changes the hvs_remote_addr_init(). setting the VMADDR_CID_HOST as remote CID instead of VMADDR_CID_ANY to make the choice of transport to be used work properly. Patch 11 adds multi-transports support. Patch 12 changes a little bit the vmci_transport and the vmci driver to register the vmci_transport only when there are active host/guest. Patch 13 prevents the transport modules unloading while sockets are assigned to them. Patch 14 fixes an issue in the bind() logic discoverable only with the new multi-transport support. Patch 15 refuses CID assigned to the guest->host transport in the vhost_transport. I've tested this series with nested KVM (vsock-transport [L0,L1], virtio-transport[L1,L2]) and with VMware (L0) + KVM (L1) (vmci-transport [L0,L1], vhost-transport [L1], virtio-transport[L2]). Dexuan successfully tested the RFC series on HyperV with a Linux guest. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:19 -08:00
Stefano Garzarella	ed8640a961	vhost/vsock: refuse CID assigned to the guest->host transport In a nested VM environment, we have to refuse to assign to a nested guest the same CID assigned to our guest->host transport. In this way, the user can use the local CID for loopback. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	36c5b48b91	vsock: fix bind() behaviour taking care of CID When we are looking for a socket bound to a specific address, we also have to take into account the CID. This patch is useful with multi-transports support because it allows the binding of the same port with different CID, and it prevents a connection to a wrong socket bound to the same port, but with different CID. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00
Stefano Garzarella	6a2c096210	vsock: prevent transport modules unloading This patch adds 'module' member in the 'struct vsock_transport' in order to get/put the transport module. This prevents the module unloading while sockets are assigned to it. We increase the module refcnt when a socket is assigned to a transport, and we decrease the module refcnt when the socket is destructed. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jorgen Hansen <jhansen@vmware.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-11-14 18:12:18 -08:00

1 2 3 4 5 ...

874492 Коммитов Все ветки Поиск

874492 Коммитов

Все ветки