WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Liu Jian	b556c3fd46	selftests, bpf: Fix test_txmsg_ingress_parser error After "skmsg: lose offset info in sk_psock_skb_ingress", the test case with ktls failed. This because ktls parser(tls_read_size) return value is 285 not 256. The case like this: tls_sk1 --> redir_sk --> tls_sk2 tls_sk1 sent out 512 bytes data, after tls related processing redir_sk recved 570 btyes data, and redirect 512 (skb_use_parser) bytes data to tls_sk2; but tls_sk2 needs 285 * 2 bytes data, receive timeout occurred. Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211029141216.211899-2-liujian56@huawei.com	2021-11-01 17:08:21 +01:00
Liu Jian	7303524e04	skmsg: Lose offset info in sk_psock_skb_ingress If sockmap enable strparser, there are lose offset info in sk_psock_skb_ingress(). If the length determined by parse_msg function is not skb->len, the skb will be converted to sk_msg multiple times, and userspace app will get the data multiple times. Fix this by get the offset and length from strp_msg. And as Cong suggested, add one bit in skb->_sk_redir to distinguish enable or disable strparser. Fixes: `604326b41a` ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: Liu Jian <liujian56@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211029141216.211899-1-liujian56@huawei.com	2021-11-01 17:08:21 +01:00
Andrii Nakryiko	0133c20480	selftests/bpf: Fix strobemeta selftest regression After most recent nightly Clang update strobemeta selftests started failing with the following error (relevant portion of assembly included): 1624: (85) call bpf_probe_read_user_str#114 1625: (bf) r1 = r0 1626: (18) r2 = 0xfffffffe 1628: (5f) r1 &= r2 1629: (55) if r1 != 0x0 goto pc+7 1630: (07) r9 += 104 1631: (6b) (u16 )(r9 +0) = r0 1632: (67) r0 <<= 32 1633: (77) r0 >>= 32 1634: (79) r1 = (u64 )(r10 -456) 1635: (0f) r1 += r0 1636: (7b) (u64 )(r10 -456) = r1 1637: (79) r1 = (u64 )(r10 -368) 1638: (c5) if r1 s< 0x1 goto pc+778 1639: (bf) r6 = r8 1640: (0f) r6 += r7 1641: (b4) w1 = 0 1642: (6b) (u16 )(r6 +108) = r1 1643: (79) r3 = (u64 )(r10 -352) 1644: (79) r9 = (u64 )(r10 -456) 1645: (bf) r1 = r9 1646: (b4) w2 = 1 1647: (85) call bpf_probe_read_user_str#114 R1 unbounded memory access, make sure to bounds check any such access In the above code r0 and r1 are implicitly related. Clang knows that, but verifier isn't able to infer this relationship. Yonghong Song narrowed down this "regression" in code generation to a recent Clang optimization change ([0]), which for BPF target generates code pattern that BPF verifier can't handle and loses track of register boundaries. This patch works around the issue by adding an BPF assembly-based helper that helps to prove to the verifier that upper bound of the register is a given constant by controlling the exact share of generated BPF instruction sequence. This fixes the immediate issue for strobemeta selftest. [0] `acabad9ff6` Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20211029182907.166910-1-andrii@kernel.org	2021-11-01 17:08:21 +01:00
Linus Torvalds	9ac211426f	File locking changes for v5.16 -----BEGIN PGP SIGNATURE----- iQJHBAABCAAxFiEES8DXskRxsqGE6vXTAA5oQRlWghUFAmF77NQTHGpsYXl0b25A a2VybmVsLm9yZwAKCRAADmhBGVaCFfP9D/wN8rCAPA2J6SpBjdJXSpSQS4PoAOqC Z002bOc7sq/zg2cWk+pX1aOR/+wUpk+PvaQdyvfO+o4TVCpsTOklRh/yfYuOkJdP PoINUR7vb43/CGqd04YI45+pxOFMJk9JoLkNha0uIY4ukXdt9mA6u/+qBEDboyDQ Jbn1JGitRc9WYaE7BV26ba0l+Deb5h2/4c1DiDlsgmLkDPhpowkOznovUCkBnH7H bfwlssjZ5P0K5ttZDw6VlkC2xE+Yr56BsEco2bXO42LwUHOx6r6ZNp04rh9zh1Zp hFPybgU+ur17EOOmBbCq8aHZqRRcjQQDH/rZ1heHSOfTrEWWth1xNjmeewSgRZHL 0oMi3yIJPwvuDBQPEQg0VD87k5Z8xbRPql6eM6GeGxDZvzXWqqYKW2OYXtNxG91m bGvu2OOGkdF/4WGYBixdjUQb5KjcOqdIFkq3/oHfLQ+cS2uc6eOfnCdxa7cTnTdd BcFDgZmWQDLFs6/DIbwUI0KWMAiLFMZnZACE937JvlE74EGiHu47JMzwcU15J6zO VD0Oq0XsWQN+TgY2RnjuTFqm6DTpbrkgw88sNDr5g3jZbgJZiZ/r/3M55lcBVWvk PFT4fjKhD1mT6/SscAAmOxUKUeDbN7EktiRsZmH9C2sUCERufDb/cmY/RYZ00C4t 01ovPjs7VukS/w== =bcaq -----END PGP SIGNATURE----- Merge tag 'locks-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull file locking updates from Jeff Layton: "Most of this is just follow-on cleanup work of documentation and comments from the mandatory locking removal in v5.15. The only real functional change is that LOCK_MAND flock() support is also being removed, as it has basically been non-functional since the v2.5 days" * tag 'locks-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs: remove leftover comments from mandatory locking removal locks: remove changelog comments docs: fs: locks.rst: update comment about mandatory file locking Documentation: remove reference to now removed mandatory-locking doc locks: remove LOCK_MAND flock lock support	2021-11-01 09:06:53 -07:00
Pawan Gupta	8a03e56b25	bpf: Disallow unprivileged bpf by default Disabling unprivileged BPF would help prevent unprivileged users from creating certain conditions required for potential speculative execution side-channel attacks on unmitigated affected hardware. A deep dive on such attacks and current mitigations is available here [0]. Sync with what many distros are currently applying already, and disable unprivileged BPF by default. An admin can enable this at runtime, if necessary, as described in `08389d8882` ("bpf: Add kconfig knob for disabling unpriv bpf by default"). [0] "BPF and Spectre: Mitigating transient execution attacks", Daniel Borkmann, eBPF Summit '21 https://ebpf.io/summit-2021-slides/eBPF_Summit_2021-Keynote-Daniel_Borkmann-BPF_and_Spectre.pdf Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Mark Rutland <mark.rutland@arm.com> Link: https://lore.kernel.org/bpf/0ace9ce3f97656d5f62d11093ad7ee81190c3c25.1635535215.git.pawan.kumar.gupta@linux.intel.com	2021-11-01 17:06:47 +01:00
Linus Torvalds	ad98a92466	tpmdd updates for Linux v5.16 -----BEGIN PGP SIGNATURE----- iIgEABYIADAWIQRE6pSOnaBC00OEHEIaerohdGur0gUCYXdi4xIcamFya2tvQGtl cm5lbC5vcmcACgkQGnq6IXRrq9LSLgD/cnxDX1yhRPve9udhWlvP4n1e/GRSM5Jy 69mOWRP2MqUBANxayKOcFCWKOFWT4xgepA9/rrf9CjK/Du/Hk5OnpuYJ =NzZ4 -----END PGP SIGNATURE----- Merge tag 'tpmdd-next-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm updates from Jarkko Sakkinen: "Only bug fixes" * tag 'tpmdd-next-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm_tis_spi: Add missing SPI ID tpm: fix Atmel TPM crash caused by too frequent queries tpm: Check for integer overflow in tpm2_map_response_body() tpm: tis: Kconfig: Add helper dependency on COMPILE_TEST	2021-11-01 09:02:15 -07:00
Linus Torvalds	49f8275c7d	Memory folios Add memory folios, a new type to represent either order-0 pages or the head page of a compound page. This should be enough infrastructure to support filesystems converting from pages to folios. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmF9uI0ACgkQDpNsjXcp gj7MUAf/R7LCZ+xFiIedw7SAgb/DGK0C9uVjuBEIZgAw21ZUw/GuPI6cuKBMFGGf rRcdtlvMpwi7yZJcoNXxaqU/xPaaJMjf2XxscIvYJP1mjlZVuwmP9dOx0neNvWOc T+8lqR6c1TLl82lpqIjGFLwvj2eVowq2d3J5jsaIJFd4odmmYVInrhJXOzC/LQ54 Niloj5ksehf+KUIRLDz7ycppvIHhlVsoAl0eM2dWBAtL0mvT7Nyn/3y+vnMfV2v3 Flb4opwJUgTJleYc16oxTn9svT2yS8q2uuUemRDLW8ABghoAtH3fUUk43RN+5Krd LYCtbeawtkikPVXZMfWybsx5vn0c3Q== =7SBe -----END PGP SIGNATURE----- Merge tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecache Pull memory folios from Matthew Wilcox: "Add memory folios, a new type to represent either order-0 pages or the head page of a compound page. This should be enough infrastructure to support filesystems converting from pages to folios. The point of all this churn is to allow filesystems and the page cache to manage memory in larger chunks than PAGE_SIZE. The original plan was to use compound pages like THP does, but I ran into problems with some functions expecting only a head page while others expect the precise page containing a particular byte. The folio type allows a function to declare that it's expecting only a head page. Almost incidentally, this allows us to remove various calls to VM_BUG_ON(PageTail(page)) and compound_head(). This converts just parts of the core MM and the page cache. For 5.17, we intend to convert various filesystems (XFS and AFS are ready; other filesystems may make it) and also convert more of the MM and page cache to folios. For 5.18, multi-page folios should be ready. The multi-page folios offer some improvement to some workloads. The 80% win is real, but appears to be an artificial benchmark (postgres startup, which isn't a serious workload). Real workloads (eg building the kernel, running postgres in a steady state, etc) seem to benefit between 0-10%. I haven't heard of any performance losses as a result of this series. Nobody has done any serious performance tuning; I imagine that tweaking the readahead algorithm could provide some more interesting wins. There are also other places where we could choose to create large folios and currently do not, such as writes that are larger than PAGE_SIZE. I'd like to thank all my reviewers who've offered review/ack tags: Christoph Hellwig, David Howells, Jan Kara, Jeff Layton, Johannes Weiner, Kirill A. Shutemov, Michal Hocko, Mike Rapoport, Vlastimil Babka, William Kucharski, Yu Zhao and Zi Yan. I'd also like to thank those who gave feedback I incorporated but haven't offered up review tags for this part of the series: Nick Piggin, Mel Gorman, Ming Lei, Darrick Wong, Ted Ts'o, John Hubbard, Hugh Dickins, and probably a few others who I forget" * tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecache: (90 commits) mm/writeback: Add folio_write_one mm/filemap: Add FGP_STABLE mm/filemap: Add filemap_get_folio mm/filemap: Convert mapping_get_entry to return a folio mm/filemap: Add filemap_add_folio() mm/filemap: Add filemap_alloc_folio mm/page_alloc: Add folio allocation functions mm/lru: Add folio_add_lru() mm/lru: Convert __pagevec_lru_add_fn to take a folio mm: Add folio_evictable() mm/workingset: Convert workingset_refault() to take a folio mm/filemap: Add readahead_folio() mm/filemap: Add folio_mkwrite_check_truncate() mm/filemap: Add i_blocks_per_folio() mm/writeback: Add folio_redirty_for_writepage() mm/writeback: Add folio_account_redirty() mm/writeback: Add folio_clear_dirty_for_io() mm/writeback: Add folio_cancel_dirty() mm/writeback: Add folio_account_cleaned() mm/writeback: Add filemap_dirty_folio() ...	2021-11-01 08:47:59 -07:00
David S. Miller	d4a07dc5ac	Merge branch 'SMC-tracepoints' Tony Lu says: ==================== Tracepoints for SMC This patch set introduces tracepoints for SMC, including the tracepoints basic code. The tracepoitns would help us to track SMC's behaviors by automatic tools, or other BPF tools, and zero overhead if not enabled. Compared with kprobe and other dymatic tools, the tracepoints are considered as stable API, and less overhead for tracing with easy-to-use API. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:39:14 +00:00
Tony Lu	a3a0e81b6f	net/smc: Introduce tracepoint for smcr link down SMC-R link down event is important to help us find links' issues, we should track this event, especially in the single nic mode, which means upper layer connection would be shut down. Then find out the direct link-down reason in time, not only increased the counter, also the location of the code who triggered this event. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:39:14 +00:00
Tony Lu	aff3083f10	net/smc: Introduce tracepoints for tx and rx msg This introduce two tracepoints for smc tx and rx msg to help us diagnosis issues of data path. These two tracepoitns don't cover the path of CORK or MSG_MORE in tx, just the top half of data path. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:39:14 +00:00
Tony Lu	4826260868	net/smc: Introduce tracepoint for fallback This introduces tracepoint for smc fallback to TCP, so that we can track which connection and why it fallbacks, and map the clcsocks' pointer with /proc/net/tcp to find more details about TCP connections. Compared with kprobe or other dynamic tracing, tracepoints are stable and easy to use. Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Reviewed-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:39:14 +00:00
David S. Miller	6008889121	Merge branch 'amt-driver' Taehee Yoo says: ==================== amt: add initial driver for Automatic Multicast Tunneling (AMT) This is an implementation of AMT(Automatic Multicast Tunneling), RFC 7450. https://datatracker.ietf.org/doc/html/rfc7450 This implementation supports IGMPv2, IGMPv3, MLDv1, MLDv2, and IPv4 underlay. Summary of RFC 7450 The purpose of this protocol is to provide multicast tunneling. The main use-case of this protocol is to provide delivery multicast traffic from a multicast-enabled network to sites that lack multicast connectivity to the source network. There are two roles in AMT protocol, Gateway, and Relay. The main purpose of Gateway mode is to forward multicast listening information(IGMP, MLD) to the source. The main purpose of Relay mode is to forward multicast data to listeners. These multicast traffics(IGMP, MLD, multicast data packets) are tunneled. Listeners are located behind Gateway endpoint. But gateway itself can be a listener too. Senders are located behind Relay endpoint. ___________ _________ _______ ________ \| \| \| \| \| \| \| \| \| Listeners <-----> Gateway <-----> Relay <-----> Source \| \|___________\| \|_________\| \|_______\| \|________\| IGMP/MLD---------(encap)-----------> <-------------(decap)--------(encap)------Multicast Data Usage of AMT interface 1. Create gateway interface ip link add amtg type amt mode gateway local 10.0.0.1 discovery 10.0.0.2 \ dev gw1_rt gateway_port 2268 relay_port 2268 2. Create Relay interface ip link add amtr type amt mode relay local 10.0.0.2 dev relay_rt \ relay_port 2268 max_tunnels 4 v1 -> v2: - Eliminate sparse warnings. - Use bool type instead of __be16 for identifying v4/v6 protocol. v2 -> v3: - Fix compile warning due to unsed variable. - Add missing spinlock comment. - Update help message of amt in Kconfig. v3 -> v4: - Split patch. - Use CHECKSUM_NONE instead of CHECKSUM_UNNECESSARY. - Fix compile error. v4 -> v5: - Remove unnecessary rcu_read_lock(). - Remove unnecessary amt_change_mtu(). - Change netlink error message. - Add validation for IFLA_AMT_LOCAL_IP and IFLA_AMT_DISCOVERY_IP. - Add comments in amt.h. - Add missing dev_put() in error path of amt_newlink(). - Fix typo. - Add BUILD_BUG_ON() in amt_smb_cb(). - Use macro instead of magic values. - Use kzalloc() instead of kmalloc(). - Add selftest script. v5 -> v6: - Reset remote_ip in amt_dev_stop(). v6 -> v7: - Fix compile error. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:36:09 +00:00
Taehee Yoo	c08e8baea7	selftests: add amt interface selftest script This is selftest script for amt interface. This script includes basic forwarding scenarion and torture scenario. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:36:09 +00:00
Taehee Yoo	b75f7095d4	amt: add mld report message handler In the previous patch, igmp report handler was added. That handler can be used for mld too. So, it uses that common code to parse mld report message. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:36:09 +00:00
Taehee Yoo	bc54e49c14	amt: add multicast(IGMP) report message handler amt 'Relay' interface manages multicast groups(igmp/mld) and sources. In order to manage, it should have the function to parse igmp/mld report messages. So, this adds the logic for parsing igmp report messages and saves them on their own data structure. struct amt_group_node means one group(igmp/mld). struct amt_source_node means one source. The same source can't exist in the same group. The same group can exist in the same tunnel because it manages the host address too. The group information is used when forwarding multicast data. If there are no groups in the specific tunnel, Relay doesn't forward it. Although Relay manages sources, it doesn't support the source filtering feature. Because the reason to manage sources is just that in order to manage group more correctly. In the next patch, MLD part will be added. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:36:08 +00:00
Taehee Yoo	cbc21dc1cf	amt: add data plane of amt interface Before forwarding multicast traffic, the amt interface establishes between gateway and relay. In order to establish, amt defined some message type and those message flow looks like the below. Gateway Relay ------- ----- : Request : [1] \| N \| \|---------------------->\| \| Membership Query \| [2] \| N,MAC,gADDR,gPORT \| \|<======================\| [3] \| Membership Update \| \| ({G:INCLUDE({S})}) \| \|======================>\| \| \| ---------------------:-----------------------:--------------------- \| \| \| \| \| \| Multicast Data \| IP Packet(S,G) \| \| \| gADDR,gPORT \|<-----------------() \| \| IP Packet(S,G) \|<======================\| \| \| ()<-----------------\| \| \| \| \| \| \| ---------------------:-----------------------:--------------------- ~ ~ ~ Request ~ [4] \| N' \| \|---------------------->\| \| Membership Query \| [5] \| N',MAC',gADDR',gPORT' \| \|<======================\| [6] \| \| \| Teardown \| \| N,MAC,gADDR,gPORT \| \|---------------------->\| \| \| [7] \| Membership Update \| \| ({G:INCLUDE({S})}) \| \|======================>\| \| \| ---------------------:-----------------------:--------------------- \| \| \| \| \| \| Multicast Data \| IP Packet(S,G) \| \| \| gADDR',gPORT' \|<-----------------() \| \| IP Packet (S,G) \|<======================\| \| \| ()<-----------------\| \| \| \| \| \| \| ---------------------:-----------------------:--------------------- \| \| : : 1. Discovery - Sent by Gateway to Relay - To find Relay unique ip address 2. Advertisement - Sent by Relay to Gateway - Contains the unique IP address 3. Request - Sent by Gateway to Relay - Solicit to receive 'Query' message. 4. Query - Sent by Relay to Gateway - Contains General Query message. 5. Update - Sent by Gateway to Relay - Contains report message. 6. Multicast Data - Sent by Relay to Gateway - encapsulated multicast traffic. 7. Teardown - Not supported at this time. Except for the Teardown message, it supports all messages. In the next patch, IGMP/MLD logic will be added. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:36:08 +00:00
Taehee Yoo	b9022b53ad	amt: add control plane of amt interface It adds definitions and control plane code for AMT. this is very similar to udp tunneling interfaces such as gtp, vxlan, etc. In the next patch, data plane code will be added. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:36:08 +00:00
David S. Miller	741948ff60	Merge branch 'netdevsim-device-and-bus' Jakub Kicinski says: ==================== netdevsim: improve separation between device and bus VF config falls strangely in between device and bus responsibilities today. Because of this bus.c sticks fingers directly into struct nsim_dev and we look at nsim_bus_dev in many more places than necessary. Make bus.c contain pure interface code, and move the particulars of the logic (which touch on eswitch, devlink reloads etc) to dev.c. Rename the functions at the boundary of the interface to make the separation clearer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:29:41 +00:00
Jakub Kicinski	a66f64b808	netdevsim: rename 'driver' entry points Rename functions serving as driver entry points from nsim_dev_... to nsim_drv_... this makes the API boundary between bus and dev clearer. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:29:41 +00:00
Jakub Kicinski	a3353ec325	netdevsim: move max vf config to dev max_vfs is a strange little beast because the file hangs off of nsim's debugfs, but it configures a field in the bus device. Move it to dev.c, let's look at it as if the device driver was imposing VF limit based on FW info (like pci_sriov_set_totalvfs()). Again, when moving refactor the function not to hold the vfs lock pointlessly while parsing the input. Wrap the access from the read side in READ_ONCE() to appease concurrency checkers. Do not check if return value from snprintf() is negative... Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:29:41 +00:00
Jakub Kicinski	1c401078bc	netdevsim: move details of vf config to dev Since "eswitch" configuration was added bus.c contains a lot of device details which really belong to dev.c. Restructure the code while moving it. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:29:41 +00:00
Jakub Kicinski	5e388f3dc3	netdevsim: move vfconfig to nsim_dev When netdevsim got split into the faux bus vfconfig ended up in the bus device (think pci_dev) which is strange because it contains very networky not to say netdevy information. Move it to nsim_dev, which is the driver "priv" structure for the device. To make sure we don't race with probe/remove take the device lock (much like PCI). While at it remove the NULL-checking of vfconfigs. It appears to be pointless. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:29:41 +00:00
Jakub Kicinski	26c37d89f6	netdevsim: take rtnl_lock when assigning num_vfs Legacy VF NDOs look at num_vfs and then based on that index into vfconfig. If we don't rtnl_lock() num_vfs may get set to 0 and vfconfig freed/replaced while the NDO is running. We don't need to protect replacing vfconfig since it's only done when num_vfs is 0. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:29:41 +00:00
David S. Miller	1adc58ea23	Merge branch 'devlink-locking' Jakub Kicinski says: ==================== improve ethtool/rtnl vs devlink locking During ethtool netlink development we decided to move some of the commmands to devlink. Since we don't want drivers to implement both devlink and ethtool version of the commands ethtool ioctl falls back to calling devlink. Unfortunately devlink locks must be taken before rtnl_lock. This results in a questionable dev_hold() / rtnl_unlock() / devlink / rtnl_lock() / dev_put() pattern. This method "works" but it working depends on drivers in question not doing much in ethtool_ops->begin / complete, and on the netdev not having needs_free_netdev set. Since commit `437ebfd90a` ("devlink: Count struct devlink consumers") we can hold a reference on a devlink instance and prevent it from going away (sort of like netdev with dev_hold()). We can use this to create a more natural reference nesting where we get a ref on the devlink instance and make the devlink call entirely outside of the rtnl_lock section. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:26:07 +00:00
Jakub Kicinski	1af0a0948e	ethtool: don't drop the rtnl_lock half way thru the ioctl devlink compat code needs to drop rtnl_lock to take devlink->lock to ensure correct lock ordering. This is problematic because we're not strictly guaranteed that the netdev will not disappear after we re-lock. It may open a possibility of nested ->begin / ->complete calls. Instead of calling into devlink under rtnl_lock take a ref on the devlink instance and make the call after we've dropped rtnl_lock. We (continue to) assume that netdevs have an implicit reference on the devlink returned from ndo_get_devlink_port Note that ndo_get_devlink_port will now get called under rtnl_lock. That should be fine since none of the drivers seem to be taking serious locks inside ndo_get_devlink_port. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:26:07 +00:00
Jakub Kicinski	46db1b77cd	devlink: expose get/put functions Allow those who hold implicit reference on a devlink instance to try to take a full ref on it. This will be used from netdev code which has an implicit ref because of driver call ordering. Note that after recent changes devlink_unregister() may happen before netdev unregister, but devlink_free() should still happen after, so we are safe to try, but we can't just refcount_inc() and assume it's not zero. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:26:07 +00:00
Jakub Kicinski	095cfcfe13	ethtool: handle info/flash data copying outside rtnl_lock We need to increase the lifetime of the data for .get_info and .flash_update beyond their handlers inside rtnl_lock. Allocate a union on the heap and use it instead. Note that we now copy the ethcmd before we lookup dev, hopefully there is no crazy user space depending on error codes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:26:07 +00:00
Jakub Kicinski	f49deaa64a	ethtool: push the rtnl_lock into dev_ethtool() Don't take the lock in net/core/dev_ioctl.c, we'll have things to do outside rtnl_lock soon. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:26:07 +00:00
David S. Miller	c6e03dbe0c	Merge branch 'mana-misc' Dexuan Cui says: ==================== net: mana: some misc patches Patch 1 is a small fix. Patch 2 reports OS info to the PF driver. Before the patch, the req fields were all zeros. Patch 3 fixes and cleans up the error handling of HWC creation failure. Patch 4 adds the callbacks for hibernation/kexec. It's based on patch 3. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:21:49 +00:00
Dexuan Cui	635096a86e	net: mana: Support hibernation and kexec Implement the suspend/resume/shutdown callbacks for hibernation/kexec. Add mana_gd_setup() and mana_gd_cleanup() for some common code, and use them in the mand_gd_* callbacks. Reuse mana_probe/remove() for the hibernation path. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:21:49 +00:00
Dexuan Cui	62ea8b77ed	net: mana: Improve the HWC error handling Currently when the HWC creation fails, the error handling is flawed, e.g. if mana_hwc_create_channel() -> mana_hwc_establish_channel() fails, the resources acquired in mana_hwc_init_queues() is not released. Enhance mana_hwc_destroy_channel() to do the proper cleanup work and call it accordingly. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:21:49 +00:00
Dexuan Cui	3c37f35735	net: mana: Report OS info to the PF driver The PF driver might use the OS info for statistical purposes. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:21:49 +00:00
Dexuan Cui	6c7ea69653	net: mana: Fix the netdev_err()'s vPort argument in mana_init_port() Use the correct port index rather than 0. Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:21:49 +00:00
David S. Miller	986d2e3da7	Merge branch 'mptcp-selftests' Mat Martineau says: ==================== mptcp: Some selftest improvements Here are a couple of selftest changes for MPTCP. Patch 1 fixes a mistake where the wrong protocol (TCP vs MPTCP) could be requested on the listening socket in some link failure tests. Patch 2 refactors the simulataneous flow tests to improve timing accuracy and give more consistent results. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:19:49 +00:00
Paolo Abeni	b6ab64b074	selftests: mptcp: more stable simult_flows tests Currently the simult_flows.sh self-tests are not very stable, especially when running on slow VMs. The tests measure runtime for transfers on multiple subflows and check that the time is near the theoretical maximum. The current test infra introduces a bit of jitter in test runtime, due to multiple explicit delays. Additionally the runtime is measured by the shell script wrapper. On a slow VM, the script overhead is measurable and subject to relevant jitter. One solution to make the test more stable would be adding more slack to the expected time; that could possibly hide real regressions. Instead move the measurement inside the command doing the transfer, and drop most unneeded sleeps. Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:19:49 +00:00
Geliang Tang	7c909a9804	selftests: mptcp: fix proto type in link_failure tests In listener_ns, we should pass srv_proto argument to mptcp_connect command, not cl_proto. Fixes: `7d1e6f1639` ("selftests: mptcp: add testcase for active-back") Signed-off-by: Geliang Tang <geliang.tang@suse.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:19:49 +00:00
Sukadev Bhattiprolu	6b278c0cb3	ibmvnic: delay complete() If we get CRQ_INIT, we set errno to -EIO and first call complete() to notify the waiter. Then we try to schedule a FAILOVER reset. If this occurs while adapter is in PROBING state, ibmvnic_reset() changes the error code to EAGAIN and returns without scheduling the FAILOVER. The purpose of setting error code to EAGAIN is to ask the waiter to retry. But due to the earlier complete() call, the waiter may already have seen the -EIO response and decided not to retry. This can cause intermittent failures when bringing up ibmvnic adapters during boot, specially in in kexec/kdump kernels. Defer the complete() call until after scheduling the reset. Also streamline the error code to EAGAIN. Don't see why we need EIO sometimes. All 3 callers of ibmvnic_reset_init() can handle EAGAIN. Fixes: `17c8705838` ("ibmvnic: Return error code if init interrupted by transport event") Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:14:52 +00:00
Sukadev Bhattiprolu	6e20d00158	ibmvnic: Process crqs after enabling interrupts Soon after registering a CRQ it is possible that we get a fail over or maybe a CRQ_INIT from the VIOS while interrupts were disabled. Look for any such CRQs after enabling interrupts. Otherwise we can intermittently fail to bring up ibmvnic adapters during boot, specially in kexec/kdump kernels. Fixes: `032c5e8284` ("Driver for IBM System i/p VNIC protocol") Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:14:52 +00:00
Sukadev Bhattiprolu	8878e46fcf	ibmvnic: don't stop queue in xmit If adapter's resetting bit is on, discard the packet but don't stop the transmit queue - instead leave that to the reset code. With this change, it is possible that we may get several calls to ibmvnic_xmit() that simply discard packets and return. But if we stop the queue here, we might end up doing so just after __ibmvnic_open() started the queues (during a hard/soft reset) and before the ->resetting bit was cleared. If that happens, there will be no one to restart queue and transmissions will be blocked indefinitely. This can cause a TIMEOUT reset and with auto priority failover enabled, an unnecessary FAILOVER reset to less favored backing device and then a FAILOVER back to the most favored backing device. If we hit the window repeatedly, we can get stuck in a loop of TIMEOUT, FAILOVER, FAILOVER resets leaving the adapter unusable for extended periods of time. Fixes: `7f5b030830` ("ibmvnic: Free skb's in cases of failure in transmit") Reported-by: Abdul Haleem <abdhalee@in.ibm.com> Reported-by: Vaishnavi Bhat <vaish123@in.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.ibm.com> Reviewed-by: Dany Madden <drt@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:14:52 +00:00
David S. Miller	7be49d242b	Merge branch 'SO_MARK-routing' Jakub Kicinski says: ==================== udp6: allow SO_MARK ctrl msg to affect routing Looks like SO_MARK from cmsg does not affect routing policy. This seems accidental. I opted for net because of the discrepancy between IPv4 and IPv6, but it never worked and doesn't cause crashes.. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:12:48 +00:00
Jakub Kicinski	b0ced8f290	selftests: udp: test for passing SO_MARK as cmsg Before fix: \| Case IPv6 rejection returned 0, expected 1 \|FAIL - 1/4 cases failed With the fix: \| OK Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:12:48 +00:00
Jakub Kicinski	42dcfd850e	udp6: allow SO_MARK ctrl msg to affect routing Commit `c6af0c227a` ("ip: support SO_MARK cmsg") added propagation of SO_MARK from cmsg to skb->mark. For IPv4 and raw sockets the mark also affects route lookup, but in case of IPv6 the flow info is initialized before cmsg is parsed. Fixes: `c6af0c227a` ("ip: support SO_MARK cmsg") Reported-and-tested-by: Xintong Hu <huxintong@fb.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:12:48 +00:00
Yu Xiao	f7536ffb09	nfp: flower: Allow ipv6gretap interface for offloading The tunnel_type check only allows for "netif_is_gretap", but for OVS the port is actually "netif_is_ip6gretap" when setting up GRE for ipv6, which means offloading request was rejected before. Therefore, adding "netif_is_ip6gretap" allow ipv6gretap interface for offloading. Signed-off-by: Yu Xiao <yu.xiao@corigine.com> Signed-off-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:09:55 +00:00
Marek Behún	c07c6e8eb4	net: dsa: populate supported_interfaces member Add a new DSA switch operation, phylink_get_interfaces, which should fill in which PHY_INTERFACE_MODE_* are supported by given port. Use this before phylink_create() to fill phylinks supported_interfaces member, allowing phylink to determine which PHY_INTERFACE_MODEs are supported. Signed-off-by: Marek Behún <kabel@kernel.org> [tweaked patch and description to add more complete support -- rmk] Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:06:32 +00:00
David S. Miller	ebed1cf5b8	Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2021-10-29 This series contains updates to ice and iavf drivers and virtchnl header file. Brett removes vlan_promisc argument from a function call for ice driver. In the virtchnl header file he removes an unused, reserved define and converts raw value defines to instead use the BIT macro. Marcin adds syncing of MAC addresses when creating switchdev VFs to remove error messages on link up and stops showing buffer information for port representors to remove duplicated entries being displayed for ice driver. Karen introduces a helper to go from pci_dev to iavf_adapter in the iavf driver. Przemyslaw fixes an issue where iavf was attempting to free IRQs before calling disable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:05:20 +00:00
David S. Miller	06f1ecd433	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2021-10-30 Just two minor changes this time: 1) Remove some superfluous header files from xfrm4_tunnel.c From Mianhan Liu. 2) Simplify some error checks in xfrm_input(). From luo penghao. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 13:01:44 +00:00
David S. Miller	894d084434	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Use array_size() in ebtables, from Gustavo A. R. Silva. 2) Attach IPS_ASSURED to internal UDP stream state, reported by Maciej Zenczykowski. 3) Add NFT_META_IFTYPE to match on the interface type either from ingress or egress. 4) Generalize pktinfo->tprot_set to flags field. 5) Allow to match on inner headers / payload data. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 12:59:58 +00:00
David S. Miller	2aec919f8d	mlx5-updates-2021-10-29 1) Minor trivial refactoring and improvements 2) Check for unsupported parameters fields in SW steering 3) Support TC offload for OVS internal port, from Ariel, see below. Ariel Levkovich says: ===================== Support HW offload of TC rules involving OVS internal port device type as the filter device or the destination device. The support is for flows which explicitly use the internal port as source or destination device as well as indirect offload for flows performing tunnel set or unset via a tunnel device and the internal port is the tunnel overlay device. Since flows with internal port as source port are added as egress rules while redirecting to internal port is done as an ingress redirect, the series introduces the necessary changes in mlx5_core driver to support the new types of flows and actions. ===================== -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmF8X0sACgkQSD+KveBX +j5PsQf/RfsE+spW0yJriJQ6Et+o+/CYR+AQYat5MaXjRw8uMz6uBcfXWCIBbYjw OwNP4ZagWXIHMkelj2Ap0Qlu4yqkUBy1A0le7HcAzOeje1vc9BObS15w9pJvQ9cp br3ZK5VZnQccSfF/LQpSjlGhD9083kETA2uXlCz7vitn8MVaya6ue6GU+wFC4Wnz LjOJ4PMXCEfhpA+efD0nD4EK6FJjqvJoVQkxWNmgOW7yg5PcyWXZD/tsDZUI8DGl 0GlnM6W2H8bC0YhW01cnOsWPU+vtMLCsaF0YKqsLhnWUsaYSD5lXPIHqH6VpucZ7 LSv/c2U9pBnWkf7UoFyuEeQxAhz1rg== =Uxa9 -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2021-10-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-10-29 1) Minor trivial refactoring and improvements 2) Check for unsupported parameters fields in SW steering 3) Support TC offload for OVS internal port, from Ariel, see below. Ariel Levkovich says: ===================== Support HW offload of TC rules involving OVS internal port device type as the filter device or the destination device. The support is for flows which explicitly use the internal port as source or destination device as well as indirect offload for flows performing tunnel set or unset via a tunnel device and the internal port is the tunnel overlay device. Since flows with internal port as source port are added as egress rules while redirecting to internal port is done as an ingress redirect, the series introduces the necessary changes in mlx5_core driver to support the new types of flows and actions. ===================== ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2021-11-01 12:53:24 +00:00
Helge Deller	6e866a4628	parisc: Fix set_fixmap() on PA1.x CPUs Fix a kernel crash which happens on PA1.x CPUs while initializing the FTRACE/KPROBE breakpoints. The PTE table entries for the fixmap area were not created correctly. Signed-off-by: Helge Deller <deller@gmx.de> Fixes: `ccfbc68d41` ("parisc: add set_fixmap()/clear_fixmap()") Cc: stable@vger.kernel.org # v5.2+	2021-11-01 12:00:22 +01:00
Yihao Han	1ae8e91e81	parisc: Use swap() to swap values in setup_bootmem() Signed-off-by: Yihao Han <hanyihao@vivo.com> Signed-off-by: Helge Deller <deller@gmx.de>	2021-11-01 11:59:49 +01:00

1 2 3 4 5 ...

1049500 Коммитов Все ветки Поиск

1049500 Коммитов

Все ветки