Граф коммитов

982360 Коммитов

Автор SHA1 Сообщение Дата
Willem de Bruijn 9bd6b629c3 esp: avoid unneeded kmap_atomic call
esp(6)_output_head uses skb_page_frag_refill to allocate a buffer for
the esp trailer.

It accesses the page with kmap_atomic to handle highmem. But
skb_page_frag_refill can return compound pages, of which
kmap_atomic only maps the first underlying page.

skb_page_frag_refill does not return highmem, because flag
__GFP_HIGHMEM is not set. ESP uses it in the same manner as TCP.
That also does not call kmap_atomic, but directly uses page_address,
in skb_copy_to_page_nocache. Do the same for ESP.

This issue has become easier to trigger with recent kmap local
debugging feature CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP.

Fixes: cac2661c53 ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f6a ("esp6: Avoid skb_cow_data whenever possible")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11 18:20:09 -08:00
Willem de Bruijn 97550f6fa5 net: compound page support in skb_seq_read
skb_seq_read iterates over an skb, returning pointer and length of
the next data range with each call.

It relies on kmap_atomic to access highmem pages when needed.

An skb frag may be backed by a compound page, but kmap_atomic maps
only a single page. There are not enough kmap slots to always map all
pages concurrently.

Instead, if kmap_atomic is needed, iterate over each page.

As this increases the number of calls, avoid this unless needed.
The necessary condition is captured in skb_frag_must_loop.

I tried to make the change as obvious as possible. It should be easy
to verify that nothing changes if skb_frag_must_loop returns false.

Tested:
  On an x86 platform with
    CONFIG_HIGHMEM=y
    CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP=y
    CONFIG_NETFILTER_XT_MATCH_STRING=y

  Run
    ip link set dev lo mtu 1500
    iptables -A OUTPUT -m string --string 'badstring' -algo bm -j ACCEPT
    dd if=/dev/urandom of=in bs=1M count=20
    nc -l -p 8000 > /dev/null &
    nc -w 1 -q 0 localhost 8000 < in

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11 18:20:09 -08:00
Willem de Bruijn 29766bcffa net: support kmap_local forced debugging in skb_frag_foreach
Skb frags may be backed by highmem and/or compound pages. Highmem
pages need kmap_atomic mappings to access. But kmap_atomic maps a
single page, not the entire compound page.

skb_foreach_page iterates over an skb frag, in one step in the common
case, page by page only if kmap_atomic must be called for each page.
The decision logic is captured in skb_frag_must_loop.

CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP extends kmap from highmem to all
pages, to increase code coverage.

Extend skb_frag_must_loop to this new condition.

Link: https://lore.kernel.org/linux-mm/20210106180132.41dc249d@gandalf.local.home/
Fixes: 0e91a0c698 ("mm/highmem: Provide CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP")
Reported-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11 18:20:09 -08:00
Andrey Zhizhikin e56b3d94d9 rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request
MSFT ActiveSync implementation requires that the size of the response for
incoming query is to be provided in the request input length. Failure to
set the input size proper results in failed request transfer, where the
ActiveSync counterpart reports the NDIS_STATUS_INVALID_LENGTH (0xC0010014L)
error.

Set the input size for OID_GEN_PHYSICAL_MEDIUM query to the expected size
of the response in order for the ActiveSync to properly respond to the
request.

Fixes: 039ee17d1b ("rndis_host: Add RNDIS physical medium checking into generic_rndis_bind()")
Signed-off-by: Andrey Zhizhikin <andrey.zhizhikin@leica-geosystems.com>
Link: https://lore.kernel.org/r/20210108095839.3335-1-andrey.zhizhikin@leica-geosystems.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11 17:54:40 -08:00
Stefan Chulski 6f83802a1a net: mvpp2: Remove Pause and Asym_Pause support
Packet Processor hardware not connected to MAC flow control unit and
cannot support TX flow control.
This patch disable flow control support.

Fixes: 3f518509de ("ethernet: Add new driver for Marvell Armada 375 network unit")
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/r/1610306582-16641-1-git-send-email-stefanc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11 16:57:54 -08:00
Seb Laveze 938288349c dt-bindings: net: dwmac: fix queue priority documentation
The priority field is not the queue priority (queue priority is fixed)
but a bitmask of priorities assigned to this queue.

In receive, priorities relate to tagged frames priorities.

In transmit, priorities relate to PFC frames.

Signed-off-by: Seb Laveze <sebastien.laveze@nxp.com>
Link: https://lore.kernel.org/r/20210111081406.1348622-1-sebastien.laveze@oss.nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-11 15:18:47 -08:00
Geert Uytterhoeven f97844f9c5 dt-bindings: net: renesas,etheravb: RZ/G2H needs tx-internal-delay-ps
The merge resolution of the interaction of commits 307eea32b2
("dt-bindings: net: renesas,ravb: Add support for r8a774e1 SoC") and
d7adf63311 ("dt-bindings: net: renesas,etheravb: Convert to
json-schema") missed that "tx-internal-delay-ps" should be a required
property on RZ/G2H.

Fixes: 8b0308fe31 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20210105151516.1540653-1-geert+renesas@glider.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 19:12:21 -08:00
Jakub Kicinski 26c49f0d10 Merge branch 'mlxsw-core-thermal-control-fixes'
Ido Schimmel says:

====================
mlxsw: core: Thermal control fixes

This series includes two fixes for thermal control in mlxsw.

Patch #1 validates that the alarm temperature threshold read from a
transceiver is above the warning temperature threshold. If not, the
current thresholds are maintained. It was observed that some transceiver
might be unreliable and sometimes report a too low alarm temperature
threshold which would result in thermal shutdown of the system.

Patch #2 increases the temperature threshold above which thermal
shutdown is triggered for the ASIC thermal zone. It is currently too low
and might result in thermal shutdown under perfectly fine operational
conditions.
====================

Link: https://lore.kernel.org/r/20210108145210.1229820-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 16:25:14 -08:00
Vadim Pasternak b06ca3d5a4 mlxsw: core: Increase critical threshold for ASIC thermal zone
Increase critical threshold for ASIC thermal zone from 110C to 140C
according to the system hardware requirements. All the supported ASICs
(Spectrum-1, Spectrum-2, Spectrum-3) could be still operational with ASIC
temperature below 140C. With the old critical threshold value system
can perform unjustified shutdown.

All the systems equipped with the above ASICs implement thermal
protection mechanism at firmware level and firmware could decide to
perform system thermal shutdown in case the temperature is below 140C.
So with the new threshold system will not meltdown, while thermal
operating range will be aligned with hardware abilities.

Fixes: 41e760841d ("mlxsw: core: Replace thermal temperature trips with defines")
Fixes: a50c1e3565 ("mlxsw: core: Implement thermal zone")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 16:25:10 -08:00
Vadim Pasternak 57726ebe27 mlxsw: core: Add validation of transceiver temperature thresholds
Validate thresholds to avoid a single failure due to some transceiver
unreliability. Ignore the last readouts in case warning temperature is
above alarm temperature, since it can cause unexpected thermal
shutdown. Stay with the previous values and refresh threshold within
the next iteration.

This is a rare scenario, but it was observed at a customer site.

Fixes: 6a79507cfe ("mlxsw: core: Extend thermal module with per QSFP module thermal zones")
Signed-off-by: Vadim Pasternak <vadimp@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 16:25:10 -08:00
Hoang Le b774134464 tipc: fix NULL deref in tipc_link_xmit()
The buffer list can have zero skb as following path:
tipc_named_node_up()->tipc_node_xmit()->tipc_link_xmit(), so
we need to check the list before casting an &sk_buff.

Fault report:
 [] tipc: Bulk publication failure
 [] general protection fault, probably for non-canonical [#1] PREEMPT [...]
 [] KASAN: null-ptr-deref in range [0x00000000000000c8-0x00000000000000cf]
 [] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 5.10.0-rc4+ #2
 [] Hardware name: Bochs ..., BIOS Bochs 01/01/2011
 [] RIP: 0010:tipc_link_xmit+0xc1/0x2180
 [] Code: 24 b8 00 00 00 00 4d 39 ec 4c 0f 44 e8 e8 d7 0a 10 f9 48 [...]
 [] RSP: 0018:ffffc90000006ea0 EFLAGS: 00010202
 [] RAX: dffffc0000000000 RBX: ffff8880224da000 RCX: 1ffff11003d3cc0d
 [] RDX: 0000000000000019 RSI: ffffffff886007b9 RDI: 00000000000000c8
 [] RBP: ffffc90000007018 R08: 0000000000000001 R09: fffff52000000ded
 [] R10: 0000000000000003 R11: fffff52000000dec R12: ffffc90000007148
 [] R13: 0000000000000000 R14: 0000000000000000 R15: ffffc90000007018
 [] FS:  0000000000000000(0000) GS:ffff888037400000(0000) knlGS:000[...]
 [] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 [] CR2: 00007fffd2db5000 CR3: 000000002b08f000 CR4: 00000000000006f0

Fixes: af9b028e27 ("tipc: make media xmit call outside node spinlock context")
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
Link: https://lore.kernel.org/r/20210108071337.3598-1-hoang.h.le@dektech.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 14:44:47 -08:00
Vadim Fedorenko 3502bd9b57 selftests/tls: fix selftests after adding ChaCha20-Poly1305
TLS selftests where broken because of wrong variable types used.
Fix it by changing u16 -> uint16_t

Fixes: 4f336e88a8 ("selftests/tls: add CHACHA20-POLY1305 to tls selftests")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/1610141865-7142-1-git-send-email-vfedorenko@novek.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 14:26:30 -08:00
Aya Levin b210de4f8c net: ipv6: Validate GSO SKB before finish IPv6 processing
There are cases where GSO segment's length exceeds the egress MTU:
 - Forwarding of a TCP GRO skb, when DF flag is not set.
 - Forwarding of an skb that arrived on a virtualisation interface
   (virtio-net/vhost/tap) with TSO/GSO size set by other network
   stack.
 - Local GSO skb transmitted on an NETIF_F_TSO tunnel stacked over an
   interface with a smaller MTU.
 - Arriving GRO skb (or GSO skb in a virtualised environment) that is
   bridged to a NETIF_F_TSO tunnel stacked over an interface with an
   insufficient MTU.

If so:
 - Consume the SKB and its segments.
 - Issue an ICMP packet with 'Packet Too Big' message containing the
   MTU, allowing the source host to reduce its Path MTU appropriately.

Note: These cases are handled in the same manner in IPv4 output finish.
This patch aligns the behavior of IPv6 and the one of IPv4.

Fixes: 9e50849054 ("netfilter: ipv6: move POSTROUTING invocation before fragmentation")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/r/1610027418-30438-1-git-send-email-ayal@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 14:06:32 -08:00
Manish Chopra a2bc221b97 netxen_nic: fix MSI/MSI-x interrupts
For all PCI functions on the netxen_nic adapter, interrupt
mode (INTx or MSI) configuration is dependent on what has
been configured by the PCI function zero in the shared
interrupt register, as these adapters do not support mixed
mode interrupts among the functions of a given adapter.

Logic for setting MSI/MSI-x interrupt mode in the shared interrupt
register based on PCI function id zero check is not appropriate for
all family of netxen adapters, as for some of the netxen family
adapters PCI function zero is not really meant to be probed/loaded
in the host but rather just act as a management function on the device,
which caused all the other PCI functions on the adapter to always use
legacy interrupt (INTx) mode instead of choosing MSI/MSI-x interrupt mode.

This patch replaces that check with port number so that for all
type of adapters driver attempts for MSI/MSI-x interrupt modes.

Fixes: b37eb210c0 ("netxen_nic: Avoid mixed mode interrupts")
Signed-off-by: Manish Chopra <manishc@marvell.com>
Signed-off-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20210107101520.6735-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-09 13:47:57 -08:00
Jakub Kicinski c49243e889 Merge branch 'net-fix-issues-around-register_netdevice-failures'
Jakub Kicinski says:

====================
net: fix issues around register_netdevice() failures

This series attempts to clean up the life cycle of struct
net_device. Dave has added dev->needs_free_netdev in the
past to fix double frees, we can lean on that mechanism
a little more to fix remaining issues with register_netdevice().

This is the next chapter of the saga which already includes:
commit 0e0eee2465 ("net: correct error path in rtnl_newlink()")
commit e51fb15231 ("rtnetlink: fix a memory leak when ->newlink fails")
commit cf124db566 ("net: Fix inconsistent teardown and release of private netdev state.")
commit 93ee31f14f ("[NET]: Fix free_netdev on register_netdev failure.")
commit 814152a89e ("net: fix memleak in register_netdevice()")
commit 10cc514f45 ("net: Fix null de-reference of device refcount")

The immediate problem which gets fixed here is that calling
free_netdev() right after unregister_netdevice() is illegal
because we need to release rtnl_lock first, to let the
unregistration finish. Note that unregister_netdevice() is
just a wrapper of unregister_netdevice_queue(), it only
does half of the job.

Where this limitation becomes most problematic is in failure
modes of register_netdevice(). There is a notifier call right
at the end of it, which lets other subsystems veto the entire
thing. At which point we should really go through a full
unregister_netdevice(), but we can't because callers may
go straight to free_netdev() after the failure, and that's
no bueno (see the previous paragraph).

This set makes free_netdev() more lenient, when device
is still being unregistered free_netdev() will simply set
dev->needs_free_netdev and let the unregister process do
the freeing.

With the free_netdev() problem out of the way failures in
register_netdevice() can make use of net_todo, again.
Users are still expected to call free_netdev() right after
failure but that will only set dev->needs_free_netdev.

To prevent the pathological case of:

 dev->needs_free_netdev = true;
 if (register_netdevice(dev)) {
   rtnl_unlock();
   free_netdev(dev);
 }

make register_netdevice()'s failure clear dev->needs_free_netdev.

Problems described above are only present with register_netdevice() /
unregister_netdevice(). We have two parallel APIs for registration
of devices:
 - those called outside rtnl_lock (register_netdev(), and
   unregister_netdev());
 - and those to be used under rtnl_lock - register_netdevice()
   and unregister_netdevice().
The former is trivial and has no problems. The alternative
approach to fix the latter would be to also separate the
freeing functions - i.e. add free_netdevice(). This has been
implemented (incl. converting all relevant calls in the tree)
but it feels a little unnecessary to put the burden of choosing
the right free_netdev{,ice}() call on the programmer when we
can "just do the right thing" by default.
====================

Link: https://lore.kernel.org/r/20210106184007.1821480-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-08 19:27:44 -08:00
Jakub Kicinski 766b0515d5 net: make sure devices go through netdev_wait_all_refs
If register_netdevice() fails at the very last stage - the
notifier call - some subsystems may have already seen it and
grabbed a reference. struct net_device can't be freed right
away without calling netdev_wait_all_refs().

Now that we have a clean interface in form of dev->needs_free_netdev
and lenient free_netdev() we can undo what commit 93ee31f14f ("[NET]:
Fix free_netdev on register_netdev failure.") has done and complete
the unregistration path by bringing the net_set_todo() call back.

After registration fails user is still expected to explicitly
free the net_device, so make sure ->needs_free_netdev is cleared,
otherwise rolling back the registration will cause the old double
free for callers who release rtnl_lock before the free.

This also solves the problem of priv_destructor not being called
on notifier error.

net_set_todo() will be moved back into unregister_netdevice_queue()
in a follow up.

Reported-by: Hulk Robot <hulkci@huawei.com>
Reported-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-08 19:27:41 -08:00
Jakub Kicinski c269a24ce0 net: make free_netdev() more lenient with unregistering devices
There are two flavors of handling netdev registration:
 - ones called without holding rtnl_lock: register_netdev() and
   unregister_netdev(); and
 - those called with rtnl_lock held: register_netdevice() and
   unregister_netdevice().

While the semantics of the former are pretty clear, the same can't
be said about the latter. The netdev_todo mechanism is utilized to
perform some of the device unregistering tasks and it hooks into
rtnl_unlock() so the locked variants can't actually finish the work.
In general free_netdev() does not mix well with locked calls. Most
drivers operating under rtnl_lock set dev->needs_free_netdev to true
and expect core to make the free_netdev() call some time later.

The part where this becomes most problematic is error paths. There is
no way to unwind the state cleanly after a call to register_netdevice(),
since unreg can't be performed fully without dropping locks.

Make free_netdev() more lenient, and defer the freeing if device
is being unregistered. This allows error paths to simply call
free_netdev() both after register_netdevice() failed, and after
a call to unregister_netdevice() but before dropping rtnl_lock.

Simplify the error paths which are currently doing gymnastics
around free_netdev() handling.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-08 19:27:41 -08:00
Jakub Kicinski 2b446e650b docs: net: explain struct net_device lifetime
Explain the two basic flows of struct net_device's operation.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-08 19:27:41 -08:00
Tom Parkin c1787ffd0d ppp: fix refcount underflow on channel unbridge
When setting up a channel bridge, ppp_bridge_channels sets the
pch->bridge field before taking the associated reference on the bridge
file instance.

This opens up a refcount underflow bug if ppp_bridge_channels called
via. iotcl runs concurrently with ppp_unbridge_channels executing via.
file release.

The bug is triggered by ppp_bridge_channels taking the error path
through the 'err_unset' label.  In this scenario, pch->bridge is set,
but the reference on the bridged channel will not be taken because
the function errors out.  If ppp_unbridge_channels observes pch->bridge
before it is unset by the error path, it will erroneously drop the
reference on the bridged channel and cause a refcount underflow.

To avoid this, ensure that ppp_bridge_channels holds a reference on
each channel in advance of setting the bridge pointers.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Fixes: 4cf476ced4 ("ppp: add PPPIOCBRIDGECHAN and PPPIOCUNBRIDGECHAN ioctls")
Acked-by: Guillaume Nault <gnault@redhat.com>
Link: https://lore.kernel.org/r/20210107181315.3128-1-tparkin@katalix.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-08 19:21:06 -08:00
Baptiste Lepers fd2ddef043 udp: Prevent reuseport_select_sock from reading uninitialized socks
reuse->socks[] is modified concurrently by reuseport_add_sock. To
prevent reading values that have not been fully initialized, only read
the array up until the last known safe index instead of incorrectly
re-reading the last index of the array.

Fixes: acdcecc612 ("udp: correct reuseport selection with connected sockets")
Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20210107051110.12247-1-baptiste.lepers@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-08 19:15:40 -08:00
Dongseok Yi 53475c5dd8 net: fix use-after-free when UDP GRO with shared fraglist
skbs in fraglist could be shared by a BPF filter loaded at TC. If TC
writes, it will call skb_ensure_writable -> pskb_expand_head to create
a private linear section for the head_skb. And then call
skb_clone_fraglist -> skb_get on each skb in the fraglist.

skb_segment_list overwrites part of the skb linear section of each
fragment itself. Even after skb_clone, the frag_skbs share their
linear section with their clone in PF_PACKET.

Both sk_receive_queue of PF_PACKET and PF_INET (or PF_INET6) can have
a link for the same frag_skbs chain. If a new skb (not frags) is
queued to one of the sk_receive_queue, multiple ptypes can see and
release this. It causes use-after-free.

[ 4443.426215] ------------[ cut here ]------------
[ 4443.426222] refcount_t: underflow; use-after-free.
[ 4443.426291] WARNING: CPU: 7 PID: 28161 at lib/refcount.c:190
refcount_dec_and_test_checked+0xa4/0xc8
[ 4443.426726] pstate: 60400005 (nZCv daif +PAN -UAO)
[ 4443.426732] pc : refcount_dec_and_test_checked+0xa4/0xc8
[ 4443.426737] lr : refcount_dec_and_test_checked+0xa0/0xc8
[ 4443.426808] Call trace:
[ 4443.426813]  refcount_dec_and_test_checked+0xa4/0xc8
[ 4443.426823]  skb_release_data+0x144/0x264
[ 4443.426828]  kfree_skb+0x58/0xc4
[ 4443.426832]  skb_queue_purge+0x64/0x9c
[ 4443.426844]  packet_set_ring+0x5f0/0x820
[ 4443.426849]  packet_setsockopt+0x5a4/0xcd0
[ 4443.426853]  __sys_setsockopt+0x188/0x278
[ 4443.426858]  __arm64_sys_setsockopt+0x28/0x38
[ 4443.426869]  el0_svc_common+0xf0/0x1d0
[ 4443.426873]  el0_svc_handler+0x74/0x98
[ 4443.426880]  el0_svc+0x8/0xc

Fixes: 3a1296a38d (net: Support GRO/GSO fraglist chaining.)
Signed-off-by: Dongseok Yi <dseok.yi@samsung.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/1610072918-174177-1-git-send-email-dseok.yi@samsung.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-08 19:14:02 -08:00
Stephan Gerhold afba9dc1f3 net: ipa: modem: add missing SET_NETDEV_DEV() for proper sysfs links
At the moment it is quite hard to identify the network interface
provided by IPA in userspace components: The network interface is
created as virtual device, without any link to the IPA device.
The interface name ("rmnet_ipa%d") is the only indication that the
network interface belongs to IPA, but this is not very reliable.

Add SET_NETDEV_DEV() to associate the network interface with the
IPA parent device. This allows userspace services like ModemManager
to properly identify that this network interface is provided by IPA
and belongs to the modem.

Cc: Alex Elder <elder@kernel.org>
Fixes: a646d6ec90 ("soc: qcom: ipa: modem and microcontroller")
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Link: https://lore.kernel.org/r/20210106100755.56800-1-stephan@gerhold.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-08 18:45:35 -08:00
Linus Torvalds 6279d812ea Networking fixes for 5.11-rc3 (part 2), including fixes from bpf and
can trees.
 
 Current release - always broken:
 
  - can: mcp251xfd: fix Tx/Rx ring buffer driver race conditions
 
  - dsa: hellcreek: fix led_classdev build errors
 
 Previous releases - regressions:
 
  - ipv6: fib: flush exceptions when purging route to avoid netdev
               reference leak
 
  - ip_tunnels: fix pmtu check in nopmtudisc mode
 
  - ip: always refragment ip defragmented packets to avoid MTU issues
        when forwarding through tunnels, correct "packet too big"
        message is prohibitively tricky to generate
 
  - s390/qeth: fix locking for discipline setup / removal and during
               recovery to prevent both deadlocks and races
 
  - mlx5: Use port_num 1 instead of 0 when delete a RoCE address
 
 Previous releases - always broken:
 
  - cdc_ncm: correct overhead calculation in delayed_ndp_size to prevent
             out of bound accesses with Huawei 909s-120 LTE module
 
  - stmmac: dwmac-sun8i: fix suspend/resume:
            - PHY being left powered off
            - MAC syscon configuration being reset
            - reference to the reset controller being improperly dropped
 
  - qrtr: fix null-ptr-deref in qrtr_ns_remove
 
  - can: tcan4x5x: fix bittiming const, use common bittiming from m_can
                   driver
 
  - mlx5e: CT: Use per flow counter when CT flow accounting is enabled
 
  - mlx5e: Fix SWP offsets when vlan inserted by driver
 
 Misc:
 
  - bpf: Fix a task_iter bug caused by a bpf -> net merge conflict
         resolution
 
 And the usual many fixes to various error paths.
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIyBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl/34wsACgkQMUZtbf5S
 IrttVA/1FSZ34bAulkdPAQj/qtL9Y3G/qR33ZUgiNySRaqhRuq9ZB73+Qk84RyWI
 yef586533NXXfwLg/8sS3crwDLAMl62rU8RTH/BF7MdByDAnX89L5czzmZxJGd36
 Ffn81F9oBu2obIPjvXC2TDAr1G8/x4MMNBZZiEPEetWjjlAmCBhK7AjsCRTEquOy
 PF2//97y/qrCo3nFoHNaxUy4IG7qu5WvA9vPGTPk/SZoyJ4HYKGIoRFNl8hnh8E7
 XUTeqC5+KE+3PIfRV6WSwnA5mQ3/LXXjCQFQvHgT4vT7DCUUhAqiqwXfczFy88Jt
 jN5ir3JFbFODF965NkohdvCA/ynhIvPMQUThq5JZd2t8fqVDtOQPR+Ov20Emw/mW
 tD2o4+1vi5GLaFoyWIMBCFZVtoh9cFjne19S0ls90XcF5B3EKwMlRbQt69F6afBx
 n/w+qr4/fRFkrWewKHsfiez/RXM8bYr8QsRxqofq/FVPxOfI+nVC4OyNFrhEYY0R
 NRTEY8/RxOgjfL61c0OjZJ+m2oeBIusWH/BW/9SPzAiqgrIqnNkJhbhXhGfdW6r3
 AMoaS1vittUoc4Vsv/3GIslSQV3xgbQVbuWK0sZEh8K0gKGIy8EDLDd2r1ZkcwN2
 slSh2QMAiOQZaq8UkLuVEF2yAVVGXtPhnL0yuoN0dM1+kPo7zw==
 =YS2c
 -----END PGP SIGNATURE-----

Merge tag 'net-5.11-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull more networking fixes from Jakub Kicinski:
 "Slightly lighter pull request to get back into the Thursday cadence.

  Current release - always broken:

   - can: mcp251xfd: fix Tx/Rx ring buffer driver race conditions

   - dsa: hellcreek: fix led_classdev build errors

  Previous releases - regressions:

   - ipv6: fib: flush exceptions when purging route to avoid netdev
     reference leak

   - ip_tunnels: fix pmtu check in nopmtudisc mode

   - ip: always refragment ip defragmented packets to avoid MTU issues
     when forwarding through tunnels, correct "packet too big" message
     is prohibitively tricky to generate

   - s390/qeth: fix locking for discipline setup / removal and during
     recovery to prevent both deadlocks and races

   - mlx5: Use port_num 1 instead of 0 when delete a RoCE address

  Previous releases - always broken:

   - cdc_ncm: correct overhead calculation in delayed_ndp_size to
     prevent out of bound accesses with Huawei 909s-120 LTE module

   - fix stmmac dwmac-sun8i suspend/resume:
           - PHY being left powered off
           - MAC syscon configuration being reset
           - reference to the reset controller being improperly dropped

   - qrtr: fix null-ptr-deref in qrtr_ns_remove

   - can: tcan4x5x: fix bittiming const, use common bittiming from m_can
     driver

   - mlx5e: CT: Use per flow counter when CT flow accounting is enabled

   - mlx5e: Fix SWP offsets when vlan inserted by driver

  Misc:

   - bpf: Fix a task_iter bug caused by a bpf -> net merge conflict
     resolution

  And the usual many fixes to various error paths"

* tag 'net-5.11-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE
  s390/qeth: fix L2 header access in qeth_l3_osa_features_check()
  s390/qeth: fix locking for discipline setup / removal
  s390/qeth: fix deadlock during recovery
  selftests: fib_nexthops: Fix wrong mausezahn invocation
  nexthop: Bounce NHA_GATEWAY in FDB nexthop groups
  nexthop: Unlink nexthop group entry in error path
  nexthop: Fix off-by-one error in error path
  octeontx2-af: fix memory leak of lmac and lmac->name
  chtls: Fix chtls resources release sequence
  chtls: Added a check to avoid NULL pointer dereference
  chtls: Replace skb_dequeue with skb_peek
  chtls: Avoid unnecessary freeing of oreq pointer
  chtls: Fix panic when route to peer not configured
  chtls: Remove invalid set_tcb call
  chtls: Fix hardware tid leak
  net: ip: always refragment ip defragmented packets
  net: fix pmtu check in nopmtudisc mode
  selftests: netfilter: add selftest for ipip pmtu discovery with enabled connection tracking
  docs: octeontx2: tune rst markup
  ...
2021-01-08 12:12:30 -08:00
Linus Torvalds ea1c87c156 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes a functional bug in arm/chacha-neon as well as a potential
  buffer overflow in ecdh"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: ecdh - avoid buffer overflow in ecdh_set_secret()
  crypto: arm/chacha-neon - add missing counter increment
2021-01-08 12:05:11 -08:00
Linus Torvalds ef0ba05538 poll: fix performance regression due to out-of-line __put_user()
The kernel test robot reported a -5.8% performance regression on the
"poll2" test of will-it-scale, and bisected it to commit d55564cfc2
("x86: Make __put_user() generate an out-of-line call").

I didn't expect an out-of-line __put_user() to matter, because no normal
core code should use that non-checking legacy version of user access any
more.  But I had overlooked the very odd poll() usage, which does a
__put_user() to update the 'revents' values of the poll array.

Now, Al Viro correctly points out that instead of updating just the
'revents' field, it would be much simpler to just copy the _whole_
pollfd entry, and then we could just use "copy_to_user()" on the whole
array of entries, the same way we use "copy_from_user()" a few lines
earlier to get the original values.

But that is not what we've traditionally done, and I worry that threaded
applications might be concurrently modifying the other fields of the
pollfd array.  So while Al's suggestion is simpler - and perhaps worth
trying in the future - this instead keeps the "just update revents"
model.

To fix the performance regression, use the modern "unsafe_put_user()"
instead of __put_user(), with the proper "user_write_access_begin()"
guarding in place. This improves code generation enormously.

Link: https://lore.kernel.org/lkml/20210107134723.GA28532@xsang-OptiPlex-9020/
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Laight <David.Laight@aculab.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-08 11:06:29 -08:00
Petr Mladek a91bd6223e Revert "init/console: Use ttynull as a fallback when there is no console"
This reverts commit 757055ae8d.

The commit caused that ttynull was used as the default console
on several systems[1][2][3]. As a result, the console was
blank even when a better alternative existed.

It happened when there was no console configured
on the command line and ttynull_init() was the first initcall
calling register_console().

Or it happened when /dev/ did not exist when console_on_rootfs()
was called. It was not able to open /dev/console even though
a console driver was registered. It tried to add ttynull console
but it obviously did not help. But ttynull became the preferred
console and was used by /dev/console when it was available later.

The commit tried to fix a historical problem that have been there
for ages. The primary motivation was the commit 3cffa06aee
("printk/console: Allow to disable console output by using console=""
 or console=null"). It provided a clean solution for a workaround
 that was widely used and worked only by chance.

This revert causes that the console="" or console=null command line
options will again work only by chance. These options will cause that
a particular console will be preferred and the default (tty) ones
will not get enabled. There will be no console registered at
all. As a result there won't be stdin, stdout, and stderr for
the init process. But it worked exactly this way even before.

The proper solution has to fulfill many conditions:

  + Register ttynull only when explicitly required or as
    the ultimate fallback.

  + ttynull should get associated with /dev/console but it must
    not become preferred console when used as a fallback.
    Especially, it must still be possible to replace it
    by a better console later.

Such a change requires clean up of the register_console() code.
Otherwise, it would be even harder to follow. Especially, the use
of has_preferred_console and CON_CONSDEV flag is tricky. The clean
up is risky. The ordering of consoles is not well defined. And
any changes tend to break existing user settings.

Do the revert at the least risky solution for now.

[1] https://lore.kernel.org/linux-kselftest/20201221144302.GR4077@smile.fi.intel.com/
[2] https://lore.kernel.org/lkml/d2a3b3c0-e548-7dd1-730f-59bc5c04e191@synopsys.com/
[3] https://patchwork.ozlabs.org/project/linux-um/patch/20210105120128.10854-1-thomas@m3y3r.de/

Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reported-by: Vineet Gupta <vgupta@synopsys.com>
Reported-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-01-08 11:02:18 -08:00
Jakub Kicinski 220efcf9ca mlx5-fixes-2021-01-07
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl/3bZsACgkQSD+KveBX
 +j4powgAnzKVXUxN/R5IJ8z9kONZEvI+9SgHQ/qvzsHY66zy6xHlCdrmRWYI8V1c
 KuZltBQPVCd1fAAWAzzmkEOpClM1IbS6zd0/OoJFHJfZYdFEZStvw4NrazpRNd6E
 rPUgJpxRFjiLbklZhpfZ6yaoAWez/fXPQbXB+ObOs2m/guesCyZy/T7mBTvvovwA
 gmrWoxb6vApJZtgk3bIQKaaevx5/0UFCWqJIoAgKyCzVKf8QvQ0OLRkt27Cn3B08
 lTjtUjiGe4boxVY8F+bEb79BcnezIb/NlafjoJ1gyCgZ5dvHSrfaopiAxIZhxRSX
 bP/W8cCVZvUngclDs/L+cx3Y3btpNA==
 =0qLp
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2021-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5 fixes 2021-01-07

* tag 'mlx5-fixes-2021-01-07' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5e: Fix memleak in mlx5e_create_l2_table_groups
  net/mlx5e: Fix two double free cases
  net/mlx5: Release devlink object if adev fails
  net/mlx5e: ethtool, Fix restriction of autoneg with 56G
  net/mlx5e: In skb build skip setting mark in switchdev mode
  net/mlx5: E-Switch, fix changing vf VLANID
  net/mlx5e: Fix SWP offsets when vlan inserted by driver
  net/mlx5e: CT: Use per flow counter when CT flow accounting is enabled
  net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address
  net/mlx5e: Add missing capability check for uplink follow
  net/mlx5: Check if lag is supported before creating one
====================

Link: https://lore.kernel.org/r/20210107202845.470205-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 19:13:30 -08:00
Aleksander Jan Bajkowski 3545454c78 net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE
Exclude RMII from modes that report 1 GbE support. Reduced MII supports
up to 100 MbE.

Fixes: 14fceff477 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20210107195818.3878-1-olek2@wp.pl
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 19:00:11 -08:00
Jakub Kicinski 286e95eed1 Merge branch 's390-qeth-fixes-2021-01-07'
Julian Wiedmann says:

====================
s390/qeth: fixes 2021-01-07

This brings two locking fixes for the device control path.
Also one fix for a path where our .ndo_features_check() attempts to
access a non-existent L2 header.
====================

Link: https://lore.kernel.org/r/20210107172442.1737-1-jwi@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 18:54:09 -08:00
Julian Wiedmann f9c4845385 s390/qeth: fix L2 header access in qeth_l3_osa_features_check()
ip_finish_output_gso() may call .ndo_features_check() even before the
skb has a L2 header. This conflicts with qeth_get_ip_version()'s attempt
to inspect the L2 header via vlan_eth_hdr().

Switch to vlan_get_protocol(), as already used further down in the
common qeth_features_check() path.

Fixes: f13ade1993 ("s390/qeth: run non-offload L3 traffic over common xmit path")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 18:54:06 -08:00
Julian Wiedmann b41b554c1e s390/qeth: fix locking for discipline setup / removal
Due to insufficient locking, qeth_core_set_online() and
qeth_dev_layer2_store() can run in parallel, both attempting to load &
setup the discipline (and stepping on each other toes along the way).
A similar race can also occur between qeth_core_remove_device() and
qeth_dev_layer2_store().

Access to .discipline is meant to be protected by the discipline_mutex,
so add/expand the locking in qeth_core_remove_device() and
qeth_core_set_online().
Adjust the locking in qeth_l*_remove_device() accordingly, as it's now
handled by the callers in a consistent manner.

Based on an initial patch by Ursula Braun.

Fixes: 9dc48ccc68 ("qeth: serialize sysfs-triggered device configurations")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 18:54:06 -08:00
Julian Wiedmann 0b9902c1fc s390/qeth: fix deadlock during recovery
When qeth_dev_layer2_store() - holding the discipline_mutex - waits
inside qeth_l*_remove_device() for a qeth_do_reset() thread to complete,
we can hit a deadlock if qeth_do_reset() concurrently calls
qeth_set_online() and thus tries to aquire the discipline_mutex.

Move the discipline_mutex locking outside of qeth_set_online() and
qeth_set_offline(), and turn the discipline into a parameter so that
callers understand the dependency.

To fix the deadlock, we can now relax the locking:
As already established, qeth_l*_remove_device() waits for
qeth_do_reset() to complete. So qeth_do_reset() itself is under no risk
of having card->discipline ripped out while it's running, and thus
doesn't need to take the discipline_mutex.

Fixes: 9dc48ccc68 ("qeth: serialize sysfs-triggered device configurations")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 18:54:06 -08:00
Jakub Kicinski d708342748 Merge branch 'nexthop-various-fixes'
Ido Schimmel says:

====================
nexthop: Various fixes

This series contains various fixes for the nexthop code. The bugs were
uncovered during the development of resilient nexthop groups.

Patches #1-#2 fix the error path of nexthop_create_group(). I was not
able to trigger these bugs with current code, but it is possible with
the upcoming resilient nexthop groups code which adds a user
controllable memory allocation further in the function.

Patch #3 fixes wrong validation of netlink attributes.

Patch #4 fixes wrong invocation of mausezahn in a selftest.
====================

Link: https://lore.kernel.org/r/20210107144824.1135691-1-idosch@idosch.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 18:47:21 -08:00
Ido Schimmel a5c9ca76a1 selftests: fib_nexthops: Fix wrong mausezahn invocation
For IPv6 traffic, mausezahn needs to be invoked with '-6'. Otherwise an
error is returned:

 # ip netns exec me mausezahn veth1 -B 2001:db8:101::2 -A 2001:db8:91::1 -c 0 -t tcp "dp=1-1023, flags=syn"
 Failed to set source IPv4 address. Please check if source is set to a valid IPv4 address.
  Invalid command line parameters!

Fixes: 7c741868ce ("selftests: Add torture tests to nexthop tests")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 18:47:19 -08:00
Petr Machata b19218b27f nexthop: Bounce NHA_GATEWAY in FDB nexthop groups
The function nh_check_attr_group() is called to validate nexthop groups.
The intention of that code seems to have been to bounce all attributes
above NHA_GROUP_TYPE except for NHA_FDB. However instead it bounces all
these attributes except when NHA_FDB attribute is present--then it accepts
them.

NHA_FDB validation that takes place before, in rtm_to_nh_config(), already
bounces NHA_OIF, NHA_BLACKHOLE, NHA_ENCAP and NHA_ENCAP_TYPE. Yet further
back, NHA_GROUPS and NHA_MASTER are bounced unconditionally.

But that still leaves NHA_GATEWAY as an attribute that would be accepted in
FDB nexthop groups (with no meaning), so long as it keeps the address
family as unspecified:

 # ip nexthop add id 1 fdb via 127.0.0.1
 # ip nexthop add id 10 fdb via default group 1

The nexthop code is still relatively new and likely not used very broadly,
and the FDB bits are newer still. Even though there is a reproducer out
there, it relies on an improbable gateway arguments "via default", "via
all" or "via any". Given all this, I believe it is OK to reformulate the
condition to do the right thing and bounce NHA_GATEWAY.

Fixes: 38428d6871 ("nexthop: support for fdb ecmp nexthops")
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 18:47:18 -08:00
Ido Schimmel 7b01e53eee nexthop: Unlink nexthop group entry in error path
In case of error, remove the nexthop group entry from the list to which
it was previously added.

Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 18:47:18 -08:00
Ido Schimmel 07e61a979c nexthop: Fix off-by-one error in error path
A reference was not taken for the current nexthop entry, so do not try
to put it in the error path.

Fixes: 430a049190 ("nexthop: Add support for nexthop groups")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 18:47:18 -08:00
Colin Ian King ac7996d680 octeontx2-af: fix memory leak of lmac and lmac->name
Currently the error return paths don't kfree lmac and lmac->name
leading to some memory leaks.  Fix this by adding two error return
paths that kfree these objects

Addresses-Coverity: ("Resource leak")
Fixes: 1463f382f5 ("octeontx2-af: Add support for CGX link management")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Link: https://lore.kernel.org/r/20210107123916.189748-1-colin.king@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 18:39:04 -08:00
Jakub Kicinski 85bd6055e3 Merge branch 'bug-fixes-for-chtls-driver'
Ayush Sawal says:

====================
Bug fixes for chtls driver

patch 1: Fix hardware tid leak.
patch 2: Remove invalid set_tcb call.
patch 3: Fix panic when route to peer not configured.
patch 4: Avoid unnecessary freeing of oreq pointer.
patch 5: Replace skb_dequeue with skb_peek.
patch 6: Added a check to avoid NULL pointer dereference patch.
patch 7: Fix chtls resources release sequence.
====================

Link: https://lore.kernel.org/r/20210106042912.23512-1-ayush.sawal@chelsio.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 17:06:05 -08:00
Ayush Sawal 15ef6b0e30 chtls: Fix chtls resources release sequence
CPL_ABORT_RPL is sent after releasing the resources by calling
chtls_release_resources(sk); and chtls_conn_done(sk);
eventually causing kernel panic. Fixing it by calling release
in appropriate order.

Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 17:06:02 -08:00
Ayush Sawal eade1e0a4f chtls: Added a check to avoid NULL pointer dereference
In case of server removal lookup_stid() may return NULL pointer, which
is used as listen_ctx. So added a check before accessing this pointer.

Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 17:06:02 -08:00
Ayush Sawal a84b2c0d5f chtls: Replace skb_dequeue with skb_peek
The skb is unlinked twice, one in __skb_dequeue in function
chtls_reset_synq() and another in cleanup_syn_rcv_conn().
So in this patch using skb_peek() instead of __skb_dequeue(),
so that unlink will be handled only in cleanup_syn_rcv_conn().

Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Vinay Kumar Yadav <vinay.yadav@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 17:06:02 -08:00
Ayush Sawal f8d15d29d6 chtls: Avoid unnecessary freeing of oreq pointer
In chtls_pass_accept_request(), removing the chtls_reqsk_free()
call to avoid oreq freeing twice. Here oreq is the pointer to
struct request_sock.

Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 17:06:02 -08:00
Ayush Sawal 5a5fac9966 chtls: Fix panic when route to peer not configured
If route to peer is not configured, we might get non tls
devices from dst_neigh_lookup() which is invalid, adding a
check to avoid it.

Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 17:06:02 -08:00
Ayush Sawal 827d329105 chtls: Remove invalid set_tcb call
At the time of SYN_RECV, connection information is not
initialized at FW, updating tcb flag over uninitialized
connection causes adapter crash. We don't need to
update the flag during SYN_RECV state, so avoid this.

Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 17:06:02 -08:00
Ayush Sawal 717df0f4cd chtls: Fix hardware tid leak
send_abort_rpl() is not calculating cpl_abort_req_rss offset and
ends up sending wrong TID with abort_rpl WR causng tid leaks.
Replaced send_abort_rpl() with chtls_send_abort_rpl() as it is
redundant.

Fixes: cc35c88ae4 ("crypto : chtls - CPL handler definition")
Signed-off-by: Rohit Maheshwari <rohitm@chelsio.com>
Signed-off-by: Ayush Sawal <ayush.sawal@chelsio.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 17:06:01 -08:00
Linus Torvalds c4cc3b1de3 gcc-plugins fix for v5.11-rc3
- Bump c++ standard version for latest GCC versions (Valdis Kletnieks)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl/3lmMACgkQiXL039xt
 wCbVwxAAoPLbLZHjBkG654VWl1YK9tsRIcGomGqCmzsgkO9dBJPj58VPeVBSlnl2
 A58YJdz7m0iq9Tv1UH+fkOW5EziIMIsozho09JZpAAn7hqPw0eQP56EudJudoXIN
 UFtt3C+bqEUCfYpmwhUl1aV2SBAyB6QQZ6nn+J6Lrxq0w6KbYzNTeaKBwnGcDkGN
 RZQpMfY+UJjzAFm17/N1UhyOBR+EfjdN9PDi46omFRikfsP3KmaUVdl05JZ3ONfr
 oN7JGuoNv1PSHPXslqMzgB/8h5DuCASUfLPDCZr8wk9EZ87gcM3xVHHDpQOnLxuU
 V26/YB7IBh8nWmwfpZcfsT7CdGq108JlxyuSGezxNziW3yHLNHMeZ29Arlh2jJ+U
 Z2PunpGTZhBcod7MFobIgLTnnXU9i+4re9Y6soJq1P9g9cfwd3q8YxixFMYvWgyh
 KmtF9eF6a+EOATutC+lLByNnZ5+DisEfGyMiGXEv+cbozT74Dx2H8ISQshW03tym
 iyP94giJAf7DVJ7mMVTG1XlM+Pl6dQC9p51nu5pl5DBM0Ryj8Hunu8605JJ44qIb
 8jroMSV0SoqMXlf2bN5XKLGRSpyrKz59dKjfw/Iu0v+2fd796ZugbRGouR27VekK
 WFQsql6OXsYHgK8uh1pxnZvP2swrjjXhHiLUGwr6e+b2YArrf0E=
 =jtTd
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugins-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull gcc-plugins fix from Kees Cook:
 "Bump c++ standard version for latest GCC versions (Valdis Kletnieks)"

* tag 'gcc-plugins-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: fix gcc 11 indigestion with plugins...
2021-01-07 16:03:19 -08:00
Jakub Kicinski 0565ff56cd Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2021-01-07

We've added 4 non-merge commits during the last 10 day(s) which contain
a total of 4 files changed, 14 insertions(+), 7 deletions(-).

The main changes are:

1) Fix task_iter bug caused by the merge conflict resolution, from Yonghong.

2) Fix resolve_btfids for multiple type hierarchies, from Jiri.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpftool: Fix compilation failure for net.o with older glibc
  tools/resolve_btfids: Warn when having multiple IDs for single type
  bpf: Fix a task_iter bug caused by a merge conflict resolution
  selftests/bpf: Fix a compile error for BPF_F_BPRM_SECUREEXEC
====================

Link: https://lore.kernel.org/r/20210107221555.64959-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 15:10:27 -08:00
Jakub Kicinski 704a0f858e Merge branch 'net-fix-netfilter-defrag-ip-tunnel-pmtu-blackhole'
Florian Westphal says:

====================
net: fix netfilter defrag/ip tunnel pmtu blackhole

Christian Perle reported a PMTU blackhole due to unexpected interaction
between the ip defragmentation that comes with connection tracking and
ip tunnels.

Unfortunately setting 'nopmtudisc' on the tunnel breaks the test
scenario even without netfilter.

Christinas setup looks like this:
     +--------+       +---------+       +--------+
     |Router A|-------|Wanrouter|-------|Router B|
     |        |.IPIP..|         |..IPIP.|        |
     +--------+       +---------+       +--------+
          /             mtu 1400           \
         /                                  \
 +--------+                                  +--------+
 |Client A|                                  |Client B|
 +--------+                                  +--------+

MTU is 1500 everywhere, except on Router A to Wanrouter and
Wanrouter to Router B.

Router A and Router B use IPIP tunnel interfaces to tunnel traffic
between Client A and Client B over WAN.

Client A sends a 1400 byte UDP datagram to Client B.
This packet gets encapsulated in the IPIP tunnel.

This works, packet is received on client B.

When conntrack (or anything else that forces ip defragmentation) is
enabled on Router A, the packet gets dropped on Router A after
encapsulation because they exceed the link MTU.

Setting the 'nopmtudisc' flag on the IPIP tunnel makes things worse,
no packets pass even in the no-netfilter scenario.

Patch one is a reproducer script for selftest infra.

Patch two is a fix for 'nopmtudisc' behaviour so ip_tunnel will send
an icmp error to Client A.  This allows 'nopmtudisc' tunnel to forward
the UDP datagrams.

Patch three enables ip refragmentation for all reassembled packets, just
like ipv6.
====================

Link: https://lore.kernel.org/r/20210105231523.622-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 14:42:37 -08:00
Florian Westphal bb4cc1a188 net: ip: always refragment ip defragmented packets
Conntrack reassembly records the largest fragment size seen in IPCB.
However, when this gets forwarded/transmitted, fragmentation will only
be forced if one of the fragmented packets had the DF bit set.

In that case, a flag in IPCB will force fragmentation even if the
MTU is large enough.

This should work fine, but this breaks with ip tunnels.
Consider client that sends a UDP datagram of size X to another host.

The client fragments the datagram, so two packets, of size y and z, are
sent. DF bit is not set on any of these packets.

Middlebox netfilter reassembles those packets back to single size-X
packet, before routing decision.

packet-size-vs-mtu checks in ip_forward are irrelevant, because DF bit
isn't set.  At output time, ip refragmentation is skipped as well
because x is still smaller than the mtu of the output device.

If ttransmit device is an ip tunnel, the packet size increases to
x+overhead.

Also, tunnel might be configured to force DF bit on outer header.

In this case, packet will be dropped (exceeds MTU) and an ICMP error is
generated back to sender.

But sender already respects the announced MTU, all the packets that
it sent did fit the announced mtu.

Force refragmentation as per original sizes unconditionally so ip tunnel
will encapsulate the fragments instead.

The only other solution I see is to place ip refragmentation in
the ip_tunnel code to handle this case.

Fixes: d6b915e29f ("ip_fragment: don't forward defragmented DF packet")
Reported-by: Christian Perle <christian.perle@secunet.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 14:42:36 -08:00