Stanislav Fomichev says:
====================
This patch series fixes the existing BPF flow dissector API to
support calling BPF progs from the eth_get_headlen context (the
support itself will be added in bpf-next tree).
The summary of the changes:
* fix VLAN handling in bpf_flow.c, we don't need to peek back and look
at skb->vlan_present; add selftests
* pass and use flow_keys->n_proto instead of skb->protocol
* fix clamping of flow_keys->nhoff for packets with nhoff > 0
* prohibit access to most of the __sk_buff fields from BPF flow
dissector progs; only data/data_end/flow_keys are allowed (all input
is now passed via flow_keys)
* finally, document BPF flow dissector program environment
====================
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Petar Penkov <peterpenkov96@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Short doc on what BPF flow dissector should expect in the input
__sk_buff and flow_keys.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Use whitelist instead of a blacklist and allow only a small set of
fields that might be relevant in the context of flow dissector:
* data
* data_end
* flow_keys
This is required for the eth_get_headlen case where we have only a
chunk of data to dissect (i.e. trying to read the other skb fields
doesn't make sense).
Note, that it is a breaking API change! However, we've provided
flow_keys->n_proto as a substitute for skb->protocol; and there is
no need to manually handle skb->vlan_present. So even if we
break somebody, the migration is trivial. Unfortunately, we can't
support eth_get_headlen use-case without those breaking changes.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Don't allow BPF program to set flow_keys->nhoff to less than initial
value. We currently don't read the value afterwards in anything but
the tests, but it's still a good practice to return consistent
values to the test programs.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This is a preparation for the next commit that would prohibit access to
the most fields of __sk_buff from the BPF programs.
Instead of requiring BPF flow dissector programs to look into skb,
pass all input data in the flow_keys.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
When we tail call PROG(VLAN) from parse_eth_proto we don't need to peek
back to handle vlan proto because we didn't adjust nhoff/thoff yet. Use
flow_keys->n_proto, that we set in parse_eth_proto instead and
properly increment nhoff as well.
Also, always use skb->protocol and don't look at skb->vlan_present.
skb->vlan_present indicates that vlan information is stored out-of-band
in skb->vlan_{tci,proto} and vlan header is already pulled from skb.
That means, skb->vlan_present == true is not relevant for BPF flow
dissector.
Add simple test cases with VLAN tagged frames:
* single vlan for ipv4
* double vlan for ipv6
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
The device type for ip6 tunnels is set to
ARPHRD_TUNNEL6. However, the ip4ip6_err function
is expecting the device type of the tunnel to be
ARPHRD_TUNNEL. Since the device types do not
match, the function exits and the ICMP error
packet is not sent to the originating host. Note
that the device type for IPv4 tunnels is set to
ARPHRD_TUNNEL.
Fix is to expect a tunnel device type of
ARPHRD_TUNNEL6 instead. Now the tunnel device
type matches and the ICMP error packet is sent
to the originating host.
Signed-off-by: Sheena Mira-ato <sheena.mira-ato@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
If dccp_feat_push_change fails, we forget free the mem
which is alloced by kmemdup in dccp_feat_clone_sp_val.
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: e8ef967a54 ("dccp: Registration routines for changing feature values")
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzbot report a kernel-infoleak:
BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
Call Trace:
_copy_to_user+0x16b/0x1f0 lib/usercopy.c:32
copy_to_user include/linux/uaccess.h:174 [inline]
sctp_getsockopt_peer_addrs net/sctp/socket.c:5911 [inline]
sctp_getsockopt+0x1668e/0x17f70 net/sctp/socket.c:7562
...
Uninit was stored to memory at:
sctp_transport_init net/sctp/transport.c:61 [inline]
sctp_transport_new+0x16d/0x9a0 net/sctp/transport.c:115
sctp_assoc_add_peer+0x532/0x1f70 net/sctp/associola.c:637
sctp_process_param net/sctp/sm_make_chunk.c:2548 [inline]
sctp_process_init+0x1a1b/0x3ed0 net/sctp/sm_make_chunk.c:2361
...
Bytes 8-15 of 16 are uninitialized
It was caused by that th _pad field (the 8-15 bytes) of a v4 addr (saved in
struct sockaddr_in) wasn't initialized, but directly copied to user memory
in sctp_getsockopt_peer_addrs().
So fix it by calling memset(addr->v4.sin_zero, 0, 8) to initialize _pad of
sockaddr_in before copying it to user memory in sctp_v4_addr_to_user(), as
sctp_v6_addr_to_user() does.
Reported-by: syzbot+86b5c7c236a22616a72f@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Tested-by: Alexander Potapenko <glider@google.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
nfp: flower: fix matching and pushing vlan CFI bit
This patch clears up some confusion around the meaning of bit 12
for FW messages related to VLAN and flower offload.
Pieter says:
It fixes issues with matching, pushing and popping vlan tags.
We replace the vlan CFI bit with a vlan present bit that
indicates the presence of a vlan tag. We also no longer set
the CFI when pushing vlan tags.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
We no longer set CFI when pushing vlan tags, therefore we remove
the CFI bit from push vlan.
Fixes: 1a1e586f54 ("nfp: add basic action capabilities to flower offloads")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace vlan CFI bit with a vlan present bit that indicates the
presence of a vlan tag. Previously the driver incorrectly assumed
that an vlan id of 0 is not matchable, therefore we indicate vlan
presence with a vlan present bit.
Fixes: 5571e8c9f2 ("nfp: extend flower matching capabilities")
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Louis Peens <louis.peens@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When kcm is loaded while many processes try to create a KCM socket, a
crash occurs:
BUG: unable to handle kernel NULL pointer dereference at 000000000000000e
IP: mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
PGD 8000000016ef2067 P4D 8000000016ef2067 PUD 3d6e9067 PMD 0
Oops: 0002 [#1] SMP KASAN PTI
CPU: 0 PID: 7005 Comm: syz-executor.5 Not tainted 4.12.14-396-default #1 SLE15-SP1 (unreleased)
RIP: 0010:mutex_lock+0x27/0x40 kernel/locking/mutex.c:240
RSP: 0018:ffff88000d487a00 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 000000000000000e RCX: 1ffff100082b0719
...
CR2: 000000000000000e CR3: 000000004b1bc003 CR4: 0000000000060ef0
Call Trace:
kcm_create+0x600/0xbf0 [kcm]
__sock_create+0x324/0x750 net/socket.c:1272
...
This is due to race between sock_create and unfinished
register_pernet_device. kcm_create tries to do "net_generic(net,
kcm_net_id)". but kcm_net_id is not initialized yet.
So switch the order of the two to close the race.
This can be reproduced with mutiple processes doing socket(PF_KCM, ...)
and one process doing module removal.
Fixes: ab7ac4eb98 ("kcm: Kernel Connection Multiplexor module")
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni says:
====================
net: sched: fix stats accounting for child NOLOCK qdiscs
Currently, stats accounting for NOLOCK qdisc enslaved to classful (lock)
qdiscs is buggy. Per CPU values are ignored in most places, as a result,
stats dump in the above scenario always report 0 length backlog and parent
backlog len is not updated correctly on NOLOCK qdisc removal.
The first patch address stats dumping, and the second one child qdisc removal.
I'm targeting the net tree as this is a bugfix, but it could be moved to
net-next due to the relatively large diffstat.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The same code to flush qdisc tree and purge the qdisc queue
is duplicated in many places and in most cases it does not
respect NOLOCK qdisc: the global backlog len is used and the
per CPU values are ignored.
This change addresses the above, factoring-out the relevant
code and using the helpers introduced by the previous patch
to fetch the correct backlog len.
Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Classful qdiscs can't access directly the child qdiscs backlog
length: if such qdisc is NOLOCK, per CPU values should be
accounted instead.
Most qdiscs no not respect the above. As a result, qstats fetching
for most classful qdisc is currently incorrect: if the child qdisc is
NOLOCK, it always reports 0 len backlog.
This change introduces a pair of helpers to safely fetch
both backlog and qlen and use them in stats class dumping
functions, fixing the above issue and cleaning a bit the code.
DRR needs also to access the child qdisc queue length, so it
needs custom handling.
Fixes: c5ad119fb6 ("net: sched: pfifo_fast use skb_array")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It returned always NULL, thus it was never possible to get the filter.
Example:
$ ip link add foo type dummy
$ ip link add bar type dummy
$ tc qdisc add dev foo clsact
$ tc filter add dev foo protocol all pref 1 ingress handle 1234 \
matchall action mirred ingress mirror dev bar
Before the patch:
$ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
Error: Specified filter handle not found.
We have an error talking to the kernel
After:
$ tc filter get dev foo protocol all pref 1 ingress handle 1234 matchall
filter ingress protocol all pref 1 matchall chain 0 handle 0x4d2
not_in_hw
action order 1: mirred (Ingress Mirror to device bar) pipe
index 1 ref 1 bind 1
CC: Yotam Gigi <yotamg@mellanox.com>
CC: Jiri Pirko <jiri@mellanox.com>
Fixes: fd62d9f5c5 ("net/sched: matchall: Fix configuration race")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Configuration check to accept source route IP options should be made on
the incoming netdevice when the skb->dev is an l3mdev master. The route
lookup for the source route next hop also needs the incoming netdev.
v2->v3:
- Simplify by passing the original netdevice down the stack (per David
Ahern).
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Martin Habets <mhabets@solarflare.com>
Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Acked-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When tcp_sk_init() failed in inet_ctl_sock_create(),
'net->ipv4.tcp_congestion_control' will be left
uninitialized, but tcp_sk_exit() hasn't check for
that.
This patch add checking on 'net->ipv4.tcp_congestion_control'
in tcp_sk_exit() to prevent NULL-ptr dereference.
Fixes: 6670e15244 ("tcp: Namespace-ify sysctl_tcp_default_congestion_control")
Signed-off-by: Dust Li <dust.li@linux.alibaba.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long says:
====================
tipc: a batch of uninit-value fixes for netlink_compat
These issues were all reported by syzbot, and exist since very beginning.
See the details on each patch.
====================
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzbot found a crash:
BUG: KMSAN: uninit-value in tipc_nl_compat_name_table_dump+0x54f/0xcd0 net/tipc/netlink_compat.c:872
Call Trace:
tipc_nl_compat_name_table_dump+0x54f/0xcd0 net/tipc/netlink_compat.c:872
__tipc_nl_compat_dumpit+0x59e/0xda0 net/tipc/netlink_compat.c:215
tipc_nl_compat_dumpit+0x63a/0x820 net/tipc/netlink_compat.c:280
tipc_nl_compat_handle net/tipc/netlink_compat.c:1226 [inline]
tipc_nl_compat_recv+0x1b5f/0x2750 net/tipc/netlink_compat.c:1265
genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626
netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
Uninit was created at:
__alloc_skb+0x309/0xa20 net/core/skbuff.c:208
alloc_skb include/linux/skbuff.h:1012 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
It was supposed to be fixed on commit 974cb0e3e7 ("tipc: fix uninit-value
in tipc_nl_compat_name_table_dump") by checking TLV_GET_DATA_LEN(msg->req)
in cmd->header()/tipc_nl_compat_name_table_dump_header(), which is called
ahead of tipc_nl_compat_name_table_dump().
However, tipc_nl_compat_dumpit() doesn't handle the error returned from cmd
header function. It means even when the check added in that fix fails, it
won't stop calling tipc_nl_compat_name_table_dump(), and the issue will be
triggered again.
So this patch is to add the process for the err returned from cmd header
function in tipc_nl_compat_dumpit().
Reported-by: syzbot+3ce8520484b0d4e260a5@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A similar issue as fixed by Patch "tipc: check bearer name with right
length in tipc_nl_compat_bearer_enable" was also found by syzbot in
tipc_nl_compat_link_set().
The length to check with should be 'TLV_GET_DATA_LEN(msg->req) -
offsetof(struct tipc_link_config, name)'.
Reported-by: syzbot+de00a87b8644a582ae79@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Syzbot reported the following crash:
BUG: KMSAN: uninit-value in memchr+0xce/0x110 lib/string.c:961
memchr+0xce/0x110 lib/string.c:961
string_is_valid net/tipc/netlink_compat.c:176 [inline]
tipc_nl_compat_bearer_enable+0x2c4/0x910 net/tipc/netlink_compat.c:401
__tipc_nl_compat_doit net/tipc/netlink_compat.c:321 [inline]
tipc_nl_compat_doit+0x3aa/0xaf0 net/tipc/netlink_compat.c:354
tipc_nl_compat_handle net/tipc/netlink_compat.c:1162 [inline]
tipc_nl_compat_recv+0x1ae7/0x2750 net/tipc/netlink_compat.c:1265
genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626
netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336
netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
Uninit was created at:
__alloc_skb+0x309/0xa20 net/core/skbuff.c:208
alloc_skb include/linux/skbuff.h:1012 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
sock_sendmsg_nosec net/socket.c:622 [inline]
sock_sendmsg net/socket.c:632 [inline]
It was triggered when the bearer name size < TIPC_MAX_BEARER_NAME,
it would check with a wrong len/TLV_GET_DATA_LEN(msg->req), which
also includes priority and disc_domain length.
This patch is to fix it by checking it with a right length:
'TLV_GET_DATA_LEN(msg->req) - offsetof(struct tipc_bearer_config, name)'.
Reported-by: syzbot+8b707430713eb46e1e45@syzkaller.appspotmail.com
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Aaro Koskinen says:
====================
net: stmmac: fix handling of oversized frames
I accidentally had MTU size mismatch (9000 vs. 1500) in my network,
and I noticed I could kill a system using stmmac & 1500 MTU simply
by pinging it with "ping -s 2000 ...".
While testing a fix I encountered also some other issues that need fixing.
I have tested these only with enhanced descriptors, so the normal
descriptor changes need a careful review.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This is log is harmful as it can trigger multiple times per packet. Delete
it.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packets without the last descriptor set should be dropped early. If we
receive a frame larger than the DMA buffer, the HW will continue using the
next descriptor. Driver mistakes these as individual frames, and sometimes
a truncated frame (without the LD set) may look like a valid packet.
This fixes a strange issue where the system replies to 4098-byte ping
although the MTU/DMA buffer size is set to 4096, and yet at the same
time it's logging an oversized packet.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we have error bits set, the discard_frame status will get overwritten
by checksum bit checks, which might set the status back to good one.
Fix by checking the COE status only if the frame is good.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, if we drop a packet, we exit from NAPI loop before the budget
is consumed. In some situations this will make the RX processing stall
e.g. when flood pinging the system with oversized packets, as the
errorneous packets are not dropped efficiently.
If we drop a packet, we should just continue to the next one as long as
the budget allows.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We always program the maximum DMA buffer size into the receive descriptor,
although the allocated size may be less. E.g. with the default MTU size
we allocate only 1536 bytes. If somebody sends us a bigger frame, then
memory may get corrupted.
Fix by using exact buffer sizes.
Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was reported that re-introducing ASPM, in combination with RX
interrupt coalescing, results in significantly increased packet
latency, see [0]. Disabling ASPM or RX interrupt coalescing fixes
the issue. Therefore change the driver's default to disable RX
interrupt coalescing. Users still have the option to enable RX
coalescing via ethtool.
[0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=925496
Fixes: a99790bf5c ("r8169: Reinstate ASPM Support")
Reported-by: Mike Crowe <mac@mcrowe.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
pull-request: bpf 2019-03-29
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Bug fix in BTF deduplication that was mishandling an equivalence
comparison, from Andrii.
2) libbpf Makefile fixes to properly link against libelf for the shared
object and to actually export AF_XDP's xsk.h header, from Björn.
3) Fix use after free in bpf inode eviction, from Daniel.
4) Fix a bug in skb creation out of cpumap redirect, from Jesper.
5) Remove an unnecessary and triggerable WARN_ONCE() in max number
of call stack frames checking in verifier, from Paul.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJcnnD7AAoJEEg/ir3gV/o+nksIAImv3hLoPXJoS/hnHQJV4FtB
U6euKao9PlKzMGCb7mii3WtBcn9ULwxNmWTA6oiJhZl2gp1gS/efAK5RFAis78pe
FKCuFU07antVcJQZ118L2gKSPDHwxuEOHm4d0IqlxzXr6gzldCZD+jSekD+nB59g
7Ehex+wi5ChPz5JqNA9UM5VQ9lO9Mf4yVrXKaZG3wAnT9Y0w2qwvqeNx5xVbXao9
3ugObrK24WVnajOK8Ub1MwREgveTuzccUy5GB4gc6DIBI8oSwKh+3DdqGqwiCM0z
/l5NmxK1ma+YpoKs1p6G7ipyTRielJD4oMbJeoilHfxsFyA1+AOIzdHjBw4Cd/4=
=SQ1L
-----END PGP SIGNATURE-----
Merge tag 'mlx5-fixes-2019-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
Mellanox, mlx5 fixes 2019-03-29
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
For -stable v4.11
('net/mlx5: Decrease default mr cache size')
For -stable v4.12
('net/mlx5e: Add a lock on tir list')
For -stable v4.13
('net/mlx5e: Fix error handling when refreshing TIRs')
For -stable v4.18
('net/mlx5e: Update xon formula')
For -stable v4.19
('net: mlx5: Add a missing check on idr_find, free buf')
('net/mlx5e: Update xoff formula')
net-next merge Note:
When merged with net-next the following simple conflict will appear,
drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
++<<<<<<< HEAD (net)
+ * max_mtu: netdev's max_mtu
++=======
+ * @mtu: device's MTU
++>>>>>>> net-next
To resolve: just replace the line in net-next
* @mtu: device's MTU
to
* @max_mtu: netdev's max_mtu
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Change t4fw_version.h to update latest firmware version
number to 1.23.3.0.
Signed-off-by: Vishal Kulkarni <vishal@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NULL or ZERO_SIZE_PTR will be returned for zero sized memory
request, and derefencing them will lead to a segfault
so it is unnecessory to call vzalloc for zero sized memory
request and not call functions which maybe derefence the
NULL allocated memory
this also fixes a possible memory leak if phy_ethtool_get_stats
returns error, memory should be freed before exit
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Wang Li <wangli39@baidu.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only decrypt_internal() performs zero copy on rx, all paths
which don't hit decrypt_internal() must set zc to false,
otherwise tls_sw_recvmsg() may return 0 causing the application
to believe that that connection got closed.
Currently this happens with device offload when new record
is first read from.
Fixes: d069b780e3 ("tls: Fix tls_device receive")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reported-by: David Beckett <david.beckett@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After queue stopped, the wakeup mechanism may wake it up again
when ring buffer usage is lower than a threshold. This may cause
send path panic on NULL pointer when we stopped all tx queues in
netvsc_detach and start removing the netvsc device.
This patch fix it by adding a tx_disable flag to prevent unwanted
queue wakeup.
Fixes: 7b2ee50c0c ("hv_netvsc: common detach logic")
Reported-by: Mohammed Gamal <mgamal@redhat.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bond expects ethernet hwaddr for its slave, but it can be longer than 6
bytes - infiniband interface for example.
# cat /sys/devices/<skipped>/net/ib0/address
80:00:02:08:fe:80:00:00:00:00:00:00:7c:fe:90:03:00:be:5d:e1
# cat /sys/devices/<skipped>/net/ib0/bonding_slave/perm_hwaddr
80:00:02:08:fe:80
So print full hwaddr in sysfs "bonding_slave/perm_hwaddr" as well.
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver allocates an encap context based on the tunnel properties,
and reuse that context for all flows using the same tunnel properties.
Commit df2ef3bff1 ("net/mlx5e: Add GRE protocol offloading")
introduced another tunnel protocol other than the single VXLAN
previously supported. A flow that uses a tunnel with the same tunnel
properties but with a different tunnel type (GRE vs VXLAN for example)
would mistakenly reuse the previous alocated context, causing the
traffic to be sent with the wrong encapsulation. Fix that by
considering the tunnel type for encap contexts.
Fixes: df2ef3bff1 ("net/mlx5e: Add GRE protocol offloading")
Signed-off-by: Eli Britstein <elibr@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Set xon = xoff - netdev's max_mtu.
netdev's max_mtu will give enough time for the pause frame to
arrive at the sender.
Fixes: 0696d60853 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Make sure the struct mlx5_flow_destination is zero before
filling in the field.
Fixes: 8da202b249 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Traditionally, the PF (Physical Function) which resides on vport 0 was
the E-switch manager. Since the ECPF (Embedded CPU Physical Function),
which resides on vport 0xfffe, was introduced as the E-Switch manager,
the assumption that the E-switch manager is on vport 0 is incorrect.
Since the eswitch code already uses the actual vport value, all we
need is to always set other_vport=1.
Signed-off-by: Omri Kahalon <omrik@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The esw offloads structures share a union with the legacy mode structs.
Reset the offloads struct to zero in init to protect from null
assumptions made by the legacy mode code.
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The capacity of FDB offloading and NIC offloading table are
different, and when allocating the pedit actions, we should
use the correct namespace type.
Fixes: c500c86b0c ("net/mlx5e: support for two independent packet edit actions")
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The esw fdb table has a union of legacy and offloads members.
So if we were in a certain esw mode we could set some memebers and not
set null which is fine as on destroy path and don't care.
But then moving from legacy to switchdev a second time, the cleanup flow
of legacy mode checks if a struct member was in use if it's not null so
we need to make sure to reset the code to null when we init legacy mode.
Fixes: 8da202b249 ("net/mlx5: E-Switch, Add support for VEPA in legacy mode.")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Huy Nguyen <huyn@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Allow configuration of legacy link-modes even when extended link-modes
are supported. This requires reading of legacy advertisement even when
extended link-modes are supported. Since legacy and extended
advertisement are mutually excluded, wait for empty reply from extended
advertisement before reading legacy advertisement.
Fixes: 6a89737241 ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Ethtool option set_link_ksettings allows setting of legacy link-modes
or extended link-modes. Refine the decision of which type of link-modes
is set.
Fixes: 6a89737241 ("net/mlx5: ethtool, Add ethtool support for 50Gbps per lane link modes")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>