Fixes the following W=1 kernel build warning(s):
drivers/net/bonding/bond_main.c:329: warning: Function parameter or member 'proto' not described in 'bond_vlan_rx_add_vid'
drivers/net/bonding/bond_main.c:362: warning: Function parameter or member 'proto' not described in 'bond_vlan_rx_kill_vid'
drivers/net/bonding/bond_main.c:964: warning: Function parameter or member 'new_active' not described in 'bond_change_active_slave'
drivers/net/bonding/bond_main.c:964: warning: Excess function parameter 'new' description in 'bond_change_active_slave'
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Thomas Davis <tadavis@lbl.gov>
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renames and missing descriptions.
Fixes the following W=1 kernel build warning(s):
drivers/net/bonding/bond_3ad.c:140: warning: Function parameter or member 'port' not described in '__get_first_agg'
drivers/net/bonding/bond_3ad.c:140: warning: Excess function parameter 'bond' description in '__get_first_agg'
drivers/net/bonding/bond_3ad.c:1655: warning: Function parameter or member 'agg' not described in 'ad_agg_selection_logic'
drivers/net/bonding/bond_3ad.c:1655: warning: Excess function parameter 'aggregator' description in 'ad_agg_selection_logic'
drivers/net/bonding/bond_3ad.c:1817: warning: Function parameter or member 'port' not described in 'ad_initialize_port'
drivers/net/bonding/bond_3ad.c:1817: warning: Excess function parameter 'aggregator' description in 'ad_initialize_port'
drivers/net/bonding/bond_3ad.c:1976: warning: Function parameter or member 'timeout' not described in 'bond_3ad_initiate_agg_selection'
drivers/net/bonding/bond_3ad.c:2274: warning: Function parameter or member 'work' not described in 'bond_3ad_state_machine_handler'
drivers/net/bonding/bond_3ad.c:2274: warning: Excess function parameter 'bond' description in 'bond_3ad_state_machine_handler'
drivers/net/bonding/bond_3ad.c:2508: warning: Function parameter or member 'link' not described in 'bond_3ad_handle_link_change'
drivers/net/bonding/bond_3ad.c:2508: warning: Excess function parameter 'status' description in 'bond_3ad_handle_link_change'
drivers/net/bonding/bond_3ad.c:2566: warning: Function parameter or member 'bond' not described in 'bond_3ad_set_carrier'
drivers/net/bonding/bond_3ad.c:2677: warning: Function parameter or member 'bond' not described in 'bond_3ad_update_lacp_rate'
drivers/net/bonding/bond_3ad.c:1655: warning: Function parameter or member 'agg' not described in 'ad_agg_selection_logic'
drivers/net/bonding/bond_3ad.c:1655: warning: Excess function parameter 'aggregator' description in 'ad_agg_selection_logic'
drivers/net/bonding/bond_3ad.c:1817: warning: Function parameter or member 'port' not described in 'ad_initialize_port'
drivers/net/bonding/bond_3ad.c:1817: warning: Excess function parameter 'aggregator' description in 'ad_initialize_port'
drivers/net/bonding/bond_3ad.c:1976: warning: Function parameter or member 'timeout' not described in 'bond_3ad_initiate_agg_selection'
drivers/net/bonding/bond_3ad.c:2274: warning: Function parameter or member 'work' not described in 'bond_3ad_state_machine_handler'
drivers/net/bonding/bond_3ad.c:2274: warning: Excess function parameter 'bond' description in 'bond_3ad_state_machine_handler'
drivers/net/bonding/bond_3ad.c:2508: warning: Function parameter or member 'link' not described in 'bond_3ad_handle_link_change'
drivers/net/bonding/bond_3ad.c:2508: warning: Excess function parameter 'status' description in 'bond_3ad_handle_link_change'
drivers/net/bonding/bond_3ad.c:2566: warning: Function parameter or member 'bond' not described in 'bond_3ad_set_carrier'
drivers/net/bonding/bond_3ad.c:2677: warning: Function parameter or member 'bond' not described in 'bond_3ad_update_lacp_rate'
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Broadcast mode bonds transmit a copy of all traffic simultaneously out of
all interfaces, so the "speed" of the bond isn't really the aggregate of
all interfaces, but rather, the speed of the slowest active interface.
Also, the type of the speed field is u32, not unsigned long, so adjust
that accordingly, as required to make min() function here without
complaining about mismatching types.
Fixes: bb5b052f75 ("bond: add support to read speed and duplex via ethtool")
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP reuseport conflict was a little bit tricky.
The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.
At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.
This requires moving the reuseport_has_conns() logic into the callers.
While we are here, get rid of inline directives as they do not belong
in foo.c files.
The other changes were cases of more straightforward overlapping
modifications.
Signed-off-by: David S. Miller <davem@davemloft.net>
Very similar to commit 544f287b84
("bonding: check error value of register_netdevice() immediately"),
we should immediately check the return value of register_netdevice()
before doing anything else.
Fixes: 005db31d5f ("bonding: set carrier off for devices created through netlink")
Reported-and-tested-by: syzbot+bbc3a11c4da63c1b74d6@syzkaller.appspotmail.com
Cc: Beniamino Galvani <bgalvani@redhat.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bond_ipsec_* helpers don't need RTNL, and can potentially get called
without it being held, so switch from rtnl_dereference() to
rcu_dereference() to access bond struct data.
Lightly tested with xfrm bonding, no problems found, should address the
syzkaller bug referenced below.
Reported-by: syzbot+582c98032903dcc04816@syzkaller.appspotmail.com
CC: Huy Nguyen <huyn@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's possible that device removal happens when the bond is in non-AB mode,
and addition happens in AB mode, so bond_ipsec_del_sa() never gets called,
which leaves security associations in an odd state if bond_ipsec_add_sa()
then gets called after switching the bond into AB. Just call add and
delete universally for all modes to keep things consistent.
However, it's also possible that this code gets called when the system is
shutting down, and the xfrm subsystem has already been disconnected from
the bond device, so we need to do some error-checking and bail, lest we
hit a null ptr deref.
Fixes: a3b658cfb6 ("bonding: allow xfrm offload setup post-module-load")
CC: Huy Nguyen <huyn@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the moment, bonding xfrm crypto offload can only be set up if the bonding
module is loaded with active-backup mode already set. We need to be able to
make this work with bonds set to AB after the bonding driver has already
been loaded.
So what's done here is:
1) move #define BOND_XFRM_FEATURES to net/bonding.h so it can be used
by both bond_main.c and bond_options.c
2) set BOND_XFRM_FEATURES in bond_dev->hw_features universally, rather than
only when loading in AB mode
3) wire up xfrmdev_ops universally too
4) disable BOND_XFRM_FEATURES in bond_dev->features if not AB
5) exit early (non-AB case) from bond_ipsec_offload_ok, to prevent a
performance hit from traversing into the underlying drivers
5) toggle BOND_XFRM_FEATURES in bond_dev->wanted_features and call
netdev_change_features() from bond_option_mode_set()
In my local testing, I can change bonding modes back and forth on the fly,
have hardware offload work when I'm in AB, and see no performance penalty
to non-AB software encryption, despite having xfrm bits all wired up for
all modes now.
Fixes: 18cb261afd ("bonding: support hardware encryption offload to slaves")
Reported-by: Huy Nguyen <huyn@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clang warns:
drivers/net/bonding/bond_main.c:4657:23: warning: equality comparison
with extraneous parentheses [-Wparentheses-equality]
if ((BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP))
~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
drivers/net/bonding/bond_main.c:4681:23: warning: equality comparison
with extraneous parentheses [-Wparentheses-equality]
if ((BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP))
~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
This warning occurs when a comparision has two sets of parentheses,
which is usually the convention for doing an assignment within an
if statement. Since equality comparisons do not need a second set of
parentheses, remove them to fix the warning.
Fixes: 18cb261afd ("bonding: support hardware encryption offload to slaves")
Link: https://github.com/ClangBuiltLinux/linux/issues/1066
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reported-by: kernelci.org bot <bot@kernelci.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than requiring every hw crypto capable NIC driver to do a check for
slave_dev being set, set real_dev in the xfrm layer and xso init time, and
then override it in the bonding driver as needed. Then NIC drivers can
always use real_dev, and at the same time, we eliminate the use of a
variable name that probably shouldn't have been used in the first place,
particularly given recent current events.
CC: Boris Pismenny <borisp@mellanox.com>
CC: Saeed Mahameed <saeedm@mellanox.com>
CC: Leon Romanovsky <leon@kernel.org>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
Suggested-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, this support is limited to active-backup mode, as I'm not sure
about the feasilibity of mapping an xfrm_state's offload handle to
multiple hardware devices simultaneously, and we rely on being able to
pass some hints to both the xfrm and NIC driver about whether or not
they're operating on a slave device.
I've tested this atop an Intel x520 device (ixgbe) using libreswan in
transport mode, succesfully achieving ~4.3Gbps throughput with netperf
(more or less identical to throughput on a bare NIC in this system),
as well as successful failover and recovery mid-netperf.
v2: just use CONFIG_XFRM_OFFLOAD for wrapping, isolate more code with it
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Jakub Kicinski <kuba@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: netdev@vger.kernel.org
CC: intel-wired-lan@lists.osuosl.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dynamic key update for addr_list_lock still causes troubles,
for example the following race condition still exists:
CPU 0: CPU 1:
(RCU read lock) (RTNL lock)
dev_mc_seq_show() netdev_update_lockdep_key()
-> lockdep_unregister_key()
-> netif_addr_lock_bh()
because lockdep doesn't provide an API to update it atomically.
Therefore, we have to move it back to static keys and use subclass
for nest locking like before.
In commit 1a33e10e4a ("net: partially revert dynamic lockdep key
changes"), I already reverted most parts of commit ab92d68fc2
("net: core: add generic lockdep keys").
This patch reverts the rest and also part of commit f3b0a18bb6
("net: remove unnecessary variables and callback"). After this
patch, addr_list_lock changes back to using static keys and
subclasses to satisfy lockdep. Thanks to dev->lower_level, we do
not have to change back to ->ndo_get_lock_subclass().
And hopefully this reduces some syzbot lockdep noises too.
Reported-by: syzbot+f3a0e80c34b3fc28ac5e@syzkaller.appspotmail.com
Cc: Taehee Yoo <ap420073@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
xdp_umem.c had overlapping changes between the 64-bit math fix
for the calculation of npgs and the removal of the zerocopy
memory type which got rid of the chunk_size_nohdr member.
The mlx5 Kconfig conflict is a case where we just take the
net-next copy of the Kconfig entry dependency as it takes on
the ESWITCH dependency by one level of indirection which is
what the 'net' conflicting change is trying to ensure.
Signed-off-by: David S. Miller <davem@davemloft.net>
kobject_init_and_add() takes reference even when it fails.
If this function returns an error, kobject_put() must be called to
properly clean up the memory associated with the object. Previous
commit "b8eb718348b8" fixed a similar problem.
Fixes: 07699f9a7c ("bonding: add sysfs /slave dir for bond slave devices.")
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This merge includes updates to bonding driver needed for the rdma stack,
to avoid conflicts with the RDMA branch.
Maor Gottlieb Says:
====================
Bonding: Add support to get xmit slave
The following series adds support to get the LAG master xmit slave by
introducing new .ndo - ndo_get_xmit_slave. Every LAG module can
implement it and it first implemented in the bond driver.
This is follow-up to the RFC discussion [1].
The main motivation for doing this is for drivers that offload part
of the LAG functionality. For example, Mellanox Connect-X hardware
implements RoCE LAG which selects the TX affinity when the resources
are created and port is remapped when it goes down.
The first part of this patchset introduces the new .ndo and add the
support to the bonding module.
The second part adds support to get the RoCE LAG xmit slave by building
skb of the RoCE packet based on the AH attributes and call to the new
.ndo.
The third part change the mlx5 driver driver to set the QP's affinity
port according to the slave which found by the .ndo.
====================
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Currently, bonding always returns NETDEV_TX_OK to its caller.
It is worth trying to be more accurate : TCP for instance
can have different recovery strategies if it can have more
precise status, if packet was dropped by slave qdisc.
This is especially important when host is under stress.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit b3e80d44f5
("bonding: fix lockdep warning in bond_get_stats()") the dynamic
key is no longer necessary, as we compute nest level at run-time.
So, we can just remove it to save some lockdep key entries.
Test commands:
ip link add bond0 type bond
ip link add bond1 type bond
ip link set bond0 master bond1
ip link set bond0 nomaster
ip link set bond1 master bond0
Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reverts the folowing commits:
commit 064ff66e2b
"bonding: add missing netdev_update_lockdep_key()"
commit 53d374979e
"net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()"
commit 1f26c0d3d2
"net: fix kernel-doc warning in <linux/netdevice.h>"
commit ab92d68fc2
"net: core: add generic lockdep keys"
but keeps the addr_list_lock_key because we still lock
addr_list_lock nestedly on stack devices, unlikely xmit_lock
this is safe because we don't take addr_list_lock on any fast
path.
Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add implementation of ndo_get_xmit_slave. Find the slave by using the
helper function according to the bond mode. If the caller set all_slaves
to true, then it assumes that all slaves are available to transmit.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Keep all slaves in array so it could be used to get the xmit slave
assume all the slaves are active.
The logic to add slave to the array is like the usable slaves, except
that we also add slaves that currently can't transmit - not up or active.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add helper function to get the xmit slave in active-backup mode.
It's only one line function that return the curr_active_slave,
but it will used both in the xmit flow and by the new .ndo to get
the xmit slave.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add helper function to get the xmit slave when bond is in round
robin mode. Change bond_xmit_slave_id to bond_get_slave_by_id, then
the logic for find the next slave for transmit could be used
both by the xmit flow and the .ndo to get the xmit slave.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Both xor and 802.3ad modes use bond_xmit_hash to get the xmit slave.
Export the logic to helper function so it could be used in the
following patches by the .ndo to get the xmit slave.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Add two helper functions to get the xmit slave of bond in alb or tlb
mode. Extract the logic of find the xmit slave from the xmit flow
to function. Xmit flow will xmit through this slave and in the
following patches the new .ndo will call to the helper function
to return the xmit slave.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Rename slave_arr to usable_slaves, since we will have two arrays,
one for the usable slaves and the other to all slaves.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
As a preparation for following change that add array of
all slaves, extract code that skip slave to function.
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Get rid of linux/vermagic.h includes, so that MODULE_ARCH_VERMAGIC from
the arch header arch/x86/include/asm/module.h won't be redefined.
In file included from ./include/linux/module.h:30,
from drivers/net/ethernet/3com/3c515.c:56:
./arch/x86/include/asm/module.h:73: warning: "MODULE_ARCH_VERMAGIC"
redefined
73 | # define MODULE_ARCH_VERMAGIC MODULE_PROC_FAMILY
|
In file included from drivers/net/ethernet/3com/3c515.c:25:
./include/linux/vermagic.h:28: note: this is the location of the
previous definition
28 | #define MODULE_ARCH_VERMAGIC ""
|
Fixes: 6bba2e89a8 ("net/3com: Delete driver and module versions from 3com drivers")
Co-developed-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Shannon Nelson <snelson@pensando.io> # ionic
Acked-by: Sebastian Reichel <sre@kernel.org> # power
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The in-kernel code has already unique version, which is based
on Linus's tag, update the bond driver to be consistent with that
version.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial cleanup, so that all bridge port-specific code can be found in
one go.
CC: Johannes Berg <johannes@sipsolutions.net>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
syzbot managed to send an IPX packet through bond_alb_xmit()
and af_packet and triggered a use-after-free.
First, bond_alb_xmit() was using ipx_hdr() helper to reach
the IPX header, but ipx_hdr() was using the transport offset
instead of the network offset. In the particular syzbot
report transport offset was 0xFFFF
This patch removes ipx_hdr() since it was only (mis)used from bonding.
Then we need to make sure IPv4/IPv6/IPX headers are pulled
in skb->head before dereferencing anything.
BUG: KASAN: use-after-free in bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
Read of size 2 at addr ffff8801ce56dfff by task syz-executor.2/18108
(if (ipx_hdr(skb)->ipx_checksum != IPX_NO_CHECKSUM) ...)
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
[<ffffffff8441fc42>] __dump_stack lib/dump_stack.c:17 [inline]
[<ffffffff8441fc42>] dump_stack+0x14d/0x20b lib/dump_stack.c:53
[<ffffffff81a7dec4>] print_address_description+0x6f/0x20b mm/kasan/report.c:282
[<ffffffff81a7e0ec>] kasan_report_error mm/kasan/report.c:380 [inline]
[<ffffffff81a7e0ec>] kasan_report mm/kasan/report.c:438 [inline]
[<ffffffff81a7e0ec>] kasan_report.cold+0x8c/0x2a0 mm/kasan/report.c:422
[<ffffffff81a7dc4f>] __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:469
[<ffffffff82c8c00a>] bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
[<ffffffff82c60c74>] __bond_start_xmit drivers/net/bonding/bond_main.c:4199 [inline]
[<ffffffff82c60c74>] bond_start_xmit+0x4f4/0x1570 drivers/net/bonding/bond_main.c:4224
[<ffffffff83baa558>] __netdev_start_xmit include/linux/netdevice.h:4525 [inline]
[<ffffffff83baa558>] netdev_start_xmit include/linux/netdevice.h:4539 [inline]
[<ffffffff83baa558>] xmit_one net/core/dev.c:3611 [inline]
[<ffffffff83baa558>] dev_hard_start_xmit+0x168/0x910 net/core/dev.c:3627
[<ffffffff83bacf35>] __dev_queue_xmit+0x1f55/0x33b0 net/core/dev.c:4238
[<ffffffff83bae3a8>] dev_queue_xmit+0x18/0x20 net/core/dev.c:4278
[<ffffffff84339189>] packet_snd net/packet/af_packet.c:3226 [inline]
[<ffffffff84339189>] packet_sendmsg+0x4919/0x70b0 net/packet/af_packet.c:3252
[<ffffffff83b1ac0c>] sock_sendmsg_nosec net/socket.c:673 [inline]
[<ffffffff83b1ac0c>] sock_sendmsg+0x12c/0x160 net/socket.c:684
[<ffffffff83b1f5a2>] __sys_sendto+0x262/0x380 net/socket.c:1996
[<ffffffff83b1f700>] SYSC_sendto net/socket.c:2008 [inline]
[<ffffffff83b1f700>] SyS_sendto+0x40/0x60 net/socket.c:2004
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the LACP actor/partner state is now part of the uapi, rename the
3ad state defines with LACP prefix. The LACP prefix is preferred over
BOND_3AD as the LACP standard moved to 802.1AX.
Fixes: 826f66b30c ("bonding: move 802.3ad port state flags to uapi")
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the recent fix in commit 1899bb3251 ("bonding: fix state
transition issue in link monitoring"), the active-backup mode with
miimon initially come-up fine but after a link-failure, both members
transition into backup state.
Following steps to reproduce the scenario (eth1 and eth2 are the
slaves of the bond):
ip link set eth1 up
ip link set eth2 down
sleep 1
ip link set eth2 up
ip link set eth1 down
cat /sys/class/net/eth1/bonding_slave/state
cat /sys/class/net/eth2/bonding_slave/state
Fixes: 1899bb3251 ("bonding: fix state transition issue in link monitoring")
CC: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The bond slave actor/partner operating state is exported as
bitfield to userspace, which lacks a way to interpret it, e.g.,
iproute2 only prints the state as a number:
ad_actor_oper_port_state 15
For userspace to interpret the bitfield, the bitfield definitions
should be part of the uapi. The bitfield itself is defined in the
802.3ad standard.
This commit moves the 802.3ad bitfield definitions to uapi.
Related iproute2 patches, soon to be posted upstream, use the new uapi
headers to pretty-print bond slave state, e.g., with ip -d link show
ad_actor_oper_port_state_str <active,short_timeout,aggregating,in_sync>
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
neigh_cleanup() has not been used for seven years, and was a wrong design.
Messing with shared pointer in bond_neigh_init() without proper
memory barriers would at least trigger syzbot complains eventually.
It is time to remove this stuff.
Fixes: b63b70d877 ("IPoIB: Use a private hash table for path lookup in xmit path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A bonding with layer2+3 or layer3+4 hashing uses the IP addresses and the ports
to balance packets between slaves. With some network errors, we receive an ICMP
error packet by the remote host or a router. If sent by a router, the source IP
can differ from the remote host one. Additionally the ICMP protocol has no port
numbers, so a layer3+4 bonding will get a different hash than the previous one.
These two conditions could let the packet go through a different interface than
the other packets of the same flow:
# tcpdump -qltnni veth0 |sed 's/^/0: /' &
# tcpdump -qltnni veth1 |sed 's/^/1: /' &
# hping3 -2 192.168.0.2 -p 9
0: IP 192.168.0.1.2251 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
1: IP 192.168.0.1.2252 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
1: IP 192.168.0.1.2253 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
0: IP 192.168.0.1.2254 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
An ICMP error packet contains the header of the packet which caused the network
error, so inspect it and match the flow against it, so we can send the ICMP via
the same interface of the previous packet in the flow.
Move the IP and port dissect code into a generic function bond_flow_ip() and if
we are dissecting an ICMP error packet, call it again with the adjusted offset.
# hping3 -2 192.168.0.2 -p 9
1: IP 192.168.0.1.1224 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
1: IP 192.168.0.1.1225 > 192.168.0.2.9: UDP, length 0
1: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
0: IP 192.168.0.1.1226 > 192.168.0.2.9: UDP, length 0
0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
0: IP 192.168.0.1.1227 > 192.168.0.2.9: UDP, length 0
0: IP 192.168.0.2 > 192.168.0.1: ICMP 192.168.0.2 udp port 9 unreachable, length 36
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One conflict in the BPF samples Makefile, some fixes in 'net' whilst
we were converting over to Makefile.target rules in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Since de77ecd4ef ("bonding: improve link-status update in
mii-monitoring"), the bonding driver has utilized two separate variables
to indicate the next link state a particular slave should transition to.
Each is used to communicate to a different portion of the link state
change commit logic; one to the bond_miimon_commit function itself, and
another to the state transition logic.
Unfortunately, the two variables can become unsynchronized,
resulting in incorrect link state transitions within bonding. This can
cause slaves to become stuck in an incorrect link state until a
subsequent carrier state transition.
The issue occurs when a special case in bond_slave_netdev_event
sets slave->link directly to BOND_LINK_FAIL. On the next pass through
bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL
case will set the proposed next state (link_new_state) to BOND_LINK_UP,
but the new_link to BOND_LINK_DOWN. The setting of the final link state
from new_link comes after that from link_new_state, and so the slave
will end up incorrectly in _DOWN state.
Resolve this by combining the two variables into one.
Reported-by: Aleksei Zakharov <zakharov.a.g@yandex.ru>
Reported-by: Sha Zhang <zhangsha.zhang@huawei.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Fixes: de77ecd4ef ("bonding: improve link-status update in mii-monitoring")
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only slightly tricky merge conflict was the netdevsim because the
mutex locking fix overlapped a lot of driver reload reorganization.
The rest were (relatively) trivial in nature.
Signed-off-by: David S. Miller <davem@davemloft.net>
The bonding uses the L4 ports to balance flows between slaves. As the ICMP
protocol has no ports, those packets are sent all to the same device:
# tcpdump -qltnni veth0 ip |sed 's/^/0: /' &
# tcpdump -qltnni veth1 ip |sed 's/^/1: /' &
# ping -qc1 192.168.0.2
1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 315, seq 1, length 64
1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 315, seq 1, length 64
# ping -qc1 192.168.0.2
1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 316, seq 1, length 64
1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 316, seq 1, length 64
# ping -qc1 192.168.0.2
1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 317, seq 1, length 64
1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 317, seq 1, length 64
But some ICMP packets have an Identifier field which is
used to match packets within sessions, let's use this value in the hash
function to balance these packets between bond slaves:
# ping -qc1 192.168.0.2
0: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 303, seq 1, length 64
0: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 303, seq 1, length 64
# ping -qc1 192.168.0.2
1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 304, seq 1, length 64
1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 304, seq 1, length 64
Aso, let's use a flow_dissector_key which defines FLOW_DISSECTOR_KEY_ICMP,
so we can balance pings encapsulated in a tunnel when using mode encap3+4:
# ping -q 192.168.1.2 -c1
0: IP 192.168.0.1 > 192.168.0.2: GREv0, length 102: IP 192.168.1.1 > 192.168.1.2: ICMP echo request, id 585, seq 1, length 64
0: IP 192.168.0.2 > 192.168.0.1: GREv0, length 102: IP 192.168.1.2 > 192.168.1.1: ICMP echo reply, id 585, seq 1, length 64
# ping -q 192.168.1.2 -c1
1: IP 192.168.0.1 > 192.168.0.2: GREv0, length 102: IP 192.168.1.1 > 192.168.1.2: ICMP echo request, id 586, seq 1, length 64
1: IP 192.168.0.2 > 192.168.0.1: GREv0, length 102: IP 192.168.1.2 > 192.168.1.1: ICMP echo reply, id 586, seq 1, length 64
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes variables and callback these are related to the nested
device structure.
devices that can be nested have their own nest_level variable that
represents the depth of nested devices.
In the previous patch, new {lower/upper}_level variables are added and
they replace old private nest_level variable.
So, this patch removes all 'nest_level' variables.
In order to avoid lockdep warning, ->ndo_get_lock_subclass() was added
to get lockdep subclass value, which is actually lower nested depth value.
But now, they use the dynamic lockdep key to avoid lockdep warning instead
of the subclass.
So, this patch removes ->ndo_get_lock_subclass() callback.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All bonding device has same lockdep key and subclass is initialized with
nest_level.
But actual nest_level value can be changed when a lower device is attached.
And at this moment, the subclass should be updated but it seems to be
unsafe.
So this patch makes bonding use dynamic lockdep key instead of the
subclass.
Test commands:
ip link add bond0 type bond
for i in {1..5}
do
let A=$i-1
ip link add bond$i type bond
ip link set bond$i master bond$A
done
ip link set bond5 master bond0
Splat looks like:
[ 307.992912] WARNING: possible recursive locking detected
[ 307.993656] 5.4.0-rc3+ #96 Tainted: G W
[ 307.994367] --------------------------------------------
[ 307.995092] ip/761 is trying to acquire lock:
[ 307.995710] ffff8880513aac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 307.997045]
but task is already holding lock:
[ 307.997923] ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 307.999215]
other info that might help us debug this:
[ 308.000251] Possible unsafe locking scenario:
[ 308.001137] CPU0
[ 308.001533] ----
[ 308.001915] lock(&(&bond->stats_lock)->rlock#2/2);
[ 308.002609] lock(&(&bond->stats_lock)->rlock#2/2);
[ 308.003302]
*** DEADLOCK ***
[ 308.004310] May be due to missing lock nesting notation
[ 308.005319] 3 locks held by ip/761:
[ 308.005830] #0: ffffffff9fcc42b0 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x466/0x8a0
[ 308.006894] #1: ffff88805fcbac60 (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0xb8/0x500 [bonding]
[ 308.008243] #2: ffffffff9f9219c0 (rcu_read_lock){....}, at: bond_get_stats+0x9f/0x500 [bonding]
[ 308.009422]
stack backtrace:
[ 308.010124] CPU: 0 PID: 761 Comm: ip Tainted: G W 5.4.0-rc3+ #96
[ 308.011097] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 308.012179] Call Trace:
[ 308.012601] dump_stack+0x7c/0xbb
[ 308.013089] __lock_acquire+0x269d/0x3de0
[ 308.013669] ? register_lock_class+0x14d0/0x14d0
[ 308.014318] lock_acquire+0x164/0x3b0
[ 308.014858] ? bond_get_stats+0xb8/0x500 [bonding]
[ 308.015520] _raw_spin_lock_nested+0x2e/0x60
[ 308.016129] ? bond_get_stats+0xb8/0x500 [bonding]
[ 308.017215] bond_get_stats+0xb8/0x500 [bonding]
[ 308.018454] ? bond_arp_rcv+0xf10/0xf10 [bonding]
[ 308.019710] ? rcu_read_lock_held+0x90/0xa0
[ 308.020605] ? rcu_read_lock_sched_held+0xc0/0xc0
[ 308.021286] ? bond_get_stats+0x9f/0x500 [bonding]
[ 308.021953] dev_get_stats+0x1ec/0x270
[ 308.022508] bond_get_stats+0x1d1/0x500 [bonding]
Fixes: d3fff6c443 ("net: add netdev_lockdep_set_classes() helper")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some interface types could be nested.
(VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..)
These interface types should set lockdep class because, without lockdep
class key, lockdep always warn about unexisting circular locking.
In the current code, these interfaces have their own lockdep class keys and
these manage itself. So that there are so many duplicate code around the
/driver/net and /net/.
This patch adds new generic lockdep keys and some helper functions for it.
This patch does below changes.
a) Add lockdep class keys in struct net_device
- qdisc_running, xmit, addr_list, qdisc_busylock
- these keys are used as dynamic lockdep key.
b) When net_device is being allocated, lockdep keys are registered.
- alloc_netdev_mqs()
c) When net_device is being free'd llockdep keys are unregistered.
- free_netdev()
d) Add generic lockdep key helper function
- netdev_register_lockdep_key()
- netdev_unregister_lockdep_key()
- netdev_update_lockdep_key()
e) Remove unnecessary generic lockdep macro and functions
f) Remove unnecessary lockdep code of each interfaces.
After this patch, each interface modules don't need to maintain
their lockdep keys.
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The debugfs core now will print a message if this function fails, so
don't duplicate that logic. Also, no need to change the code logic if
the call fails either, as no debugfs calls should interrupt normal
kernel code for any reason.
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As commit 30d8177e8a ("bonding: Always enable vlan tx offload")
said, we should always enable bonding's vlan tx offload, pass the
vlan packets to the slave devices with vlan tci, let them to handle
vlan implementation.
Now if encapsulation protocols like VXLAN is used, skb->encapsulation
may be set, then the packet is passed to vlan device which based on
bonding device. However in netif_skb_features(), the check of
hw_enc_features:
if (skb->encapsulation)
features &= dev->hw_enc_features;
clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results
in same issue in commit 30d8177e8a like this:
vlan_dev_hard_start_xmit
-->dev_queue_xmit
-->validate_xmit_skb
-->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared
-->validate_xmit_vlan
-->__vlan_hwaccel_push_inside //skb->tci is cleared
...
--> bond_start_xmit
--> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34
--> __skb_flow_dissect // nhoff point to IP header
--> case htons(ETH_P_8021Q)
// skb_vlan_tag_present is false, so
vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
//vlan point to ip header wrongly
Fixes: b2a103e6d0 ("bonding: convert to ndo_fix_features")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following scenario was encountered during testing of logical
partition mobility on pseries partitions with bonded ibmvnic
adapters in LACP mode.
1. Driver receives a signal that the device has been
swapped, and it needs to reset to initialize the new
device.
2. Driver reports loss of carrier and begins initialization.
3. Bonding driver receives NETDEV_CHANGE notifier and checks
the slave's current speed and duplex settings. Because these
are unknown at the time, the bond sets its link state to
BOND_LINK_FAIL and handles the speed update, clearing
AD_PORT_LACP_ENABLE.
4. Driver finishes recovery and reports that the carrier is on.
5. Bond receives a new notification and checks the speed again.
The speeds are valid but miimon has not altered the link
state yet. AD_PORT_LACP_ENABLE remains off.
Because the slave's link state is still BOND_LINK_FAIL,
no further port checks are made when it recovers. Though
the slave devices are operational and have valid speed
and duplex settings, the bond will not send LACPDU's. The
simplest fix I can see is to force another speed check
in bond_miimon_commit. This way the bond will update
AD_PORT_LACP_ENABLE if needed when transitioning from
BOND_LINK_FAIL to BOND_LINK_UP.
CC: Jarod Wilson <jarod@redhat.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IFLA_BOND_PEER_NOTIF_DELAY was set to the value of downdelay instead
of peer_notif_delay. After this change, the correct value is exported.
Fixes: 07a4ddec3c ("bonding: add an option to specify a delay between peer notifications")
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, gratuitous ARP/ND packets are sent every `miimon'
milliseconds. This commit allows a user to specify a custom delay
through a new option, `peer_notif_delay'.
Like for `updelay' and `downdelay', this delay should be a multiple of
`miimon' to avoid managing an additional work queue. The configuration
logic is copied from `updelay' and `downdelay'. However, the default
value cannot be set using a module parameter: Netlink or sysfs should
be used to configure this feature.
When setting `miimon' to 100 and `peer_notif_delay' to 500, we can
observe the 500 ms delay is respected:
20:30:19.354693 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
20:30:19.874892 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
20:30:20.394919 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
20:30:20.914963 ARP, Request who-has 203.0.113.10 tell 203.0.113.10, length 28
In bond_mii_monitor(), I have tried to keep the lock logic readable.
The change is due to the fact we cannot rely on a notification to
lower the value of `bond->send_peer_notif' as `NETDEV_NOTIFY_PEERS' is
only triggered once every N times, while we need to decrement the
counter each time.
iproute2 also needs to be updated to be able to specify this new
attribute through `ip link'.
Signed-off-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
bond_xmit_roundrobin() checks for IGMP packets but it parses
the IP header even before checking skb->protocol.
We should validate the IP header with pskb_may_pull() before
using iph->protocol.
Reported-and-tested-by: syzbot+e5be16aa39ad6e755391@syzkaller.appspotmail.com
Fixes: a2fd940f4c ("bonding: fix broken multicast with round-robin mode")
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The new route handling in ip_mc_finish_output() from 'net' overlapped
with the new support for returning congestion notifications from BPF
programs.
In order to handle this I had to take the dev_loopback_xmit() calls
out of the switch statement.
The aquantia driver conflicts were simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
We build vlan on top of bonding interface, which vlan offload
is off, bond mode is 802.3ad (LACP) and xmit_hash_policy is
BOND_XMIT_POLICY_ENCAP34.
Because vlan tx offload is off, vlan tci is cleared and skb push
the vlan header in validate_xmit_vlan() while sending from vlan
devices. Then in bond_xmit_hash, __skb_flow_dissect() fails to
get information from protocol headers encapsulated within vlan,
because 'nhoff' is points to IP header, so bond hashing is based
on layer 2 info, which fails to distribute packets across slaves.
This patch always enable bonding's vlan tx offload, pass the vlan
packets to the slave devices with vlan tci, let them to handle
vlan implementation.
Fixes: 278339a42a ("bonding: propogate vlan_features to bonding master")
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All of these printk instances benefit from having both master and slave
device information included, so convert to using a standardized macro
format and remove redundant information.
Suggested-by: Joe Perches <joe@perches.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All of these printk instances benefit from having both master and slave
device information included, so convert to using a standardized macro
format and remove redundant information.
Suggested-by: Joe Perches <joe@perches.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All of these printk instances benefit from having both master and slave
device information included, so convert to using a standardized macro
format and remove redundant information.
Suggested-by: Joe Perches <joe@perches.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All of these printk instances benefit from having both master and slave
device information included, so convert to using a standardized macro
format and remove redundant information.
Suggested-by: Joe Perches <joe@perches.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Passing the bond name again to debug output when referencing slave is wrong.
We're trying to set the bond's MAC to that of the new_active slave, so adjust
the error message slightly and pass in the slave's name, not the bond's.
Then we're trying to set the MAC on the old active slave, but putting the
new active slave's name in the output. While we're at it, clarify the
error messages so you know which one actually triggered.
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Seeing bonding debug log data along the lines of "event: 5" is a bit spartan,
and often requires a lookup table if you don't remember what every event is.
Make use of netdev_cmd_to_name for an improved debugging experience, so for
the prior example, you'll see: "bond_netdev_event received NETDEV_REGISTER"
instead (both are prefixed with the device for which the event pertains).
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some ISDN files that got removed in net-next had some changes
done in mainline, take the removals.
Signed-off-by: David S. Miller <davem@davemloft.net>
When setting the bonding interface net device features,
the kernel code doesn't address the slaves' MPLS features
and doesn't inherit them.
Therefore, HW offloads that enhance performance such as
checksumming and TSO are disabled for MPLS tagged traffic
flowing via the bonding interface.
The patch add the inheritance of the MPLS features from the
slave devices with a similar logic to setting the bonding device's
VLAN and encapsulation features.
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Ariel Levkovich <lariel@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is another set of reviewed patches that adds SPDX tags to different
kernel files, based on a set of rules that are being used to parse the
comments to try to determine that the license of the file is
"GPL-2.0-or-later" or "GPL-2.0-only". Only the "obvious" versions of
these matches are included here, a number of "non-obvious" variants of
text have been found but those have been postponed for later review and
analysis.
There is also a patch in here to add the proper SPDX header to a bunch
of Kbuild files that we have missed in the past due to new files being
added and forgetting that Kbuild uses two different file names for
Makefiles. This issue was reported by the Kbuild maintainer.
These patches have been out for review on the linux-spdx@vger mailing
list, and while they were created by automatic tools, they were
hand-verified by a bunch of different people, all whom names are on the
patches are reviewers.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXPCHLg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykxyACgql6ktH+Tv8Ho1747kKPiFca1Jq0AoK5HORXI
yB0DSTXYNjMtH41ypnsZ
=x2f8
-----END PGP SIGNATURE-----
Merge tag 'spdx-5.2-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull yet more SPDX updates from Greg KH:
"Here is another set of reviewed patches that adds SPDX tags to
different kernel files, based on a set of rules that are being used to
parse the comments to try to determine that the license of the file is
"GPL-2.0-or-later" or "GPL-2.0-only". Only the "obvious" versions of
these matches are included here, a number of "non-obvious" variants of
text have been found but those have been postponed for later review
and analysis.
There is also a patch in here to add the proper SPDX header to a bunch
of Kbuild files that we have missed in the past due to new files being
added and forgetting that Kbuild uses two different file names for
Makefiles. This issue was reported by the Kbuild maintainer.
These patches have been out for review on the linux-spdx@vger mailing
list, and while they were created by automatic tools, they were
hand-verified by a bunch of different people, all whom names are on
the patches are reviewers"
* tag 'spdx-5.2-rc3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (82 commits)
treewide: Add SPDX license identifier - Kbuild
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 225
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 224
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 223
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 222
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 221
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 220
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 218
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 217
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 216
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 215
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 214
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 213
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 211
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 210
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 207
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 203
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 201
...
Pull networking fixes from David Miller:
1) Fix OOPS during nf_tables rule dump, from Florian Westphal.
2) Use after free in ip_vs_in, from Yue Haibing.
3) Fix various kTLS bugs (NULL deref during device removal resync,
netdev notification ignoring, etc.) From Jakub Kicinski.
4) Fix ipv6 redirects with VRF, from David Ahern.
5) Memory leak fix in igmpv3_del_delrec(), from Eric Dumazet.
6) Missing memory allocation failure check in ip6_ra_control(), from
Gen Zhang. And likewise fix ip_ra_control().
7) TX clean budget logic error in aquantia, from Igor Russkikh.
8) SKB leak in llc_build_and_send_ui_pkt(), from Eric Dumazet.
9) Double frees in mlx5, from Parav Pandit.
10) Fix lost MAC address in r8169 during PCI D3, from Heiner Kallweit.
11) Fix botched register access in mvpp2, from Antoine Tenart.
12) Use after free in napi_gro_frags(), from Eric Dumazet.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (89 commits)
net: correct zerocopy refcnt with udp MSG_MORE
ethtool: Check for vlan etype or vlan tci when parsing flow_rule
net: don't clear sock->sk early to avoid trouble in strparser
net-gro: fix use-after-free read in napi_gro_frags()
net: dsa: tag_8021q: Create a stable binary format
net: dsa: tag_8021q: Change order of rx_vid setup
net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value
ipv4: tcp_input: fix stack out of bounds when parsing TCP options.
mlxsw: spectrum: Prevent force of 56G
mlxsw: spectrum_acl: Avoid warning after identical rules insertion
net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT
r8169: fix MAC address being lost in PCI D3
net: core: support XDP generic on stacked devices.
netvsc: unshare skb in VF rx handler
udp: Avoid post-GRO UDP checksum recalculation
net: phy: dp83867: Set up RGMII TX delay
net: phy: dp83867: do not call config_init twice
net: phy: dp83867: increase SGMII autoneg timer duration
net: phy: dp83867: fix speed 10 in sgmii mode
net: phy: marvell10g: report if the PHY fails to boot firmware
...
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Once in a while, with just the right timing, 802.3ad slaves will fail to
properly initialize, winding up in a weird state, with a partner system
mac address of 00:00:00:00:00:00. This started happening after a fix to
properly track link_failure_count tracking, where an 802.3ad slave that
reported itself as link up in the miimon code, but wasn't able to get a
valid speed/duplex, started getting set to BOND_LINK_FAIL instead of
BOND_LINK_DOWN. That was the proper thing to do for the general "my link
went down" case, but has created a link initialization race that can put
the interface in this odd state.
The simple fix is to instead set the slave link to BOND_LINK_DOWN again,
if the link has never been up (last_link_up == 0), so the link state
doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed
in this case, it simply hasn't been up yet, and this prevents the
unnecessary state change from DOWN to FAIL and getting stuck in an init
failure w/o a partner mac.
Fixes: ea53abfab9 ("bonding/802.3ad: fix link_failure_count tracking")
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Tested-by: Heesoon Kim <Heesoon.Kim@stratus.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not write to the free software foundation inc
59 temple place suite 330 boston ma 02111 1307 usa the full gnu
general public license is included in this distribution in the file
called license
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 4 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190520075211.959886972@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details you
should have received a copy of the gnu general public license along
with this program if not see http www gnu org licenses the full gnu
general public license is included in this distribution in the file
called license
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Jilayne Lovejoy <opensource@jilayne.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190519154041.052102771@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There's currently a problem with toggling arp_validate on and off with an
active-backup bond. At the moment, you can start up a bond, like so:
modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
ip link set bond0 down
echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
ip link set bond0 up
ip addr add 192.168.1.2/24 dev bond0
Pings to 192.168.1.1 work just fine. Now turn on arp_validate:
echo 1 > /sys/class/net/bond0/bonding/arp_validate
Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
arp_validate off again, the link falls flat on it's face:
echo 0 > /sys/class/net/bond0/bonding/arp_validate
dmesg
...
[133191.911987] bond0: Setting arp_validate to none (0)
[133194.257793] bond0: bond_should_notify_peers: slave ens4f0
[133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
[133194.259000] bond0: making interface ens4f1 the new active one
[133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
[133197.331191] bond0: now running without any active interface!
The problem lies in bond_options.c, where passing in arp_validate=0
results in bond->recv_probe getting set to NULL. This flies directly in
the face of commit 3fe68df97c, which says we need to set recv_probe =
bond_arp_recv, even if we're not using arp_validate. Said commit fixed
this in bond_option_arp_interval_set, but missed that we can get to that
same state in bond_option_arp_validate_set as well.
One solution would be to universally set recv_probe = bond_arp_recv here
as well, but I don't think bond_option_arp_validate_set has any business
touching recv_probe at all, and that should be left to the arp_interval
code, so we can just make things much tidier here.
Fixes: 3fe68df97c ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.
Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().
Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:
@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)
@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a bond is enslaved to another bond, bond_netdev_event() only
handles the event as if the bond is a master, and skips treating the
bond as a slave.
This leads to a refcount leak on the slave, since we don't remove the
adjacency to its master and the master holds a reference on the slave.
Reproducer:
ip link add bondL type bond
ip link add bondU type bond
ip link set bondL master bondU
ip link del bondL
No "Fixes:" tag, this code is older than git history.
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor comment merge conflict in mlx5.
Staging driver has a fixup due to the skb->xmit_more changes
in 'net-next', but was removed in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Bond expects ethernet hwaddr for its slave, but it can be longer than 6
bytes - infiniband interface for example.
# cat /sys/devices/<skipped>/net/ib0/address
80:00:02:08:fe:80:00:00:00:00:00:00:7c:fe:90:03:00:be:5d:e1
# cat /sys/devices/<skipped>/net/ib0/bonding_slave/perm_hwaddr
80:00:02:08:fe:80
So print full hwaddr in sysfs "bonding_slave/perm_hwaddr" as well.
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the previous patch, all the callers of ndo_select_queue()
provide as a 'fallback' argument netdev_pick_tx.
The only exceptions are nested calls to ndo_select_queue(),
which pass down the 'fallback' available in the current scope
- still netdev_pick_tx.
We can drop such argument and replace fallback() invocation with
netdev_pick_tx(). This avoids an indirect call per xmit packet
in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
with device drivers implementing such ndo. It also clean the code
a bit.
Tested with ixgbe and CONFIG_FCOE=m
With pktgen using queue xmit:
threads vanilla patched
(kpps) (kpps)
1 2334 2428
2 4166 4278
4 7895 8100
v1 -> v2:
- rebased after helper's name change
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is no longer necessary after eca59f6915 ("net: Remove support for bridge bypass ndos from stacked devices")
Suggested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andy Gospodarek <andy@greyhouse.net>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three conflicts, one of which, for marvell10g.c is non-trivial and
requires some follow-up from Heiner or someone else.
The issue is that Heiner converted the marvell10g driver over to
use the generic c45 code as much as possible.
However, in 'net' a bug fix appeared which makes sure that a new
local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
is cleared.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes a subtle PACKET_ORIGDEV regression which was a side
effect of fixes introduced by:
6a9e461f6f bonding: pass link-local packets to bonding master also.
... to:
b89f04c61e bonding: deliver link-local packets with skb->dev set to link that packets arrived on
While 6a9e461f6f restored pre-b89f04c61efe presence of link-local
packets on bonding masters (which is required e.g. by linux bridges
participating in spanning tree or needed for lab-like setups created
with group_fwd_mask) it also caused the originating device
information to be lost due to cloning.
Maciej Żenczykowski proposed another solution that doesn't require
packet cloning and retains original device information - instead of
returning RX_HANDLER_PASS for all link-local packets it's now limited
only to packets from inactive slaves.
At the same time, packets passed to bonding masters retain correct
information about the originating device and PACKET_ORIGDEV can be used
to determine it.
This elegantly solves all issues so far:
- link-local packets that were removed from bonding masters
- LLDP daemons being forced to explicitly bind to slave interfaces
- PACKET_ORIGDEV having no effect on bond interfaces
Fixes: 6a9e461f6f (bonding: pass link-local packets to bonding master also.)
Reported-by: Vincent Bernat <vincent@bernat.ch>
Signed-off-by: Michal Soltys <soltys@ziu.info>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is a little improvement. If user use the
command shown as below, we should print the info [1]
instead of [2]. The eth0 exists actually, and it may
confuse user.
$ echo "eth0" > /sys/class/net/bond4/bonding/slaves
[1] "bond4: no command found in slaves file - use +ifname or -ifname"
[2] "write error: No such device"
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I made a dumb mistake when I summed up the slave stats, obviously slaves
can come and go which would make the master stats unreliable.
Count and export the master stats separately.
Fixes: a258aeacd7 ("bonding: add support for xstats and export 3ad stats")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for extended statistics (xstats) call to the
bonding. The first user would be the 3ad code which counts the following
events:
- LACPDU Rx/Tx
- LACPDU unknown type Rx
- LACPDU illegal Rx
- Marker Rx/Tx
- Marker response Rx/Tx
- Marker unknown type Rx
All of these are exported via netlink as separate attributes to be
easily extensible as we plan to add more in the future.
Similar to how the bridge and other xstats exports, the structure
inside is:
[ IFLA_STATS_LINK_XSTATS ]
-> [ LINK_XSTATS_TYPE_BOND ]
-> [ BOND_XSTATS_3AD ]
-> [ 3ad stats attributes ]
With this structure it's easy to add more stat types later.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Count the following types of 3ad packets per slave:
- rx/tx lacpdu
- rx/tx marker
- rx/tx marker response
- rx illegal lacpdus (right now counted on wrong length)
- rx unknown lacpdu type
- rx unknown marker type
The counters are using atomic64 since this is not fast path.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the received lacpdu is accessed via skb_header_pointer() in
bond_3ad_lacpdu_recv() we no longer need to check for skb->len's length.
If the returned lacpdu pointer is not null that should be enough.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No functional changes, adjust the style of bond_3ad_rx_indication to
prepare it for the stats changes:
- reduce indentation by returning early on wrong length
- remove extra new lines between switch cases
- add marker local variable and use it to reduce line length
- rearrange local variables in reverse xmas tree
- separate final return
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A network device stack with multiple layers of bonding devices can
trigger a false positive lockdep warning. Adding lockdep nest levels
fixes this. Update the level on both enslave and unlink, to avoid the
following series of events ..
ip netns add test
ip netns exec test bash
ip link set dev lo addr 00:11:22:33:44:55
ip link set dev lo down
ip link add dev bond1 type bond
ip link add dev bond2 type bond
ip link set dev lo master bond1
ip link set dev bond1 master bond2
ip link set dev bond1 nomaster
ip link set dev bond2 master bond1
.. from still generating a splat:
[ 193.652127] ======================================================
[ 193.658231] WARNING: possible circular locking dependency detected
[ 193.664350] 4.20.0 #8 Not tainted
[ 193.668310] ------------------------------------------------------
[ 193.674417] ip/15577 is trying to acquire lock:
[ 193.678897] 00000000a40e3b69 (&(&bond->stats_lock)->rlock#3/3){+.+.}, at: bond_get_stats+0x58/0x290
[ 193.687851]
but task is already holding lock:
[ 193.693625] 00000000807b9d9f (&(&bond->stats_lock)->rlock#2/2){+.+.}, at: bond_get_stats+0x58/0x290
[..]
[ 193.851092] lock_acquire+0xa7/0x190
[ 193.855138] _raw_spin_lock_nested+0x2d/0x40
[ 193.859878] bond_get_stats+0x58/0x290
[ 193.864093] dev_get_stats+0x5a/0xc0
[ 193.868140] bond_get_stats+0x105/0x290
[ 193.872444] dev_get_stats+0x5a/0xc0
[ 193.876493] rtnl_fill_stats+0x40/0x130
[ 193.880797] rtnl_fill_ifinfo+0x6c5/0xdc0
[ 193.885271] rtmsg_ifinfo_build_skb+0x86/0xe0
[ 193.890091] rtnetlink_event+0x5b/0xa0
[ 193.894320] raw_notifier_call_chain+0x43/0x60
[ 193.899225] netdev_change_features+0x50/0xa0
[ 193.904044] bond_compute_features.isra.46+0x1ab/0x270
[ 193.909640] bond_enslave+0x141d/0x15b0
[ 193.913946] do_set_master+0x89/0xa0
[ 193.918016] do_setlink+0x37c/0xda0
[ 193.921980] __rtnl_newlink+0x499/0x890
[ 193.926281] rtnl_newlink+0x48/0x70
[ 193.930238] rtnetlink_rcv_msg+0x171/0x4b0
[ 193.934801] netlink_rcv_skb+0xd1/0x110
[ 193.939103] rtnetlink_rcv+0x15/0x20
[ 193.943151] netlink_unicast+0x3b5/0x520
[ 193.947544] netlink_sendmsg+0x2fd/0x3f0
[ 193.951942] sock_sendmsg+0x38/0x50
[ 193.955899] ___sys_sendmsg+0x2ba/0x2d0
[ 193.960205] __x64_sys_sendmsg+0xad/0x100
[ 193.964687] do_syscall_64+0x5a/0x460
[ 193.968823] entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fixes: 7e2556e400 ("bonding: avoid lockdep confusion in bond_get_stats()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two statements that are indented too much by one space each,
fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Give interested parties an opportunity to veto an impending HW address
change.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before NETDEV_CHANGEADDR, bond driver should emit NETDEV_PRE_CHANGEADDR,
and allow consumers to veto the address change. To propagate further the
return code from NETDEV_PRE_CHANGEADDR, give the function that
implements address change a return value.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A follow-up patch will add a notifier type NETDEV_PRE_CHANGEADDR, which
allows vetoing of MAC address changes. One prominent path to that
notification is through dev_set_mac_address(). Therefore give this
function an extack argument, so that it can be packed together with the
notification. Thus a textual reason for rejection (or a warning) can be
communicated back to the user.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>