Cloned patch of Eric Dumazet for bonding.
Some workloads greatly benefit of IFF_XMIT_DST_RELEASE capability
on output net device, avoiding dirtying dst refcount.
team currently disables IFF_XMIT_DST_RELEASE unconditionally.
If all ports have the IFF_XMIT_DST_RELEASE bit set, then
team dev can also have it in its priv_flags.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's done in very similar way this is done in bonding and bridge.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using NLMSG_GOODSIZE results in multiple pages being used as
nlmsg_new() will automatically add the size of the netlink
header to the payload thus exceeding the page limit.
NLMSG_DEFAULT_SIZE takes this into account.
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Cc: Jiri Pirko <jpirko@redhat.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Sergey Lapin <slapin@ossfans.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Lauro Ramos Venancio <lauro.venancio@openbossa.org>
Cc: Aloisio Almeida Jr <aloisio.almeida@openbossa.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
team_adjust_ops should check for enabled ports, not all ports.
This may lead to division by zero. This patch fixes this.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use rcu_access_pointer and rcu_dereference_protected
to access RCU pointer by updater.
Use RCU_INIT_POINTER for NULL assignment of RCU pointer.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Should be used instead of rcu_dereference, since rcu_read_lock_bh is
held.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When multiple sets are done, event message is generated for each. This
patch accumulates these messages into one.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
currently, when port is created and per-port options are present, there
options are sent to userspace with ifindex of port which userspace does
not know about. Port add message goes right after.
This patch corrects message ordering so userspace would not be confused.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to walk through option instance list and look for ->changed ==
true when called knows exactly what one option instance changed.
Also use lists to pass option instances needed to be present in netlink
message.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
genlmsg_cancel() needs to be called in case nest fails
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
genlmsg_cancel() needs to be called in case nest fails
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds two exported functions. One allows to mark option
instance as changed and the second processes change check and does
transfer of changed options to userspace.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce struct team_option_inst_info and push option instance info
there. It can be then easily passed to gsetter context and used for
feature async option changes.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
That leaves team->mode and all its values valid so no checks would be
needed (for example in team_mode_option_get()).
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes content of hashlist (used to get port struct by
computed index (0...en_port_count-1)). Now the hash list contains only
enabled ports so userspace will be able to say what ports can be used
for tx/rx. This becomes handy when userspace will need to disable ports
which does not belong to active aggregator. By default, newly added port
is enabled.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since team->lock is being held, _rcu variant make no sense.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allows userspace to setup linkup for ports. Default is to take linkup
directly from ethtool state.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add another (hopefully last) option type. Use NLA_FLAG to implement
that.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows to create per-port options. That becomes handy for all
sorts of stuff, for example for userspace driven link-state, 802.3ad
implementation and so on.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces new team mode. It's TX port is selected by
user-set BPF hash function.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For transfering generic binary data (e.g. BPF code), introduce new
binary option type.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These macros contain a hidden goto, and are thus extremely error
prone and make code hard to audit.
Signed-off-by: David S. Miller <davem@davemloft.net>
Use eth_hw_addr_random() instead of calling random_ether_addr()
to set addr_assign_type correctly to NET_ADDR_RANDOM.
Reset the state to NET_ADDR_PERM as soon as the MAC get
changed via .ndo_set_mac_address.
v2: adapt to renamed eth_hw_addr_random()
Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes event message behaviour to send only updated records
instead of whole list. This fixes bug on which userspace receives non-actual
data in case multiple events occur in row.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far when vlan id was added to team device befor port was added, this
vid was not added to port's vlan filter. Also after removal, vid stayed
in port device's vlan filter. Benefit of new vlan functions to handle
this work.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds wrapper for ndo_vlan_rx_add_vid/ndo_vlan_rx_kill_vid
functions. Check for NETIF_F_HW_VLAN_FILTER feature is done in this
wrapper.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let caller know the result of adding/removing vlan id to/from vlan
filter.
In some drivers I make those functions to just return 0. But in those
where there is able to see if hw setup went correctly, return value is
set appropriately.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rcu_assign_pointer(ptr, NULL) can be safely replaced by
RCU_INIT_POINTER(ptr, NULL)
(old rcu_assign_pointer() macro was testing the NULL value and could
omit the smp_wmb(), but this had to be removed because of compiler
warnings)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Apparently using variable-length array is not correct
(https://lkml.org/lkml/2011/10/23/25). So remove it.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since multiple team instances are putting defined options into their
option list, during register each option must be cloned before added
into list. This resolves uncool memory corruptions when using multiple
teams.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to have spinlock for this purpose. So convert this to mutex and
avoid current schedule while atomic problems in netlink code.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces new network device called team. It supposes to be
very fast, simple, userspace-driven alternative to existing bonding
driver.
Userspace library called libteam with couple of demo apps is available
here:
https://github.com/jpirko/libteam
Note it's still in its dipers atm.
team<->libteam use generic netlink for communication. That and rtnl
suppose to be the only way to configure team device, no sysfs etc.
Python binding of libteam was recently introduced.
Daemon providing arpmon/miimon active-backup functionality will be
introduced shortly. All what's necessary is already implemented in
kernel team driver.
v7->v8:
- check ndo_ndo_vlan_rx_[add/kill]_vid functions before calling
them.
- use dev_kfree_skb_any() instead of dev_kfree_skb()
v6->v7:
- transmit and receive functions are not checked in hot paths.
That also resolves memory leak on transmit when no port is
present
v5->v6:
- changed couple of _rcu calls to non _rcu ones in non-readers
v4->v5:
- team_change_mtu() uses team->lock while travesing though port
list
- mac address changes are moved completely to jurisdiction of
userspace daemon. This way the daemon can do FOM1, FOM2 and
possibly other weird things with mac addresses.
Only round-robin mode sets up all ports to bond's address then
enslaved.
- Extended Kconfig text
v3->v4:
- remove redundant synchronize_rcu from __team_change_mode()
- revert "set and clear of mode_ops happens per pointer, not per
byte"
- extend comment of function __team_change_mode()
v2->v3:
- team_change_mtu() uses rcu version of list traversal to unwind
- set and clear of mode_ops happens per pointer, not per byte
- port hashlist changed to be embedded into team structure
- error branch in team_port_enter() does cleanup now
- fixed rtln->rtnl
v1->v2:
- modes are made as modules. Makes team more modular and
extendable.
- several commenters' nitpicks found on v1 were fixed
- several other bugs were fixed.
- note I ignored Eric's comment about roundrobin port selector
as Eric's way may be easily implemented as another mode (mode
"random") in future.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>