Instances of struct node are created in the function tipc_disc_rcv()
under the assumption that there is no race between received discovery
messages arriving from the same node. This assumption is wrong.
When we use more than one bearer, it is possible that discovery
messages from the same node arrive at the same moment, resulting in
creation of two instances of struct tipc_node. This may later cause
confusion during link establishment, and may result in one of the links
never becoming activated.
We fix this by making lookup and potential creation of nodes atomic.
Instead of first looking up the node, and in case of failure, create it,
we now start with looking up the node inside node_link_create(), and
return a reference to that one if found. Otherwise, we go ahead and
create the node as we did before.
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During link failover it may happen that the remaining link goes
down while it is still in the process of taking over traffic
from a previously failed link. When this happens, we currently
abort the failover procedure and reset the first failed link to
non-failover mode, so that it will be ready to re-establish
contact with its peer when it comes available.
However, if the first link goes down because its bearer was manually
disabled, it is not enough to reset it; it must also be deleted;
which is supposed to happen when the failover procedure is finished.
Otherwise it will remain a zombie link: attached to the owner node
structure, in mode LINK_STOPPED, and permanently blocking any re-
establishing of the link to the peer via the interface in question.
We fix this by amending the failover abort procedure. Apart from
resetting the link to non-failover state, we test if the link is
also in LINK_STOPPED mode. If so, we delete it, using the conditional
tipc_link_delete() function introduced in the previous commit.
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a bearer is disabled, all pertaining links will be reset and
deleted. However, if there is a second active link towards a killed
link's destination, the delete has to be postponed until the failover
is finished. During this interval, we currently put the link in zombie
mode, i.e., we take it out of traffic, delete its timer, but leave it
attached to the owner node structure until all missing packets have
been received. When this is done, we detach the link from its node
and delete it, assuming that the synchronous timer deletion that was
initiated earlier in a different thread has finished.
This is unsafe, as the failover may finish before del_timer_sync()
has returned in the other thread.
We fix this by adding an atomic reference counter of type kref in
struct tipc_link. The counter keeps track of the references kept
to the link by the owner node and the timer. We then do a conditional
delete, based on the reference counter, both after the failover has
been finished and when the timer expires, if applicable. Whoever
comes last, will actually delete the link. This approach also implies
that we can make the deletion of the timer asynchronous.
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* revert a patch that caused a regression with mesh userspace (Bob)
* fix a number of suspend/resume related races
(from Emmanuel, Luca and myself - we'll look at backporting later)
* add software implementations for new ciphers (Jouni)
* add a new ACPI ID for Broadcom's rfkill (Mika)
* allow using netns FD for wireless (Vadim)
* some other cleanups (various)
-----BEGIN PGP SIGNATURE-----
iQIcBAABCAAGBQJU0NEaAAoJEDBSmw7B7bqr2DAP/3nbk6WB2Lz4Vwi8fh9C4y6X
4ZzN2v7NHaimC+Wpxg3wP2wCdX2VG3ZWwF3yLm6qgVHGnE35RFLxURen1eqNeW77
Yf5gvZ066nCGEQ8l8J6YK9vrLX4qp5c4lyE00bbxpZTA4Qq71SgTg+rmGC1be8uX
vacfaLwfDWffuOOnjBAPfanj7f4AQaUEY2uN1WkBFC7iEeOtPcWVkHAFVIeGjJfQ
vfgQJcwOjgWjYbwZdQQi7Aj+k1Yzda4pg1yEWn3CkZ8zyeCYGs1gk2ovoBgPgXBP
yT9ypIdFsN242VJvy7nkFnCKA8mhKyltMQ1Xjs0Q9lAxWdaq9U+iqt5cUn4jxVIG
T9Vi3PCbx/nVOqcfR81dBTQ3uDU5AyosPsmh2YTxi5lpRBrsjNY2FtaKE0sm2Om3
wRiSPOdPrXBeEnU0KssI9e6euXgS4JQV78Naq85OWZDd2yZ1fT5U2fi8y4drRGlz
rucbbobdVhQch5L4FStPz1uW5pNuJrhekXeZIE8MruTNg2A2oBAK3ApO7hxn68sE
RnbAnkxVLwgedC9042JF5eiS1PDIU46w4e782j/+XskVKMEakqd23iJycx3tmgHZ
cxDi/qKZ2RCE74YsT61o/9ErSVPvCfNPL3+CVov918jidQQg2WHfKm/jFlIxHIxk
4wBP7p2VGlENMgw/R8GJ
=wOaR
-----END PGP SIGNATURE-----
Merge tag 'mac80211-next-for-davem-2015-02-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Last round of updates for net-next:
* revert a patch that caused a regression with mesh userspace (Bob)
* fix a number of suspend/resume related races
(from Emmanuel, Luca and myself - we'll look at backporting later)
* add software implementations for new ciphers (Jouni)
* add a new ACPI ID for Broadcom's rfkill (Mika)
* allow using netns FD for wireless (Vadim)
* some other cleanups (various)
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is to use firmware version macros from t4fw_version.h
and also enables 40g T5 adapter.
Signed-off-by: Praveen Madhavan <praveenm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is only an API consolidation and should make things more readable
it replaces var * HZ / 1000 by msecs_to_jiffies(var).
As there is a discrepancy between the code and the comments this is in
a separate patch.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is only an API consolidation and should make things more readable
it replaces var * HZ / 1000 by msecs_to_jiffies(var).
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth-next 2015-02-03
Here's what's likely the last bluetooth-next pull request for 3.20.
Notable changes include:
- xHCI workaround + a new id for the ath3k driver
- Several new ids for the btusb driver
- Support for new Intel Bluetooth controllers
- Minor cleanups to ieee802154 code
- Nested sleep warning fix in socket accept() code path
- Fixes for Out of Band pairing handling
- Support for LE scan restarting for HCI_QUIRK_STRICT_DUPLICATE_FILTER
- Improvements to data we expose through debugfs
- Proper handling of Hardware Error HCI events
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds skb_remcsum_process and skb_gro_remcsum_process to
perform the appropriate adjustments to the skb when receiving
remote checksum offload.
Updated vxlan and gue to use these functions.
Tested: Ran TCP_RR and TCP_STREAM netperf for VXLAN and GUE, did
not see any change in performance.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When 'learned_sync' flag is turned on, the offloaded switch
port syncs learned MAC addresses to bridge's FDB via switchdev notifier
(NETDEV_SWITCH_FDB_ADD). Currently, FDB entries learnt via this mechanism are
wrongly being deleted by bridge aging logic. This patch ensures that FDB
entries synced from offloaded switch ports are not deleted by bridging logic.
Such entries can only be deleted via switchdev notifier
(NETDEV_SWITCH_FDB_DEL).
Signed-off-by: Siva Mannem <siva.mannem.lnx@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Freescale ethernet controllers have the capability to re-assemble fragmented
data into a single ethernet frame. This patch uses this capability and
implements NETIP_F_SG feature into the fs_enet ethernet driver.
On a MPC885, I get 53% performance improvement on a ftp transfer of a 15Mb file:
* Without the patch : 2,8 Mbps
* With the patch : 4,3 Mbps
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
A typical qdisc setup is the following :
bond0 : bonding device, using HTB hierarchy
eth1/eth2 : slaves, multiqueue NIC, using MQ + FQ qdisc
XPS allows to spread packets on specific tx queues, based on the cpu
doing the send.
Problem is that dequeues from bond0 qdisc can happen on random cpus,
due to the fact that qdisc_run() can dequeue a batch of packets.
CPUA -> queue packet P1 on bond0 qdisc, P1->ooo_okay=1
CPUA -> queue packet P2 on bond0 qdisc, P2->ooo_okay=0
CPUB -> dequeue packet P1 from bond0
enqueue packet on eth1/eth2
CPUC -> dequeue packet P2 from bond0
enqueue packet on eth1/eth2 using sk cache (ooo_okay is 0)
get_xps_queue() then might select wrong queue for P1, since current cpu
might be different than CPUA.
P2 might be sent on the old queue (stored in sk->sk_tx_queue_mapping),
if CPUC runs a bit faster (or CPUB spins a bit on qdisc lock)
Effect of this bug is TCP reorders, and more generally not optimal
TX queue placement. (A victim bulk flow can be migrated to the wrong TX
queue for a while)
To fix this, we have to record sender cpu number the first time
dev_queue_xmit() is called for one tx skb.
We can union napi_id (used on receive path) and sender_cpu,
granted we clear sender_cpu in skb_scrub_packet() (credit to Willem for
this union idea)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Markus Elfring says:
====================
netlabel: Deletion of a few unnecessary checks
Further update suggestions were taken into account after patches were applied
from static source code analysis.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The functions "cipso_v4_doi_putdef" and "kfree" could be called in some cases
by the netlbl_mgmt_add_common() function during error handling even if the
passed variables contained still a null pointer.
* This implementation detail could be improved by adjustments for jump labels.
* Let us return immediately after the first failed function call according to
the current Linux coding style convention.
* Let us delete also an unnecessary check for the variable "entry" there.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cipso_v4_doi_free() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cipso_v4_doi_putdef() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device tree binding(s) document has fallen out of sync with the
driver code. Update the list of supported devices to reflect current
driver capabilities
Change-Id: I440d8de2ee2d9c3b7b23e69b3da851cab18a4c9a
Signed-off-by: Shruti Kanetkar <Kanetkar.Shruti@gmail.com>
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is yet another Broadcom bluetooth chip with ACPI ID BCM2E40.
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The bnep_get_device function may be triggered by an ioctl just after a
connection has gone down. In such a case the respective L2CAP chan->conn
pointer will get set to NULL (by l2cap_chan_del). This patch adds a
missing NULL check for this case in the bnep_get_device() function.
Reported-by: Patrik Flykt <patrik.flykt@linux.intel.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The kfree() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-By: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladislav Yasevich says:
====================
ipv6: Add lockless UDP send path
This series introduces a lockless UDPv6 send path similar to
what Herbert Xu did for IPv4 a while ago.
There are some difference from IPv4. IPv6 caching for flow
label is a bit different, as well as it requires another cork
cork structure that holds the IPv6 ancillary data.
Please take a look.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currntly, if we are not doing UFO on the packet, all UDP
packets will start with CHECKSUM_NONE and thus perform full
checksum computations in software even if device support
IPv6 checksum offloading.
Let's start start with CHECKSUM_PARTIAL if the device
supports it and we are sending only a single packet at
or below mtu size.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit adds the same functionaliy to IPv6 that
commit 903ab86d19
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue Mar 1 02:36:48 2011 +0000
udp: Add lockless transmit path
added to IPv4.
UDP transmit path can now run without a socket lock,
thus allowing multiple threads to send to a single socket
more efficiently.
This is only used when corking/MSG_MORE is not used.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we can individually construct IPv6 skbs to send, add a
udpv6_send_skb() function to populate the udp header and send the
skb. This allows udp_v6_push_pending_frames() to re-use this
function as well as enables us to add lockless sendmsg() support.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is very similar to
commit 1c32c5ad6f
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue Mar 1 02:36:47 2011 +0000
inet: Add ip_make_skb and ip_finish_skb
It adds IPv6 version of the helpers ip6_make_skb and ip6_finish_skb.
The job of ip6_make_skb is to collect messages into an ipv6 packet
and poplulate ipv6 eader. The job of ip6_finish_skb is to transmit
the generated skb. Together they replicated the job of
ip6_push_pending_frames() while also provide the capability to be
called independently. This will be needed to add lockless UDP sendmsg
support.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the ability to append data to arbitrary queue. This
will be needed later to implement lockless UDP sends.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull IPv6 cork initialization into its own function that
can be re-used. IPv6 specific cork data did not have an
explicit data structure. This patch creats eone so that
just ipv6 cork data can be as arguemts. Also, since
IPv6 tries to save the flow label into inet_cork_full
tructure, pass the full cork.
Adjust ip6_cork_release() to take cork data structures.
Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Add support for IEEE ets & pfc api.
* Fix bug that resulted in incorrect bandwidth percentage being returned for
CEE peers
* Convert pfc enabled info from firmware format to what dcbnl expects before
returning
Signed-off-by: Anish Bhatt <anish@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ARM has 32-byte cache lines, which according to the comment in
the init registers function seems to work best with the default
value of 0x4800 that is also used on sparc and parisc.
This adds ARM to the same list, to use that default but no
longer warn about it.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Grant Grundler <grundler@parisc-linux.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hip04 ethernet driver causes a new compile-time warning
when built as a loadable module:
WARNING: modpost: missing MODULE_LICENSE() in drivers/net/ethernet/hisilicon/hip04_eth.o
see include/linux/module.h for more information
This adds the license as "GPL", which matches the header of the file.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One deployment requirement of DCTCP is to be able to run
in a DC setting along with TCP traffic. As Glenn Judd's
NSDI'15 paper "Attaining the Promise and Avoiding the Pitfalls
of TCP in the Datacenter" [1] (tba) explains, one way to
solve this on switch side is to split DCTCP and TCP traffic
in two queues per switch port based on the DSCP: one queue
soley intended for DCTCP traffic and one for non-DCTCP traffic.
For the DCTCP queue, there's the marking threshold K as
explained in commit e3118e8359 ("net: tcp: add DCTCP congestion
control algorithm") for RED marking ECT(0) packets with CE.
For the non-DCTCP queue, there's f.e. a classic tail drop queue.
As already explained in e3118e8359, running DCTCP at scale
when not marking SYN/SYN-ACK packets with ECT(0) has severe
consequences as for non-ECT(0) packets, traversing the RED
marking DCTCP queue will result in a severe reduction of
connection probability.
This is due to the DCTCP queue being dominated by ECT(0) traffic
and switches handle non-ECT traffic in the RED marking queue
after passing K as drops, where K is usually a low watermark
in order to leave enough tailroom for bursts. Splitting DCTCP
traffic among several queues (ECN and non-ECN queue) is being
considered a terrible idea in the network community as it
splits single flows across multiple network paths.
Therefore, commit e3118e8359 implements this on Linux as
ECT(0) marked traffic, as we argue that marking all packets
of a DCTCP flow is the only viable solution and also doesn't
speak against the draft.
However, recently, a DCTCP implementation for FreeBSD hit also
their mainline kernel [2]. In order to let them play well
together with Linux' DCTCP, we would need to loosen the
requirement that ECT(0) has to be asserted during the 3WHS as
not implemented in FreeBSD. This simplifies the ECN test and
lets DCTCP work together with FreeBSD.
Joint work with Daniel Borkmann.
[1] https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/judd
[2] 8ad8794452
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn says:
====================
net-timestamp: blinding
Changes
(v2 -> v3)
- rebase only: v2 did not make it to patchwork / netdev
(v1 -> v2)
- fix capability check in patch 2
this could be moved into net/core/sock.c as sk_capable_nouser()
(rfc -> v1)
- dropped patch 4: timestamp batching
due to complexity, as discussed
- dropped patch 5: default mode
because it does not really cover all use cases, as discussed
- added documentation
- minor fix, see patch 2
Two issues were raised during recent timestamping discussions:
1. looping full packets on the error queue exposes packet headers
2. TCP timestamping with retransmissions generates many timestamps
This RFC patchset is an attempt at addressing both without breaking
legacy behavior.
Patch 1 reintroduces the "no payload" timestamp option, which loops
timestamps onto an empty skb. This reduces the pressure on SO_RCVBUF
from looping many timestamps. It does not reduce the number of recv()
calls needed to process them. The timestamp cookie mechanism developed
in http://patchwork.ozlabs.org/patch/427213/ did, but this is
considerably simpler.
Patch 2 then gives administrators the power to block all timestamp
requests that contain data by unprivileged users. I proposed this
earlier as a backward compatible workaround in the discussion of
net-timestamp: pull headers for SOCK_STREAM
http://patchwork.ozlabs.org/patch/414810/
Patch 3 only updates the txtimestamp example to test this option.
Verified that with option '-n', length is zero in all cases and
option '-I' (PKTINFO) stops working.
====================
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Demonstrate how SOF_TIMESTAMPING_OPT_TSONLY can be used and
test the implementation.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tx timestamps are looped onto the error queue on top of an skb. This
mechanism leaks packet headers to processes unless the no-payload
options SOF_TIMESTAMPING_OPT_TSONLY is set.
Add a sysctl that optionally drops looped timestamp with data. This
only affects processes without CAP_NET_RAW.
The policy is checked when timestamps are generated in the stack.
It is possible for timestamps with data to be reported after the
sysctl is set, if these were queued internally earlier.
No vulnerability is immediately known that exploits knowledge
gleaned from packet headers, but it may still be preferable to allow
administrators to lock down this path at the cost of possible
breakage of legacy applications.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Changes
(v1 -> v2)
- test socket CAP_NET_RAW instead of capable(CAP_NET_RAW)
(rfc -> v1)
- document the sysctl in Documentation/sysctl/net.txt
- fix access control race: read .._OPT_TSONLY only once,
use same value for permission check and skb generation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add timestamping option SOF_TIMESTAMPING_OPT_TSONLY. For transmit
timestamps, this loops timestamps on top of empty packets.
Doing so reduces the pressure on SO_RCVBUF. Payload inspection and
cmsg reception (aside from timestamps) are no longer possible. This
works together with a follow on patch that allows administrators to
only allow tx timestamping if it does not loop payload or metadata.
Signed-off-by: Willem de Bruijn <willemb@google.com>
----
Changes (rfc -> v1)
- add documentation
- remove unnecessary skb->len test (thanks to Richard Cochran)
Signed-off-by: David S. Miller <davem@davemloft.net>
This extended return parameters struct conflicts with the new Read Local
OOB Extended Data command definition. To avoid the conflict simply
rename the old "extended" version to the normal one and update the code
appropriately to take into account the two possible response PDU sizes.
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The Intel Snowfield Peak Bluetooth controllers use a strict scanning
filter policy that filters based on Bluetooth device addresses and
not on RSSI.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
When using LE_SCAN_FILTER_DUP_ENABLE, some controllers would send
advertising report from each LE device only once. That means that we
don't get any updates on RSSI value, and makes Service Discovery very
slow. This patch adds restarting scan when in Service Discovery, and
device with filtered uuid is found, but it's not in RSSI range to send
event yet. This way if device moves into range, we will quickly get RSSI
update.
Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Currently there is no way to restart le scan, and it's needed in
service scan method. The way it work: it disable, and then enable le
scan on controller.
During the restart, we must remember when the scan was started, and
it's duration, to later re-schedule the le_scan_disable work, that was
stopped during the stop scan phase.
Signed-off-by: Jakub Pawlowski <jpawlowski@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Add support for retrieving port level statistics from device.
Hook is added for ethtool's stats functionality. For example,
$ ethtool -S eth3
NIC statistics:
rx_packets: 12
rx_bytes: 2790
rx_dropped: 0
rx_errors: 0
tx_packets: 8
tx_bytes: 728
tx_dropped: 0
tx_errors: 0
Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Roopa Prabhu says:
====================
switchdev offload flags
This patch series introduces new offload flags for switchdev.
Kernel network subsystems can use this flag to accelerate
network functions by offloading to hw.
I expect that there will be need for subsystem specific feature
flag in the future.
This patch series currently only addresses bridge driver link
attribute offloads to hardware.
Looking at the current state of bridge l2 offload in the kernel,
- flag 'self' is the way to directly manage the bridge device in hw via
the ndo_bridge_setlink/ndo_bridge_getlink calls
- flag 'master' is always used to manage the in kernel bridge devices
via the same ndo_bridge_setlink/ndo_bridge_getlink calls
Today these are used separately. The nic offloads use hwmode "vepa/veb" to go
directly to hw with the "self" flag.
At this point i am trying not to introduce any new user facing flags/attributes.
In the model where we want the kernel bridging to be accelerated with
hardware, we very much want the bridge driver to be involved.
In this proposal,
- The offload flag/bit helps switch asic drivers to indicate that they
accelerate the kernel networking objects/functions
- The user does not have to specify a new flag to do so. A bridge created with
switch asic ports will be accelerated if the switch driver supports it.
- The user can continue to directly manage l2 in nics (ixgbe) using the
existing hwmode/self flags
- It also does not stop users from using the 'self' flag to talk to the
switch asic driver directly
- Involving the bridge driver makes sure the add/del notifications to user
space go out after both kernel and hardware are programmed
(To selectively offload bridge port attributes,
example learning in hw only etc, we can introduce offload bits for
per bridge port flag attribute as in my previous patch
https://patchwork.ozlabs.org/patch/413211/. I have not included that in this
series)
v2
- try a different name for the offload flag/bit
- tries to solve the stacked netdev case by traversing the lowerdev
list to reach the switch port
v3 -
- Tested with bond as bridge port for the stacked device case.
Includes a bond_fix_features change to not ignore the
NETIF_F_HW_NETFUNC_OFFLOAD flag
- Some checkpatch fixes
v4 -
- rename flag to NETIF_F_HW_SWITCH_OFFLOAD
- add ndo_bridge_setlink/dellink handlers in bond and team drivers as
suggested by jiri.
- introduce default ndo_dflt_netdev_switch_port_bridge_setlink/dellink
handlers that masters can use to call offload api on lowerdevs.
====================
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Currently ndo_bridge_setlink and ndo_bridge_dellink handlers point
to the default switchdev handlers
This follows my bonding driver changes.
I have only compile tested this patch. However similar
bonding code has been tested.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We want bond to pick up the offload flag if any of its slaves have it.
NETIF_F_HW_SWITCH_OFFLOAD flag is added to the mask, so that
netdev_increment_features does not ignore it.
This also adds ndo_bridge_setlink and ndo_bridge_dellink handlers.
These currently point to the default handlers provided by the
switchdev api.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch sets the NETIF_F_HW_SWITCH_OFFLOAD feature flag on rocker ports
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support to set/del bridge port attributes in hardware from
the bridge driver.
With this, when the user sends a bridge setlink message with no flags or
master flags set,
- the bridge driver ndo_bridge_setlink handler sets settings in the kernel
- calls the swicthdev api to propagate the attrs to the switchdev
hardware
You can still use the self flag to go to the switch hw or switch port
driver directly.
With this, it also makes sure a notification goes out only after the
attributes are set both in the kernel and hw.
The patch calls switchdev api only if BRIDGE_FLAGS_SELF is not set.
This is because the offload cases with BRIDGE_FLAGS_SELF are handled in
the caller (in rtnetlink.c).
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds two new api's netdev_switch_port_bridge_setlink
and netdev_switch_port_bridge_dellink to offload bridge port attributes
to switch port
(The names of the apis look odd with 'switch_port_bridge',
but am more inclined to change the prefix of the api to something else.
Will take any suggestions).
The api's look at the NETIF_F_HW_SWITCH_OFFLOAD feature flag to
pass bridge port attributes to the port device.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bridge flags are needed inside ndo_bridge_setlink/dellink handlers to
avoid another call to parse IFLA_AF_SPEC inside these handlers
This is used later in this series
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a high level feature flag for all switch asic offloads
switch drivers set this flag on switch ports. Logical devices like
bridge, bonds, vxlans can inherit this flag from their slaves/ports.
The patch also adds the flag to NETIF_F_ONE_FOR_ALL, so that it gets
propagated to the upperdevices (bridges and bonds).
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- In tx_hard_error_bump_tc interrupt, tc should be bumped only when current
device instance is in DMA threshold mode. Check per device xstats.threshold
other than global tc.
- Set per device xstats.threshold to SF_DMA_MODE when current device
instance is set to SF mode.
v2-changes:
- fix ident style
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>