Automatically generate flow labels for IPv6 packets on transmit.
The flow label is computed based on skb_get_hash. The flow label will
only automatically be set when it is zero otherwise (i.e. flow label
manager hasn't set one). This supports the transmit side functionality
of RFC 6438.
Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
functionality per socket.
By default, auto flowlabels are disabled to avoid possible conflicts
with flow label manager, however if this feature proves useful we
may want to enable it by default.
It should also be noted that FreeBSD has already implemented automatic
flow labels (including the sysctl and socket option). In FreeBSD,
automatic flow labels default to enabled.
Performance impact:
Running super_netperf with 200 flows for TCP_RR and UDP_RR for
IPv6. Note that in UDP case, __skb_get_hash will be called for
every packet with explains slight regression. In the TCP case
the hash is saved in the socket so there is no regression.
Automatic flow labels disabled:
TCP_RR:
86.53% CPU utilization
127/195/322 90/95/99% latencies
1.40498e+06 tps
UDP_RR:
90.70% CPU utilization
118/168/243 90/95/99% latencies
1.50309e+06 tps
Automatic flow labels enabled:
TCP_RR:
85.90% CPU utilization
128/199/337 90/95/99% latencies
1.40051e+06
UDP_RR
92.61% CPU utilization
115/164/236 90/95/99% latencies
1.4687e+06
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the receive side to support RFC 6438 which is to
use the flow label as an ECMP hash. If an IPv6 flow label is set
in a packet we can use this as input for computing an L4-hash. There
should be no need to parse any transport headers in this case.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In vxlan and OVS vport-vxlan call common function to get source port
for a UDP tunnel. Removed vxlan_src_port since the functionality is
now in udp_flow_src_port.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds udp_flow_src_port function which is intended to be
a common function that UDP tunnel implementations call to set the source
port. The source port is chosen so that a hash over the outer headers
(IP addresses and UDP ports) acts as suitable hash for the flow of the
encapsulated packet. In this manner, UDP encapsulation works with RSS
and ECMP based wrt the inner flow.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Call standard function to get a packet hash instead of taking this from
skb->sk->sk_hash or only using skb->protocol.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For a connected socket we can precompute the flow hash for setting
in skb->hash on output. This is a performance advantage over
calculating the skb->hash for every packet on the connection. The
computation is done using the common hash algorithm to be consistent
with computations done for packets of the connection in other states
where thers is no socket (e.g. time-wait, syn-recv, syn-cookies).
This patch adds sk_txhash to the sock structure. inet_set_txhash and
ip6_set_txhash functions are added which are called from points in
TCP and UDP where socket moves to established state.
skb_set_hash_from_sk is a function which sets skb->hash from the
sock txhash value. This is called in UDP and TCP transmit path when
transmitting within the context of a socket.
Tested: ran super_netperf with 200 TCP_RR streams over a vxlan
interface (in this case skb_get_hash called on every TX packet to
create a UDP source port).
Before fix:
95.02% CPU utilization
154/256/505 90/95/99% latencies
1.13042e+06 tps
Time in functions:
0.28% skb_flow_dissect
0.21% __skb_get_hash
After fix:
94.95% CPU utilization
156/254/485 90/95/99% latencies
1.15447e+06
Neither __skb_get_hash nor skb_flow_dissect appear in perf
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the hash computation located in __skb_get_hash to be a separate
function which takes flow_keys as input. This will allow flow hash
computation in other contexts where we only have addresses and ports.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli says:
====================
net: systemport: PM and Wake-on-LAN support
This patchset brings Power Management and Wake-on-LAN support to the
Broadcom SYSTEM PORT driver.
S2 and S3 modes are supported, while we only support Wake-on-LAN using
MagicPackets for now
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Support for Wake-on-LAN using Magic Packet with or without SecureOn
password is implemented doing the following:
- setting the password to the relevant UniMAC registers
- flagging the device as a wakeup source for the system, as well as
its Wake-on-LAN interrupt
- prepare the hardware for entering WoL mode
- enabling the MPD interrupt to wake us
The Device Tree binding documentation is also reflected to specify the
third optional Wake-on-LAN interrupt line.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This boolean tells us whether we are using the RXCHK hardware block,
so use a variable name that reflects that. RXCHK might be used in the
future to implement Wake-on-LAN using ARP or unicast packets.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the hardware recommended suspend/resume procedure for
SYSTEMPORT. We leverage the previous factoring work such that we can
logically break all suspend/resume operations into disctint RX and TX
code paths.
When the system enters S3, we will loose all register contents, so
make sure that we correctly re-program all the hardware and software
views of the RX & TX rings as well.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Factor common code that either enables or disables the network
interface with the networking stack. We are going to reuse these
functions for suspend/resume callbacks.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Quite often we need to enable either the transmitter or the receiver
bits in UMAC_CMD, use umac_enable_set() to do that for us.
This is a preliminary change to introduce suspend/resume support in the
driver.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixed the coding style issues reported by checkpatch.pl
following issues fixed:
CHECK: Alignment should match open parenthesis
WARNING: line over 80 characters
CHECK: Blank lines aren't necessary before a close brace '}'
WARNING: networking block comments don't use an empty /* line, use /* Comment...
WARNING: Missing a blank line after declarations
WARNING: networking block comments start with * on subsequent lines
CHECK: braces {} should be used on all arms of this statement
Signed-off-by: Varka Bhadram <varkab@cdac.in>
Tested-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the prototype of the do_one_broadcast() method so that it will return void.
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Erik Hugne says:
====================
tipc: link state processing improvements
Message delivery is separated from the link state processing, and
we fix a bug in receive-path triggered acks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Link state acks triggered from the receive path is done before
the last received packet have been processed by the link layer.
The effect of this is that the last received packet will not be
included in the ack. This causes problems if the link window is
set to TIPC_MIN_LINK_WIN, where the ack interval will be equal to
the link tolerance, and the link enters a stop-and-go behavior.
We move the ack logic to after link state processing, just before
the packet is delivered to higher layers.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Carl Sigurjonsson <carl.sigurjonsson@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a cosmetic change, separating message delivery from the
link state processing.
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Always store in snt_synack the time at which the server received the
first client SYN and attempted to send the first SYNACK.
Recent commit aa27fc501 ("tcp: tcp_v[46]_conn_request: fix snt_synack
initialization") resolved an inconsistency between IPv4 and IPv6 in
the initialization of snt_synack. This commit brings back the idea
from 843f4a55e (tcp: use tcp_v4_send_synack on first SYN-ACK), which
was going for the original behavior of snt_synack from the commit
where it was added in 9ad7c049f0 ("tcp: RFC2988bis + taking RTT
sample from 3WHS for the passive open side") in v3.1.
In addition to being simpler (and probably a tiny bit faster),
unconditionally storing the time of the first SYNACK attempt has been
useful because it allows calculating a performance metric quantifying
how long it took to establish a passive TCP connection.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Cc: Octavian Purdila <octavian.purdila@intel.com>
Cc: Jerry Chu <hkchu@google.com>
Acked-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ondrej Zary says:
====================
tlan: Link handling improvements and Olicom fixes
This patch series improves link handling in tlan driver, allowing the
cable to be (un)plugged anytime and NetworkManager to work properly.
Also there are some bugfixes related to Olicom OC-2326 card.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When using internal 10 Mbps PHY, isolate the external PHY from MII bus.
External PHY must be kept powered up because it passes TX from tlan chip to
network.
This fixes weird link-loss problems under load with OC-2326 card at 10 Mbps.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
pci_disable_device() is called in _suspend but there's no corresponding
pci_enable_device() in _resume.
This causes "disabling already-disabled device" warning on 2nd suspend.
Add pci_enable_device() call to _resume to fix this problem.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In tlan_reset_adapter, we disable internal PHY when an external one is used.
On cards which use internal PHY in 10 Mbps mode, we enable it later when
setting 10 Mbps mode but it does not really work (PHY fails to reset).
Leave it enabled instead.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a timeout to prevent infinite loop waiting for PHY to reset.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reduce the autonegotiation poll interval from 8 seconds to 2.
This greatly reduces the time needed to detect link presence,
especially on Olicom cards at 10 Mbps (two autonegoatiations required).
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove excess printks when the link is down.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When link is lost on a card which uses internal PHY for 10 Mbit speeds,
restart autonegotiation to allow switching between 10 and 100 Mbps speeds.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Olicom OC-2325 and OC-2326 cards have the MAC address byte-swapped in EEPROM.
Byte-swap the MAC address if it's located at offset 0xF8.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add basic ethtool support to tlan driver:
- driver info - link detect (this allows NetworkManager to detect carrier)
- EEPROM read
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable old link monitoring code and modify it:
- control LINK LED
- use separate timer so it does not interfere with ACT LED
Tested with Olicom OC-2326.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Olicom OC-2325 and OC-2326 ethernet cards have an activity LED but it does not
work with tlan driver as it's not enabled. Enable it.
Tested with OC-2326.
Signed-off-by: Ondrej Zary <linux@rainbow-software.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
kasprintf combines kmalloc and sprintf, and takes care of the size
calculation itself.
The semantic patch that makes this change is as follows:
// <smpl>
@@
expression a,flag;
expression list args;
statement S;
@@
a =
- \(kmalloc\|kzalloc\)(...,flag)
+ kasprintf(flag,args)
<... when != a
if (a == NULL || ...) S
...>
- sprintf(a,args);
// </smpl>
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Sørensen says:
====================
Add ptp vlan support
This patch series adds functionality for running ptp/ieee1588 over vlan.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows applications to enable hardware timestamping without being aware
of it being a vlan device and figuring out the real device.
Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This extends the ptp bpf to also match ptp over ip over vlan packets. The ptp
classes are changed to orthogonal bitfields representing version, transport
and vlan values to simplify matching.
Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace two switch statements enumerating all valid ptp classes with an if
statement matching for not PTP_CLASS_NONE.
Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann says:
====================
Misc SCTP updates
Daniel Borkmann (2):
net: sctp: improve timer slack calculation for transport HBs
net: sctp: only warn in proc_sctp_do_alpha_beta if write
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Only warn if the value is written to alpha or beta. We don't care
emitting a one-time warning when only reading it.
Reported-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
RFC4960, section 8.3 says:
On an idle destination address that is allowed to heartbeat,
it is recommended that a HEARTBEAT chunk is sent once per RTO
of that destination address plus the protocol parameter
'HB.interval', with jittering of +/- 50% of the RTO value,
and exponential backoff of the RTO if the previous HEARTBEAT
is unanswered.
Currently, we calculate jitter via sctp_jitter() function first,
and then add its result to the current RTO for the new timeout:
TMO = RTO + (RAND() % RTO) - (RTO / 2)
`------------------------^-=> sctp_jitter()
Instead, we can just simplify all this by directly calculating:
TMO = (RTO / 2) + (RAND() % RTO)
With the help of prandom_u32_max(), we don't need to open code
our own global PRNG, but can instead just make use of the per
CPU implementation of prandom with better quality numbers. Also,
we can now spare us the conditional for divide by zero check
since no div or mod operation needs to be used. Note that
prandom_u32_max() won't emit the same result as a mod operation,
but we really don't care here as we only want to have a random
number scaled into RTO interval.
Note, exponential RTO backoff is handeled elsewhere, namely in
sctp_do_8_2_transport_strike().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla says:
====================
be2net: patch set
v2 change: merged 2 lines into one in patch 4
Patch 1 refactors be_cmd_get_profile_config() routine to reduce
code duplication by using the be_cmd_notify_wait() routine, instead
of using a separate variant of the code for MBOX and MCCQ.
Patch 2 introduces the required FW-cmd code in the PF to query
RSS support on a VF. This is in preparation for patch 3.
Patch 3 adds support for the PF driver to re-configure the resource
distribution in FW based on the number of VFs enabled by the user. When
the user is not interested in enabling VFs, all resources of a port are
set-aside for the PF. If less than maximum number of VFs are enabled, then
each VF gets a better share of the resources and can now enable RSS (if
the interface supports it.)
Patch 4 is a minor fix to re-enable HW vlan filtering as soon as the number
of vlans programmed is within the HW limit.
Please consider applying to net-next tree. Thanks!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
While adding vlans, when the HW limit of vlan filters is reached, the
driver enables vlan promiscuous mode.
Similarily, while removing vlans, the driver must re-enable HW filtering
as soon as the number of vlan filters is within the HW limit.
Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If SR-IOV is enabled in the adapter, the FW distributes queue resources
evenly across the PF and it's VFs. If the user is not interested in enabling
VFs, the queues set aside for VFs are wasted.
This patch adds support for the PF driver to re-configure the resource
distribution in FW based on the number of VFs enabled by the user.
This also allows for supporting RSS queues on VFs, when less number of VFs
are enabled per PF. When maximum number of VFs are enabled, each VF typically
gets only one RXQ.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PF driver must query the FW for VF's interface capabilities
to know if the VF is RSS capable or not.
This patch is in preparation for enabling RSS on VFs on Skyhawk-R.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix be_cmd_get_profile_cmd() to use be_cmd_notify_wait() routine,
which uses MBOX if MCCQ has not been created. Doing this reduces
code duplication; we don't need the _mbox/_mccq() variants anymore.
Signed-off-by: Vasundhara Volam <vasundhara.volam@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix checkpatch warning:
WARNING: kfree(NULL) is safe this check is probably not required
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since consume_skb() (and hence dev_kfree_skb() macro) checks the passed pointer
for NULL, there's no need to check for NULL before invoking dev_kfree_skb().
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Harish Patil says:
====================
qlcnic: Enhance Tx timeout debug data collection.
The following set of patches are for enhancing Tx timeout debug collection
- Collect a firmware dump on first Tx timeout if netif_msg_tx_err() is set
- Log Receive and Status ring info on Tx timeout, in addition to Tx ring info
- Log additional Tx ring info if netif_msg_tx_err() is set
- Update driver version to 5.3.61
Please apply this series to net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
- Collect a firmware dump on first Tx timeout if netif_msg_tx_err() is set
- Log Receive and Status ring info on Tx timeout, in addition to Tx ring info
- Log additional Tx ring info if netif_msg_tx_err() is set
Signed-off-by: Harish Patil <harish.patil@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>