The function get_netvsc_net_device had conditional locking. This was
unnecessary, incorrect, but harmless. It was unnecessary since the
code is only called from netlink netdev event callback where RTNL
is always acquired before the callbacks are run. It was incorrect
because of use of trylock and then continuing.
Fix by replacing with proper assertion.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing code uses busy retry when unable to send out receive
completions due to full ring buffer. It also gives up retrying after limit
is reached, and causes receive buffer slots not being recycled.
This patch implements batching of receive completions. It also prevents
dropping receive completions due to full ring buffer.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor overlapping changes for both merge conflicts.
Resolution work done by Stephen Rothwell was used
as a reference.
Signed-off-by: David S. Miller <davem@davemloft.net>
Bonding driver sets IFF_BONDING on both master (the bonding device) and
slave (the real NIC) devices and in netvsc_netdev_event() we want to skip
master devices only. Currently, there is an uncertainty when a slave
interface is removed: if bonding module comes first in netdev_chain it
clears IFF_BONDING flag on the netdev and netvsc_netdev_event() correctly
handles NETDEV_UNREGISTER event, but in case netvsc comes first on the
chain it sees the device with IFF_BONDING still attached and skips it. As
we still hold vf_netdev pointer to the device we crash on the next inject.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We're not guaranteed to see NETDEV_REGISTER/NETDEV_UNREGISTER notifications
only once per VF but we increase/decrease module refcount unconditionally.
Check vf_netdev to make sure we don't take/release it twice. We presume
that only one VF per netvsc device may exist.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We reset vf_inject on VF going down (netvsc_vf_down()) but we don't on
VF removal (netvsc_unregister_vf()) so vf_inject stays 'true' while
vf_netdev is already NULL and we're trying to inject packets into NULL
net device in netvsc_recv_callback() causing kernel to crash.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here is a deadlock scenario:
- netvsc_vf_up() schedules netvsc_notify_peers() work and quits.
- netvsc_vf_down() runs before netvsc_notify_peers() gets executed. As it
is being executed from netdev notifier chain we hold rtnl lock when we
get here.
- we enter while (atomic_read(&net_device_ctx->vf_use_cnt) != 0) loop and
wait till netvsc_notify_peers() drops vf_use_cnt.
- netvsc_notify_peers() starts on some other CPU but netdev_notify_peers()
will hang on rtnl_lock().
- deadlock!
Instead of introducing additional synchronization I suggest we drop
gwrk.dwrk completely and call NETDEV_NOTIFY_PEERS directly. As we're
acting under rtnl lock this is legitimate.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct netvsc_device is not suitable for storing VF information as this
structure is being destroyed on MTU change / set channel operation (see
rndis_filter_device_remove()). Move all VF related stuff to struct
net_device_context which is persistent.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On Hyper-V host 2016 and later, VMs gets an event message of the physical
link speed when vSwitch is changed. This patch handles this message, so
the updated link speed can be reported by ethtool.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The physical link speed value will be reported by ethtool command.
The real speed is available from Windows 2016 host or later.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added a condition to avoid bonding devices with same MAC registering
as VF.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the new APIs for eliminating a copy on the receive path. These new APIs also
help in minimizing the number of memory barriers we end up issuing (in the
ringbuffer code) since we can better control when we want to expose the ring
state to the host.
The patch is being resent to address earlier email issues.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I'm hitting 5 second timeout in rndis_filter_set_rss_param() while setting
RSS parameters for the device. When this happens we end up returning
-ETIMEDOUT from the function and rndis_filter_device_add() falls back to
setting
net_device->max_chn = 1;
net_device->num_chn = 1;
net_device->num_sc_offered = 0;
but after a moment the rndis request succeeds and subchannels start to
appear. netvsc_sc_open() does unconditional nvscdev->num_sc_offered-- and
it becomes U32_MAX-1. Consequent rndis_filter_device_remove() will hang
while waiting for all U32_MAX-1 subchannels to appear and this is not
going to happen.
The immediate issue could be solved by adding num_sc_offered > 0 check to
netvsc_sc_open() but we're getting out of sync with the host and it's not
easy to adjust things later, e.g. in this particular case we'll be creating
queues without a user request for it and races are expected. Same applies
to other parts of the driver which have the same completion timeout.
Following the trend in drivers/hv/* code I suggest we remove all these
timeouts completely. As a guest we can always trust the host we're running
on and if the host screws things up there is no easy way to recover anyway.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only caller rndis_filter_device_add() has 'struct net_device' pointer
already.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We unpack 'struct net_device' in netvsc_set_mac_addr() to get to
'struct hv_device' pointer which we use in rndis_filter_set_device_mac()
to get back to 'struct net_device'.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Both rndis_filter_open()/rndis_filter_close() use struct hv_device to
reach to struct netvsc_device only and all callers have it already.
While on it, rename net_device to nvdev in rndis_filter_open() as
net_device is misleading.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make it easier to get 'struct netvsc_device' from 'struct net_device' and
'struct hv_device' by introducing inline helpers.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net_device_ctx is assigned in the very beginning of the function and 'net'
pointer doesn't change.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added a condition to avoid vlan devices with same MAC registering
as VF.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Crash in netvsc_send() is observed when netvsc device is re-created on
mtu change/set channels. The crash is caused by dereferencing of NULL
channel pointer which comes from chn_table. The root cause is a mixture
of two facts:
- we set nvdev pointer in net_device_context in alloc_net_device()
before we populate chn_table.
- we populate chn_table[0] only.
The issue could be papered over by checking channel != NULL in
netvsc_send() but populating the whole chn_table and writing the
nvdev pointer afterwards seems more appropriate.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When netvsc device is removed during mtu change or channels setup we get
into troubles as both paths are trying to remove the device. Synchronize
them with start_remove flag and rtnl lock.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Simplify netvsvc pointer graph by getting rid of the redundant ndev
pointer. We can always get a pointer to struct net_device from somewhere
else.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have the following structures keeping netvsc adapter state:
- struct net_device
- struct net_device_context
- struct netvsc_device
- struct rndis_device
- struct hv_device
and there are pointers/dependencies between them:
- struct net_device_context is contained in struct net_device
- struct hv_device has driver_data pointer which points to
'struct net_device' OR 'struct netvsc_device' depending on driver's
state (!).
- struct net_device_context has a pointer to 'struct hv_device'.
- struct netvsc_device has pointers to 'struct hv_device' and
'struct net_device_context'.
- struct rndis_device has a pointer to 'struct netvsc_device'.
Different functions get different structures as parameters and use these
pointers for traveling. The problem is (in addition to keeping in mind
this complex graph) that some of these structures (struct netvsc_device
and struct rndis_device) are being removed and re-created on mtu change
(as we implement it as re-creation of hyper-v device) so our travel using
these pointers is dangerous.
Simplify this to a the following:
- add struct netvsc_device pointer to struct net_device_context (which is
a part of struct net_device and thus never disappears)
- remove struct hv_device and struct net_device_context pointers from
struct netvsc_device
- replace pointer to 'struct netvsc_device' with pointer to
'struct net_device'.
- always keep 'struct net_device' in hv_device driver_data.
We'll end up with the following 'circular' structure:
net_device:
[net_device_context] -> netvsc_device -> rndis_device -> net_device
-> hv_device -> net_device
On MTU change we'll be removing the 'netvsc_device -> rndis_device'
branch and re-creating it making the synchronization easier.
There is one additional redundant pointer left, it is struct net_device
link in struct netvsc_device, it is going to be removed in a separate
commit.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
netvsc_link_change() can race with netvsc_change_mtu() or
netvsc_set_channels() as these functions destroy struct netvsc_device and
rndis filter. Use start_remove flag for syncronization. As
netvsc_change_mtu()/netvsc_set_channels() are called with rtnl lock held
we need to take it before checking start_remove value in
netvsc_link_change().
Reported-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct netvsc_device is destroyed on mtu change so keeping the
protection flag there is not a good idea. Move it to struct
net_device_context which is preserved.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RNDIS_STATUS_NETWORK_CHANGE event is handled as two "half events" --
media disconnect & connect. The second half should be added to the list
head, not to the tail. So all events are processed in normal order.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support VF drivers on Hyper-V. On Hyper-V, each VF instance presented to
the guest has an associated synthetic interface that shares the MAC address
with the VF instance. Typically these are bonded together to support
live migration. By default, the host delivers all the incoming packets
on the synthetic interface. Once the VF is up, we need to explicitly switch
the data path on the host to divert traffic onto the VF interface. Even after
switching the data path, broadcast and multicast packets are always delivered
on the synthetic interface and these will have to be injected back onto the
VF interface (if VF is up).
This patch implements the necessary support in netvsc to support Linux
VF drivers.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reorder the code in netvsc_sc_open(), so num_sc_offered is only decremented
after vmbus_open() is called. This avoid pontential race of removing device
before all channels are setup.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The VRSS_CHANNEL_MAX is the max number of channels supported by Hyper-V
hosts. We use it for the related array sizes instead of using NR_CPUS,
which may be set to several thousands.
This patch reduces possible memory allocation failures.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct netvsc_device is freed in rndis_filter_device_remove(). So we save
the nvdev->num_chn into a temp variable for later usage.
(Please also include this patch into stable branch.)
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During hot add, vmbus_device_register() is called from vmbus_onoffer(), on
the same workqueue as the subchannel offer message work-queue, so
subchannel offer won't be processed until the vmbus_device_register()/...
/netvsc_probe() is done.
Also, vmbus_device_register() is called with channel_mutex locked, which
prevents subchannel processing too. So the "waiting for sub-channel
processing" will not success in hot add case. But, in usual module loading,
the netvsc_probe() is called from different code path, and doesn't fail.
This patch resolves the deadlock during NIC hot-add, and speeds up NIC
loading time.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the user to set and retrieve speed and duplex of the
hv_netvsc device via ethtool.
Example:
$ ethtool eth0
Settings for eth0:
...
Speed: Unknown!
Duplex: Unknown! (255)
...
$ ethtool -s eth0 speed 1000 duplex full
$ ethtool eth0
Settings for eth0:
...
Speed: 1000Mb/s
Duplex: Full
...
This is based on patches by Roopa Prabhu and Nikolay Aleksandrov.
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/phy/bcm7xxx.c
drivers/net/phy/marvell.c
drivers/net/vxlan.c
All three conflicts were cases of simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable skb_tx_timestamp in hyperv netvsc.
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit c0eb454034 ("hv_netvsc: Don't ask for additional head room in the
skb") got rid of needed_headroom setting for the driver. With the change I
hit the following issue trying to use ptkgen module:
[ 57.522021] kernel BUG at net/core/skbuff.c:1128!
[ 57.522021] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
...
[ 58.721068] Call Trace:
[ 58.721068] [<ffffffffa0144e86>] netvsc_start_xmit+0x4c6/0x8e0 [hv_netvsc]
...
[ 58.721068] [<ffffffffa02f87fc>] ? pktgen_finalize_skb+0x25c/0x2a0 [pktgen]
[ 58.721068] [<ffffffff814f5760>] ? __netdev_alloc_skb+0xc0/0x100
[ 58.721068] [<ffffffffa02f9907>] pktgen_thread_worker+0x257/0x1920 [pktgen]
Basically, we're calling skb_cow_head(skb, RNDIS_AND_PPI_SIZE) and crash on
if (skb_shared(skb))
BUG();
We probably need to restore needed_headroom setting (but shrunk to
RNDIS_AND_PPI_SIZE as we don't need more) to request the required headroom
space. In theory, it should not give us performance penalty.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Adding NETIF_F_TSO6 feature flag;
2. Adding NETIF_F_HW_CSUM. NETIF_F_IPV6_CSUM and NETIF_F_IP_CSUM are
being deprecated;
3. Cleanup the coding style of flag assignment by using macro.
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since eliminating send_completion_tid from struct hv_netvsc_packet, we
haven't add proper book keeping for the skb of the batched packet. This
patch fixes this issue and allows the previous skb is properly freed.
Otherwise, a panic may happen.
Thanks to Simon Xiao <sixiao@microsoft.com> for bisecting and analysis.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent changes to 'struct flow_keys' (e.g commit d34af823ff ("net: Add
VLAN ID to flow_keys")) introduced a performance regression in netvsc
driver. Is problem is, however, not the above mentioned commit but the
fact that netvsc_set_hash() function did some assumptions on the struct
flow_keys data layout and this is wrong.
Get rid of netvsc_set_hash() by switching to skb_get_hash(). This change
will also imply switching to Jenkins hash from the currently used Toeplitz
but it seems there is no good excuse for Toeplitz to stay.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 2a04ae8acb ("hv_netvsc: remove locking in netvsc_send()"), the
locking for MSD (Multi-Send Data) field was removed. This could cause a
race condition between RNDIS control messages and data packets processing,
because these two types of traffic are not synchronized.
This patch fixes this issue by sending control messages out directly
without reading MSD field.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate vlan_tci from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate status from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate xmit_more from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate completion_func from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate is_data_pkt from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate send_completion_tid from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate page_buf from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packet scheduler guarantees there won't be multiple senders for the same
queue and as we use q_idx for multi_send_data the spinlock is redundant.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rndis header is 116 bytes big and can be placed in the default
head room that will be available in the skb. Since the netvsc packet
is less than 48 bytes, we can use the skb control buffer
for the netvsc packet. With these changes we don't need to
ask for additional head room.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate send_completion_ctx from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate send_completion from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminatte the data field from struct hv_netvsc_packet.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate rndis_msg pointer from hv_netvsc_packet structure.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eliminate the channel field in hv_netvsc_packet structure.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rearrange the elements of struct hv_negtvsc_packet for optimal layout -
eliminate unnecessary padding.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As part of reducing the size of the hv_netvsc_packet, resize some of the
variables based on their usage.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several issues in hv_netvsc driver with regards to link status
change handling:
- RNDIS_STATUS_NETWORK_CHANGE results in calling userspace helper doing
'/etc/init.d/network restart' and this is inappropriate and broken for
many reasons.
- link_watch infrastructure only sends one notification per second and
in case of e.g. paired disconnect/connect events we get only one
notification with last status. This makes it impossible to handle such
situations in userspace.
Redo link status changes handling in the following way:
- Create a list of reconfig events in network device context.
- On a reconfig event add it to the list of events and schedule
netvsc_link_change().
- In netvsc_link_change() ensure 2-second delay between link status
changes.
- Handle RNDIS_STATUS_NETWORK_CHANGE as a paired disconnect/connect event.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The flags argument will allow control of the dissection process (for
instance whether to parse beyond L3).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This enables the use of ethtool --set-channels devname combined N to
change the number of vRSS queues. Separate rx, tx, and other parameters
are not supported. The maximum is rsscap.num_recv_que. It passes the
given value to rndis_filter_device_add through the device_info->num_chn
field.
If the procedure fails, it attempts to recover to the prior state. If
the recovery fails, it logs an error and aborts.
Current num_chn is saved and restored when changing the MTU.
Signed-off-by: Andrew Schwartzmeyer <andschwa@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Uses device_info->num_chn to pass user provided number of vRSS
queues (from ethtool --set-channels) to rndis_filter_device_add. If
nonzero and less than the maximum, set net_device->num_chn to the given
value; else default to prior algorithm.
Always initialize struct device_info to 0, otherwise not all its fields
are guaranteed to be 0, which is necessary when checking if num_chn has
been purposefully set.
Signed-off-by: Andrew Schwartzmeyer <andschwa@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds data structures and handlers for messages related
to SRIOV Virtual Function.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code returns from probe without waiting for the proper handling
of subchannels that may be requested. If the netvsc driver were to be rapidly
loaded/unloaded, we can trigger a panic as the unload will be tearing
down state that may not have been fully setup yet. We fix this issue by making
sure that we return from the probe call only after ensuring that the
sub-channel offers in flight are properly handled.
Reviewed-and-tested-by: Haiyang Zhang <haiyangz@microsoft.com
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current change mtu call only stops tx before removing RNDIS filter.
In case ringbufer is not empty, the rndis_filter_device_remove() may
hang on removing the buffers.
This patch adds close of RNDIS filter before removing it, also a
gradual waiting loop until the ring is empty. The change_mtu hang
issue under heavy traffic is solved by this patch.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When packet encapsulation is in use, the MTU needs to be reduced for
headroom reservation.
The existing code takes the updated MTU value only from the host side.
But vSwitch extensions, such as Open vSwitch, require the flexibility
to change the MTU to different values from within a guest during the
lifecycle of a vNIC, when the encapsulation protocol is changed. The
patch supports this kind of MTU changes.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking updates from David Miller:
1) Add TX fast path in mac80211, from Johannes Berg.
2) Add TSO/GRO support to ibmveth, from Thomas Falcon
3) Move away from cached routes in ipv6, just like ipv4, from Martin
KaFai Lau.
4) Lots of new rhashtable tests, from Thomas Graf.
5) Run ingress qdisc lockless, from Alexei Starovoitov.
6) Allow servers to fetch TCP packet headers for SYN packets of new
connections, for fingerprinting. From Eric Dumazet.
7) Add mode parameter to pktgen, for testing receive. From Alexei
Starovoitov.
8) Cache access optimizations via simplifications of build_skb(), from
Alexander Duyck.
9) Move page frag allocator under mm/, also from Alexander.
10) Add xmit_more support to hv_netvsc, from KY Srinivasan.
11) Add a counter guard in case we try to perform endless reclassify
loops in the packet scheduler.
12) Extern flow dissector to be programmable and use it in new "Flower"
classifier. From Jiri Pirko.
13) AF_PACKET fanout rollover fixes, performance improvements, and new
statistics. From Willem de Bruijn.
14) Add netdev driver for GENEVE tunnels, from John W Linville.
15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.
16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.
17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
Borkmann.
18) Add tail call support to BPF, from Alexei Starovoitov.
19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.
20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.
21) Favor even port numbers for allocation to connect() requests, and
odd port numbers for bind(0), in an effort to help avoid
ip_local_port_range exhaustion. From Eric Dumazet.
22) Add Cavium ThunderX driver, from Sunil Goutham.
23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
from Alexei Starovoitov.
24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.
25) Double TCP Small Queues default to 256K to accomodate situations
like the XEN driver and wireless aggregation. From Wei Liu.
26) Add more entropy inputs to flow dissector, from Tom Herbert.
27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
Jonassen.
28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.
29) Track and act upon link status of ipv4 route nexthops, from Andy
Gospodarek.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
bridge: vlan: flush the dynamically learned entries on port vlan delete
bridge: multicast: add a comment to br_port_state_selection about blocking state
net: inet_diag: export IPV6_V6ONLY sockopt
stmmac: troubleshoot unexpected bits in des0 & des1
net: ipv4 sysctl option to ignore routes when nexthop link is down
net: track link-status of ipv4 nexthops
net: switchdev: ignore unsupported bridge flags
net: Cavium: Fix MAC address setting in shutdown state
drivers: net: xgene: fix for ACPI support without ACPI
ip: report the original address of ICMP messages
net/mlx5e: Prefetch skb data on RX
net/mlx5e: Pop cq outside mlx5e_get_cqe
net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
net/mlx5e: Remove extra spaces
net/mlx5e: Avoid TX CQE generation if more xmit packets expected
net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
...
Nothing in <asm/io.h> uses anything from <linux/vmalloc.h>, so
remove it from there and fix up the resulting build problems
triggered on x86 {64|32}-bit {def|allmod|allno}configs.
The breakages were triggering in places where x86 builds relied
on vmalloc() facilities but did not include <linux/vmalloc.h>
explicitly and relied on the implicit inclusion via <asm/io.h>.
Also add:
- <linux/init.h> to <linux/io.h>
- <asm/pgtable_types> to <asm/io.h>
... which were two other implicit header file dependencies.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
[ Tidied up the changelog. ]
Acked-by: David Miller <davem@davemloft.net>
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Vinod Koul <vinod.koul@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Anton Vorontsov <anton@enomsg.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Colin Cross <ccross@android.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: James E.J. Bottomley <JBottomley@odin.com>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Kristen Carlson Accardi <kristen@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Suma Ramars <sramars@cisco.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Allocate the receive bufer from the NUMA node assigned to the primary
channel.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current algorithm for deciding on the number of VRSS channels is
not optimal since we open up the min of number of CPUs online and the
number of VRSS channels the host is offering. So on a 32 VCPU guest
we could potentially open 32 VRSS subchannels. Experimentation has
shown that it is best to limit the number of VRSS channels to the number
of CPUs within a NUMA node.
Here is the new algorithm for deciding on the number of sub-channels we
would open up:
1) Pick the minimum of what the host is offering and what the driver
in the guest is specifying as the default value.
2) Pick the minimum of (1) and the numbers of CPUs in the NUMA
node the primary channel is bound to.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the struct netvsc_stats has a member s_sync
of type u64_stats_sync.
This definition will break kernel build as the macro
netdev_alloc_pcpu_stats requires this member name to be syncp.
(see netdev_alloc_pcpu_stats definition in ./include/linux/netdevice.h)
This patch changes netvsc_stats's member name from s_sync to syncp to fix
the build break.
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code does not lock anything when calculating the TX and RX stats.
As a result, the RX and TX data reported by ifconfig are not accuracy in a
system with high network throughput and multiple CPUs (in my test,
RX/TX = 83% between 2 HyperV VM nodes which have 8 vCPUs and 40G Ethernet).
This patch fixed the above issue by using per_cpu stats.
netvsc_get_stats64() summarizes TX and RX data by iterating over all CPUs
to get their respective stats.
This v2 patch addressed David's comments on the cleanup path when
netdev_alloc_pcpu_stats() failed.
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Four minor merge conflicts:
1) qca_spi.c renamed the local variable used for the SPI device
from spi_device to spi, meanwhile the spi_set_drvdata() call
got moved further up in the probe function.
2) Two changes were both adding new members to codel params
structure, and thus we had overlapping changes to the
initializer function.
3) 'net' was making a fix to sk_release_kernel() which is
completely removed in 'net-next'.
4) In net_namespace.c, the rtnl_net_fill() call for GET operations
had the command value fixed, meanwhile 'net-next' adjusted the
argument signature a bit.
This also matches example merge resolutions posted by Stephen
Rothwell over the past two days.
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on the information given to this driver (via the xmit_more skb flag),
we can defer signaling the host if more packets are on the way. This will help
make the host more efficient since it can potentially process a larger batch of
packets. Implement this optimization.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With commit b56fc3c536 ("hv_netvsc: Fix a bug in netvsc_start_xmit()"),
skb variable is no longer used in netvsc_send. Remove variable and dead
code that depended on it.
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit b08cc79155 eliminated memory
allocation in the packet send path:
"hv_netvsc: Eliminate memory allocation in the packet send path
The network protocol used to communicate with the host is the remote ndis (rndis)
protocol. We need to decorate each outgoing packet with a rndis header and
additional rndis state (rndis per-packet state). To manage this state, we
currently allocate memory in the transmit path. Eliminate this allocation by
requesting additional head room in the skb."
This commit introduced a bug since it did not account for the case if the skb
was cloned. Fix this bug.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Introduce netif-msg to netvsc to control debug logging output
and keep msg_enable in netvsc_device_context so that it is
kept persistently.
2. Only call dump_rndis_message() when NETIF_MSG_RX_ERR or above
is specified in netvsc module debug param.
In non-debug mode, in current code, dump_rndis_message() will not
dump anything but it still initialize some local variables and
process the switch logic which is unnecessary, especially in
high network throughput situation.
Signed-off-by: Simon Xiao <sixiao@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If remaining space in a send buffer slot is too small for the whole message,
we only copy the RNDIS header and PPI data into send buffer, so we can batch
one more packet each time. It reduces the vmbus per-message overhead.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In netvsc_start_xmit() we can handle packets which are scattered around not
more than MAX_PAGE_BUFFER_COUNT-2 pages. It is, however, easy to create a
packet which is not big in size but occupies more pages (e.g. if it uses frags
on compound pages boundaries). When we drop such packet it cases sender to try
resending it but in most cases it will try resending the same packet which will
also get dropped, this will cause the particular connection to stick. To solve
the issue we can try linearizing skb.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
... which validly uses dev_kfree_skb_any() instead of dev_kfree_skb().
Setting ret to -EFAULT and -ENOMEM have no real meaning here (we need to set
it to anything but -EAGAIN) as we drop the packet and return NETDEV_TX_OK
anyway.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the two places changed, we now use netvsc_xmit_completion() which properly
frees hv_netvsc_packet in or not in skb headroom.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sum of RNDIS msg and PPI struct sizes is used in multiple places, so we define
a macro for them.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The network protocol used to communicate with the host is the remote ndis (rndis)
protocol. We need to decorate each outgoing packet with a rndis header and
additional rndis state (rndis per-packet state). To manage this state, we
currently allocate memory in the transmit path. Eliminate this allocation by
requesting additional head room in the skb.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for embedding the rndis state and other packet state into
the skb, cleanup the test for freeing the skb.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vmbus_are_subchannels_present() also involves opening the channels, which
may be too early at this point. Checking for subchannels is not necessary here.
So this patch removes it. Subchannels will be opened when offer messages arrive.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this patch, we can send out multiple RNDIS data packets in one send buffer
slot and one VMBus message. It reduces the overhead associated with VMBus messages.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for reporting the actual and maximum combined channels
count of the hv_netvsc driver via 'ethtool --show-channels'.
This required adding 'max_chn' to 'struct netvsc_device', and assigning
it 'rsscap.num_recv_que' in 'rndis_filter_device_add'. Now we can access
the combined maximum channel count via 'struct netvsc_device' in the
ethtool callback.
Signed-off-by: Andrew Schwartzmeyer <andrew@schwartzmeyer.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
this patch fixes following sparse warnings:
netvsc.c:688:5: warning: symbol 'netvsc_copy_to_send_buf' was not declared. Should it be static?
rndis_filter.c:627:5: warning: symbol 'rndis_filter_set_offload_params' was not declared. Should it be static?
rndis_filter.c:702:5: warning: symbol 'rndis_filter_set_rss_param' was not declared. Should it be static?
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/vxlan.c
drivers/vhost/net.c
include/linux/if_vlan.h
net/core/dev.c
The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.
In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.
In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.
In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing code frees the skb in EAGAIN case, in which the skb will be
retried from upper layer and used again.
Also, the existing code doesn't free send buffer slot in error case, because
there is no completion message for unsent packets.
This patch fixes these problems.
(Please also include this patch for stable trees. Thanks!)
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
return type of wait_for_completion_timeout is unsigned long not int, this
patch just fixes up the declarations.
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
The changed names are union fields with the same size, so the existing code
still works. But, we now update these variables to the correct names.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds proper handling of the vNIC hot removal event, which includes
a rescind-channel-offer message from the host side that triggers vNIC close and
removal. In this case, the notices to the host during close and removal is not
necessary because the channel is rescinded. This patch blocks these unnecessary
messages, and lets vNIC removal process complete normally.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vfree() function performs also input parameter validation.
Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the host uses packet encapsulation feature, the MTU may be reduced by the
host due to headroom reservation for encapsulation. This patch handles this
new MTU value.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will allow the workload spreading via vRSS for IPv6.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
total_data_buflen is used by netvsc_send() to decide if a packet can be put
into send buffer. It should also include the size of RNDIS message before the
Ethernet frame. Otherwise, a messge with total size bigger than send_section_size
may be copied into the send buffer, and cause data corruption.
[Request to include this patch to the Stable branches]
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case that the IP header has optional field at the end, this patch will
get the port numbers after that field, and compute the hash. The general
parser skb_flow_dissect() is used here.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the packet is successfully sent, we should not touch the packet
as it may have been freed. This patch is based on the work done by
Long Li <longli@microsoft.com>.
David, please queue this up for stable.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/usb/r8152.c
net/netfilter/nfnetlink.c
Both r8152 and nfnetlink conflicts were simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
After the packet is successfully sent, we should not touch the skb
as it may have been freed. This patch is based on the work done by
Long Li <longli@microsoft.com>.
In this version of the patch I have fixed issues pointed out by David.
David, please queue this up for stable.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Long Li <longli@microsoft.com>
Tested-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We try to call free_netvsc_device(net_device) when "net_device" is NULL.
It leads to an Oops.
Fixes: f90251c8a6 ('hyperv: Increase the buffer length for netvsc_channel_cb()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the buffer is too small for a packet from VMBus, a bigger buffer will be
allocated in netvsc_channel_cb() and retry reading the packet from VMBus.
Increasing this buffer size will reduce the retry overhead.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
WS2008R2 is a supported platform and it turns out that the maximum sendbuf
size that ws2008R2 can support is only 15MB. Make the necessary
adjustment.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Intel did some benchmarking on our network throughput when Linux on Hyper-V
is as used as a gateway. This fix gave us almost a 1 Gbps additional throughput
on about 5Gbps base throughput we hadi, prior to increasing the sendbuf size.
The sendbuf mechanism is a copy based transport that we have which is clearly
more optimal than the copy-free page flipping mechanism (for small packets).
In the forwarding scenario, we deal only with MTU sized packets,
and increasing the size of the senbuf area gave us the additional performance.
For what it is worth, Windows guests on Hyper-V, I am told use similar sendbuf
size as well.
The exact value of sendbuf I think is less important than the fact that it needs
to be larger than what Linux can allocate as physically contiguous memory.
Thus the change over to allocating via vmalloc().
We currently allocate 16MB receive buffer and we use vmalloc there for allocation.
Also the low level channel code has already been modified to deal with physically
dis-contiguous memory in the ringbuffer setup.
Based on experimentation Intel did, they say there was some improvement in throughput
as the sendbuf size was increased up to 16MB and there was no effect on throughput
beyond 16MB. Thus I have chosen 16MB here.
Increasing the sendbuf value makes a material difference in small packet handling
In this version of the patch, based on David's feedback, I have added
additional details in the commit log.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return -ENOMEM from the kalloc error handling
case instead of 0.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to have at least a netconsole to debug kernel issues on
Windows Azure this patch implements netpoll support.
Sending packets is easy, netvsc_start_xmit() does already everything
needed.
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix checkpatch warning:
WARNING: kfree(NULL) is safe this check is probably not required
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RNDIS_STATUS_NETWORK_CHANGE event is received after the Hyper-V host
sleep or hibernation. We refresh network at this time.
MS-TFS: 135162
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
c25aaf814a63: "hyperv: Enable sendbuf mechanism on the send path" added
some teardown code that looks like it was copied from the recieve path
above, but missed a variable name replacement.
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It passes the hash value as the RNDIS Per-packet info to the Hyper-V host,
so that the send completion notices can be spread across multiple channels.
MS-TFS: 140273
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net: get rid of SET_ETHTOOL_OPS
Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
This does that.
Mostly done via coccinelle script:
@@
struct ethtool_ops *ops;
struct net_device *dev;
@@
- SET_ETHTOOL_OPS(dev, ops);
+ dev->ethtool_ops = ops;
Compile tested only, but I'd seriously wonder if this broke anything.
Suggested-by: Dave Miller <davem@davemloft.net>
Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/altera/altera_sgdma.c
net/netlink/af_netlink.c
net/sched/cls_api.c
net/sched/sch_api.c
The netlink conflict dealt with moving to netlink_capable() and
netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations
in non-init namespaces. These were simple transformations from
netlink_capable to netlink_ns_capable.
The Altera driver conflict was simply code removal overlapping some
void pointer cast cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
This change ensures the driver can be built successfully without the
CONFIG_SYSFS flag.
MS-TFS: 182270
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do checksum offload only if the client of the driver wants checksum to be
offloaded.
In V1 version of this patch, I addressed comments from
Stephen Hemminger <stephen@networkplumber.org> and
Eric Dumazet <eric.dumazet@gmail.com>.
In this version of the patch I have addressed comments from
David Miller.
This patch fixes a bug that is exposed in gateway scenarios.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We send packets using a copy-free mechanism (this is the Guest to Host transport
via VMBUS). While this is obviously optimal for large packets,
it may not be optimal for small packets. Hyper-V host supports
a second mechanism for sending packets that is "copy based". We implement that
mechanism in this patch.
In this version of the patch I have addressed a comment from David Miller.
With this patch (and all of the other offload and VRSS patches), we are now able
to almost saturate a 10G interface between Linux VMs on Hyper-V
on different hosts - close to 9 Gbps as measured via iperf.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The union contains only one member now, so we use the variables in it directly.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed recv_pkt_list and lock, and updated related code, so that
the locking overhead is reduced especially when multiple channels
are in use.
The recv_pkt_list isn't actually necessary because the packets are
processed sequentially in each channel. It has been replaced by a
local variable, and the related lock for this list is also removed.
The is_data_pkt field is not used in receive path, so its assignment
is cleaned up.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This feature allows multiple channels to be used by each virtual NIC.
It is available on Hyper-V host 2012 R2.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ws2008r2 does not support UDP checksum offload. Thus, we cannnot turn on
UDP offload in the host. Also, on ws2012 and ws2012 r2, there appear to be
an issue with UDP checksum offload.
Fix this issue by computing the UDP checksum in the Hyper-V driver.
Based on Dave Miller's comments, in this version, I have COWed the skb
before modifying the UDP header (the checksum field).
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ws2008R2 supports ndis_version 6.1 and 6.1 is the minimal version required
for various offloads. Negotiate ndis_version 6.1 when on ws2008r2.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An outgoing packet can potentially need per-packet information for
all the offloads and VLAN tagging. Fix this issue.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/usb/r8152.c
drivers/net/xen-netback/netback.c
Both the r8152 and netback conflicts were simple overlapping
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to a bug in the Hyper-V host verion 2008R2, we need to use a slightly smaller
receive buffer size, otherwise the buffer will not be accepted by the legacy hosts.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable segmentation offload.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable send side checksum offload.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable receive side checksum offload.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prior to enabling guest side offloads, enable the offloads on the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for enabling offloads, cleanup the send path.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cleanup the code and enable scatter gather I/O.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It moves the state setting for query into rndis_filter_receive_response().
All callbacks including query-complete and status-callback are synchronized
by channel->inbound_lock. This prevents pentential race between them.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It auto negotiates the highest NetVSP version supported by both guest and host.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/bonding/bond_3ad.h
drivers/net/bonding/bond_main.c
Two minor conflicts in bonding, both of which were overlapping
changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of the buffer allocation in the receive path for normal packets.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the receive path a little more efficient by parameterizing the
required state rather than re-establishing that state.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This structure is redundant; get rid of it make the code little more efficient -
get rid of the unnecessary indirection.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without this patch, the "cat /sys/class/net/ethN/operstate" shows
"unknown", and "ethtool ethN" shows "Link detected: yes", when VM
boots up with or without vNIC connected.
This patch fixed the problem.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will allow us to use bigger receive buffer, and prevent allocation failure
due to fragmented memory.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
net/ipv6/ip6_tunnel.c
net/ipv6/ip6_vti.c
ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.
qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.
Signed-off-by: David S. Miller <davem@davemloft.net>
Moving the register_netdev to the end of probe to prevent
possible open call happens before NetVSP is connected.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/intel/i40e/i40e_main.c
drivers/net/macvtap.c
Both minor merge hassles, simple overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
There's a possible deadlock if we flush the peers notifying work during setting
mtu:
[ 22.991149] ======================================================
[ 22.991173] [ INFO: possible circular locking dependency detected ]
[ 22.991198] 3.10.0-54.0.1.el7.x86_64.debug #1 Not tainted
[ 22.991219] -------------------------------------------------------
[ 22.991243] ip/974 is trying to acquire lock:
[ 22.991261] ((&(&net_device_ctx->dwork)->work)){+.+.+.}, at: [<ffffffff8108af95>] flush_work+0x5/0x2e0
[ 22.991307]
but task is already holding lock:
[ 22.991330] (rtnl_mutex){+.+.+.}, at: [<ffffffff81539deb>] rtnetlink_rcv+0x1b/0x40
[ 22.991367]
which lock already depends on the new lock.
[ 22.991398]
the existing dependency chain (in reverse order) is:
[ 22.991426]
-> #1 (rtnl_mutex){+.+.+.}:
[ 22.991449] [<ffffffff810dfdd9>] __lock_acquire+0xb19/0x1260
[ 22.991477] [<ffffffff810e0d12>] lock_acquire+0xa2/0x1f0
[ 22.991501] [<ffffffff81673659>] mutex_lock_nested+0x89/0x4f0
[ 22.991529] [<ffffffff815392b7>] rtnl_lock+0x17/0x20
[ 22.991552] [<ffffffff815230b2>] netdev_notify_peers+0x12/0x30
[ 22.991579] [<ffffffffa0340212>] netvsc_send_garp+0x22/0x30 [hv_netvsc]
[ 22.991610] [<ffffffff8108d251>] process_one_work+0x211/0x6e0
[ 22.991637] [<ffffffff8108d83b>] worker_thread+0x11b/0x3a0
[ 22.991663] [<ffffffff81095e5d>] kthread+0xed/0x100
[ 22.991686] [<ffffffff81681c6c>] ret_from_fork+0x7c/0xb0
[ 22.991715]
-> #0 ((&(&net_device_ctx->dwork)->work)){+.+.+.}:
[ 22.991715] [<ffffffff810de817>] check_prevs_add+0x967/0x970
[ 22.991715] [<ffffffff810dfdd9>] __lock_acquire+0xb19/0x1260
[ 22.991715] [<ffffffff810e0d12>] lock_acquire+0xa2/0x1f0
[ 22.991715] [<ffffffff8108afde>] flush_work+0x4e/0x2e0
[ 22.991715] [<ffffffff8108e1b5>] __cancel_work_timer+0x95/0x130
[ 22.991715] [<ffffffff8108e303>] cancel_delayed_work_sync+0x13/0x20
[ 22.991715] [<ffffffffa03404e4>] netvsc_change_mtu+0x84/0x200 [hv_netvsc]
[ 22.991715] [<ffffffff815233d4>] dev_set_mtu+0x34/0x80
[ 22.991715] [<ffffffff8153bc2a>] do_setlink+0x23a/0xa00
[ 22.991715] [<ffffffff8153d054>] rtnl_newlink+0x394/0x5e0
[ 22.991715] [<ffffffff81539eac>] rtnetlink_rcv_msg+0x9c/0x260
[ 22.991715] [<ffffffff8155cdd9>] netlink_rcv_skb+0xa9/0xc0
[ 22.991715] [<ffffffff81539dfa>] rtnetlink_rcv+0x2a/0x40
[ 22.991715] [<ffffffff8155c41d>] netlink_unicast+0xdd/0x190
[ 22.991715] [<ffffffff8155c807>] netlink_sendmsg+0x337/0x750
[ 22.991715] [<ffffffff8150d219>] sock_sendmsg+0x99/0xd0
[ 22.991715] [<ffffffff8150d63e>] ___sys_sendmsg+0x39e/0x3b0
[ 22.991715] [<ffffffff8150eba2>] __sys_sendmsg+0x42/0x80
[ 22.991715] [<ffffffff8150ebf2>] SyS_sendmsg+0x12/0x20
[ 22.991715] [<ffffffff81681d19>] system_call_fastpath+0x16/0x1b
This is because we hold the rtnl_lock() before ndo_change_mtu() and try to flush
the work in netvsc_change_mtu(), in the mean time, netdev_notify_peers() may be
called from worker and also trying to hold the rtnl_lock. This will lead the
flush won't succeed forever. Solve this by not canceling and flushing the work,
this is safe because the transmission done by NETDEV_NOTIFY_PEERS was
synchronized with the netif_tx_disable() called by netvsc_change_mtu().
Reported-by: Yaju Cao <yacao@redhat.com>
Tested-by: Yaju Cao <yacao@redhat.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several files refer to an old address for the Free Software Foundation
in the file header comment. Resolve by replacing the address with
the URL <http://www.gnu.org/licenses/> so that we do not have to keep
updating the header comments anytime the address changes.
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Veaceslav Falico <vfalico@redhat.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Paul Mackerras <paulus@samba.org>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove HV_DRV_VERSION, it has no meaning for upstream drivers.
Initially it was supposed to show the "Linux Integration Services"
version, now it is not in sync anymore with the out-of-tree drivers
available from the MSFT website.
The only place where a version string is still required is the KVP
command "IntegrationServicesVersion" which is handled by
tools/hv/hv_kvp_daemon.c. To satisfy such KVP request from the host pass
the current string to the daemon during KVP userland registration.
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
SG mode is not currently supported by netvsc, so remove this flag for now.
Otherwise, it will be unconditionally enabled by commit ec5f061564
"Kill link between CSUM and SG features"
Previously, the SG feature is disabled because CSUM is not set here.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should call __vlan_hwaccel_put_tag() only if the packet
comes from vlan, otherwise VLAN_TAG_PRESENT will always be
added.
Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the recent addition of 8021AD, we need to set the new field vlan_proto in
sk_buff. Otherwise, it will trigger BUG() call in vlan_proto_idx().
This patch fixes the problem.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixed: warning: cast from pointer to integer of different size
The Hyper-V hosts always use 64 bit request id. The guests can have 32 or 64
bit pointers which equal to the ulong type size. So we cast it to ulong type.
And, assigning 32bit integer to 64 bit variable works fine.
The VMBus returns the same id in the completion packet. But the value has no
effect on the host side.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/emulex/benet/be_main.c
drivers/net/ethernet/intel/igb/igb_main.c
drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
include/net/scm.h
net/batman-adv/routing.c
net/ipv4/tcp_input.c
The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
cleanup in net-next to now pass cred structs around.
The be2net driver had a bug fix in 'net' that overlapped with the VLAN
interface changes by Patrick McHardy in net-next.
An IGB conflict existed because in 'net' the build_skb() support was
reverted, and in 'net-next' there was a comment style fix within that
code.
Several batman-adv conflicts were resolved by making sure that all
calls to batadv_is_my_mac() are changed to have a new bat_priv first
argument.
Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
rewrite in 'net-next', mostly overlapping changes.
Thanks to Stephen Rothwell and Antonio Quartulli for help with several
of these merge resolutions.
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename the hardware VLAN acceleration features to include "CTAG" to indicate
that they only support CTAGs. Follow up patches will introduce 802.1ad
server provider tagging (STAGs) and require the distinction for hardware not
supporting acclerating both.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases, the VM_PKT_COMP message can arrive later than RNDIS completion
message, which will free the packet memory. This may cause panic due to access
to freed memory in netvsc_send_completion().
This patch fixes this problem by removing rndis_filter_send_request_completion()
from the code path. The function was a no-op.
Reported-by: Long Li <longli@microsoft.com>
Tested-by: Long Li <longli@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The warning about local_bh_enable inside IRQ happens when disconnecting a
virtual NIC.
The reason for the warning is -- netif_tx_disable() is called when the NIC
is disconnected. And it's called within irq context. netif_tx_disable() calls
local_bh_enable() which displays warning if in irq.
The fix is to remove the unnecessary netif_tx_disable & wake_queue() in the
netvsc_linkstatus_callback().
Reported-by: Richard Genoud <richard.genoud@gmail.com>
Tested-by: Long Li <longli@microsoft.com>
Tested-by: Richard Genoud <richard.genoud@gmail.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Here's the big char/misc driver patches for 3.9-rc1.
Nothing major here, just lots of different driver updates (mei, hyperv, ipack,
extcon, vmci, etc.).
All of these have been in the linux-next tree for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iEYEABECAAYFAlEmZJgACgkQMUfUDdst+ymhZgCgo2dn37r9uMCwgTSpxSq92Je5
x8kAnRF1UnD6ZvySRIlLUBV5LW1YgFnK
=i5HH
-----END PGP SIGNATURE-----
Merge tag 'char-misc-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver patches from Greg Kroah-Hartman:
"Here's the big char/misc driver patches for 3.9-rc1.
Nothing major here, just lots of different driver updates (mei,
hyperv, ipack, extcon, vmci, etc.).
All of these have been in the linux-next tree for a while."
* tag 'char-misc-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (209 commits)
w1: w1_therm: Add force-pullup option for "broken" sensors
w1: ds2482: Added 1-Wire pull-up support to the driver
vme: add missing put_device() after device_register() fails
extcon: max8997: Use workqueue to check cable state after completing boot of platform
extcon: max8997: Set default UART/USB path on probe
extcon: max8997: Consolidate duplicate code for checking ADC/CHG cable type
extcon: max8997: Set default of ADC debounce time during initialization
extcon: max8997: Remove duplicate code related to set H/W line path
extcon: max8997: Move defined constant to header file
extcon: max77693: Make max77693_extcon_cable static
extcon: max8997: Remove unreachable code
extcon: max8997: Make max8997_extcon_cable static
extcon: max77693: Remove unnecessary goto statement to improve readability
extcon: max77693: Convert to devm_input_allocate_device()
extcon: gpio: Rename filename of extcon-gpio.c according to kernel naming style
CREDITS: update email and address of Harald Hoyer
extcon: arizona: Use MICDET for final microphone identification
extcon: arizona: Always take the first HPDET reading as the final one
extcon: arizona: Clear _trig_sts bits after jack detection
extcon: arizona: Don't HPDET magic when headphones are enabled
...
Bring in the 'net' tree so that we can get some ipv4/ipv6 bug
fixes that some net-next work will build upon.
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the consolidated GUID definitions in the Hyper-V network driver.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch fixed wrong mac length, it should be ETH_ALEN,
also replaced the hardcode 6 in hyperv_net.h
Signed-off-by: Amos Kong <kongjianjun@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use strlcpy where possible to ensure the string is \0 terminated.
Use always sizeof(string) instead of 32, ETHTOOL_BUSINFO_LEN
and custom defines.
Use snprintf instead of sprint.
Remove unnecessary inits of ->fw_version
Remove unnecessary inits of drvinfo struct.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
This message indicates an error returned from the host when changing MAC address.
Reported-by: Michal Kubecek <mkubecek@suse.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Checked with Windows networking team, there is only one RNDIS message
in each netvsc packet.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some response messages, there may be some extended info after the
message.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The existing code always reports NVSP_STAT_SUCCESS. This patch adds the
mechanism to report failure when it happens.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The receive code path doesn't use the page buffer, so remove the
extra allocated space here.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To prevent possible data corruption in RNDIS requests, add another
page buffer if the request message crossed page boundary.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return ETIMEDOUT when the reply message is not received in time.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
According to RNDIS specs, Windows sets this size to
0x4000. I use the same value here.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I believe net/core/dev.c is a better place for netif_notify_peers(),
because other net event notify functions also stay in this file.
And rename it to netdev_notify_peers().
Cc: David S. Miller <davem@davemloft.net>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to wait for send_completion msg before put_rndis_request() at
the end of rndis_filter_halt_device(). Otherwise, netvsc_send_completion()
may reference freed memory which is overwritten, and cause panic.
Reported-by: Long Li <longli@microsoft.com>
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It prevents ring_size being set to a too small value.
Reported-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for setting synthetic NIC MAC address from within Linux
guests. Before using this feature, the option "spoofing of MAC address"
should be enabled at the Hyper-V manager / Settings of the synthetic
NIC.
Thanks to Kin Cho <kcho@infoblox.com> for the initial implementation and
tests. And, thanks to Long Li <longli@microsoft.com> for the debugging
works.
Reported-and-tested-by: Kin Cho <kcho@infoblox.com>
Reported-by: Long Li <longli@microsoft.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding casts of objects to the same type is unnecessary
and confusing for a human reader.
For example, this cast:
int y;
int *p = (int *)&y;
I used the coccinelle script below to find and remove these
unnecessary casts. I manually removed the conversions this
script produces of casts with __force, __iomem and __user.
@@
type T;
T *p;
@@
- (T *)p
+ p
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the busy-waiting/udelay to wait_event on outstanding sends.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch the hyperv filter and rndis gadget driver to use the same command
enumerators as the other drivers and delete the surplus command codes.
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RNDIS status codes are redefined with much stranged ifdeffery
and only one of these codes was used in the hyperv driver, and
there it is very clearly referring to the RNDIS variant, not some
other status. So clarify this by explictly using the RNDIS_*
prefixed status code in the hyperv drivera and delete the
duplicate defines.
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As a first step to consolidate the RNDIS implementations, break out
a common file with all the #defines and move it to <linux/rndis.h>.
This also deletes the immediate duplicated defines in the
<linux/rndis.h> file that yields a lot of compilation warnings.
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix merge between commit 3adadc08cc ("net ax25: Reorder ax25_exit to
remove races") and commit 0ca7a4c87d ("net ax25: Simplify and
cleanup the ax25 sysctl handling")
The former moved around the sysctl register/unregister calls, the
later simply removed them.
With help from Stephen Rothwell.
Signed-off-by: David S. Miller <davem@davemloft.net>
Although the network interface is down, the RX packets number which
could be observed by ifconfig may keep on increasing.
This is because the WORK scheduled in netvsc_set_multicast_list()
may be executed after netvsc_close(). That means the rndis filter
may be re-enabled by do_set_multicast() even if it was closed by
netvsc_close().
By canceling possible WORK before close the rndis filter, the issue
could be never happened.
Signed-off-by: Wenqi Ma <wenqi_ma@trendmicro.com.cn>
Reviewed-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the existing code, we only stop queue when the ringbuffer is full,
so the current packet has to be dropped or retried from upper layer.
This patch stops the tx queue when available ringbuffer is below
the low watermark. So the ringbuffer still has small amount of space
available for the current packet. This will reduce the overhead of
retries on sending.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of dropping the packet, we keep the skb buffer, and return
NETDEV_TX_BUSY to let upper layer retry send. This will not cause
endless loop, because the host is taking data away from ring buffer,
and we have called the stop_queue before returning NETDEV_TX_BUSY.
The stop_queue was called in the function netvsc_send() in file
netvsc.c, then it returns to rndis_filter_send(), which returns to
netvsc_start_xmit() in file netvsc_drv.c. So the NETDEV_TX_BUSY is
indeed returned AFTER queue is stopped.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A driver start_xmit() method cannot free skb and return NETDEV_TX_BUSY,
since caller is going to reuse freed skb.
This is mostly a revert of commit bf769375c (staging: hv: fix the return
status of netvsc_start_xmit())
In fact netif_tx_stop_queue() / netif_stop_queue() is needed before
returning NETDEV_TX_BUSY or you can trigger a ksoftirqd fatal loop.
In case of memory allocation error, only safe way is to drop the packet
and return NETDEV_TX_OK
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With this feature, a Linux guest can now configure multiple vlans through
a single synthetic NIC on Win8 Hyper-V host.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Limiting the memcpy to be the sizeof(struct rndis_message) can truncate
the message if there are Per-Packet-Info or Out-of-Band data.
In my earlier patch (commit 45326342), the unnecessary kmap_atomic and
kunmap_atomic surrounding this memcpy have been removed because the memory
in the receive buffer is always mapped. This memcpy is not necessary
either. To fix the bug, I removed the memcpy.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Olaf Hering <olaf@aepfle.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The memory has been allocated by kzalloc, so it's unnecessary to memset
again.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The first assignment to variable "net" is wrong, but overridden by the
latter assignments. So the bug isn't manifested.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a possible data corruption if an RNDIS message goes beyond page
boundary in the sending code path. This patch fixes the problem.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For code path not on the xmit, use netif_tx_disable() instead of
netif_stop_queue() to ensure other CPUs are not doing xmit.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The packets with size larger than 1452 will be dropped by bridge
which with two hyperv netdevice ports. This cause by hyperv netvsc
driver always copy the trailer padding to the data packet, and then
the skb received from netdevice may include wrong skb->len (20 bytes
larger than the real size normally). The captured packet may like
this:
Ethernet II, Src: Microsof_00:00:07 (00:15:5d:00:00:07),
Dst: HewlettP_00:00:4e (00:1f:29:00:00:4e)
Destination: HewlettP_e6:00:4e (00:1f:29:00:00:4e)
Source: Microsof_f6:6d:07 (00:15:5d:f6:6d:07)
Type: IP (0x0800)
Trailer: 1415161718191A1B1C1D1E1F20212223
Frame check sequence: 0x24252627 [incorrect, should be 0x7c2e5a5e]
The following command help to reproduction it, and the ping ICMP
packets will be dropped by bridge.
$ ping ip -s 1453
This patch fixed it by removing the trailer padding from the data
packet.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
skb->len after call eth_type_trans() does not include the ether
header size, but rx_bytes should account it.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
do_set_multicast() may not free the memory malloc in
netvsc_set_multicast_list().
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the user set the MTU up to 65536 for Linux guests running on
Hyper-V 2008 R2 or later.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Automatically negotiate the highest protocol version mutually recognized by
both host and guest.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
__get_free_pages() doesn't return HI memory, so the memory is always mapped.
kmap_atomic() is not necessary here. This patch removes the kmap_atomic()
calls and related code for locking and page manipulation.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The ring buffer is only used to pass meta data for outbound packets. The
actual payload is accessed by DMA from the host. So the stop/wake queue
mechanism based on counting and comparing number of pages sent v.s. number
of pages in the ring buffer is wrong. Also, there is a race condition in
the stop/wake queue calls, which can stop xmit queue forever.
The new stop/wake queue mechanism is based on the actual bytes used by
outbound packets in the ring buffer. The check for number of outstanding
sends after stop queue prevents the race condition that can cause wake
queue happening earlier than stop queue.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reported-by: Long Li <longli@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add code to accept promiscuous mode setting, and pass it to
RNDIS filter.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
hv_netvsc has been reviewed on netdev mailing list on 6/09/2011.
All recommended changes have been made. We are requesting to move
it out of staging area.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: KY Srinivasan <kys@microsoft.com>
Signed-off-by: Mike Sterling <Mike.Sterling@microsoft.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>