The netif_device_detach() conditionally stops all tx queues if the queues
are running. There is no need to call netif_tx_stop_all_queues() again.
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 406f42fa0d ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers. We will make
netdev->dev_addr a const.
Make sure local references to netdev->dev_addr are constant.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use dev_addr_set() instead of writing directly to netdev->dev_addr
in various misc and old drivers.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With all vmxnet3 version 6 changes incorporated in the vmxnet3 driver,
the driver can configure emulation to run at vmxnet3 version 6, provided
the emulation advertises support for version 6.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch increases the maximum configurable mtu to 9190
to accommodate jumbo packets of overlay traffic.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As vmxnet3 supports IP/TCP/UDP RSS, this patch sets appropriate
hash type based on the type of RSS performed.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With version 6, vmxnet3 relaxes the restriction on queues to
be power of two. This is helpful in cases (Edge VM) where
vcpus are less than 8 and device requires more than 4 queues.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, vmxnet3 supports maximum of 8 Tx/Rx queues. With increase
in number of vcpus on a VM, to achieve better performance and utilize
idle vcpus, we need to increase the max number of queues supported.
This patch enhances vmxnet3 to support maximum of 32 Tx/Rx queues.
Increasing the Rx queues also increases the probability of distrubuting
the traffic from different flows to different queues with RSS.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmxnet3 is currently at version 4 and this patch initiates the
preparation to accommodate changes for upto version 6. Introduced
utility macros for vmxnet3 version 6 comparison and update Copyright
information.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
buf_info structures in RX & TX queues are private driver data that
do not need to be visible to the device. Although there is physical
address and length in the queue descriptor that points to these
structures, their layout is not standardized, and device never looks
at them.
So lets allocate these structures in non-DMA-able memory, and fill
physical address as all-ones and length as zero in the queue
descriptor.
That should alleviate worries brought by Martin Radev in
https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210104/022829.html
that malicious vmxnet3 device could subvert SVM/TDX guarantees.
Signed-off-by: Petr Vandrovec <petr@vmware.com>
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit dacce2be33 ("vmxnet3: add geneve and vxlan tunnel offload
support") added support for encapsulation offload. However, the inner
offload capability is to be restrictued to UDP tunnels.
This patch fixes the issue for non-udp tunnels by adding features
check capability and filtering appropriate features for non-udp tunnels.
Fixes: dacce2be33 ("vmxnet3: add geneve and vxlan tunnel offload support")
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit dacce2be33 ("vmxnet3: add geneve and vxlan tunnel offload
support") added support for encapsulation offload. However, while
calculating tcp hdr length, it does not take into account if the
packet is encapsulated or not.
This patch fixes this issue by using correct reference for inner
tcp header.
Fixes: dacce2be33 ("vmxnet3: add geneve and vxlan tunnel offload support")
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'Commit dacce2be33 ("vmxnet3: add geneve and vxlan tunnel offload
support")' added support for encapsulation offload. However, while
preparing inner tso packet, it uses reference to outer ip headers.
This patch fixes this issue by using correct reference for inner
headers.
Fixes: dacce2be33 ("vmxnet3: add geneve and vxlan tunnel offload support")
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With all vmxnet3 version 4 changes incorporated in the vmxnet3 driver,
the driver can configure emulation to run at vmxnet3 version 4, provided
the emulation advertises support for version 4.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vmxnet3 version 3 device supports checksum/TSO offload. Thus, vNIC to
pNIC traffic can leverage hardware checksum/TSO offloads. However,
vmxnet3 does not support checksum/TSO offload for Geneve/VXLAN
encapsulated packets. Thus, for a vNIC configured with an overlay, the
guest stack must first segment the inner packet, compute the inner
checksum for each segment and encapsulate each segment before
transmitting the packet via the vNIC. This results in significant
performance penalty.
This patch will enhance vmxnet3 to support Geneve/VXLAN TSO as well as
checksum offload.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With vmxnet3 version 4, the emulation supports multiqueue(RSS) for
UDP and ESP traffic. A guest can enable/disable RSS for UDP/ESP over
IPv4/IPv6 by issuing commands introduced in this patch. ESP ipv6 is
not yet supported in this patch.
This patch implements get_rss_hash_opts and set_rss_hash_opts
methods to allow querying and configuring different Rx flow hash
configurations.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmxnet3 is currently at version 3 and this patch initiates the
preparation to accommodate changes for version 4. Introduced utility
macros for vmxnet3 version 4 comparison and update Copyright
information.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use new helper tcp_v6_gso_csum_prep in additional network drivers.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use accessor functions for skb fragment's page_offset instead
of direct references, in preparation for bvec conversion.
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unifying the skb_frag and bio_vec, use the fine
accessors which already exist and use skb_frag_t instead of
struct skb_frag_struct.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 518a2f1925
("dma-mapping: zero memory returned from dma_alloc_*"),
dma_alloc_coherent has already zeroed the memory.
So memset is not needed.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, when rx csum is disabled, vmxnet3 driver does not turn
off lro, which can cause performance issues if user does not turn off
lro explicitly. This patch adds fix_features support which is used to
turn off LRO whenever RXCSUM is disabled.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Rishi Mehta <rmehta@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ifa_list is protected by rcu, yet code doesn't reflect this.
Add the __rcu annotations and fix up all places that are now reported by
sparse.
I've done this in the same commit to not add intermediate patches that
result in new warnings.
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.
This change was generated with the following Coccinelle SmPL patch:
@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@
-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
S390 bpf_jit.S is removed in net-next and had changes in 'net',
since that code isn't used any more take the removal.
TLS data structures split the TX and RX components in 'net-next',
put the new struct members from the bug fix in 'net' into the RX
part.
The 'net-next' tree had some reworking of how the ERSPAN code works in
the GRE tunneling code, overlapping with a one-line headroom
calculation fix in 'net'.
Overlapping changes in __sock_map_ctx_update_elem(), keep the bits
that read the prog members via READ_ONCE() into local variables
before using them.
Signed-off-by: David S. Miller <davem@davemloft.net>
As documented in Documentation/timers/timers-howto.txt,
replace msleep(1) with usleep_range().
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The gen bits must be read first from (resp. written last to) DMA memory.
The proper way to enforce this on Linux is to call dma_rmb() (resp.
dma_wmb()).
Signed-off-by: Regis Duchesne <hpreg@vmware.com>
Acked-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The DMA mask must be set before, not after, the first DMA map operation, or
the first DMA map operation could in theory fail on some systems.
Fixes: b0eb57cb97 ("VMXNET3: Add support for virtual IOMMU")
Signed-off-by: Regis Duchesne <hpreg@vmware.com>
Acked-by: Ronak Doshi <doshir@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmxnet3_get_hdr_len() is used to calculate the header length which in
turn is used to calculate the gso_size for skb. When rxvlan offload is
disabled, vlan tag is present in the header and the function references
ip header from sizeof(ethhdr) and leads to incorrect pointer reference.
This patch fixes this issue by taking sizeof(vlan_ethhdr) into account
if vlan tag is present and correctly references the ip hdr.
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Guolin Yang <gyang@vmware.com>
Acked-by: Louis Luo <llouis@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'Commit 45dac1d6ea ("vmxnet3: Changes for vmxnet3 adapter version 2
(fwd)")' introduced a flag "lro" in structure vmxnet3_adapter which is
used to indicate whether LRO is enabled or not. However, the patch
did not set the flag and hence it was never exercised.
So, when LRO is enabled, it resulted in poor TCP performance due to
delayed acks. This issue is seen with packets which are larger than
the mss getting a delayed ack rather than an immediate ack, thus
resulting in high latency.
This patch removes the lro flag and directly uses device features
against NETIF_F_LRO to check if lro is enabled.
Fixes: 45dac1d6ea ("vmxnet3: Changes for vmxnet3 adapter version 2 (fwd)")
Reported-by: Rachel Lunnon <rachel_lunnon@stormagic.com>
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The field txNumDeferred is used by the driver to keep track of the number
of packets it has pushed to the emulation. The driver increments it on
pushing the packet to the emulation and the emulation resets it to 0 at
the end of the transmit.
There is a possibility of a race either when (a) ESX is under heavy load or
(b) workload inside VM is of low packet rate.
This race results in xmit hangs when network coalescing is disabled. This
change creates a local copy of txNumDeferred and uses it to perform ring
arithmetic.
Reported-by: Noriho Tanaka <ntanaka@vmware.com>
Signed-off-by: Ronak Doshi <doshir@vmware.com>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pointer rq is being initialized but this value is never read, it
is being updated inside a for-loop. Remove the initialization and
move it into the scope of the for-loop.
Cleans up clang warning:
drivers/net/vmxnet3/vmxnet3_drv.c:2763:27: warning: Value stored
to 'rq' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
with the introduction of commit
b0eb57cb97, it appears that rq->buf_info
is improperly handled. While it is heap allocated when an rx queue is
setup, and freed when torn down, an old line of code in
vmxnet3_rq_destroy was not properly removed, leading to rq->buf_info[0]
being set to NULL prior to its being freed, causing a memory leak, which
eventually exhausts the system on repeated create/destroy operations
(for example, when the mtu of a vmxnet3 interface is changed
frequently.
Fix is pretty straight forward, just move the NULL set to after the
free.
Tested by myself with successful results
Applies to net, and should likely be queued for stable, please
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-By: boyang@redhat.com
CC: boyang@redhat.com
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: David S. Miller <davem@davemloft.net>
Acked-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several paths in vmxnet3, where settings changes cause the
adapter to be brought down and back up (vmxnet3_set_ringparam among
them). Should part of the reset operation fail, these paths call
vmxnet3_force_close, which enables all napi instances prior to calling
dev_close (with the expectation that vmxnet3_close will then properly
disable them again). However, vmxnet3_force_close neglects to clear
VMXNET3_STATE_BIT_QUIESCED prior to calling dev_close. As a result
vmxnet3_quiesce_dev (called from vmxnet3_close), returns early, and
leaves all the napi instances in a enabled state while the device itself
is closed. If a device in this state is activated again, napi_enable
will be called on already enabled napi_instances, leading to a BUG halt.
The fix is to simply enausre that the QUIESCED bit is cleared in
vmxnet3_force_close to allow quesence to be completed properly on close.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
napi_complete_done() allows to opt-in for gro_flush_timeout,
added back in linux-3.19, commit 3b47d30396
("net: gro: add a per device gro flush timer")
This allows for more efficient GRO aggregation without
sacrifying latencies.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mostly simple overlapping changes.
For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
hyperv_net:
- set min/max_mtu, per Haiyang, after rndis_filter_device_add
virtio_net:
- set min/max_mtu
- remove virtnet_change_mtu
vmxnet3:
- set min/max_mtu
xen-netback:
- min_mtu = 0, max_mtu = 65517
xen-netfront:
- min_mtu = 0, max_mtu = 65535
unisys/visor:
- clean up defines a little to not clash with network core or add
redundat definitions
CC: netdev@vger.kernel.org
CC: virtualization@lists.linux-foundation.org
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: "Michael S. Tsirkin" <mst@redhat.com>
CC: Shrikrishna Khare <skhare@vmware.com>
CC: "VMware, Inc." <pv-drivers@vmware.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Paul Durrant <paul.durrant@citrix.com>
CC: David Kershner <david.kershner@unisys.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmxnet3_set_mc() checks new_table_pa returned by dma_map_single()
with dma_mapping_error(), but even there it assumes zero is invalid pa
(it assumes dma_mapping_error(...,0) returns true if new_table is NULL).
The patch adds an explicit variable to track status of new_table_pa.
Found by Linux Driver Verification project (linuxtesting.org).
v2: use "bool" and "true"/"false" for boolean variables.
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmxnet3_reset_work() expects tx queues to be stopped (via
vmxnet3_quiesce_dev -> netif_tx_disable). However, this races with the
netif_wake_queue() call in netif_tx_timeout() such that the driver's
start_xmit routine may be called unexpectedly, triggering one of the BUG_ON
in vmxnet3_map_pkt with a stack trace like this:
RIP: 0010:[<ffffffffa00cf4bc>] vmxnet3_map_pkt+0x3ac/0x4c0 [vmxnet3]
[<ffffffffa00cf7e0>] vmxnet3_tq_xmit+0x210/0x4e0 [vmxnet3]
[<ffffffff813ab144>] dev_hard_start_xmit+0x2e4/0x4c0
[<ffffffff813c956e>] sch_direct_xmit+0x17e/0x1e0
[<ffffffff813c96a7>] __qdisc_run+0xd7/0x130
[<ffffffff813a6a7a>] net_tx_action+0x10a/0x200
[<ffffffff810691df>] __do_softirq+0x11f/0x260
[<ffffffff81472fdc>] call_softirq+0x1c/0x30
[<ffffffff81004695>] do_softirq+0x65/0xa0
[<ffffffff81069b89>] local_bh_enable_ip+0x99/0xa0
[<ffffffffa031ff36>] destroy_conntrack+0x96/0x110 [nf_conntrack]
[<ffffffff813d65e2>] nf_conntrack_destroy+0x12/0x20
[<ffffffff8139c6d5>] skb_release_head_state+0xb5/0xf0
[<ffffffff8139d299>] skb_release_all+0x9/0x20
[<ffffffff8139cfe9>] __kfree_skb+0x9/0x90
[<ffffffffa00d0069>] vmxnet3_quiesce_dev+0x209/0x340 [vmxnet3]
[<ffffffffa00d020a>] vmxnet3_reset_work+0x6a/0xa0 [vmxnet3]
[<ffffffff8107d7cc>] process_one_work+0x16c/0x350
[<ffffffff810804fa>] worker_thread+0x17a/0x410
[<ffffffff810848c6>] kthread+0x96/0xa0
[<ffffffff81472ee4>] kernel_thread_helper+0x4/0x10
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warning:
drivers/net/vmxnet3/vmxnet3_drv.c:1645:1: warning:
symbol 'vmxnet3_rq_destroy_all_rxdataring' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
'Commit 3c8b3efc06 ("vmxnet3: allow variable length transmit data ring
buffer")' changed the size of the buffers in the tx data ring from a
fixed size of 128 bytes to a variable size.
However, while copying data to the data ring, vmxnet3_copy_hdr continues
to carry the old code that assumes fixed buffer size of 128. This patch
fixes it by adding correct offset based on the actual data ring buffer
size.
Signed-off-by: Guolin Yang <gyang@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With all vmxnet3 version 3 changes incorporated in the vmxnet3 driver,
the driver can configure emulation to run at vmxnet3 version 3, provided
the emulation advertises support for version 3.
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The emulation supports a variety of coalescing modes viz. disabled
(no coalescing), adaptive, static (number of packets to batch before
raising an interrupt), rate based (number of interrupts per second).
This patch implements get_coalesce and set_coalesce methods to allow
querying and configuring different coalescing modes.
Signed-off-by: Keyong Sun <sunk@vmware.com>
Signed-off-by: Manoj Tammali <tammalim@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmxnet3 driver preallocates buffers for receiving packets and posts the
buffers to the emulation. In order to deliver a received packet to the
guest, the emulation must map buffer(s) and copy the packet into it.
To avoid this memory mapping overhead, this patch introduces the receive
data ring - a set of small sized buffers that are always mapped by
the emulation. If a packet fits into the receive data ring buffer, the
emulation delivers the packet via the receive data ring (which must be
copied by the guest driver), or else the usual receive path is used.
Receive Data Ring buffer length is configurable via ethtool -G ethX rx-mini
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmxnet3 driver supports transmit data ring viz. a set of fixed size
buffers used by the driver to copy packet headers. Small packets that
fit these buffers are copied into these buffers entirely.
Currently this buffer size of fixed at 128 bytes. This patch extends
transmit data ring implementation to allow variable length transmit
data ring buffers. The length of the buffer is read from the emulation
during initialization.
Signed-off-by: Sriram Rangarajan <rangarajans@vmware.com>
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
vmxnet3 is currently at version 2, but some command definitions from
previous vmxnet3 versions are missing. Add those definitions before
moving to version 3.
Also, introduce utility macros for vmxnet3 version comparison and update
Copyright information and Maintained by.
Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>