Previously, VF updates its link status every second by send query command
to PF in periodic service task. If link stats of PF is changed, VF may
need at most one second to update its link status.
To reduce delay of link status between PF and VFs, PF actively push its
link status to VFs when its link status is updated. And to let VF know
PF supports this new feature, the link status changed mailbox command
adds one bit to indicate it.
Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Atheros AR8031 and AR8033 expose different registers for SGMII/Fiber
as well as the copper side of the PHY depending on the BT_BX_REG_SEL bit
in the chip configure register.
The driver assumes the copper side is selected on probe, but this might
not be the case depending which page was last selected by the
bootloader. Notably, Ubiquiti UniFi bootloaders show this behavior.
Select the copper page when probing to circumvent this.
Signed-off-by: David Bauer <mail@david-bauer.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
100GbE Intel Wired LAN Driver Updates 2021-04-14
This series contains updates to ice driver only.
Bruce changes and removes open coded values to instead use existing
kernel defines and suppresses false cppcheck issues.
Ani adds new VSI states to track netdev allocation and registration. He
also removes leading underscores in the ice_pf_state enum.
Jesse refactors ITR by introducing helpers to reduce duplicated code and
structures to simplify checking of ITR mode. He also triggers a software
interrupt when exiting napi poll or busy-poll to ensure all work is
processed. Modifies /proc/iomem to display driver name instead of PCI
address. He also changes the checks of vsi->type to use a local variable
in ice_vsi_rebuild() and removes an unneeded struct member.
Jake replaces the driver's adaptive interrupt moderation algorithm to
use the kernel's DIM library implementation.
Scott reworks module reads to reduce the number of reads needed and
remove excessive increment of QSFP page.
Brett sets the vsi->vf_id to invalid for non-VF VSIs.
Paul removes the return value from ice_vsi_manage_rss_lut() as it's not
communicating anything critical. He also reduces the scope of a
variable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The scope of this variable can be reduced so do that.
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
We were saving the return value from ice_vsi_manage_rss_lut(), but
the errors from that function are not critical so change it to
return void and remove the code that saved the value.
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Silence false errors, warnings and style issues reported by cppcheck.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Currently the vsi->vf_id is set only for ICE_VSI_VF and it's left as 0
for all other VSI types. This is confusing and could be problematic
since 0 is a valid vf_id. Fix this by always setting non VF VSI types to
ICE_INVAL_VFID.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The only time you can ever have a rq_last_status is if
a firmware event was somehow reporting a status on the receive
queue, which are generally firmware initiated events or
mailbox messages from a VF. Mostly this struct member was unused.
Fix this problem by still printing the value of the field in a debug
print, but don't store the value forever in a struct, potentially
creating opportunities for callers to use the wrong struct member.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Do a minor refactor on ice_vsi_rebuild to use a local
variable to store vsi->type.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The driver previously printed it's PCI address in
the name field for the pci resource, which when displayed
via /proc/iomem, would print the same thing twice.
It's more useful for debugging to see the driver name, as
most other modules do.
Here's a diff of before and after this change:
99100000-991fffff : 0000:3b:00.1
9a000000-a04fffff : PCI Bus 0000:3b
9a000000-9bffffff : 0000:3b:00.1
- 9a000000-9bffffff : 0000:3b:00.1
+ 9a000000-9bffffff : ice
9c000000-9dffffff : 0000:3b:00.0
- 9c000000-9dffffff : 0000:3b:00.0
+ 9c000000-9dffffff : ice
9e000000-9effffff : 0000:3b:00.1
9f000000-9fffffff : 0000:3b:00.0
a0000000-a000ffff : 0000:3b:00.1
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There was an excessive increment of the QSFP page, which is
now fixed. Additionally, this new update now reads 8 bytes
at a time and will retry each request if the module/bus is
busy.
Also, prevent reading from upper pages if module does not
support those pages.
Signed-off-by: Scott W Taylor <scott.w.taylor@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Use a dedicated bitfield in order to both increase
the amount of checking around the length of ITR writes
as well as simplify the checks of dynamic mode.
Basically unpack the "high bit means dynamic" logic
into bitfields.
Also, remove some unused ITR defines.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The driver would occasionally miss that there were outstanding
descriptors to clean when exiting busy/napi poll. This issue has
been in the code since the introduction of the ice driver.
Attempt to "catch" any remaining work by triggering a software
interrupt when exiting napi poll or busy-poll. This will not
cause extra interrupts in the case of normal execution.
This issue was found when running sfnt-pingpong, with busy
poll enabled, and typically with larger I/O sizes like > 8192,
the program would occasionally report > 1 second maximums
to complete a ping pong.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The ice driver has support for adaptive interrupt moderation, an
algorithm for tuning the interrupt rate dynamically. This algorithm
is based on various assumptions about ring size, socket buffer size,
link speed, SKB overhead, ethernet frame overhead and more.
The Linux kernel has support for a dynamic interrupt moderation
algorithm known as "dimlib". Replace the custom driver-specific
implementation of dynamic interrupt moderation with the kernel's
algorithm.
The Intel hardware has a different hardware implementation than the
originators of the dimlib code had to work with, which requires the
driver to use a slightly different set of inputs for the actual
moderation values, while getting all the advice from dimlib of
better/worse, shift left or right.
The change made for this implementation is to use a pair of values
for each of the 5 "slots" that the dimlib moderation expects, and
the driver will program those pairs when dimlib recommends a slot to
use. The currently implementation uses two tables, one for receive
and one for transmit, and the pairs of values in each slot set the
maximum delay of an interrupt and a maximum number of interrupts per
second (both expressed in microseconds).
There are two separate kinds of bugs fixed by using DIMLIB, one is
UDP single stream send was too slow, and the other is that 8K
ping-pong was going to the most aggressive moderation and has much
too high latency.
The overall result of using DIMLIB is that we meet or exceed our
performance expectations set based on the old algorithm.
Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Introduce several new helpers for writing ITR and GLINT_RATE
registers, and refactor the code calling them. This resulted
in removal of several duplicate functions and rolled a bunch
of simple code back into the calling routines.
In particular this removes some code that was doing both
a store and a set in a helper function, which seems better
done as separate tasks in the caller (and generally takes
less lines of code even with a tiny bit of repetition).
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add two new VSI states, one to track if a netdev for the VSI has been
allocated and the other to track if the netdev has been registered.
Call unregister_netdev/free_netdev only when the corresponding state
bits are set.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Remove the leading underscores in enum ice_pf_state. This is not really
communicating anything and is unnecessary. No functional change.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The well-known IANA protocol port 3260 (iscsi-target 0x0cbc) and the
ether-types 0x8906 (ETH_P_FCOE) and 0x8914 (ETH_P_FIP) are already defined
in kernel header files. Use those definitions instead of open-coding the
same.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
-----BEGIN PGP SIGNATURE-----
iQFHBAABCgAxFiEEK3kIWJt9yTYMP3ehqclaivrt76kFAmB25i8THG1rbEBwZW5n
dXRyb25peC5kZQAKCRCpyVqK+u3vqY30CAC27BCWO82Gs3H55ygl9aahnQQHGbVu
/dgZ3YwvpwmKbLSOwE4n9TgnidGfZmyVXWVbA/XKnc4OBjCpwV5Y/VbNWhTdPDjh
4Z9VzXWDbFeq+jzrJW8KYXfJ0552oxcTr7BkGmdcVDLqG/g0/iPqyECSVGrmc9A5
770G8RRiJo8hkUFjcQ8NnkiLL+bWnggWNuFeGzo4AwqJjiJAzpHPk3dpr1rS11BP
xKW/QdMewZyXhlTvdlI/RMNODjehf8TNeuIsWP6zqKnyFpQad05ltuN+Ti2EhJn0
LHj9W/JB5wYh/Y5iQZq7JYW2aR41j8O+EvLMc4wV34dDsT880TJJGqqy
=Cmp9
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-5.13-20210414' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2021-04-14
this is a pull request of a single patch for net-next/master.
Vincent Mailhol's patch fixes a NULL pointer dereference when handling
error frames in the etas_es58x driver, which has been added in the
previous PR.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
af_packet fanout uses RCU rules to ensure f->arr elements
are not dismantled before RCU grace period.
However, it lacks rcu accessors to make sure KCSAN and other tools
wont detect data races. Stupid compilers could also play games.
Fixes: dc99f60069 ("packet: Add fanout support.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: "Gong, Sishuai" <sishuai@purdue.edu>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some Ethernet switches might only be able to support disabling multicast
snooping globally, which is an issue for example when several bridges
span the same physical device and request contradictory settings.
Propagate the return value of br_mc_disabled_update() such that this
limitation is transmitted correctly to user-space.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx5 core and netdev driver updates
1) E-Switch updates from Parav,
1.1) Devlink parameter to control mlx5 metadata enablement for E-Switch
1.2) Trivial cleanups for E-Switch code
1.3) Dynamically allocate vport steering namespaces only when required
2) From Jianbo, Use variably sized data structures for Software steering
3) Several minor cleanups
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmB3LmAACgkQSD+KveBX
+j6pfAf+MX75+oMpkrSdyLu6ZXIlc4aFWzJmbzngZud+Du1OrUZY3GSwT65Lq6rV
ZJ7YAuqwbRB3JzxcWO4u+yEhLoWaY/G/Hyns6OKvIeXZzKhBAQTLU5GrYppVkS9f
1xTpbDbaOlQ0qi63NfQgTEK+XKzP8EFwt53UXZsSmq7tTzZJ85bLRRdGU5HeYBgD
SS6q+7jdZZ8PCClpfK8qZ7L6JlEXPG91HKpRMWGz2FHZhN7tFRk5PGDiF3hOpXOn
0fPlg4dQ6u0/fFwOoSkUoSpzR5tSm1mjByTT0MYo/sQx+lQsFAJfXkDiHNdjKBHF
3wLpMzOkhay1iJvYYeK887Kts14LsQ==
=sgu9
-----END PGP SIGNATURE-----
Merge tag 'mlx5-updates-2021-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2021-04-13
mlx5 core and netdev driver updates
1) E-Switch updates from Parav,
1.1) Devlink parameter to control mlx5 metadata enablement for E-Switch
1.2) Trivial cleanups for E-Switch code
1.3) Dynamically allocate vport steering namespaces only when required
2) From Jianbo, Use variably sized data structures for Software steering
3) Several minor cleanups
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Normally, the bootloader will already initialize the MAC address
registers of the ENETC and the driver will just use them or generate a
random one, if it is not initialized.
Add a new way to provide the MAC address: via device tree. Besides the
usual 'mac-address' property, there is also the possibility to fetch it
via a NVMEM provider. The sl28 board stores the MAC address in the SPI
NOR flash OTP region. Having this will allow linux to fetch the MAC
address from there without being dependent on the bootloader.
No in-tree boards have the device tree properties set, thus for these,
this is a no-op.
Signed-off-by: Michael Walle <michael@walle.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following coccicheck warning:
./drivers/net/ethernet/sfc/enum.h:80:7-28: duplicated argument to |
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the commit 1ddc3229ad ("skbuff: remove some unnecessary operation
in skb_segment_list()") introduces an issue very similar to the
one already fixed by commit 53475c5dd8 ("net: fix use-after-free when
UDP GRO with shared fraglist").
If the GSO skb goes though skb_clone() and pskb_expand_head() before
entering skb_segment_list(), the latter will unshare the frag_list
skbs and will release the old list. With the reverted commit in place,
when skb_segment_list() completes, skb->next points to the just
released list, and later on the kernel will hit UaF.
Note that since commit e0e3070a9b ("udp: properly complete L4 GRO
over UDP tunnel packet") the critical scenario can be reproduced also
receiving UDP over vxlan traffic with:
NIC (NETIF_F_GRO_FRAGLIST enabled) -> vxlan -> UDP sink
Attaching a packet socket to the NIC will cause skb_clone() and the
tunnel decapsulation will call pskb_expand_head().
Fixes: 1ddc3229ad ("skbuff: remove some unnecessary operation in skb_segment_list()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following clang warning:
drivers/atm/idt77252.c:1787:1: warning: unused function
'idt77252_fbq_level' [-Wunused-function].
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2021-04-14
Not much this time:
1) Simplification of some variable calculations in esp4 and esp6.
From Jiapeng Chong and Junlin Yang.
2) Fix a clang Wformat warning in esp6 and ah6.
From Arnd Bergmann.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for the [g|s]et_pauseparam ethtool ops. It considers
that the chip doesn't support pause frame use in jumbo mode.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tony Nguyen says:
====================
10GbE Intel Wired LAN Driver Updates 2021-04-13
This series contains updates to ixgbe and ixgbevf driver.
Jostar Yang adds support for BCM54616s PHY for ixgbe.
Chen Lin removes an unused function pointer for ixgbe and ixgbevf.
Bhaskar Chowdhury fixes a typo in ixgbe.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The Synopsis MAC controller supports auxiliary snapshot feature that
allows user to store a snapshot of the system time based on an external
event.
This patch add supports to the above mentioned feature. Users will be
able to triggered capturing the time snapshot from user-space using
application such as testptp or any other applications that uses the
PTP_EXTTS_REQUEST ioctl request.
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Tan Tee Min <tee.min.tan@intel.com>
Co-developed-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Bornyakov says:
====================
net: phy: marvell-88x2222: a couple of improvements
First, there are some SFP modules that only uses RX_LOS for link
indication. Add check that link is operational before actual read of
line-side status.
Second, it is invalid to set 10G speed without autonegotiation,
according to phy_ethtool_ksettings_set(). Implement switching between
10GBase-R and 1000Base-X/SGMII if autonegotiation can't complete but
there is signal in line.
Changelog:
v1 -> v2:
* make checking that link is operational more friendly for
trancievers without SFP cages.
* split swapping 1G/10G modes into non-functional and functional
commits for the sake of easier review.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Setting 10G without autonegotiation is invalid according to
phy_ethtool_ksettings_set(). Thus, we need to set it during
autonegotiation.
If 1G autonegotiation can't complete for quite a time, but there is
signal in line, switch line interface type to 10GBase-R, if supported,
in hope for link to be established.
And vice versa. If 10GBase-R link can't be established for quite a time,
and autonegotiation is enabled, and there is signal in line, switch line
interface type to appropriate 1G mode, i.e. 1000Base-X or SGMII, if
supported.
Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
No functional changes, just move read link status routines below
autonegotiation configuration to make future functional changes more
distinct.
Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some SFP modules uses RX_LOS for link indication. In such cases link
will be always up, even without cable connected. RX_LOS changes will
trigger link_up()/link_down() upstream operations. Thus, check that SFP
link is operational before actual read link status.
If there is no SFP cage connected to the tranciever, check only PMD
Recieve Signal Detect register.
Signed-off-by: Ivan Bornyakov <i.bornyakov@metrotek.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow to create an RQ which is not registered as an XDP RQ. For example:
the trap-RQ doesn't register as an XDP RQ.
Fixes: 869c5f9262 ("net/mlx5e: Generalize open RQ")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There should be no spaces at the start of the line.
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
void function return statements are not generally useful.
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
There should be a blank lines after declarations.
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
The bit-wise and of the action field with MLX5_ACCEL_ESP_ACTION_DECRYPT
is incorrect as MLX5_ACCEL_ESP_ACTION_DECRYPT is zero and not intended
to be a bit-flag. Fix this by using the == operator as was originally
intended.
Addresses-Coverity: ("Logically dead code")
Fixes: 7dfee4b1d7 ("net/mlx5: IPsec, Refactor SA handle creation and destruction")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
mlx5dr_action is a generally used data structure, and there is an
union for different types of actions in it. The size of mlx5dr_action
is about 72 bytes, but for those actions with fewer fields, most of
the allocated memory is wasted.
Remove this union, and mlx5dr_action becomes a generic action header.
Then actions are dynamically allocated with needed memory, the data
for each action is stored right after the header.
Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
SF's hardware function id is already stored in mlx5_sf. Reuse it,
instead of querying the hw table.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
At many places in the code, device pointer is directly available. Make
use of it, instead of accessing it from the table.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently eswitch flow steering (FS) namespace of vport's ingress and
egress ACL are enabled when FS layer is initialized. This is done even
when eswitch is diabled. This demands that total eswitch ports to be
known to FS layer without eswitch in use.
Given the FS core is not dependent on eswitch, make namespace init and
cleanup routines as helper routines to be invoked only when eswitch is
needed.
With this change, ingress and egress ACL namespaces are created only
when eswitch legacy/offloads mode is enabled.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently eswitch offers two modes. Legacy and offloads.
Offloads code is already in its own file eswitch_offloads.c
However eswitch.c contains the eswitch legacy code and common
infrastructure code.
To enable future extensions and to better manage generic common eswitch
infrastructure code, move the legacy code to its own legacy.c file.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Convert ESW_ALLOWED macro to a helper routine so that it can be used in
other eswitch files.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Make cleanup sequence mirror of init sequence for cleaning up reps
and freeing vports.
Also when reps initialization fails, there is no need to perform reps
cleanup.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Vport number is 16-bit field in hardware. Make it u16.
Move location of vport in the structure so that it reduces a hole
in the structure.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
With vhca events, SF state is queried through the VHCA events. Device no
longer expects SF bitmap in the query eswitch functions command.
Hence, remove it to simplify the code.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Currently each packet inserted in eswitch is tagged with a internal
metadata to indicate source vport. Metadata tagging is not always
needed. Metadata insertion is needed for multi-port RoCE, failover
between representors and stacked devices. In many other cases,
metadata enablement is not needed.
Metadata insertion slows down the packet processing rate of the E-switch
when it is in switchdev mode.
Below table show performance gain with metadata disabled for VXLAN
offload rules in both SMFS and DMFS steering mode on ConnectX-5 device.
----------------------------------------------
| steering | metadata | pkt size | rx pps |
| mode | | | (million) |
----------------------------------------------
| smfs | disabled | 128Bytes | 42 |
----------------------------------------------
| smfs | enabled | 128Bytes | 36 |
----------------------------------------------
| dmfs | disabled | 128Bytes | 42 |
----------------------------------------------
| dmfs | enabled | 128Bytes | 36 |
----------------------------------------------
Hence, allow user to disable metadata using driver specific devlink
parameter. Metadata setting of the eswitch is applicable only for the
switchdev mode.
Example to show and disable metadata before changing eswitch mode:
$ devlink dev param show pci/0000:06:00.0 name esw_port_metadata
pci/0000:06:00.0:
name esw_port_metadata type driver-specific
values:
cmode runtime value true
$ devlink dev param set pci/0000:06:00.0 \
name esw_port_metadata value false cmode runtime
$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
---
changelog:
v1->v2:
- added performance numbers in commit log
- updated commit log and documentation for switchdev mode
- added explicit note on when user can disable metadata in
documentation