Граф коммитов

1072 Коммитов

Автор SHA1 Сообщение Дата
Aya Levin 0bc73ad46a net/mlx5e: Mutually exclude RX-FCS and RX-port-timestamp
Due to current HW arch limitations, RX-FCS (scattering FCS frame field
to software) and RX-port-timestamp (improved timestamp accuracy on the
receive side) can't work together.
RX-port-timestamp is not controlled by the user and it is enabled by
default when supported by the HW/FW.
This patch sets RX-port-timestamp opposite to RX-FCS configuration.

Fixes: 102722fc68 ("net/mlx5e: Add support for RXFCS feature flag")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-12 13:51:59 -07:00
Dmytro Linkin 1ae258f8b3 net/mlx5: E-switch, Introduce rate limiting groups API
Extend eswitch API with rate limiting groups:

- Define new struct mlx5_esw_rate_group that is used to hold all
  internal group data.

- Implement functions that allow creation, destruction and cleanup of
  groups.

- Assign all vports to internal unlimited zero group by default.

This commit lays the groundwork for group rate limiting by implementing
devlink_ops->rate_node_{new|del}() callbacks to support creating and
deleting groups through devlink rate node objects. APIs that allows
setting rates and adding/removing members are implemented in following
patches.

Co-developed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Huy Nguyen <huyn@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-19 21:50:40 -07:00
Jakub Kicinski f444fea789 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/ptp/Kconfig:
  55c8fca1da ("ptp_pch: Restore dependency on PCI")
  e5f3155267 ("ethernet: fix PTP_1588_CLOCK dependencies")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-19 18:09:18 -07:00
Linus Torvalds 94e95d5899 virtio,vhost,vdpa: bugfixes
Fixes in virtio,vhost,vdpa drivers.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmEZ7l0PHG1zdEByZWRo
 YXQuY29tAAoJECgfDbjSjVRpdoQH/ir3ycgBco4pgDlpo0EQzw1eqwuyT69L7fvE
 hlEZOqYOE39WvRTbFZLdrk4RQsULuu4x1vBr9AZ/qV2kHIHUlIGkuNqXnJiihsZE
 bHGzKV7XGuzwRFXzCEmzTCDo6SFICVpqN9sb+tKMEsb/qiANi22OuDuDqffHldOH
 wYmw6BaHPdj+w1+w6PYW8R/M0A9yaI7HngfBxt9OiVCYXNK2QQDiUWOsAaxmshSt
 wsTDSwz4T6rRn/chztWC4JxlpossWJ7zywJexPKW02PSBqOV+z6irPkr7Ku3MhJ7
 T2OjLzSSub1R+ikuQikZWKY67mvr45fnWFglUsPtO7H4f6biDeA=
 =06n4
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio fixes from Michael Tsirkin:
 "Fixes in virtio, vhost, and vdpa drivers"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa/mlx5: Fix queue type selection logic
  vdpa/mlx5: Avoid destroying MR on empty iotlb
  tools/virtio: fix build
  virtio_ring: pull in spinlock header
  vringh: pull in spinlock header
  virtio-blk: Add validation for block size in config space
  vringh: Use wiov->used to check for read/write desc order
  virtio_vdpa: reject invalid vq indices
  vdpa: Add documentation for vdpa_alloc_device() macro
  vDPA/ifcvf: Fix return value check for vdpa_alloc_device()
  vp_vdpa: Fix return value check for vdpa_alloc_device()
  vdpa_sim: Fix return value check for vdpa_alloc_device()
  vhost: Fix the calculation in vhost_overflow()
  vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update()
  virtio_pci: Support surprise removal of virtio pci device
  virtio: Protect vqs list access
  virtio: Keep vring_del_virtqueue() mirror of VQ create
  virtio: Improve vq->broken access to avoid any compiler optimization
2021-08-16 06:16:25 -10:00
Jakub Kicinski f4083a752a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.h
  9e26680733 ("bnxt_en: Update firmware call to retrieve TX PTP timestamp")
  9e518f2580 ("bnxt_en: 1PPS functions to configure TSIO pins")
  099fdeda65 ("bnxt_en: Event handler for PPS events")

kernel/bpf/helpers.c
include/linux/bpf-cgroup.h
  a2baf4e8bb ("bpf: Fix potentially incorrect results with bpf_get_local_storage()")
  c7603cfa04 ("bpf: Add ambient BPF runtime context stored in current")

drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
  5957cc557d ("net/mlx5: Set all field of mlx5_irq before inserting it to the xarray")
  2d0b41a376 ("net/mlx5: Refcount mlx5_irq with integer")

MAINTAINERS
  7b637cd52f ("MAINTAINERS: fix Microchip CAN BUS Analyzer Tool entry typo")
  7d901a1e87 ("net: phy: add Maxlinear GPY115/21x/24x driver")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-13 06:41:22 -07:00
Parav Pandit 48f02eef7f net/mlx5: Allocate individual capability
Currently mlx5_core_dev contains array of capabilities. It contains 19
valid capabilities of the device, 2 reserved entries and 12 holes.
Due to this for 14 unused entries, mlx5_core_dev allocates 14 * 8K = 112K
bytes of memory which is never used. Due to this mlx5_core_dev structure
size is 270Kbytes odd. This allocation further aligns to next power of 2
to 512Kbytes.

By skipping non-existent entries,
(a) 112Kbyte is saved,
(b) mlx5_core_dev reduces to 8KB with alignment
(c) 350KB saved in alignment

In future individual capability allocation can be used to skip its
allocation when such capability is disabled at the device level. This
patch prepares mlx5_core_dev to hold capability using a pointer instead
of inline array.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:33 -07:00
Parav Pandit 5958a6fad6 net/mlx5: Reorganize current and maximal capabilities to be per-type
In the current code, the current and maximal capabilities are
maintained in separate arrays which are both per type. In order to
allow the creation of such a basic structure as a dynamically
allocated array, we move curr and max fields to a unified
structure so that specific capabilities can be allocated as one unit.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:32 -07:00
Leon Romanovsky 8e792700b9 net/mlx5: Delete impossible dev->state checks
New mlx5_core device structure is allocated through devlink_alloc
with\ kzalloc and that ensures that all fields are equal to zero
and it includes ->state too.

That means that checks of that field in the mlx5_init_one() is
completely redundant, because that function is called only once
in the begging of mlx5_core_dev lifetime.

PCI:
 .probe()
  -> probe_one()
   -> mlx5_init_one()

The recovery flow can't run at that time or before it, because relevant
work initialized later in mlx5_init_once().

Such initialization flow ensures that dev->state can't be
MLX5_DEVICE_STATE_UNINITIALIZED at all, so remove such impossible
checks.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:30 -07:00
Cai Huoqing 39c538d644 net/mlx5: Fix typo in comments
Fix typo:
*vectores  ==> vectors
*realeased  ==> released
*erros  ==> errors
*namepsace  ==> namespace
*trafic  ==> traffic
*proccessed  ==> processed
*retore  ==> restore
*Currenlty  ==> Currently
*crated  ==> created
*chane  ==> change
*cannnot  ==> cannot
*usuallly  ==> usually
*failes  ==> fails
*importent  ==> important
*reenabled  ==> re-enabled
*alocation  ==> allocation
*recived  ==> received
*tanslation  ==> translation

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-11 11:14:30 -07:00
Eli Cohen 879753c816 vdpa/mlx5: Fix queue type selection logic
get_queue_type() comments that splict virtqueue is preferred, however,
the actual logic preferred packed virtqueues. Since firmware has not
supported packed virtqueues we ended up using split virtqueues as was
desired.

Since we do not advertise support for packed virtqueues, we add a check
to verify split virtqueues are indeed supported.

Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210811053759.66752-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-08-11 06:44:43 -04:00
Jakub Kicinski ebd0d30cc5 Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
Saeed Mahameed says:

====================
pull-request: mlx5-next 2020-08-9

This pulls mlx5-next branch which includes patches already reviewed on
net-next and rdma mailing lists.

1) mlx5 single E-Switch FDB for lag

2) IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq

3) Add DCS caps & fields support

[1] https://patchwork.kernel.org/project/netdevbpf/cover/20210803231959.26513-1-saeed@kernel.org/

[2] 0e3364dab7.1626609184.git.leonro@nvidia.com/

[3] 55e1d69bef.1624258894.git.leonro@nvidia.com/

* 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux:
  net/mlx5: Lag, Create shared FDB when in switchdev mode
  net/mlx5: E-Switch, add logic to enable shared FDB
  net/mlx5: Lag, move lag destruction to a workqueue
  net/mlx5: Lag, properly lock eswitch if needed
  net/mlx5: Add send to vport rules on paired device
  net/mlx5: E-Switch, Add event callback for representors
  net/mlx5e: Use shared mappings for restoring from metadata
  net/mlx5e: Add an option to create a shared mapping
  net/mlx5: E-Switch, set flow source for send to uplink rule
  RDMA/mlx5: Add shared FDB support
  {net, RDMA}/mlx5: Extend send to vport rules
  RDMA/mlx5: Fill port info based on the relevant eswitch
  net/mlx5: Lag, add initial logic for shared FDB
  net/mlx5: Return mdev from eswitch
  IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
  net/mlx5: Add DCS caps & fields support
====================

Link: https://lore.kernel.org/r/20210809202522.316930-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-10 13:19:17 -07:00
Shay Drory 563476ae0c net/mlx5: Synchronize correct IRQ when destroying CQ
The CQ destroy is performed based on the IRQ number that is stored in
cq->irqn. That number wasn't set explicitly during CQ creation and as
expected some of the API users of mlx5_core_create_cq() forgot to update
it.

This caused to wrong synchronization call of the wrong IRQ with a number
0 instead of the real one.

As a fix, set the IRQ number directly in the mlx5_core_create_cq() and
update all users accordingly.

Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Fixes: ef1659ade3 ("IB/mlx5: Add DEVX support for CQ events")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-09 20:57:00 -07:00
Mark Bloch c8e6a9e6d6 net/mlx5: E-Switch, Add event callback for representors
This callback will allow to notify representors about relevant events
when in OFFLOADS mode. In downstream patches, this will be used to notify
about PAIR/UNPAIR devcom events.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Mark Bloch 979bf468fc {net, RDMA}/mlx5: Extend send to vport rules
In shared FDB there is only one eswitch which is active and it receives
traffic from all representors and all vports in the HCA.

While the Ethernet representor will always reside on its native PF
the IB representor will not. Extend send to vport rule creation to
support such flows. Need to account for source vport that sends the
traffic (on which the representors resides) and the target eswitch
the traffic which reach.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Mark Bloch af8c0e25f2 net/mlx5: Lag, add initial logic for shared FDB
As shared FDB requires changes in two subsystems first expose the needed
core functions so the RDMA side can be changed.

mlx5_lag_is_master(): return true if a given mlx5 device is the lag master.
mlx5_lag_is_shared_fdb(): Returns true if the lag mode is shared FDB.
mlx5_lag_get_peer_mdev(): Return the peer mdev in lag.

The mentioned functions will be used by downstream patches in order
to add support for shared FDB for the RDMA side.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:24 -07:00
Mark Bloch 97a8a8c1f9 net/mlx5: Return mdev from eswitch
Export a function so users can retrieve the mellanox device that manages
the eswitch from the eswitch device.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Reviewed-by: Mark Zhang <markzhang@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-05 13:49:23 -07:00
Maor Gottlieb 371cf74e78 net/mlx5: Move TTC logic to fs_ttc
Now that TTC logic is not dependent on mlx5e structs, move it to
lib/fs_ttc.c so it could be used other part of the mlx5 driver.

Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-08-02 19:26:26 -07:00
Maxim Mikityanskiy 26ab7b3845 net/mlx5e: Block LRO if firmware asks for tunneled LRO
This commit does a cleanup in LRO configuration.

LRO is a parameter of an RQ, but its state is changed by modifying a TIR
related to the RQ.

The current status: LRO for tunneled packets is not supported in the
driver, inner TIRs may enable LRO on creation, but LRO status of inner
TIRs isn't changed in mlx5e_modify_tirs_lro(). This is inconsistent, but
as long as the firmware doesn't declare support for tunneled LRO, it
works, because the same RQs are shared between the inner and outer TIRs.

This commit does two fixes:

1. If the firmware has the tunneled LRO capability, LRO is blocked
altogether, because it's not possible to block it for inner TIRs only,
when the same RQs are shared between inner and outer TIRs, and the
driver won't be able to handle tunneled LRO traffic.

2. mlx5e_modify_tirs_lro() is patched to modify LRO state for all TIRs,
including inner ones, because all TIRs related to an RQ should agree on
their LRO state.

Fixes: 7b3722fa9e ("net/mlx5e: Support RSS for GRE tunneled packets")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-07-26 09:50:37 -07:00
Tal Gilboa 616d576934 IB/mlx5: Rename is_apu_thread_cq function to is_apu_cq
is_apu_thread_cq() used to detect CQs which are attached to APU
threads. This was extended to support other elements as well,
so the function was renamed to is_apu_cq().

c_eqn_or_apu_element was extended from 8 bits to 32 bits, which wan't
reflected when the APU support was first introduced.

Acked-by: Michael S. Tsirkin <mst@redhat.com> # vdpa
Signed-off-by: Tal Gilboa <talgi@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-07-25 11:39:04 +03:00
Lior Nahmanson 96cd2dd65b net/mlx5: Add DCS caps & fields support
This fields will be needed when adding a support for DCS offload

max_dci_stream_channels - maximum DCI stream channels supported per DCI.
max_dci_errored_streams - maximum DCI error stream channels
supported per DCI before a DCI move to error state.

Signed-off-by: Lior Nahmanson <liorna@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-07-18 13:36:13 +03:00
Eli Cohen e13cd45d35 vdpa/mlx5: Support creating resources with uid == 0
Currently all resources must be created with uid != 0 which is essential
when userspace processes are allocating virtquueue resources. Since this
is a kernel implementation, it is perfectly legal to open resources with
uid == 0.

In case firmware supports, avoid allocating user context.

Signed-off-by: Eli Cohen <elic@nvidia.com>
Link: https://lore.kernel.org/r/20210531160404.31368-1-elic@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
2021-07-03 04:50:56 -04:00
Linus Torvalds e04360a2ea RDMA v5.14 merge window Pull Request
This PR contains a replacement driver for Intel iWarp hardware. This new
 driver supports the old ethernet hardware and also newer chips that can do
 ROCE. Otherwise this contains the typical mix of patches:
 
 - Driver updates and cleanups for bnxt_re, cxgb4, mlx4, and mlx5
 
 - Many static checker driven code clean ups, including a wide refcount_t
   conversion
 
 - Several series for the hns driver, more HIP09 HW capabilities, migration
   to new HW register manipulators, and code cleanups
 
 - Minor fixes and improvements  in srp, rts, and cm
 
 - Improvements throughout for sysfs related code to use DEVICE_ATTR_*,
   make the ib_port sysfs first-class, and overall use sysfs APIs properly
 
 - Intel's new irdma driver replacing i40iw
 
 - rxe general clean ups and Memory Window support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmDcunQACgkQOG33FX4g
 mxqSBA//dsZi/UzpzgU+YqyMFmUp04wd2/iCYzOcCViNPQZCyCARbGaMXI4kMa4s
 8dM5xU76OnCuNSnXHaIwvHC3CdN9GUm08j9eWY7syvAiKtXCjzv7qmCVfBw35UyK
 IXKfXh57toTSSAIfxw8yKc97QgaDSJ2zQ34fXkoE0AvTlfyN6pHQe9ef/Ca0ejS4
 awUGYVG/oilLXrEHcSSAv5UoX6hOUje6jqqRgp5jmZTI3g7SlIPL8mWgXBkHAYmd
 kDX7lBd09CKo2bmR071/kF6xUzvbCg1tmeE6lZze7gE+aKlBkZcvCBe1RAh3sBzK
 ysLfON5GGw5qnkMaY8j5h3sgWvi3qTTEW+jCAmmVi/6z4PF47mvmVVn+/pZc3y2e
 PqH43cunhwS0KuoUJ5Sd48J/UvabrvdbCNZrjCGCpt45EF4VwKxYMh74Bf0ABEQS
 i9eKR/+wyHG6Uv1U37fIXsqa8yUttl9aV/s88s8irn4xhG8ygBLZgeVQNeGUfvdV
 1W0XLEjRmKFezC1FhiPOz7CLIgL3BfSU1V+S7p0Gvb6ijZqyZTfRUaWbaD3KJpRT
 1kwzE4qp6IbJMEqgQH/lq0xBzvzF48FPvBslX5kwlm0phQRrMCwMVIafutpu395q
 ySeStEvsTVfz/JUHL3ZaEJyTRjAvPL0lXLH80XUpgWk9GzsksOM=
 =wyqt
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This contains a replacement driver for Intel iWarp hardware. This new
  driver supports the old ethernet hardware and also newer chips that
  can do ROCE.

  Other than that, this contains the typical mix of patches:

   - Driver updates and cleanups for bnxt_re, cxgb4, mlx4, and mlx5

   - Many static checker driven code clean ups, including a wide
     refcount_t conversion

   - Several series for the hns driver, more HIP09 HW capabilities,
     migration to new HW register manipulators, and code cleanups

   - Minor fixes and improvements in srp, rts, and cm

   - Improvements throughout for sysfs related code to use
     DEVICE_ATTR_*, make the ib_port sysfs first-class, and overall use
     sysfs APIs properly

   - Intel's new irdma driver replacing i40iw

   - rxe general clean ups and Memory Window support"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (211 commits)
  RDMA/core: Always release restrack object
  RDMA/mlx5: Don't access NULL-cleared mpi pointer
  RDMA/irdma: Fix potential overflow expression in irdma_prm_get_pbles
  RDMA/irdma: Check contents of user-space irdma_mem_reg_req object
  RDMA/rxe: Missing unlock on error in get_srq_wqe()
  RDMA/cma: Fix rdma_resolve_route() memory leak
  RDMA/core/sa_query: Remove unused argument
  RDMA/cma: Fix incorrect Packet Lifetime calculation
  RDMA/cma: Protect RMW with qp_mutex
  RDMA/cma: Remove unnecessary INIT->INIT transition
  RDMA/hns: Add window selection field of congestion control
  RDMA/hfi1: Remove use of kmap()
  RDMA/irdma: Remove use of kmap()
  RDMA/bnxt_re: Fix uninitialized struct bit field rsvd1
  IB/isert: Align target max I/O size to initiator size
  RDMA/hns: Fix incorrect vlan enable bit in QPC
  MAINTAINERS: Update Broadcom RDMA maintainers
  RDMA/irdma: Use the queried port attributes
  RDMA/rxe: Fix redundant skb_put_zero
  RDMA/rxe: Fix extra copy in prepare_ack_packet
  ...
2021-07-01 14:54:03 -07:00
Yevgeny Kliteynik 1ab6dc35e9 net/mlx5: DR, Add support for flow sampler offload
Add SW steering support for sFlow / flow sampler action.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-26 00:31:15 -07:00
Jason Gunthorpe 2833c977c3 Merge branch 'mlx5_realtime_ts' into rdma.git for-next
Aharon Landau says:

====================
In case device supports only real-time timestamp, the kernel will fail to
create QP despite rdma-core requested such timestamp type.

It is because device returns free-running timestamp, and the conversion
from free-running to real-time is performed in the user space.

This series fixes it, by returning real-time timestamp.
====================

* mlx5_realtime_ts:
  RDMA/mlx5: Support real-time timestamp directly from the device
  RDMA/mlx5: Refactor get_ts_format functions to simplify code

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-22 15:08:39 -03:00
Aharon Landau 9a1ac95a59 RDMA/mlx5: Refactor get_ts_format functions to simplify code
QPC, SQC and RQC timestamp formats and capabilities are always equal
because they represent general hardware support. So instead of code
duplication, let's merge them into general enum and logic.

Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-06-22 09:35:16 +03:00
Jakub Kicinski adc2e56ebe Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Trivial conflicts in net/can/isotp.c and
tools/testing/selftests/net/mptcp/mptcp_connect.sh

scaled_ppm_to_ppb() was moved from drivers/ptp/ptp_clock.c
to include/linux/ptp_clock_kernel.h in -next so re-apply
the fix there.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-06-18 19:47:02 -07:00
Dmytro Linkin a5ae8fc905 net/mlx5e: Don't create devices during unload flow
Running devlink reload command for port in switchdev mode cause
resources to corrupt: driver can't release allocated EQ and reclaim
memory pages, because "rdma" auxiliary device had add CQs which blocks
EQ from deletion.
Erroneous sequence happens during reload-down phase, and is following:

1. detach device - suspends auxiliary devices which support it, destroys
   others. During this step "eth-rep" and "rdma-rep" are destroyed,
   "eth" - suspended.
2. disable SRIOV - moves device to legacy mode; as part of disablement -
   rescans drivers. This step adds "rdma" auxiliary device.
3. destroy EQ table - <failure>.

Driver shouldn't create any device during unload flows. To handle that
implement MLX5_PRIV_FLAGS_DETACH flag, set it on device detach and unset
on device attach. If flag is set do no-op on drivers rescan.

Fixes: a925b5e309 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus")
Signed-off-by: Dmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-16 15:36:47 -07:00
Shay Drory 3af26495a2 net/mlx5: Enlarge interrupt field in CREATE_EQ
FW is now supporting more than 256 MSI-X per PF (up to 2K).
Hence, enlarge interrupt field in CREATE_EQ to make use of the new
MSI-X's.

Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-14 20:58:00 -07:00
Leon Romanovsky e4e3f24b82 net/mlx5: Provide cpumask at EQ creation phase
The users of EQ are running their code on different CPUs and with
various affinity patterns. Move the cpumask setting close to their
actual usage.

Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-14 20:57:57 -07:00
Vlad Buslov 19e9bfa044 net/mlx5: Bridge, add offload infrastructure
Create new files bridge.{c|h} in en/rep directory that implement bridge
interaction with representor netdevices and handle required
events/notifications, bridge.{c|h} in esw directory that implement all
necessary eswitch offloading infrastructure and works on vport/eswitch
level. Provide new kconfig MLX5_BRIDGE which is automatically selected when
both kernel bridge and mlx5 eswitch configs are enabled.

Provide basic infrastructure for bridge offloads:

- struct mlx5_esw_bridge_offloads - per-eswitch bridge offload structure
that encapsulates generic bridge-offloads data (notifier blocks, ingress
flow table/group, etc.) that is created/deleted on enable/disable eswitch
offloads.

- struct mlx5_esw_bridge - per-bridge structure that encapsulates
per-bridge data (reference counter, FDB, egress flow table/group, etc.)
that is created when first eswitch represetor is attached to new bridge and
deleted when last representor is removed from the bridge as a result of
NETDEV_CHANGEUPPER event.

The bridge tables are created with new priority FDB_BR_OFFLOAD in FDB
namespace. The new priority is between tc-miss and slow path priorities.
Priority consist of two levels: the ingress table that is global per
eswitch and matches incoming packets by src_mac/vid and redirects them to
next level (egress table) that is chosen according to ingress port bridge
membership and matches on dst_mac/vid in order to redirect packet to vport
according to the following diagram:

                +
                |
      +---------v----------+
      |                    |
      |   FDB_TC_OFFLOAD   |
      |                    |
      +---------+----------+
                |
                |
      +---------v----------+
      |                    |
      |   FDB_FT_OFFLOAD   |
      |                    |
      +---------+----------+
                |
                |
      +---------v----------+
      |                    |
      |    FDB_TC_MISS     |
      |                    |
      +---------+----------+
                |
+--------------------------------------+
|               |                      |
|        +------+                      |
|        |                             |
| +------v--------+   FDB_BR_OFFLOAD   |
| | INGRESS_TABLE |                    |
| +------+---+----+                    |
|        |   |      match              |
|        |   +---------+               |
|        |             |               |    +-------+
|        |     +-------v-------+ match |    |       |
|        |     | EGRESS_TABLE  +------------> vport |
|        |     +-------+-------+       |    |       |
|        |             |               |    +-------+
|        |    miss     |               |
|        +------+------+               |
|               |                      |
+--------------------------------------+
                |
                |
      +---------v----------+
      |                    |
      |   FDB_SLOW_PATH    |
      |                    |
      +---------+----------+
                |
                v

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-09 18:36:09 -07:00
Vlad Buslov ec3be8873d net/mlx5: Create TC-miss priority and table
In order to adhere to kernel software datapath model bridge offloads must
come after TC and NF FDBs. Following patches in this series add new FDB
priority for bridge after FDB_FT_OFFLOAD. However, since netfilter offload
is implemented with unmanaged tables, its miss path is not automatically
connected to next priority and requires the code to manually connect with
slow table. To keep bridge offloads encapsulated and not mix it with
eswitch offloads, create a new FDB_TC_MISS priority between FDB_FT_OFFLOAD
and FDB_SLOW_PATH:

          +
          |
+---------v----------+
|                    |
|   FDB_TC_OFFLOAD   |
|                    |
+---------+----------+
          |
          |
          |
+---------v----------+
|                    |
|   FDB_FT_OFFLOAD   |
|                    |
+---------+----------+
          |
          |
          |
+---------v----------+
|                    |
|    FDB_TC_MISS     |
|                    |
+---------+----------+
          |
          |
          |
+---------v----------+
|                    |
|   FDB_SLOW_PATH    |
|                    |
+---------+----------+
          |
          v

Initialize the new priority with single default empty managed table and use
the table as TC/NF miss patch instead of slow table. This approach allows
bridge offloads to be created as new FDB namespace priority between
FDB_TC_MISS and FDB_SLOW_PATH without exposing its internal tables to any
other modules since miss path of managed TC-miss table is automatically
wired to next priority.

Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-09 18:36:08 -07:00
Yevgeny Kliteynik 3f3f05ab88 net/mlx5: Added new parameters to reformat context
Adding new reformat context type (INSERT_HEADER) requires adding two new
parameters to reformat context - reformat_param_0 and reformat_param_1.
As defined by HW spec, these parameters have different meaning for
different reformat context type.

The first parameter (reformat_param_0) is not new to HW spec, but it
wasn't used by any of the supported reformats. The second parameter
(reformat_param_1) is new to the HW spec - it was added to allow
supporting INSERT_HEADER.

For NSERT_HEADER, reformat_param_0 indicates the header used to
reference the location of the inserted header, and reformat_param_1
indicates the offset of the inserted header from the reference point
defined by reformat_param_0.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-09 18:36:07 -07:00
Yevgeny Kliteynik 67133eaa93 net/mlx5: mlx5_ifc support for header insert/remove
Add support for HCA caps 2 that contains capabilities for the new
insert/remove header actions.

Added the required definitions for supporting the new reformat type:
added packet reformat parameters, reformat anchors and definitions
to allow copy/set into the inserted EMD (Embedded MetaData) tag.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Jianbo Liu <jianbol@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-09 18:36:06 -07:00
Dima Chumak a3e5fd9314 net/mlx5e: Fix page reclaim for dead peer hairpin
When adding a hairpin flow, a firmware-side send queue is created for
the peer net device, which claims some host memory pages for its
internal ring buffer. If the peer net device is removed/unbound before
the hairpin flow is deleted, then the send queue is not destroyed which
leads to a stack trace on pci device remove:

[ 748.005230] mlx5_core 0000:08:00.2: wait_func:1094:(pid 12985): MANAGE_PAGES(0x108) timeout. Will cause a leak of a command resource
[ 748.005231] mlx5_core 0000:08:00.2: reclaim_pages:514:(pid 12985): failed reclaiming pages: err -110
[ 748.001835] mlx5_core 0000:08:00.2: mlx5_reclaim_root_pages:653:(pid 12985): failed reclaiming pages (-110) for func id 0x0
[ 748.002171] ------------[ cut here ]------------
[ 748.001177] FW pages counter is 4 after reclaiming all pages
[ 748.001186] WARNING: CPU: 1 PID: 12985 at drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c:685 mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core]                      [  +0.002771] Modules linked in: cls_flower mlx5_ib mlx5_core ptp pps_core act_mirred sch_ingress openvswitch nsh xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_umad ib_ipoib iw_cm ib_cm ib_uverbs ib_core overlay fuse [last unloaded: pps_core]
[ 748.007225] CPU: 1 PID: 12985 Comm: tee Not tainted 5.12.0+ #1
[ 748.001376] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 748.002315] RIP: 0010:mlx5_reclaim_startup_pages+0x34b/0x460 [mlx5_core]
[ 748.001679] Code: 28 00 00 00 0f 85 22 01 00 00 48 81 c4 b0 00 00 00 31 c0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 c7 c7 40 cc 19 a1 e8 9f 71 0e e2 <0f> 0b e9 30 ff ff ff 48 c7 c7 a0 cc 19 a1 e8 8c 71 0e e2 0f 0b e9
[ 748.003781] RSP: 0018:ffff88815220faf8 EFLAGS: 00010286
[ 748.001149] RAX: 0000000000000000 RBX: ffff8881b4900280 RCX: 0000000000000000
[ 748.001445] RDX: 0000000000000027 RSI: 0000000000000004 RDI: ffffed102a441f51
[ 748.001614] RBP: 00000000000032b9 R08: 0000000000000001 R09: ffffed1054a15ee8
[ 748.001446] R10: ffff8882a50af73b R11: ffffed1054a15ee7 R12: fffffbfff07c1e30
[ 748.001447] R13: dffffc0000000000 R14: ffff8881b492cba8 R15: 0000000000000000
[ 748.001429] FS:  00007f58bd08b580(0000) GS:ffff8882a5080000(0000) knlGS:0000000000000000
[ 748.001695] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 748.001309] CR2: 000055a026351740 CR3: 00000001d3b48006 CR4: 0000000000370ea0
[ 748.001506] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 748.001483] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 748.001654] Call Trace:
[ 748.000576]  ? mlx5_satisfy_startup_pages+0x290/0x290 [mlx5_core]
[ 748.001416]  ? mlx5_cmd_teardown_hca+0xa2/0xd0 [mlx5_core]
[ 748.001354]  ? mlx5_cmd_init_hca+0x280/0x280 [mlx5_core]
[ 748.001203]  mlx5_function_teardown+0x30/0x60 [mlx5_core]
[ 748.001275]  mlx5_uninit_one+0xa7/0xc0 [mlx5_core]
[ 748.001200]  remove_one+0x5f/0xc0 [mlx5_core]
[ 748.001075]  pci_device_remove+0x9f/0x1d0
[ 748.000833]  device_release_driver_internal+0x1e0/0x490
[ 748.001207]  unbind_store+0x19f/0x200
[ 748.000942]  ? sysfs_file_ops+0x170/0x170
[ 748.001000]  kernfs_fop_write_iter+0x2bc/0x450
[ 748.000970]  new_sync_write+0x373/0x610
[ 748.001124]  ? new_sync_read+0x600/0x600
[ 748.001057]  ? lock_acquire+0x4d6/0x700
[ 748.000908]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[ 748.001126]  ? fd_install+0x1c9/0x4d0
[ 748.000951]  vfs_write+0x4d0/0x800
[ 748.000804]  ksys_write+0xf9/0x1d0
[ 748.000868]  ? __x64_sys_read+0xb0/0xb0
[ 748.000811]  ? filp_open+0x50/0x50
[ 748.000919]  ? syscall_enter_from_user_mode+0x1d/0x50
[ 748.001223]  do_syscall_64+0x3f/0x80
[ 748.000892]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 748.001026] RIP: 0033:0x7f58bcfb22f7
[ 748.000944] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 748.003925] RSP: 002b:00007fffd7f2aaa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 748.001732] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f58bcfb22f7
[ 748.001426] RDX: 000000000000000d RSI: 00007fffd7f2abc0 RDI: 0000000000000003
[ 748.001746] RBP: 00007fffd7f2abc0 R08: 0000000000000000 R09: 0000000000000001
[ 748.001631] R10: 00000000000001b6 R11: 0000000000000246 R12: 000000000000000d
[ 748.001537] R13: 00005597ac2c24a0 R14: 000000000000000d R15: 00007f58bd084700
[ 748.001564] irq event stamp: 0
[ 748.000787] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[ 748.001399] hardirqs last disabled at (0): [<ffffffff813132cf>] copy_process+0x146f/0x5eb0
[ 748.001854] softirqs last  enabled at (0): [<ffffffff8131330e>] copy_process+0x14ae/0x5eb0
[ 748.013431] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 748.001492] ---[ end trace a6fabd773d1c51ae ]---

Fix by destroying the send queue of a hairpin peer net device that is
being removed/unbound, which returns the allocated ring buffer pages to
the host.

Fixes: 4d8fcf216c ("net/mlx5e: Avoid unbounded peer devices when unpairing TC hairpin rules")
Signed-off-by: Dima Chumak <dchumak@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-09 17:20:03 -07:00
David S. Miller 126285651b Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net
Bug fixes overlapping feature additions and refactoring, mostly.

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-07 13:01:52 -07:00
Yevgeny Kliteynik 216214c64a net/mlx5: DR, Create multi-destination flow table with level less than 64
Flow table that contains flow pointing to multiple flow tables or multiple
TIRs must have a level lower than 64. In our case it applies to muli-
destination flow table.
Fix the level of the created table to comply with HW Spec definitions, and
still make sure that its level lower than SW-owned tables, so that it
would be possible to point from the multi-destination FW table to SW
tables.

Fixes: 34583beea4 ("net/mlx5: DR, Create multi-destination table for SW-steering use")
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: Alex Vesker <valex@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-01 18:30:21 -07:00
Jakub Kicinski af9207adb6 mlx5-updates-2021-05-26
Misc update for mlx5 driver,
 
 1) Clean up patches for lag and SF
 
 2) Reserve bit 31 in steering register C1 for IPSec offload usage
 
 3) Move steering tables pool logic into the steering core and
   increase the maximum table size to 2G entries when software steering
   is enabled.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmCv6vAACgkQSD+KveBX
 +j6qnAgAz0eKWKCsFCqlXGIgF1cg3FrGR5W2Zi5euriHhHwNqnZof3AIMkzcXjLL
 wBlPjWk3YLfBaBNPTziz6EJuGl1vZZxuSdc7bqsNnl0srujRtQFu3JyerdgXEXNL
 W2NxjSTiVwu8lq2qlYauQvcE0v+JrB/LMe9tvq1UQ2v9FtBMMhs9hGUSCro2huwj
 XYF0m0ve89+mYlm6/m0SIUpPVdMiIhm4+coO1wibk7+8jn6+ZT6EJbbZvjc9eQg7
 ZKr8f/TpfmvHToG8LPOc6HqHzRiHlp3Yzsft+xm54r082n4F/noGhL+Hqvvj1aTj
 C6Ip5N7VkzT+erMLMrjIbrmEP94cyQ==
 =torZ
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2021-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2021-05-26

Misc update for mlx5 driver,

1) Clean up patches for lag and SF

2) Reserve bit 31 in steering register C1 for IPSec offload usage

3) Move steering tables pool logic into the steering core and
  increase the maximum table size to 2G entries when software steering
  is enabled.

* tag 'mlx5-updates-2021-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: Fix lag port remapping logic
  net/mlx5: Use boolean arithmetic to evaluate roce_lag
  net/mlx5: Remove unnecessary spin lock protection
  net/mlx5: Cap the maximum flow group size to 16M entries
  net/mlx5: DR, Set max table size to 2G entries
  net/mlx5: Move chains ft pool to be used by all firmware steering
  net/mlx5: Move table size calculation to steering cmd layer
  net/mlx5: Add case for FS_FT_NIC_TX FT in MLX5_CAP_FLOWTABLE_TYPE
  net/mlx5: DR, Remove unused field of send_ring struct
  net/mlx5e: RX, Remove unnecessary check in RX CQE compression handling
  net/mlx5e: IPsec/rep_tc: Fix rep_tc_update_skb drops IPsec packet
  net/mlx5e: TC: Reserved bit 31 of REG_C1 for IPsec offload
  net/mlx5e: TC: Use bit counts for register mapping
  net/mlx5: CT: Avoid reusing modify header context for natted entries
  net/mlx5e: CT, Remove newline from ct_dbg call
====================

Link: https://lore.kernel.org/r/20210527185624.694304-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-05-27 17:14:23 -07:00
Paul Blakey 4a98544d18 net/mlx5: Move chains ft pool to be used by all firmware steering
Firmware FT pool is per device, but the software tracking of this pool
only services fs_chains users, and if another layer takes a flow table,
the pool will not be updated, and fs_chains will fail creating a flow
table, with no recovery till the flow table is returned.

Move FT pool to be global per device, and stored at the cmd level,
so all layers can use it.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-05-27 11:54:38 -07:00
Huy Nguyen b973cf3245 net/mlx5e: TC: Reserved bit 31 of REG_C1 for IPsec offload
Currently ASAP features fully utilize all the bits of the CQE's flow tag
and ft_metadata field. The flow tag field cannot be used because the
flow table tagging in FTE does not allow partial write.

We agree to reserve bit 31 of CQE's ft_metadata for IPsec to avoid
ASAP CT from dropping IPsec offloaded packet

Here is the new bit layout of REG_C1. Tunnel option id is reduced to
11 bits:
< IPSEC MARKER (1) | ESW_TUN_ID(12) | ESW_TUN_OPTS(11) | ESW_ZONE_ID(8) >

Signed-off-by: Huy Nguyen <huyn@nvidia.com>
Signed-off-by: Raed Salem <raeds@nvidia.com>
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Paul Blakey <paulb@nvidia.com>
2021-05-27 11:54:36 -07:00
Eli Cohen 7c9f131f36 {net,vdpa}/mlx5: Configure interface MAC into mpfs L2 table
net/mlx5: Expose MPFS configuration API

MPFS is the multi physical function switch that bridges traffic between
the physical port and any physical functions associated with it. The
driver is required to add or remove MAC entries to properly forward
incoming traffic to the correct physical function.

We export the API to control MPFS so that other drivers, such as
mlx5_vdpa are able to add MAC addresses of their network interfaces.

The MAC address of the vdpa interface must be configured into the MPFS L2
address. Failing to do so could cause, in some NIC configurations, failure
to forward packets to the vdpa network device instance.

Fix this by adding calls to update the MPFS table.

CC: <mst@redhat.com>
CC: <jasowang@redhat.com>
CC: <virtualization@lists.linux-foundation.org>
Fixes: 1a86b377aa ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Signed-off-by: Eli Cohen <elic@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-05-18 23:01:48 -07:00
Maor Gottlieb 3410fbcd47 {net, RDMA}/mlx5: Fix override of log_max_qp by other device
mlx5_core_dev holds pointer to static profile, hence when the
log_max_qp of the profile is override by some device, then it
effect all other mlx5 devices that share the same profile.
Fix it by having a profile instance for every mlx5 device.

Fixes: 883371c453 ("net/mlx5: Check FW limitations on log_max_qp before setting it")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-05-18 23:01:19 -07:00
Linus Torvalds f34b2cf178 RDMA merge window pull request
This is significantly bug fixes and general cleanups. The noteworthy new
 features are fairly small:
 
 - XRC support for HNS and improves RQ operations
 
 - Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw and
   qib
 
 - Quite a few general cleanups on spelling, error handling, static checker
   detections, etc
 
 - Increase the number of device ports supported beyond 255. High port
   count software switches now exist
 
 - Several bug fixes for rtrs
 
 - mlx5 Device Memory support for host controlled atomics
 
 - Report SRQ tables through to rdma-tool
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmCMMHEACgkQOG33FX4g
 mxri3Q//RAgIExCGHebQ9xkptZHVyTLLJMpiMl2cqk3ZVRdDZ7QdiQjIqY2KqlUK
 nxBj7EXJeX6rV5a1xqCcOO1gBetB28TSwnCNE2ZqrXP5B59ISW8D052IWza3UkUz
 WmHLARxHQlyKBWA4+ZAgfoUGL0NmWA8QPf56t/RK/3/OsuYnGzcnWmmFbt8XKFcH
 NtO3KC45mKWDqqG0A0XRrLbEQz/ElO3OuPBqlBKgB3ZgGPzgsOUTOGkm1tCcZ89L
 /pvZGB7SklKZdCX8TxdpVGd9h0zHl8pqh1yEzvTA1ypNAYSUId2mvZXluU8J5yJl
 FLk7E1IxE5050FNEc7T5uZdUVntulYiqL2558coRI34l5w26pKGjIMxw/nTB8hg8
 4ZfBtKVemIG6yzW5Up6iBpK7qWYpvLWVShwYAWhbNsjN7JGzJuh1gJnjbmYgyz2P
 RTMU9wjFPLL2wZxg4LDHACVJNBb82j6KKuE+kZWpk11ro7INw9+7YwRuTo7/ezxC
 BwXKu8wF4igwSigV55jM+WnGXLhxdC3qmx/2cbtWyLM/PzdRL96tM0RWW5v8/Nv7
 teFhkt+f3RVqcfYH5K1qCXy3UFrxG6bxFSvcHHSBx2bdIrqhuTY5FqszAYImeW2j
 iHoyIsuSuGu79HQgOzAQZsEyksWi6OYDvA9Q9VBoPP4bJ3DOAa4=
 =vsXA
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This is significantly bug fixes and general cleanups. The noteworthy
  new features are fairly small:

   - XRC support for HNS and improves RQ operations

   - Bug fixes and updates for hns, mlx5, bnxt_re, hfi1, i40iw, rxe, siw
     and qib

   - Quite a few general cleanups on spelling, error handling, static
     checker detections, etc

   - Increase the number of device ports supported beyond 255. High port
     count software switches now exist

   - Several bug fixes for rtrs

   - mlx5 Device Memory support for host controlled atomics

   - Report SRQ tables through to rdma-tool"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (145 commits)
  IB/qib: Remove redundant assignment to ret
  RDMA/nldev: Add copy-on-fork attribute to get sys command
  RDMA/bnxt_re: Fix a double free in bnxt_qplib_alloc_res
  RDMA/siw: Fix a use after free in siw_alloc_mr
  IB/hfi1: Remove redundant variable rcd
  RDMA/nldev: Add QP numbers to SRQ information
  RDMA/nldev: Return SRQ information
  RDMA/restrack: Add support to get resource tracking for SRQ
  RDMA/nldev: Return context information
  RDMA/core: Add CM to restrack after successful attachment to a device
  RDMA/cma: Skip device which doesn't support CM
  RDMA/rxe: Fix a bug in rxe_fill_ip_info()
  RDMA/mlx5: Expose private query port
  RDMA/mlx4: Remove an unused variable
  RDMA/mlx5: Fix type assignment for ICM DM
  IB/mlx5: Set right RoCE l3 type and roce version while deleting GID
  RDMA/i40iw: Fix error unwinding when i40iw_hmc_sd_one fails
  RDMA/cxgb4: add missing qpid increment
  IB/ipoib: Remove unnecessary struct declaration
  RDMA/bnxt_re: Get rid of custom module reference counting
  ...
2021-05-01 09:15:05 -07:00
Parav Pandit 9f8c7100c8 net/mlx5: E-Switch, Prepare to return total vports from eswitch struct
Total vports are already stored during eswitch initialization. Instead
of calculating everytime, read directly from eswitch.

Additionally, host PF's SF vport information is available using
QUERY_HCA_CAP command. It is not available through HCA_CAP of the
eswitch manager PF.
Hence, this patch prepares the return total eswitch vport count from the
existing eswitch struct.

This further helps to keep eswitch port counting macros and logic within
eswitch.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24 00:58:43 -07:00
Parav Pandit 06ec5acc77 net/mlx5: E-Switch, Return eswitch max ports when eswitch is supported
mlx5_eswitch_get_total_vports() doesn't honor MLX5_ESWICH Kconfig flag.

When MLX5_ESWITCH is disabled, FS layer continues to initialize eswitch
specific ACL namespaces.
Instead, start honoring MLX5_ESWITCH flag and perform vport specific
initialization only when vport count is non zero.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Roi Dayan <roid@nvidia.com>
Reviewed-by: Vu Pham <vuhuong@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-24 00:58:40 -07:00
Yevgeny Kliteynik aeacb52a8d net/mlx5: DR, Add support for isolate_vl_tc QP
When using SW steering, rule insertion rate depends on the RDMA RC QP
performance used for writing to the ICM. During stress this QP is competing
on the HW resources with all the other QPs that are used to send data.
To protect SW steering QP's performance in such cases, we set this QP to
use isolated VL. The VL number is reserved by FW and is not exposed to the
driver.
Support for this QP on isolated VL exists only when both force-loopback and
isolate_vl_tc capabilities are set.

Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:46 -07:00
Yevgeny Kliteynik 7304d603a5 net/mlx5: DR, Add support for force-loopback QP
When supported by the device, SW steering RoCE RC QP that is used to
write/read to/from ICM will be created with force-loopback attribute.
Such QP doesn't require GID index upon creation.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:43 -07:00
Yevgeny Kliteynik 704cfecdd0 net/mlx5: mlx5_ifc updates for flex parser
Added the required definitions for supporting more protocols by flex parsers
(GTP-U, Geneve TLV options), and for using the right flex parser that was
configured for this protocol.

Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:22 -07:00
Tariq Toukan 6980ffa0c5 net/mlx5e: RX, Add checks for calculated Striding RQ attributes
Striding RQ attributes below are mutually dependent. An unaware
change to one might take the others out of the valid range derived
by the HW caps:
- The MPWQE size in bytes
- The number of strides in a MPWQE
- The stride size

Add checks to verify they are valid and comply to the HW spec
and SW assumptions/requirements.
This is not a fix, no particular issue exists today.

Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-19 20:17:09 -07:00
Moshe Tal 36830159ac net/mlx5: Add register layout to support extended link state
Add needed structure layouts and defines for pddr register
(Port Diagnostics Database Register) and the troublshooting page.

This will be used to get extended link state from the monitor opcode
bits.

Signed-off-by: Moshe Tal <moshet@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-16 11:48:25 -07:00
Parav Pandit 6308a5f06b net/mlx5: E-Switch, Make vport number u16
Vport number is 16-bit field in hardware. Make it u16.

Move location of vport in the structure so that it reduces a hole
in the structure.

Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-04-14 11:02:32 -07:00