Граф коммитов

737377 Коммитов

Автор SHA1 Сообщение Дата
Parav Pandit 218b9e3eb8 RDMA/cma: Move rdma_cm_state to cma_priv.h
rdma_cm_state enum is internal to rdma_cm kernel module.
It is not required to expose state enums to ULP modules.
So lets keep its scope limited to rdma_cm module in cma_priv.h file.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:54:21 -06:00
Parav Pandit fd59015d68 IB/addr: Constify dst_entry pointer
Make dst_entry pointer as const struct dst_entry* to improve code
readablity to make sure that dst structure fields are not modified by
various functions which are using it.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:54:20 -06:00
Jason Gunthorpe 6f57c933a4 RDMA: Use u64_to_user_ptr everywhere
This is already used in many places, get the rest of them too, only
to make the code a bit clearer & simpler.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:42:29 -06:00
Leon Romanovsky 5b2cc79de8 RDMA/nldev: Provide netdevice name and index
Export the net device name and index to easily find connection
between IB devices and relevant net devices.

We also updated the comment regarding the devices without FW.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:32:40 -06:00
Zhu Yanjun 99dae69025 IB/rxe: optimize mcast recv process
In mcast recv process, the function skb_clone is used. In fact,
the refcount can be increased to replace cloning a new skb since
the original skb will not be modified before it is freed.

This can make the performance better and save the memory.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:25:22 -06:00
Colin Ian King a343e3f89e qedr: Fix spelling mistake: "hanlde" -> "handle"
Trivial fix to spelling mistake in DP_ERR message text

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-29 13:19:23 -06:00
Steve Wise 2253fc0caa RDMA/CMA: Add rdma_port_space to UAPI
Since the rdma_port_space enum is being passed between user and kernel for
user cm_id setup, we need it in a UAPI header.  So add it to
rdma_user_cm.h.

This also fixes the cm_id restrack changes which pass up the port space
value via the RDMA_NLDEV_ATTR_RES_PS attribute.

Fixes: 00313983cd ("RDMA/nldev: provide detailed CM_ID information")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-28 20:50:45 -06:00
Steve Wise 1b90d3002e RDMA/CMA: remove RDMA_PS_SDP
This is no longer supported, so remove it.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-28 16:23:01 -06:00
Parav Pandit 190fb9c4d1 IB/core: Refer to RoCE port property to decide building cache
IB core maintains the GID cache entries for the GID table.
This cache table has to be maintained regardless of HCA's
support of GID table.
For IB and iWarp ports, cache is created by querying the HCA.
For RoCE cache is created based on netdev events.

Therefore just refer to the RoCE port property of the {device, port} to
decide whether to build cache by querying HCA or from netdev events.
There is no need to check if HCA support GID table or not.

ib_cache_update() referred to RoCE attribute before validating
port. Though in all current callers port is valid, it is incorrect
to query RoCE port property before validating the port. Therefore,
rdma_protocol_roce() check is done after rdma_is_port_valid() verifies
that port is valid.

Fixes: 115b68aa6e ("IB/ocrdma: Removed GID add/del null routines")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 16:22:12 -06:00
Parav Pandit 22d24f75a1 IB/core: Search GID only for IB link layer
Even though API is only used by IPoIB driver, its incorrect to refer
RoCE GID table property to search for GID.

Look for only IB link layer to search for the GID.

Fixes: dbb12562f7 ("IB/{core, ipoib}: Simplify ib_find_gid to search only for IB link layer")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 16:22:12 -06:00
Parav Pandit 4ab7cb4bf3 IB/core: Refer to RoCE port property instead of GID table property
ib_find_gid_by_filter() searches GID with filter only for RoCE link
layer regardless of HCA's support for GID table.
Therefore, right way to lookup is compare RoCE port property and not
the GID table property.

Fixes: 99b27e3b5d ("IB/cache: Add ib_find_gid_by_filter cache API")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 16:22:12 -06:00
Parav Pandit 3401857ea3 IB/core: Generate GID change event regardless of RoCE GID table property
Due to following reasons, GID table event is generated regardless of GID
table property.

1. GID table cache is maintained at ib core layer regardless of link layer.
2. GID change event has no relation with IB link layer.
3. GID change event also doesn't depend on whether HCA supports GID table
or not.

Fixes: f3906bd360 ("IB/core: Refactor GID cache's ib_dispatch_event")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 16:22:12 -06:00
Parav Pandit 97c45c2c28 IB/cm: Block processing alternate path handling RoCE Rx cm messages
Due to below reasons, it is better to not support alternate path receive
messages for RoCE in near term.

1. Alternate path for RoCE is not supported at rdmacm layer.
2. It is not supported in uverbs/core layer for RoCE.
3. Alternate path for IPv6 for link local address cannot resolve route
determinstically without a valid incoming interface id whose usecase
make sense only with dual port mode.
4. init_av_from_path while processing LAP messages for IB and RoCE can
lead to adding duplicate entry of AV into the port list, leads to list
corruption.
5. rdma-core userspace a well known userspace implementation has removed
support of libucm which use ucm.ko module, which is the only module that
can trigger alternate path related messages.
6. ucm kernel module is requested to be removed from the IB core in
patch [1].

[1] https://patchwork.kernel.org/patch/10268503/

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 16:22:12 -06:00
Mark Bloch e945130b52 IB/core: Protect against concurrent access to hardware stats
Currently access to hardware stats buffer isn't protected, this can
result in multiple writes and reads at the same time to the same
memory location. This can lead to providing an incorrect value to
the user. Add a mutex to protect against it.

Fixes: b40f4757da ("IB/core: Make device counter infrastructure dynamic")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 15:07:21 -06:00
Majd Dibbiny c8d75a980f IB/mlx5: Respect new UMR capabilities
In some firmware configuration, UMR usage from Virtual Functions is restricted.
This information is published to the driver using new capability bits.

Avoid using UMRs in these cases and use the Firmware slow-path flow to create
mkeys and populate them with Virtual to Physical address translation.

Older drivers that do not have this patch, will end up using memory keys that
aren't populated with Virtual to Physical address translation that is done
part of the UMR work.

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:43:10 -06:00
Majd Dibbiny ea8af0d2f2 IB/mlx5: Enable ECN capable bits for UD RoCE v2 QPs
When working with RC QPs, the FW sets the ECN capable bits for all
the RoCE v2 packets. On the other hand, for UD QPs, the driver needs
to set the the ECN capable bits in the Address Handler since the HW
generates each packet according to the Address Handler and not
the QP context.

If ECN is not enabled in NIC or switch, these bits are ignored.

Fixes: 2811ba51b0 ("IB/mlx5: Add RoCE fields to Address Vector")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:43:10 -06:00
Matan Barak be23fb9a2c IB/uverbs: UAPI pointers should use __aligned_u64 type
The ioctl() UAPIs are meant to be used by both user-space
and kernel ioctl() handlers.

Mostly, these UAPI structs tend to consist of simple types, but
sometimes user-space pointers may be passed between user-space and
kernel. We would like to avoid dereferencing a user-space pointer in
the kernel, thus - we always define RDMA_UAPI_PTR as a __aligned_u64
type.

Fixes: 1f7ff9d5d3 ('IB/uverbs: Move to new headers and make naming consistent')
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:43:10 -06:00
Jason Gunthorpe 819b60286e Merge branch '32compat'
The design of the uAPI had intended all structs to share the same layout on 32
and 64 bit compiles. Unfortunately over the years some errors have crept in.

This series fixes all the incompatabilities. It goes along with a userspace
rdma-core series that causes the providers to use these structs directly and
then does various self-checks on the command formation.

Those checks were combined with output from pahole on 32 and 64 bit compiles
to confirm that the structure layouts are the same.

This series does not make implicit padding explicit, as long as the implicit
padding is the same on 32 and 64 bit compiles.

Finally, the issue is put to rest by using __aligned_u64 in the uapi headers,
if new code copies that type, and is checked in userspace, it is unlikely we
will see problems in future.

There are two patches that break the ABI for a 32 bit kernel, one for rxe and
one for mlx4. Both patches have notes, but the overall feeling from Doug and I
is that providing compat is just too difficult and not necessary since there
is no real user of a 32 bit userspace and 32 bit kernel for various good
reasons.

The 32 bit userspace / 64 bit kernel case however does seem to have some real
users and does need to work as designed.

* 32compat:
  RDMA: Change all uapi headers to use __aligned_u64 instead of __u64
  RDMA/rxe: Fix uABI structure layouts for 32/64 compat
  RDMA/mlx4: Fix uABI structure layouts for 32/64 compat
  RDMA/qedr: Fix uABI structure layouts for 32/64 compat
  RDMA/ucma: Fix uABI structure layouts for 32/64 compat
  RDMA: Remove minor pahole differences between 32/64
2018-03-27 14:32:49 -06:00
Jason Gunthorpe 26b9906612 RDMA: Change all uapi headers to use __aligned_u64 instead of __u64
The new auditing standard for the subsystem will be to only use
__aligned_64 in uapi headers to try and prevent 32/64 compat bugs
from existing in the future.

Changing all existing usage will help ensure new developers copy the
right idea.

The before/after of this patch was tested using pahole on 32 and 64
bit compiles to confirm it has no change in the structure layout, so
this patch is a NOP.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:25:09 -06:00
Jason Gunthorpe f2e9bfac13 RDMA/rxe: Fix uABI structure layouts for 32/64 compat
With 32 bit compilation several of the fields become misaligned here.
Fixing this is an ABI break for 32 bit rxe and it is in well used
portions of the rxe ABI.

To handle this we bump the ABI version, as expected. However the user
space driver doesn't handle it properly today, so all existing user
space continues to work.

Updated userspace will start to require the necessary kernel version.

We don't expect there to be any 32 bit users of rxe. Most likely cases,
such as ARM 32 already generally don't work because rxe does not handle
the CPU cache properly on its shared with userspace pages.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:25:09 -06:00
Jason Gunthorpe 366380a0c8 RDMA/mlx4: Fix uABI structure layouts for 32/64 compat
rss_caps in struct mlx4_uverbs_ex_query_device_resp is misaligned on
32 bit compared to 64 bit, add explicit padding.

The rss caps were introduced recently and are very rarely used in user
space, mainly for DPDK.

We don't expect there to be a real 32 bit user, so this change is done
without compat considerations.

Fixes: 09d208b258 ("IB/mlx4: Add report for RSS capabilities by vendor channel")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:25:09 -06:00
Jason Gunthorpe 71e80a4781 RDMA/qedr: Fix uABI structure layouts for 32/64 compat
struct qedr_alloc_ucontext_resp is a different length in 32 and 64
bit compiles due to implicit compiler padding.

The structs alloc_pd_uresp, create_cq_uresp and create_qp_uresp are
not padded by the compiler, but in user space the compiler pads them
due to the way the core and driver structs are concatenated. Make
this padding explicit and consistent for future sanity.

The kernel driver can already handle the user buffer being smaller
than required and copies correctly, so no compat or ABI break happens
from introducing the explicit padding.

Acked-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:25:09 -06:00
Jason Gunthorpe 611cb92b08 RDMA/ucma: Fix uABI structure layouts for 32/64 compat
The rdma_ucm_event_resp is a different length on 32 and 64 bit compiles.

The kernel requires it to be the expected length or longer so 32 bit
builds running on a 64 bit kernel will not work.

Retain full compat by having all kernels accept a struct with or without
the trailing reserved field.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:25:08 -06:00
Jason Gunthorpe 38b48808b9 RDMA: Remove minor pahole differences between 32/64
To help automatic detection we want pahole to report the same struct
layouts for 32 and 64 bit compiles. These cases are all implicit
padding added at the end of embedded structs as part of a union.

The added reserved fields have no impact on the ABI.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-27 14:25:08 -06:00
Jason Gunthorpe f64705b871 RDMA/ocrdma: Fix structure layout for ocrdma_alloc_pd
The udata's for alloc_pd cannot contain u64s due to alignment
constraints. Switch the two never-used u64's to arrays of u32 to reduce
the required struct alignment to 4 bytes.

These reserved fields are totally unnecessary, never written and never
read.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-23 14:53:29 -06:00
Steve Wise f215a3d244 iw_cxgb4: Add ib_device->get_netdev support
This is useful to rdma ULPs.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-23 11:12:51 -06:00
Parav Pandit 114cc9c4b1 IB/cma: Resolve route only while receiving CM requests
Currently CM request for RoCE follows following flow.
rdma_create_id()
rdma_resolve_addr()
rdma_resolve_route()
For RC QPs:
rdma_connect()
->cma_connect_ib()
  ->ib_send_cm_req()
    ->cm_init_av_by_path()
      ->ib_init_ah_attr_from_path()
For UD QPs:
rdma_connect()
->cma_resolve_ib_udp()
  ->ib_send_cm_sidr_req()
    ->cm_init_av_by_path()
      ->ib_init_ah_attr_from_path()

In both the flows, route is already resolved before sending CM requests.
Therefore, code is refactored to avoid resolving route second time in
ib_cm layer.
ib_init_ah_attr_from_path() is extended to resolve route when it is not
yet resolved for RoCE link layer. This is achieved by caller setting
route_resolved field in path record whenever it has route already
resolved.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-23 10:58:05 -06:00
Parav Pandit 98f1f4e0ed IB/core: Refer to RoCE port property instead of GID table property
ib_query_gid() in commit [1] refers to RoCE GID table capability of
the HCA using rdma_cap_roce_gid_table().
ib_core maintains the GID table cache regardless of the HCA provider
drivers capability to maintain RoCE GID table.
Therefore, whether to return a GID table entry from the software cache or
from HCA should be done based on whether the port is RoCE or not.

[1] commit 03db3a2d81 ("IB/core: Add RoCE GID table management")

Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22 12:42:49 -06:00
Leon Romanovsky 03286030ac RDMA/restrack: Remove ambiguity in resource track clean logic
The restrack clean routine had simple, but powerful WARN_ON check
to see if all resources are cleared prior to releasing device.

The WARN_ON check performed very well, but lack of information
which device caused to resource leak, the object type and origin
made debug to be fun and challenging at the same time.

The fact that all dumps were the same because restrack_clean() is
called in dealloc() didn't help either.

So let's fix spelling error and convert WARN_ON to be more debug
friendly. The dmesg cut below gives example of how the output
will look output for the case fixed in patch [1]

[  438.421372] restrack: ------------[ cut here ]------------
[  438.423448] restrack: BUG: RESTRACK detected leak of resources on mlx5_2
[  438.425600] restrack: Kernel PD object allocated by mlx5_ib is not freed
[  438.427753] restrack: Kernel CQ object allocated by mlx5_ib is not freed
[  438.429660] restrack: ------------[ cut here ]------------

[1] https://patchwork.kernel.org/patch/10298695/

Cc: Michal Kalderon <Michal.Kalderon@cavium.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22 12:42:48 -06:00
Yixian Liu e95955773d RDMA/hns: Fix cq record doorbell enable in kernel
Upon detecting both kernel and user space support record doorbell,
the kernel needs to enable this capability in hardware by db_en,
and it should take place before cq context configuration in
hns_roce_cq_alloc. Currently, db_en is configured after cq alloc
and db_map_user has similar problem.

Reported-by: Xiping Zhang <zhangxiping3@huawei.com>
Fixes: 9b44703d0a ("RDMA/hns: Support cq record doorbell for the user space")
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22 12:42:48 -06:00
Jason Gunthorpe 761fc376c9 RDMA/cxgb3: Use structs to describe the uABI instead of opencoding
Open coding a loose value is not acceptable for describing the uABI in
RDMA. Provide the missing struct.

Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-22 12:42:48 -06:00
Sinan Kaya 97d82a48d7 IB/mlx4: Eliminate duplicate barriers on weakly-ordered archs
Code includes wmb() followed by writel(). writel() already has a barrier on
some architectures like arm64.

This ends up CPU observing two barriers back to back before executing the
register write.

Since code already has an explicit barrier call, changing writel() to
writel_relaxed().

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-21 13:51:41 -06:00
Matan Barak 185899ee8d IB/uverbs: Enable ioctl() uAPI by default for new verbs
Enable the ioctl() uAPI for IB by default if the standard write()
uAPI (INFINIBAND_USER_ACCESS) is enabled. Verbs that are
also available under the old write() uAPI are put inside a new
INFINIBAND_EXP_LEGACY_VERBS_NEW_UAPI Kconfig.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak 3d64addd43 IB/uverbs: Add macros to simplify adding driver specific attributes
Previously, adding driver specific attributes required drivers to
declare all the hierarchy - object tree, object, methods and the
attributes themselves. A common use case is adding a few attributes to
an existing common method.
In order to simplify the driver's code, we add some macros to do all
these declarations automatically.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak 41b2a71fc8 IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new file
Currently, all objects are declared in uverbs_std_types. This could lead
to a huge file once we implement all objects, methods and handlers.
Moving each object to its own file to keep the files smaller and more
readable. uverbs_std_types.c will only contain the parsing tree
definition and objects without any methods.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak dfb1395573 IB/uverbs: Expose parsing tree of all common objects to providers
The ioctl() based uverbs is based on merging feature trees. This teaches
the generic parser how to parse methods according to the provider's
support. In order to support merging with the common objects, exporting
the common-object-tree to the provider drivers.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak c66db31113 IB/uverbs: Safely extend existing attributes
Previously, we've used UVERBS_ATTR_SPEC_F_MIN_SZ for extending existing
attributes. The behavior of this flag was the kernel accepts anything
bigger than the minimum size it specified. This is unsafe, since in
order to safely extend an attribute, we need to make sure unknown size
is zeroed. Replacing UVERBS_ATTR_SPEC_F_MIN_SZ with
UVERBS_ATTR_SPEC_F_MIN_SZ_OR_ZERO, which essentially checks that the
unknown size is zero. In addition, attributes are now decorated with
UVERBS_ATTR_TYPE and UVERBS_ATTR_STRUCT, so we can provide the minimum
and known length.

Users of this flag needs to use copy_from_or_zero functions/macros.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak 1f07e08fab IB/uverbs: Enable compact representation of uverbs_attr_spec
Downstream patches extend uverbs_attr_spec with new fields.
In order to save space, we move the type and flags fields to
the various attribute flavors contained in the union.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak 0ede73bc01 IB/uverbs: Extend uverbs_ioctl header with driver_id
Extending uverbs_ioctl header with driver_id and another reserved
field. driver_id should be used in order to identify the driver.
Since every driver could have its own parsing tree, this is necessary
for strace support.
Downstream patches take off the EXPERIMENTAL flag from the ioctl() IB
support and thus we add some reserved fields for future usage.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Matan Barak 1f7ff9d5d3 IB/uverbs: Move to new headers and make naming consistent
Use macros to make names consistent in ioctl() uAPI:
The ioctl() uAPI works with object-method hierarchy. The method part
also states which handler should be executed when this method is called
from user-space. Therefore, we need to tie method, method's id, method's
handler and the object owning this method together.
Previously, this was done through explicit developer chosen names.
This makes grepping the code harder. Changing the method's name,
method's handler and object's name to be automatically generated based
on the ids.

The headers are split in a way so they be included and used by
user-space. One header strictly contains structures that are used
directly by user-space applications, where another header is used for
internal library (i.e. libibverbs) to form the ioctl() commands.
Other header simply contains the required general command structure.

Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 14:45:17 -06:00
Bart Van Assche b470c154c6 IB/srp: Disallow duplicate RDMA/CM connections
According to the SRP standard the INITIATOR and TARGET PORT IDENTIFIER
fields from the login request specify the I_T nexus. Whether or not an
SRP target closes an existing connection for an I_T nexus when a login
request is received depends on the value of the MULTICHANNEL field in
the login request. The SRP initiator derives the value of the
INITIATOR and TARGET PORT IDENTIFIER fields from the .id_ext,
.ioc_guid, .initiator_ext .sgid members of the srp_target_port
structure. This means that the .rdma_cm.dst check must be removed from
srp_conn_unique(). This patch avoids that for target ports that have
multiple addresses, e.g. an IPv4 and an IPv6 address, and if a
connection is established to both target port addresses, that the
initiator logs in alternatingly every 10 seconds to the other target
port address. An SRP target must namely terminate all but one
connections for a given I_T nexus if the MULTICHANNEL field has not
been set in the login request.

Fixes: 19f313438c ("IB/srp: Add RDMA/CM support")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 13:54:50 -06:00
Bodong Wang 61147f391a IB/mlx5: Packet packing enhancement for RAW QP
Enable RAW QP to be able to configure burst control by modify_qp. By
using burst control with rate limiting, user can achieve best
performance and accuracy. The burst control information is passed by
user through udata.

This patch also reports burst control capability for mlx5 related
hardwares, burst control is only marked as supported when both
packet_pacing_burst_bound and packet_pacing_typical_size are
supported.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:55:13 -06:00
Bodong Wang 05d3ac978e net/mlx5: Packet pacing enhancement
Add two new parameters: max_burst_sz and typical_pkt_size (both
in bytes) to rate limit configurations.

max_burst_sz: The device will schedule bursts of packets for an
SQ connected to this rate, smaller than or equal to this value.
Value 0x0 indicates packet bursts will be limited to the device
defaults. This field should be used if bursts of packets must be
strictly kept under a certain value.

typical_pkt_size: When the rate limit is intended for a stream of
similar packets, stating the typical packet size can improve the
accuracy of the rate limiter. The expected packet size will be
the same for all SQs associated with the same rate limit index.

Ethernet driver is updated according to this change, but these two
parameters will be kept as 0 due to lacking of proper way to get the
configurations from user space which requires to change
ndo_set_tx_maxrate interface.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:54:41 -06:00
Yixian Liu df7e404258 RDMA/hns: Fix init resp when alloc ucontext
The data in resp will be copied from kernel to userspace, thus it needs to
be initialized to zeros to avoid copying uninited stack memory.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: e088a685ea ("RDMA/hns: Support rq record doorbell for the user space")
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:41:41 -06:00
Parav Pandit b19744e965 IB/core: Remove unimplemented ib_peek_cq
ib_peek_cq() verb doesn't seem be implemented in current code.
There is some past reference to it at [1] about it being unimplemented.

Lot of user documentation created out of kdoc refers to this
unimplemented API. Therefore, remove unimplemented API.

[1] http://lists.openfabrics.org/pipermail/ofw/2008-May/002465.html
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:41:41 -06:00
Parav Pandit 6d5b2047fe IB/core: Use rdma_is_port_valid()
Use rdma_is_port_valid() which performs port validity check instead of
open coding the same check.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:41:40 -06:00
Jason Gunthorpe 958d2c1ba3 RDMA/bnxt: Fix structure layout for bnxt_re_pd_resp
What is going on here is a bit subtle, in the kernel there is no
problem because the struct is copied using copy_from_user, so it
can safely have an 8 byte alignment, however in userspace it must
be constructed by concatenation with the ib_uverbs_alloc_pd_resp
struct. This is due to the required memory layout to execute the
command.

Since ibv_uverbs_alloc_pd_resp is only 4 bytes long, this causes
misalignment, and the user space will experience an unexpected padding.
Currently it works around this via pointer maths.

Make everything more robust by having the compiler reduce the alignment
of the struct to 4. The userspace has assertions to ensure this
works properly in all situations.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:41:40 -06:00
Honggang Li 7672ed33c4 IB/mlx5: Set the default active rate and width to QDR and 4X
Before commit f1b65df5a2 ("IB/mlx5: Add support for active_width and
active_speed in RoCE"), the mlx5_ib driver set the default active_width
and active_speed to IB_WIDTH_4X and IB_SPEED_QDR.

When the RoCE port is down, the RoCE port does not negotiate the active
width with the remote side, causing the active width to be zero. When
running userspace ibstat to view the port status, ibstat will panic as it
reads an invalid width from sys file.

This patch restores the original behavior.

Fixes: f1b65df5a2 ("IB/mlx5: Add support for active_width and active_speed in RoCE").
Signed-off-by: Honggang Li <honli@redhat.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:39:47 -06:00
Honggang Li 311d0da974 IB/core: Set speed string to SDR for invalid active rates
Before commit f1b65df5a2 ("IB/mlx5: Add support for active_width and
active_speed in RoCE"), the mlx5_ib driver set default active_width and
active_speed to IB_WIDTH_4X and IB_SPEED_QDR.

Now, the active_width and active_speed are zeros if the RoCE port
is in DOWN state. The speed string should be set to " SDR" instead of
a blank string when active_speed is zero.

Signed-off-by: Honggang Li <honli@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-19 11:39:47 -06:00
Leon Romanovsky 7d9a935e16 RDMA/restrack: Don't rely on uninitialized variable in restrack_add flow
The restrack code relies on the fact that object structures are zeroed at
the allocation stage, the mlx4 CQ wasn't allocated with kzalloc and it
caused to the following crash.

[  137.392209] general protection fault: 0000 [#1] SMP KASAN PTI
[  137.392972] CPU: 0 PID: 622 Comm: ibv_rc_pingpong Tainted: G        W        4.16.0-rc1-00099-g00313983cda6 #11
[  137.395079] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[  137.396866] RIP: 0010:rdma_restrack_del+0xc8/0xf0
[  137.397762] RSP: 0018:ffff8801b54e7968 EFLAGS: 00010206
[  137.399008] RAX: 0000000000000000 RBX: ffff8801d8bcbae8 RCX: ffffffffb82314df
[  137.400055] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: 70696b533d454741
[  137.401103] RBP: ffff8801d90c07a0 R08: ffff8801d8bcbb00 R09: 0000000000000000
[  137.402470] R10: 0000000000000001 R11: ffffed0036a9cf52 R12: ffff8801d90c0ad0
[  137.403318] R13: ffff8801d853fb20 R14: ffff8801d8bcbb28 R15: 0000000000000014
[  137.404736] FS:  00007fb415d43740(0000) GS:ffff8801e5c00000(0000) knlGS:0000000000000000
[  137.406074] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  137.407101] CR2: 00007fb41557df20 CR3: 00000001b580c001 CR4: 00000000003606b0
[  137.408308] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  137.409352] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  137.410385] Call Trace:
[  137.411058]  ib_destroy_cq+0x23/0x60
[  137.411460]  uverbs_free_cq+0x37/0xa0
[  137.412040]  remove_commit_idr_uobject+0x38/0xf0
[  137.413042]  _rdma_remove_commit_uobject+0x5c/0x160
[  137.413782]  ? lookup_get_idr_uobject+0x39/0x50
[  137.414737]  rdma_remove_commit_uobject+0x3b/0x70
[  137.415742]  ib_uverbs_destroy_cq+0x114/0x1d0
[  137.416260]  ? ib_uverbs_req_notify_cq+0x160/0x160
[  137.417073]  ? kernel_text_address+0x5c/0x90
[  137.417805]  ? __kernel_text_address+0xe/0x30
[  137.418766]  ? unwind_get_return_address+0x2f/0x50
[  137.419558]  ib_uverbs_write+0x453/0x6a0
[  137.420220]  ? show_ibdev+0x90/0x90
[  137.420653]  ? __kasan_slab_free+0x136/0x180
[  137.421155]  ? kmem_cache_free+0x78/0x1e0
[  137.422192]  ? remove_vma+0x83/0x90
[  137.422614]  ? do_munmap+0x447/0x6c0
[  137.423045]  ? vm_munmap+0xb0/0x100
[  137.423481]  ? SyS_munmap+0x1d/0x30
[  137.424120]  ? do_syscall_64+0xeb/0x250
[  137.424984]  ? entry_SYSCALL_64_after_hwframe+0x21/0x86
[  137.425611]  ? lru_add_drain_all+0x270/0x270
[  137.426116]  ? lru_add_drain_cpu+0xa3/0x170
[  137.426616]  ? lru_add_drain+0x11/0x20
[  137.427058]  ? free_pages_and_swap_cache+0xa6/0x120
[  137.427672]  ? tlb_flush_mmu_free+0x78/0x90
[  137.428168]  ? arch_tlb_finish_mmu+0x6d/0xb0
[  137.428680]  __vfs_write+0xc4/0x350
[  137.430917]  ? kernel_read+0xa0/0xa0
[  137.432758]  ? remove_vma+0x90/0x90
[  137.434781]  ? __kasan_slab_free+0x14b/0x180
[  137.437486]  ? remove_vma+0x83/0x90
[  137.439836]  ? kmem_cache_free+0x78/0x1e0
[  137.442195]  ? percpu_counter_add_batch+0x1d/0x90
[  137.444389]  vfs_write+0xf7/0x280
[  137.446030]  SyS_write+0xa1/0x120
[  137.447867]  ? SyS_read+0x120/0x120
[  137.449670]  ? mm_fault_error+0x180/0x180
[  137.451539]  ? _cond_resched+0x16/0x50
[  137.453697]  ? SyS_read+0x120/0x120
[  137.455883]  do_syscall_64+0xeb/0x250
[  137.457686]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[  137.459595] RIP: 0033:0x7fb415637b94
[  137.461315] RSP: 002b:00007ffdebea7d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  137.463879] RAX: ffffffffffffffda RBX: 00005565022d1bd0 RCX: 00007fb415637b94
[  137.466519] RDX: 0000000000000018 RSI: 00007ffdebea7da0 RDI: 0000000000000003
[  137.469543] RBP: 00007ffdebea7d98 R08: 0000000000000000 R09: 00005565022d40c0
[  137.472479] R10: 00000000000009cf R11: 0000000000000246 R12: 00005565022d2520
[  137.475125] R13: 00000000000003e8 R14: 0000000000000000 R15: 00007ffdebea7fd0
[  137.477760] Code: f7 e8 dd 0d 0b ff 48 c7 43 40 00 00 00 00 48 89 df e8 0d 0b 0b ff 48 8d 7b 28 c6 03 00 e8 41 0d 0b ff 48 8b 7b 28 48 85 ff 74 06 <f0> ff 4f 48 74 10 5b 48 89 ef 5d 41 5c 41 5d 41 5e e9 32 b0 ee
[  137.483375] RIP: rdma_restrack_del+0xc8/0xf0 RSP: ffff8801b54e7968
[  137.486436] ---[ end trace 81835a1ea6722eed ]---
[  137.488566] Kernel panic - not syncing: Fatal exception
[  137.491162] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Fixes: 00313983cd ("RDMA/nldev: provide detailed CM_ID information")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-03-16 16:35:25 -06:00