Reserve additional space for CDBs that contain more than sixteen bytes
and set the add_cdb_len field for such CDBs as required. From the SRP
standard: "The ADDITIONAL CDB LENGTH field contains the length in
dwords of the ADDITIONAL CDB field."
Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch avoids that a warning is reported when building with W=1.
Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add constants and data structures to support immediate data. These
changes conform to SRP2r04.
Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch moves all constants that come from the SRP standard into the
include/scsi/srp.h header file.
Cc: Sergey Gorenko <sergeygo@mellanox.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Most SCSI drivers want to enable "clustering", that is merging of
segments so that they might span more than a single page. Remove the
ENABLE_CLUSTERING define, and require drivers to explicitly set
DISABLE_CLUSTERING to disable this feature.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Make all the required change to start use the ib_device_ops structure.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Several conflicts, seemingly all over the place.
I used Stephen Rothwell's sample resolutions for many of these, if not
just to double check my own work, so definitely the credit largely
goes to him.
The NFP conflict consisted of a bug fix (moving operations
past the rhashtable operation) while chaning the initial
argument in the function call in the moved code.
The net/dsa/master.c conflict had to do with a bug fix intermixing of
making dsa_master_set_mtu() static with the fixing of the tagging
attribute location.
cls_flower had a conflict because the dup reject fix from Or
overlapped with the addition of port range classifiction.
__set_phy_supported()'s conflict was relatively easy to resolve
because Andrew fixed it in both trees, so it was just a matter
of taking the net-next copy. Or at least I think it was :-)
Joe Stringer's fix to the handling of netns id 0 in bpf_sk_lookup()
intermixed with changes on how the sdif and caller_net are calculated
in these code paths in net-next.
The remaining BPF conflicts were largely about the addition of the
__bpf_md_ptr stuff in 'net' overlapping with adjustments and additions
to the relevant data structure where the MD pointer macros are used.
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to pass extack together with NETDEV_PRE_UP notifications, it's
necessary to route the extack to __dev_open() from diverse (possibly
indirect) callers. One prominent API through which the notification is
invoked is dev_change_flags().
Therefore extend dev_change_flags() with and extra extack argument and
update all users. Most of the calls end up just encoding NULL, but
several sites (VLAN, ipvlan, VRF, rtnetlink) do have extack available.
Since the function declaration line is changed anyway, name the other
function arguments to placate checkpatch.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
iscsi_target_mod is the only LIO fabric where fabric_ops.name differs from
the fabric_ops.fabric_name string. fabric_ops.name is used when matching
target/$fabric ConfigFS create paths, so rename it .fabric_alias and
fallback to target/$fabric vs .fabric_name comparison if .fabric_alias
isn't initialised. iscsi_target_mod is the only fabric module to set
.fabric_alias . All other fabric modules rely on .fabric_name matching and
can drop the duplicate string.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
All fabrics return a const string. In all cases *except* iSCSI the
get_fabric_name() string matches fabric_ops.name.
Both fabric_ops.get_fabric_name() and fabric_ops.name are user-facing, with
the former being used for PR/ALUA state and the latter for ConfigFS
(config/target/$name), so we unfortunately need to keep both strings around
for now. Replace the useless .get_fabric_name() accessor function with a
const string fabric_name member variable.
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since the function always returns 0 make it void.
Reported-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is no need to have the 'struct se_portal_group *tpg' variable static
since new value always be assigned before use.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If for some reason we failed to query the mr status, we need to make sure
to provide sufficient information for an ambiguous error (guard error on
sector 0).
Fixes: 0a7a08ad6f ("IB/iser: Implement check_protection")
Cc: <stable@vger.kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This has been a smaller cycle with many of the commits being smallish code
fixes and improvements across the drivers.
- Driver updates for bnxt_re, cxgb4, hfi1, hns, mlx5, nes, qedr, and rxe
- Memory window support in hns
- mlx5 user API 'flow mutate/steering' allows accessing the full packet
mangling and matching machinery from user space
- Support inter-working with verbs API calls in the 'devx' mlx5 user API, and
provide options to use devx with less privilege
- Modernize the use of syfs and the device interface to use attribute groups
and cdev properly for uverbs, and clean up some of the core code's device list
management
- More progress on net namespaces for RDMA devices
- Consolidate driver BAR mmapping support into core code helpers and rework
how RDMA holds poitners to mm_struct for get_user_pages cases
- First pass to use 'dev_name' instead of ib_device->name
- Device renaming for RDMA devices
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlvR7dUACgkQOG33FX4g
mxojiw//a9GU5kq4IZ3LNAEio/3Ql/NHRF0uie5tSzJgipRJA1Ln9zW0Cm1S/ms1
VCmaSJ3l3q3GC4i3tIlsZSIIkN5qtjv/FsT/i+TZwSJYx9BDpPbzWtG6Mp4PSDj0
v3xzklFCN5HMOmEcjkNmyZw3VjHOt2Iw2mKjqvGbI9imCPLOYnw+WQaZLmMWMH6p
GL0HDbAopN5Lv8ireWd8pOhPLVbSb12cWM1crx+yHOS3q8YNWjIXGiZr/QkOPtPr
cymSXB8yuITJ7gnjbs/GxZHg6rxU0knC/Ck8hE7FqqYYHgytTklOXDE2ef1J2lFe
1VmotD+nTsCir0mZWSdcRrszEk7tzaZT7n1oWggKvWySDB6qaH0II8vWumJchQnN
pElIQn/WDgpekIqplamNqXJnKnDXZJpEVA01OHHDN4MNSc+Ad08hQy4FyFzpB6/G
jv9TnDMfGC6ma9pr1ipOXyCgCa2pHYEUCaYxUqRA0O/4ATVl7/PplqT0rqtJ6hKg
o/hmaVCawIFOUKD87/bo7Em2HBs3xNwE/c5ggbsQElLYeydrgPrZfrPfjkshv5K3
eIKDb+HPyis0is1aiF7m/bz1hSIYZp0bQhuKCdzLRjZobwCm5WDPhtuuAWb7vYVw
GSLCJWyet+bLyZxynNOt67gKm9je9lt8YTr5nilz49KeDytspK0=
=pacJ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"This has been a smaller cycle with many of the commits being smallish
code fixes and improvements across the drivers.
- Driver updates for bnxt_re, cxgb4, hfi1, hns, mlx5, nes, qedr, and
rxe
- Memory window support in hns
- mlx5 user API 'flow mutate/steering' allows accessing the full
packet mangling and matching machinery from user space
- Support inter-working with verbs API calls in the 'devx' mlx5 user
API, and provide options to use devx with less privilege
- Modernize the use of syfs and the device interface to use attribute
groups and cdev properly for uverbs, and clean up some of the core
code's device list management
- More progress on net namespaces for RDMA devices
- Consolidate driver BAR mmapping support into core code helpers and
rework how RDMA holds poitners to mm_struct for get_user_pages
cases
- First pass to use 'dev_name' instead of ib_device->name
- Device renaming for RDMA devices"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (242 commits)
IB/mlx5: Add support for extended atomic operations
RDMA/core: Fix comment for hw stats init for port == 0
RDMA/core: Refactor ib_register_device() function
RDMA/core: Fix unwinding flow in case of error to register device
ib_srp: Remove WARN_ON in srp_terminate_io()
IB/mlx5: Allow scatter to CQE without global signaled WRs
IB/mlx5: Verify that driver supports user flags
IB/mlx5: Support scatter to CQE for DC transport type
RDMA/drivers: Use core provided API for registering device attributes
RDMA/core: Allow existing drivers to set one sysfs group per device
IB/rxe: Remove unnecessary enum values
RDMA/umad: Use kernel API to allocate umad indexes
RDMA/uverbs: Use kernel API to allocate uverbs indexes
RDMA/core: Increase total number of RDMA ports across all devices
IB/mlx4: Add port and TID to MAD debug print
IB/mlx4: Enable debug print of SMPs
RDMA/core: Rename ports_parent to ports_kobj
RDMA/core: Do not expose unsupported counters
IB/mlx4: Refer to the device kobject instead of ports_parent
RDMA/nldev: Allow IB device rename through RDMA netlink
...
The WARN_ON() is pointless as the rport is placed in SDEV_TRANSPORT_OFFLINE
at that time, so no new commands can be submitted via srp_queuecommand()
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.com>
Acked-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
IPCB should be cleared before icmp_send, since it may contain data from
previous layers and the data could be misinterpreted as ip header options,
which later caused the ihl to be set to an invalid value and resulted in
the following stack corruption:
[ 1083.031512] ib0: packet len 57824 (> 2048) too long to send, dropping
[ 1083.031843] ib0: packet len 37904 (> 2048) too long to send, dropping
[ 1083.032004] ib0: packet len 4040 (> 2048) too long to send, dropping
[ 1083.032253] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.032481] ib0: packet len 23960 (> 2048) too long to send, dropping
[ 1083.033149] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.033439] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.033700] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.034124] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.034387] ==================================================================
[ 1083.034602] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xf08/0x1310
[ 1083.034798] Write of size 4 at addr ffff880353457c5f by task kworker/u16:0/7
[ 1083.034990]
[ 1083.035104] CPU: 7 PID: 7 Comm: kworker/u16:0 Tainted: G O 4.19.0-rc5+ #1
[ 1083.035316] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[ 1083.035573] Workqueue: ipoib_wq ipoib_cm_skb_reap [ib_ipoib]
[ 1083.035750] Call Trace:
[ 1083.035888] dump_stack+0x9a/0xeb
[ 1083.036031] print_address_description+0xe3/0x2e0
[ 1083.036213] kasan_report+0x18a/0x2e0
[ 1083.036356] ? __ip_options_echo+0xf08/0x1310
[ 1083.036522] __ip_options_echo+0xf08/0x1310
[ 1083.036688] icmp_send+0x7b9/0x1cd0
[ 1083.036843] ? icmp_route_lookup.constprop.9+0x1070/0x1070
[ 1083.037018] ? netif_schedule_queue+0x5/0x200
[ 1083.037180] ? debug_show_all_locks+0x310/0x310
[ 1083.037341] ? rcu_dynticks_curr_cpu_in_eqs+0x85/0x120
[ 1083.037519] ? debug_locks_off+0x11/0x80
[ 1083.037673] ? debug_check_no_obj_freed+0x207/0x4c6
[ 1083.037841] ? check_flags.part.27+0x450/0x450
[ 1083.037995] ? debug_check_no_obj_freed+0xc3/0x4c6
[ 1083.038169] ? debug_locks_off+0x11/0x80
[ 1083.038318] ? skb_dequeue+0x10e/0x1a0
[ 1083.038476] ? ipoib_cm_skb_reap+0x2b5/0x650 [ib_ipoib]
[ 1083.038642] ? netif_schedule_queue+0xa8/0x200
[ 1083.038820] ? ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib]
[ 1083.038996] ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib]
[ 1083.039174] process_one_work+0x912/0x1830
[ 1083.039336] ? wq_pool_ids_show+0x310/0x310
[ 1083.039491] ? lock_acquire+0x145/0x3a0
[ 1083.042312] worker_thread+0x87/0xbb0
[ 1083.045099] ? process_one_work+0x1830/0x1830
[ 1083.047865] kthread+0x322/0x3e0
[ 1083.050624] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 1083.053354] ret_from_fork+0x3a/0x50
For instance __ip_options_echo is failing to proceed with invalid srr and
optlen passed from another layer via IPCB
[ 762.139568] IPv4: __ip_options_echo rr=0 ts=0 srr=43 cipso=0
[ 762.139720] IPv4: ip_options_build: IPCB 00000000f3cd969e opt 000000002ccb3533
[ 762.139838] IPv4: __ip_options_echo in srr: optlen 197 soffset 84
[ 762.139852] IPv4: ip_options_build srr=0 is_frag=0 rr_needaddr=0 ts_needaddr=0 ts_needtime=0 rr=0 ts=0
[ 762.140269] ==================================================================
[ 762.140713] IPv4: __ip_options_echo rr=0 ts=0 srr=0 cipso=0
[ 762.141078] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x12ec/0x1680
[ 762.141087] Write of size 4 at addr ffff880353457c7f by task kworker/u16:0/7
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
From git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
This is required to resolve dependencies of the next series of RDMA
patches.
The code motion conflicts in drivers/infiniband/core/cache.c were
resolved.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
IPoIB netlink support was broken by the below commit since integrating
the rdma_netdev support relies on an allocation flow for netdevs that
was controlled by the ipoib driver while netdev's rtnl_newlink
implementation assumes that the netdev will be allocated by netlink.
Such situation leads to crash in __ipoib_device_add, once trying to
reuse netlink device.
This patch fixes the kernel oops for both mlx4 and mlx5
devices triggered by the following command:
Fixes: cd565b4b51 ("IB/IPoIB: Support acceleration options callbacks")
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
netdev has several interfaces that expect to call alloc_netdev_mqs from
the core code, with the driver only providing the arguments. This is
incompatible with the rdma_netdev interface that returns the netdev
directly.
Thus re-organize the API used by ipoib so that the verbs core code calls
alloc_netdev_mqs for the driver. This is done by allowing the drivers to
provide the allocation parameters via a 'get_params' callback and then
initializing an allocated netdev as a second step.
Fixes: cd565b4b51 ("IB/IPoIB: Support acceleration options callbacks")
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The dlid assignment made by looking into the u_ucast_dlid array does not
do an explicit check for the size of the array. The code path to arrive at
def_port, the index value is long and complicated so its best to just have
an explicit check here.
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In case target remote invalidates bogus rkey and signature is not used,
pi_ctx is NULL deref.
The commit also fails the connection on bogus remote invalidation.
Fixes: 59caaed7a7 ("IB/iser: Support the remote invalidation exception")
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
These return the same thing but dev_name is a more conventional use of the
kernel API.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Use different loop variables for the inner and outer loop. This avoids
that an infinite loop occurs if there are more RDMA channels than
target->req_ring_size.
Fixes: d92c0da71a ("IB/srp: Add multichannel support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Some tools may currently be using only the deprecated attribute;
let's print an elaborate and clear deprecation notice to kmsg.
To do that, we have to replace the whole sysfs file, since we inherit
the original one from netdev.
Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Some InfiniBand network devices have multiple ports on the same PCI
function. This initializes the `dev_port' sysfs field of those
network interfaces with their port number.
Prior to this the kernel erroneously used the `dev_id' sysfs
field of those network interfaces to convey the port number to userspace.
The use of `dev_id' was considered correct until Linux 3.15,
when another field, `dev_port', was defined for this particular
purpose and `dev_id' was reserved for distinguishing stacked ifaces
(e.g: VLANs) with the same hardware address as their parent device.
Similar fixes to net/mlx4_en and many other drivers, which started
exporting this information through `dev_id' before 3.15, were accepted
into the kernel 4 years ago.
See 76a066f2a2 (`net/mlx4_en: Expose port number through sysfs').
Signed-off-by: Arseny Maslennikov <ar@cs.msu.ru>
Signed-off-by: Doug Ledford <dledford@redhat.com>
It is illegal to change MTU to a value lower than the minimum MTU
stated in ethernet spec. In addition to that we need to add 4 bytes
for encapsulation header (IPOIB_ENCAP_LEN).
Before "ifconfig ib0 mtu 0" command, succeeds while it obviously shouldn't.
Signed-off-by: Muhammad Sammar <muhammads@mellanox.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
For dependencies, branch based on rdma.git 'for-rc' of
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/
Pull 'uverbs_dev_cleanups' from Leon Romanovsky:
====================
Reuse the char device code interfaces to simplify ib_uverbs_device
creation and destruction. As part of this series, we are sending fix to
cleanup path, which was discovered during internal review,
The fix definitely can go to -rc, but it means that this series will be
dependent on rdma-rc.
====================
* branch 'uverbs_dev_cleanups':
RDMA/uverbs: Use device.groups to initialize device attributes
RDMA/uverbs: Use cdev_device_add() instead of cdev_add()
RDMA/core: Depend on device_add() to add device attributes
RDMA/uverbs: Fix error cleanup path of ib_uverbs_add_one()
Resolved conflict in ib_device_unregister_sysfs()
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
WARN_ON() already contains an unlikely(), so it's not necessary to wrap it
into another.
Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Inside of start_xmit() the call to check if the connection is up and the
queueing of the packets for later transmission is not atomic which leaves
a window where cm_rep_handler can run, set the connection up, dequeue
pending packets and leave the subsequently queued packets by start_xmit()
sitting on neigh->queue until they're dropped when the connection is torn
down. This only applies to connected mode. These dropped packets can
really upset TCP, for example, and cause multi-minute delays in
transmission for open connections.
Here's the code in start_xmit where we check to see if the connection is
up:
if (ipoib_cm_get(neigh)) {
if (ipoib_cm_up(neigh)) {
ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
goto unref;
}
}
The race occurs if cm_rep_handler execution occurs after the above
connection check (specifically if it gets to the point where it acquires
priv->lock to dequeue pending skb's) but before the below code snippet in
start_xmit where packets are queued.
if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
push_pseudo_header(skb, phdr->hwaddr);
spin_lock_irqsave(&priv->lock, flags);
__skb_queue_tail(&neigh->queue, skb);
spin_unlock_irqrestore(&priv->lock, flags);
} else {
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
}
The patch acquires the netif tx lock in cm_rep_handler for the section
where it sets the connection up and dequeues and retransmits deferred
skb's.
Fixes: 839fcaba35 ("IPoIB: Connected mode experimental support")
Cc: stable@vger.kernel.org
Signed-off-by: Aaron Knister <aaron.s.knister@nasa.gov>
Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
rdma.git merge resolution for the 4.19 merge window
Conflicts:
drivers/infiniband/core/rdma_core.c
- Use the rdma code and revise with the new spelling for
atomic_fetch_add_unless
drivers/nvme/host/rdma.c
- Replace max_sge with max_send_sge in new blk code
drivers/nvme/target/rdma.c
- Use the blk code and revise to use NULL for ib_post_recv when
appropriate
- Replace max_sge with max_recv_sge in new blk code
net/rds/ib_send.c
- Use the net code and revise to use NULL for ib_post_recv when
appropriate
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This is mostly updates to the usual drivers: mpt3sas, lpfc, qla2xxx,
hisi_sas, smartpqi, megaraid_sas, arcmsr. In addition, with the
continuing absence of Nic we have target updates for tcmu and target
core (all with reviews and acks). The biggest observable change is
going to be that we're (again) trying to switch to mulitqueue as the
default (a user can still override the setting on the kernel command
line). Other major core stuff is the removal of the remaining
Microchannel drivers, an update of the internal timers and some
reworks of completion and result handling.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCW3R3niYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishauRAP4yfBKK
dbxF81c/Bxi/Stk16FWkOOrjs4CizwmnMcpM5wD/UmM9o6ebDzaYpZgA8wIl7X/N
o/JckEZZpIp+5NySZNc=
=ggLB
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates to the usual drivers: mpt3sas, lpfc, qla2xxx,
hisi_sas, smartpqi, megaraid_sas, arcmsr.
In addition, with the continuing absence of Nic we have target updates
for tcmu and target core (all with reviews and acks).
The biggest observable change is going to be that we're (again) trying
to switch to mulitqueue as the default (a user can still override the
setting on the kernel command line).
Other major core stuff is the removal of the remaining Microchannel
drivers, an update of the internal timers and some reworks of
completion and result handling"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (203 commits)
scsi: core: use blk_mq_run_hw_queues in scsi_kick_queue
scsi: ufs: remove unnecessary query(DM) UPIU trace
scsi: qla2xxx: Fix issue reported by static checker for qla2x00_els_dcmd2_sp_done()
scsi: aacraid: Spelling fix in comment
scsi: mpt3sas: Fix calltrace observed while running IO & reset
scsi: aic94xx: fix an error code in aic94xx_init()
scsi: st: remove redundant pointer STbuffer
scsi: qla2xxx: Update driver version to 10.00.00.08-k
scsi: qla2xxx: Migrate NVME N2N handling into state machine
scsi: qla2xxx: Save frame payload size from ICB
scsi: qla2xxx: Fix stalled relogin
scsi: qla2xxx: Fix race between switch cmd completion and timeout
scsi: qla2xxx: Fix Management Server NPort handle reservation logic
scsi: qla2xxx: Flush mailbox commands on chip reset
scsi: qla2xxx: Fix unintended Logout
scsi: qla2xxx: Fix session state stuck in Get Port DB
scsi: qla2xxx: Fix redundant fc_rport registration
scsi: qla2xxx: Silent erroneous message
scsi: qla2xxx: Prevent sysfs access when chip is down
scsi: qla2xxx: Add longer window for chip reset
...
Pull networking updates from David Miller:
"Highlights:
- Gustavo A. R. Silva keeps working on the implicit switch fallthru
changes.
- Support 802.11ax High-Efficiency wireless in cfg80211 et al, From
Luca Coelho.
- Re-enable ASPM in r8169, from Kai-Heng Feng.
- Add virtual XFRM interfaces, which avoids all of the limitations of
existing IPSEC tunnels. From Steffen Klassert.
- Convert GRO over to use a hash table, so that when we have many
flows active we don't traverse a long list during accumluation.
- Many new self tests for routing, TC, tunnels, etc. Too many
contributors to mention them all, but I'm really happy to keep
seeing this stuff.
- Hardware timestamping support for dpaa_eth/fsl-fman from Yangbo Lu.
- Lots of cleanups and fixes in L2TP code from Guillaume Nault.
- Add IPSEC offload support to netdevsim, from Shannon Nelson.
- Add support for slotting with non-uniform distribution to netem
packet scheduler, from Yousuk Seung.
- Add UDP GSO support to mlx5e, from Boris Pismenny.
- Support offloading of Team LAG in NFP, from John Hurley.
- Allow to configure TX queue selection based upon RX queue, from
Amritha Nambiar.
- Support ethtool ring size configuration in aquantia, from Anton
Mikaev.
- Support DSCP and flowlabel per-transport in SCTP, from Xin Long.
- Support list based batching and stack traversal of SKBs, this is
very exciting work. From Edward Cree.
- Busyloop optimizations in vhost_net, from Toshiaki Makita.
- Introduce the ETF qdisc, which allows time based transmissions. IGB
can offload this in hardware. From Vinicius Costa Gomes.
- Add parameter support to devlink, from Moshe Shemesh.
- Several multiplication and division optimizations for BPF JIT in
nfp driver, from Jiong Wang.
- Lots of prepatory work to make more of the packet scheduler layer
lockless, when possible, from Vlad Buslov.
- Add ACK filter and NAT awareness to sch_cake packet scheduler, from
Toke Høiland-Jørgensen.
- Support regions and region snapshots in devlink, from Alex Vesker.
- Allow to attach XDP programs to both HW and SW at the same time on
a given device, with initial support in nfp. From Jakub Kicinski.
- Add TLS RX offload and support in mlx5, from Ilya Lesokhin.
- Use PHYLIB in r8169 driver, from Heiner Kallweit.
- All sorts of changes to support Spectrum 2 in mlxsw driver, from
Ido Schimmel.
- PTP support in mv88e6xxx DSA driver, from Andrew Lunn.
- Make TCP_USER_TIMEOUT socket option more accurate, from Jon
Maxwell.
- Support for templates in packet scheduler classifier, from Jiri
Pirko.
- IPV6 support in RDS, from Ka-Cheong Poon.
- Native tproxy support in nf_tables, from Máté Eckl.
- Maintain IP fragment queue in an rbtree, but optimize properly for
in-order frags. From Peter Oskolkov.
- Improvde handling of ACKs on hole repairs, from Yuchung Cheng"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1996 commits)
bpf: test: fix spelling mistake "REUSEEPORT" -> "REUSEPORT"
hv/netvsc: Fix NULL dereference at single queue mode fallback
net: filter: mark expected switch fall-through
xen-netfront: fix warn message as irq device name has '/'
cxgb4: Add new T5 PCI device ids 0x50af and 0x50b0
net: dsa: mv88e6xxx: missing unlock on error path
rds: fix building with IPV6=m
inet/connection_sock: prefer _THIS_IP_ to current_text_addr
net: dsa: mv88e6xxx: bitwise vs logical bug
net: sock_diag: Fix spectre v1 gadget in __sock_diag_cmd()
ieee802154: hwsim: using right kind of iteration
net: hns3: Add vlan filter setting by ethtool command -K
net: hns3: Set tx ring' tc info when netdev is up
net: hns3: Remove tx ring BD len register in hns3_enet
net: hns3: Fix desc num set to default when setting channel
net: hns3: Fix for phy link issue when using marvell phy driver
net: hns3: Fix for information of phydev lost problem when down/up
net: hns3: Fix for command format parsing error in hclge_is_all_function_id_zero
net: hns3: Add support for serdes loopback selftest
bnxt_en: take coredump_record structure off stack
...
Move all the checking for pkey and other validity to the __ipoib_vlan_add
function. This removes the last difference from the control flow
of the __ipoib_vlan_add to make the overall design simpler to
understand.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
This fixes a bug in the netlink path where the vlan_rwsem was not
held around __ipoib_vlan_add causing the child_intfs to be manipulated
unsafely.
In the process this greatly simplifies the vlan_rwsem write side locking
to only cover a single non-sleeping statement.
This also further increases the safety of the removal ordering by holding
the netdev of the parent while the child is active to ensure most bugs
become either an oops on a NULL priv or a deadlock on the netdev refcount.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Switching to priv_destructor and needs_free_netdev created a subtle
ordering problem in ipoib_remove_one.
Now that unregister_netdev frees the netdev and priv we must ensure that
the children are unregistered before trying to unregister the parent,
or child unregister will use after free.
The solution is to unregister the children, then parent, in the same batch
all while holding the rtnl_lock. This closes all the races where a new
child could have been added and ensures proper ordering.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
This mutex was introduced to deal with the deadlock formed by calling
unregister_netdev from within the sysfs callback of a netdev.
Now that we have priv_destructor and needs_free_netdev we can switch
to the more targeted solution of running the unregister from a
work queue. This avoids the deadlock and gets rid of the mutex.
The next patch in the series needs this mutex eliminated to create
atomicity of unregisteration.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Now that the unregister_netdev flow for IPoIB no longer relies on external
code we can now introduce the use of priv_destructor and
needs_free_netdev.
The rdma_netdev flow is switched to use the netdev common priv_destructor
instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
- priv_destructor needs to switch to point to the ULP's destructor
which will then call the rdma_ndev's in the right order
- We need to be careful around the error unwind of register_netdev
as it sometimes calls priv_destructor on failure
- ULPs need to use ndo_init/uninit to ensure proper ordering
of failures around register_netdev
Switching to priv_destructor is a necessary pre-requisite to using
the rtnl new_link mechanism.
The VNIC user for rdma_netdev should also be revised, but that is left for
another patch.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Now that we have a proper ndo_uninit, move code that naturally pairs
with the ndo_uninit into ndo_init. This allows the netdev core to natually
handle ordering.
This fixes the situation where register_netdev can fail before calling
ndo_init, in which case it wouldn't call ndo_uninit either.
Also move a bunch of duplicated init code that is shared between child
and parent for clarity. Now the child and parent register functions look
very similar.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Currently uninit is sometimes done twice in error flows, and is sprinkled
a bit all over the place.
Improve the clarity of the design by moving all uninit only into
ndo_uinit.
Some duplication is removed:
- Sometimes IPOIB_STOP_NEIGH_GC was done before unregister, but
this duplicates the process in ipoib_neigh_hash_init
- Flushing priv->wq was sometimes done before unregister,
but that duplicates what has been done in ndo_uninit
Uniniting the IB event queue must remain before unregister_netdev as it
requires the RTNL lock to be dropped, this is moved to a helper to make
that flow really clear and remove some duplication in error flows.
If register_netdev fails (and ndo_init is NULL) then it almost always
calls ndo_uninit, which lets us remove all the extra code from the error
unwinds. The next patch in the series will close the 'almost always' hole
by pairing a proper ndo_init with ndo_uninit.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The neigh_reap_task is self restarting, but so long as we call
cancel_delayed_work_sync() it will be guaranteed to not be running and
never start again. Thus we don't need to have the racy
IPOIB_STOP_NEIGH_GC bit, or the confusing mismatch of places sometimes
calling flush_workqueue after the cancel.
This fixes a situation where the GC work could have been left running
in some rare situations.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This essentially duplicates the netdev's reg_state, so just use that
directly. The reg_state is updated under the rntl_lock, and all places
using GOING_DOWN already acquire the rtnl_lock so checking is safe.
Since the only place we use GOING_DOWN is for the parent device this
does not fix any bugs, but it is a step to tidy up the unregister flow
so that after later patches the flow is uniform and sane.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
This converts the drivers that called transport_deregister_session_configfs
and then immediately called transport_deregister_session to use
target_remove_session.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Chris Boot <bootc@bootc.net>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Cc: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Cc: <qla2xxx-upstream@qlogic.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rename target_alloc_session to target_setup_session to avoid confusion with
the other transport session allocation function that only allocates the
session and because the target_alloc_session does so much more. It
allocates the session, sets up the nacl and registers the session.
The next patch will then add a remove function to match the setup in this
one, so it should make sense for all drivers, except iscsi, to just call
those 2 functions to setup and remove a session.
iscsi will continue to be the odd driver.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Chris Boot <bootc@bootc.net>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Cc: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Cc: <qla2xxx-upstream@qlogic.com>
Cc: Johannes Thumshirn <jth@kernel.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The change of ipoib_ah data structure with adding "valid" flag and
checks of ah->valid in ipoib_start_xmit affected multicast packet flow.
Since the multicast flow doesn't invoke path_rec_start, "ah->valid" flag
remains unset, so that ipoib_start_xmit end up with neigh_refresh_path
instead of sending the packet using neigh.
"ah->valid" has to be set in multicast send flow. As a result IPoIB
starts sending packets via neigh immediately and eliminates 60sec delay
of neigh keep alive interval.
The typical example of this issue are two sequential arpings:
arping 11.134.208.9 -> got response (mcast_send)
arping 11.134.208.9 -> no response (ah->valid = 0)
Fixes: fa9391dbad ("RDMA/ipoib: Update paths on CLIENT_REREG/SM_CHANGE events")
Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The proper return code is "-EOPNOTSUPP" when the modify_cq() callback is
not supported, all drivers should generate this and all users should check
for it when detecting not supported functionality.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com> (for mlx5)
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Constify several pointers such as path_rec, ib_cm_event and listen_id
pointers in several functions.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Make sure to check for "-EOPNOTSUPP" instead of "-ENOSYS" which is the
return code from ib_create_srq() in case that it not supported.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since the next patch will change the return type of these functions into a
const pointer and since the iSER driver modifies the work request these
functions return a pointer two, inline two work request conversion
function calls. This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently this function is implemented in the scsi layer, but it's
actual place should be the block layer since T10-PI is a general
data integrity feature that is used in the nvme protocol as well.
Suggested-by: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL
as third argument to ib_post_(send|recv|srq_recv)().
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fix to return a negative error code from the ipoib_neigh_hash_init()
error handling case instead of 0, as done elsewhere in this function.
Fixes: 515ed4f3aa ("IB/IPoIB: Separate control and data related initializations")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This table by default takes 32KiB which is 3rd memory order. Meanwhile,
this memory is not aimed for DMA operation and could be safely allocated
by vmalloc.
Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If a login request was received through the RDMA/CM and if an error occurs
during login, clear rdma_cm_id->context instead of ib_cm_id->context.
Fixes: 63cf1a902c ("IB/srpt: Add RDMA/CM support")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Once a target session has been allocated, if an error occurs, the session
must be freed. Since it is not safe to call blocking code from the context
of an connection manager callback, trigger target session release in this
case by calling srpt_close_ch().
Fixes: db7683d7de ("IB/srpt: Fix login-related race conditions")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
An array of pointers to SRPT contexts in ib_device is over 30KiB even
in default case, in which an amount of contexts is 4095. The patch
is intended to weed out large contigous allocation for non-DMA memory.
Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The proper return code is -EOPNOTSUPP and not -ENOSYS when the function
isn't supported, also make sure to return the right error code
from ipoib_transport_dev_init() when ipoib_cm_dev_init() is supported.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch makes it so that instead of passing a void pointer as the
accel_priv we instead pass a net_device pointer as sb_dev. Making this
change allows us to pass the subordinate device through to the fallback
function eventually so that we can keep the actual code in the
ndo_select_queue call as focused on possible on the exception cases.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Make sure to use sizeof(...) instead of sizeof ... which is more
preferred.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This commit replaces all the unsigned definitions in favour of 'unsigned
int' which is preferred.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use min_t() macro to avoid the casting when using min() macro, also fix
the type of "length" and "wc->byte_len" to be "unsigned int" and
"u32" which is the right type for each one of them.
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Remove the driver version and release date information because such
information is not relevant for an upstream driver. See also commit
e1267b0124 ("RDMA: Remove useless MODULE_VERSION").
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since most target drivers do not use the second fabric_make_tpg() argument
("group") and since it is trivial to derive the group pointer from the wwn
pointer, do not pass the group pointer to fabric_make_tpg().
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since there are adapters that have four ports, increase the size of
the srpt_device.port[] array. This patch avoids that the following
warning is hit with quad port Chelsio adapters:
WARN_ON(sdev->device->phys_port_cnt > ARRAY_SIZE(sdev->port));
Reported-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
We need to set can_queue earlier than when enabling the scsi host.
in a blk-mq enabled environment, the tagset allocation is taken
from can_queue which cannot be modified later. Also, pass an updated
.can_queue to iscsi_session_setup to have enough iscsi tasks allocated
in the session kfifo.
Reported-by: Karandeep Chahal <karandeepchahal@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
While processing a path record entry in CM messages the associated GID
attribute is now also supplied.
Currently for RoCE a netdevice's net namespace pointer and ifindex are
stored in path record entry. Both of these fields of the netdev can change
anytime while processing CM messages. Additionally storing net namespace
without holding reference will lead to use-after-free crash. Therefore it
is removed. Netdevice information for RoCE is instead provided via
referenced gid attribute in ib_cm requests.
Such a design leads to a situation where the kernel can crash when the net
pointer becomes invalid. However today it is always initialized to
init_net, which cannot become invalid. In order to support processing
packets in any arbitrary namespace of the received packet, it is necessary
to avoid such conditions.
This patch removes the dependency on the net pointer and ifindex; instead
it will rely on SGID attribute which contains a pointer to netdev.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Use the sgid and other information from the path record to figure out the
sgid_attrs.
Store the selected table entry in the sgid_attr for everything else to
use.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
This patch replaces the ib_device_attr.max_sge with max_send_sge and
max_recv_sge. It allows ulps to take advantage of devices that have very
different send and recv sge depths. For example cxgb4 has a max_recv_sge
of 4, yet a max_send_sge of 16. Splitting out these attributes allows
much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
API. Consider a large RDMA WRITE that has 16 scattergather entries.
With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
16, it can be done with 1 WRITE WR.
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If the gid_attr argument is NULL then the functions behave identically to
rdma_query_gid. ib_query_gid just calls ib_get_cached_gid, so everything
can be consolidated to one function.
Now that all callers either use rdma_query_gid() or ib_get_cached_gid(),
ib_query_gid() API is removed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This has been a quiet cycle for RDMA, the big bulk is the usual smallish
driver updates and bug fixes. About four new uAPI related things. Not as much
Szykaller patches this time, the bugs it finds are getting harder to fix.
- More work cleaning up the RDMA CM code
- Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns, i40iw, iw_cxgb4, mlx5, rxe
- Driver specific resource tracking and reporting via netlink
- Continued work for name space support from Parav
- MPLS support for the verbs flow steering uAPI
- A few tricky IPoIB fixes improving robustness
- HFI1 driver support for the '16B' management packet format
- Some auditing to not print kernel pointers via %llx or similar
- Mark the entire 'UCM' user-space interface as BROKEN with the intent to remove it
entirely. The user space side of this was long ago replaced with RDMA-CM and
syzkaller is finding bugs in the residual UCM interface nobody wishes to fix because
nobody uses it.
- Purge more bogus BUG_ON's from Leon
- 'flow counters' verbs uAPI
- T10 fixups for iser/isert, these are Acked by Martin but going through the RDMA
tree due to dependencies
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJbGEcPAAoJEDht9xV+IJsarBMQAIsAFOizycF0kQfDtvz1yHyV
YjkT3NA71379DsDsCOezVKqZ6RtXdQncJoqqEG1FuNKiXh/rShR3rk9XmdBwUCTq
mIY0ySiQggdeSIJclROiBuzLE3F/KIIkY3jwM80DzT9GUEbnVuvAMt4M56X48Xo8
RpFc13/1tY09ZLBVjInlfmCpRWyNgNccDBDywB/5hF5KCFR/BG/vkp4W0yzksKiU
7M/rZYyxQbtwSfe/ZXp7NrtwOpkpn7vmhED59YgKRZWhqnHF9KKmV+K1FN+BKdXJ
V1KKJ2RQINg9bbLJ7H2JPdQ9EipvgAjUJKKBoD+XWnoVJahp6X2PjX351R/h4Lo5
TH+0XwuCZ2EdjRxhnm3YE+rU10mDY9/UUi1xkJf9vf0r25h6Fgt6sMnN0QBpqkTh
euRZnPyiFeo1b+hCXJfKqkQ6An+F3zes5zvVf59l0yfVNLVmHdlz0lzKLf/RPk+t
U+YZKxfmHA+mwNhMXtKx7rKVDrko+uRHjaX2rPTEvZ0PXE7lMzFMdBWYgzP6sx/b
4c55NiJMDAGTyLCxSc7ziGgdL9Lpo/pRZJtFOHqzkDg8jd7fb07ID7bMPbSa05y0
BU5VpC8yEOYRpOEFbkJSPtHc0Q8cMCv/q1VcMuuhKXYnfSho3TWvtOSQIjUoU/q0
8T6TXYi2yF+f+vZBTFlV
=Mb8m
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"This has been a quiet cycle for RDMA, the big bulk is the usual
smallish driver updates and bug fixes. About four new uAPI related
things. Not as much Szykaller patches this time, the bugs it finds are
getting harder to fix.
Summary:
- More work cleaning up the RDMA CM code
- Usual driver bug fixes and cleanups for qedr, qib, hfi1, hns,
i40iw, iw_cxgb4, mlx5, rxe
- Driver specific resource tracking and reporting via netlink
- Continued work for name space support from Parav
- MPLS support for the verbs flow steering uAPI
- A few tricky IPoIB fixes improving robustness
- HFI1 driver support for the '16B' management packet format
- Some auditing to not print kernel pointers via %llx or similar
- Mark the entire 'UCM' user-space interface as BROKEN with the
intent to remove it entirely. The user space side of this was long
ago replaced with RDMA-CM and syzkaller is finding bugs in the
residual UCM interface nobody wishes to fix because nobody uses it.
- Purge more bogus BUG_ON's from Leon
- 'flow counters' verbs uAPI
- T10 fixups for iser/isert, these are Acked by Martin but going
through the RDMA tree due to dependencies"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (138 commits)
RDMA/mlx5: Update SPDX tags to show proper license
RDMA/restrack: Change SPDX tag to properly reflect license
IB/hfi1: Fix comment on default hdr entry size
IB/hfi1: Rename exp_lock to exp_mutex
IB/hfi1: Add bypass register defines and replace blind constants
IB/hfi1: Remove unused variable
IB/hfi1: Ensure VL index is within bounds
IB/hfi1: Fix user context tail allocation for DMA_RTAIL
IB/hns: Use zeroing memory allocator instead of allocator/memset
infiniband: fix a possible use-after-free bug
iw_cxgb4: add INFINIBAND_ADDR_TRANS dependency
IB/isert: use T10-PI check mask definitions from core layer
IB/iser: use T10-PI check mask definitions from core layer
RDMA/core: introduce check masks for T10-PI offload
IB/isert: fix T10-pi check mask setting
IB/mlx5: Add counters read support
IB/mlx5: Add flow counters read support
IB/mlx5: Add flow counters binding support
IB/mlx5: Add counters create and destroy support
IB/uverbs: Add support for flow counters
...
No reason to use hard-coded protection information checks in ib_isert
driver. Use check masks from RDMA core driver.
Also, while we here, reduce the number of instructions made for setting
the check mask (no need to do bitwise or with 0 since we zero the mask
in the beginning of the function).
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
No reason to re-define protection information check in ib_iser driver.
Use check masks from RDMA core driver.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
A copy/paste bug (probably) caused setting of an app_tag check mask
in case where a ref_tag check was needed.
Fixes: 38a2d0d429 ("IB/isert: convert to the generic RDMA READ/WRITE API")
Fixes: 9e961ae73c ("IB/isert: Support T10-PI protected transactions")
Cc: stable@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Filling in the padding slot in the bpf structure as a bug fix in 'ne'
overlapped with actually using that padding area for something in
'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
The iSER driver reduces max_sectors. For example, if you load the
ib_iser module with max_sectors=1024, you will see that
/sys/class/block/<bdev>/queue/max_hw_sectors_kb is 508. It is an
incorrect value. The expected value is (max_sectors * sector_size) /
1024 = 512.
Reducing of max_sectors can cause performance degradation due to
unnecessary splitting of IO requests.
The number of pages per MR has been fixed here, so there is no longer
any need to reduce max_sectors.
Fixes: 9c674815d3 ("IB/iser: Fix max_sectors calculation")
Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Update mlx4 to support user MR creation against read-only memory, previously
it required the memory to be writable.
Based on rdma for-rc due to dependencies.
* mr_fix: (2 commits)
IB/mlx4: Mark user MR as writable if actual virtual memory is writable
IB/core: Make testing MR flags for writability a static inline function
Several subsystems depend on INFINIBAND_ADDR_TRANS, which in turn depends
on INFINIBAND. However, when with CONFIG_INIFIBAND=m, this leads to a
link error when another driver using it is built-in. The
INFINIBAND_ADDR_TRANS dependency is insufficient here as this is
a 'bool' symbol that does not force anything to be a module in turn.
fs/cifs/smbdirect.o: In function `smbd_disconnect_rdma_work':
smbdirect.c:(.text+0x1e4): undefined reference to `rdma_disconnect'
net/9p/trans_rdma.o: In function `rdma_request':
trans_rdma.c:(.text+0x7bc): undefined reference to `rdma_disconnect'
net/9p/trans_rdma.o: In function `rdma_destroy_trans':
trans_rdma.c:(.text+0x830): undefined reference to `ib_destroy_qp'
trans_rdma.c:(.text+0x858): undefined reference to `ib_dealloc_pd'
Fixes: 9533b292a7 ("IB: remove redundant INFINIBAND kconfig dependencies")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In unicast_arp_send function there is an inconsistency in error handling
of path_rec_start call. If path_rec_start is called because of an absent
ah field, skb will be dropped. But if it is called on a creation of a
new path, or if the path is invalid, skb will be added to the tail of
path queue. In case of a new path it will be dropped on path_free, but
in case of invalid path it can stay in the queue forever.
This patch unifies the behavior, dropping skb in all cases
of path_rec_start failure.
Signed-off-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We do a light flush on CLIENT_REREG and SM_CHANGE events. This goes
through and marks paths invalid. But we weren't always checking for this
validity when we needed to, and so we could keep using a path marked
invalid. What's more, once we establish a path with a valid ah, we put
a pointer to the ah in the neigh struct directly, so even if we mark the
path as invalid, as long as the neigh has a direct pointer to the ah, it
keeps using the old, outdated ah.
To fix this we do several things.
1) Put the valid flag in the ah instead of the path struct, so when we
put the ah pointer directly in the neigh struct, we can easily check the
validity of the ah on send events.
2) Check the neigh->ah and neigh->ah->valid elements in the needed
places, and if we have an ah, but it's invalid, then invoke a refresh of
the ah.
3) Fix the various places that check for path, but didn't check for
path->valid (now path->ah && path->ah->valid).
Reported-by: Evgenii Smirnov <evgenii.smirnov@profitbricks.com>
Fixes: ee1e2c82c2 ("IPoIB: Refresh paths instead of flushing them on SM change events")
Signed-off-by: Doug Ledford <dledford@redhat.com>
In ipoib_mcast_restart_task() the netif_addr_lock() is invoked prior
local_irq_save(). netif_addr_lock() should not be invoked in interrupt disabled
section, only in BH disabled sections.
The priv->lock is always acquired with disabled interrupts. The only place
where netif_addr_lock() and priv->lock nest ist ipoib_mcast_restart_task().
Drop the local_irq_save() and acquire priv->lock with spin_lock_irq() inside
the netif_addr locked section. It's safe to do so because the caller is either
a worker function or __ipoib_ib_dev_flush() which are both calling with
interrupts enabled (and since BH is enabled here, too so
netif_addr_lock_bh() needs to be used).
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The bpf syscall and selftests conflicts were trivial
overlapping changes.
The r8169 change involved moving the added mdelay from 'net' into a
different function.
A TLS close bug fix overlapped with the splitting of the TLS state
into separate TX and RX parts. I just expanded the tests in the bug
fix from "ctx->conf == X" into "ctx->tx_conf == X && ctx->rx_conf
== X".
Signed-off-by: David S. Miller <davem@davemloft.net>
INFINIBAND_ADDR_TRANS depends on INFINIBAND. So there's no need for
options which depend INFINIBAND_ADDR_TRANS to also depend on INFINIBAND.
Remove the unnecessary INFINIBAND depends.
Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch is meant to clean up how the opa_vnic is obtaining entropy from
Tx packets.
The code as it was written was claiming to get 16 bits of hash, but from
what I can tell it was only ever actually getting 14 bits as it was limited
to 0 - (2^15 - 1). It then was folding the result to get a 8 bit value for
entropy.
Instead of throwing away all that input I am cutting out the middle man and
instead having the code call skb_get_hash directly and then folding the 32
bit value into a 8 bit value using a pair of shifts and XOR operations.
Execution wise this new approach should provide more entropy and be faster
since we are bypassing the reciprocal multiplication to reduce the 32b
value to 16b and instead just using a shift/XOR combination.
In addition we can drop the unneeded adapter value from the call to get the
entropy since the netdev itself isn't even needed.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t',
which is a typedef for an enum type, but the implementation in this
driver returns an 'int'.
Fix this by returning 'netdev_tx_t' in this driver too.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
INFINIBAND_SRP code depends on INFINIBAND_ADDR_TRANS provided symbols.
So declare the kconfig dependency. This is necessary to allow for
enabling INFINIBAND without INFINIBAND_ADDR_TRANS.
Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Tarick Bedeir <tarick@google.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
INFINIBAND_SRPT code depends on INFINIBAND_ADDR_TRANS provided symbols.
So declare the kconfig dependency. This is necessary to allow for
enabling INFINIBAND without INFINIBAND_ADDR_TRANS.
Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Tarick Bedeir <tarick@google.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This structure is not needed since the introduction of commit
'c42687784b9a ("IB/ipoib: Scatter-Gather support in connected mode")'
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
According to the SRP standard the INITIATOR and TARGET PORT IDENTIFIER
fields from the login request specify the I_T nexus. Whether or not an
SRP target closes an existing connection for an I_T nexus when a login
request is received depends on the value of the MULTICHANNEL field in
the login request. The SRP initiator derives the value of the
INITIATOR and TARGET PORT IDENTIFIER fields from the .id_ext,
.ioc_guid, .initiator_ext .sgid members of the srp_target_port
structure. This means that the .rdma_cm.dst check must be removed from
srp_conn_unique(). This patch avoids that for target ports that have
multiple addresses, e.g. an IPv4 and an IPv6 address, and if a
connection is established to both target port addresses, that the
initiator logs in alternatingly every 10 seconds to the other target
port address. An SRP target must namely terminate all but one
connections for a given I_T nexus if the MULTICHANNEL field has not
been set in the login request.
Fixes: 19f313438c ("IB/srp: Add RDMA/CM support")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
ib_find_gid() is only used by IPoIB driver. For IB link layer, GID table
entries are not based on netdevice. Netdevice parameter is unused here.
Therefore, it is removed.
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
gcc-4.4.4 has issues with initialization of anonymous unions:
drivers/infiniband/ulp/srpt/ib_srpt.c: In function 'srpt_zerolength_write':
drivers/infiniband/ulp/srpt/ib_srpt.c:854: error: unknown field 'wr_cqe' specified in initializer
drivers/infiniband/ulp/srpt/ib_srpt.c:854: warning: initialization makes integer from pointer without a cast
Work aound this.
Fixes: 2a78cb4db4 ("IB/srpt: Fix an out-of-bounds stack access in srpt_zerolength_write()")
Cc: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Split IPv6 addresses at the colon that separates the IPv6 address
and the port number instead of at a colon in the middle of the IPv6
address. Check whether the IPv6 address is surrounded with square
brackets.
Fixes: 19f313438c ("IB/srp: Add RDMA/CM support")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add a parameter for configuring the port on which the ib_srpt driver
listens for incoming RDMA/CM connections, namely
/sys/kernel/config/target/srpt/discovery_auth/rdma_cm_port. The default
value for this parameter is 0 which means "do not listen for incoming
RDMA/CM connections". Add RDMA/CM support to all code that handles
connection state changes. Modify srpt_init_nodeacl() such that ACLs can
be configured for IPv4 and IPv6 addresses.
Note: incoming connection requests are only accepted for ports that
have been enabled. See also the "if (!sport->enabled)" code in the
connection request handler. See also the following configfs attribute:
/sys/kernel/config/target/srpt/$port/$port/enable.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If a HCA supports the SG_GAPS_REG feature then fewer memory regions
are required per command. This patch reduces the number of memory
regions that is allocated per SRP session.
Signed-off-by: Sergey Gorenko <sergeygo@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Except for a minor log message change, this patch does not change
any functionality. For the introduction of %pIS, see also commit
1067964305 ("lib: vsprintf: add IPv4/v6 generic %p[Ii]S[pfs]
format specifier").
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The caller of srp_ib_lookup_path() is responsible for holding a reference
on the SCSI host. That means that commit 8a0d18c621 was not necessary.
Hence revert it.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Before commit e494f6a728 ("[SCSI] improved eh timeout handler") it
did not really matter whether or not abort handlers like srp_abort()
called .scsi_done() when returning another value than SUCCESS. Since
that commit however this matters. Hence only call .scsi_done() when
returning SUCCESS.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is a 14 patch series waiting to come into for-next that has a
dependecy on code submitted into this kernel's for-rc series. So, merge
the for-rc branch into the current for-next in order to make the patch
series apply cleanly.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Ensure that cv_end is equal to ibdev->num_comp_vectors for the
NUMA node with the highest index. This patch improves spreading
of RDMA channels over completion vectors and thereby improves
performance, especially on systems with only a single NUMA node.
This patch drops support for the comp_vector login parameter by
ignoring the value of that parameter since I have not found a
good way to combine support for that parameter and automatic
spreading of RDMA channels over completion vectors.
Fixes: d92c0da71a ("IB/srp: Add multichannel support")
Reported-by: Alexander Schmid <alex@modula-shop-systems.de>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Alexander Schmid <alex@modula-shop-systems.de>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
netdev_wait_allrefs() could rebroadcast NETDEV_UNREGISTER event
multiple times until all refs are gone, which will result in calling
ipoib_delete_debug_files multiple times and printing a warning.
Remove the WARN_ONCE since checks of NULL pointers before calling
debugfs_remove are not needed.
Fixes: 771a525840 ("IB/IPoIB: ibX: failed to create mcg debug file")
Signed-off-by: Alaa Hleihel <alaa@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:
for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
done
with de-mangling cleanups yet to come.
NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do. But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.
The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
On reboot SM can program port pkey table before ipoib registered its
event handler, which could result in missing pkey event and leave root
interface with initial pkey value from index 0.
Since OPA port starts with invalid pkey in index 0, root interface will
fail to initialize and stay down with no-carrier flag.
For IB ipoib interface may end up with pkey different from value
opensm put in pkey table idx 0, resulting in connectivity issues
(different mcast groups, for example).
Close the window by calling event handler after registration
to make sure ipoib pkey is in sync with port pkey table.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJabj6pAAoJEHm+PkMAQRiGs8cIAJQFkCWnbz86e3vG4DuWhyA8
CMGHCQdUOxxFGa/ixhIiuetbC0x+JVHAjV2FwVYbAQfaZB3pfw2iR1ncQxpAP1AI
oLU9vBEqTmwKMPc9CM5rRfnLFWpGcGwUNzgPdxD5yYqGDtcM8K840mF6NdkYe5AN
xU8rv1wlcFPF4A5pvHCH0pvVmK4VxlVFk/2H67TFdxBs4PyJOnSBnf+bcGWgsKO6
hC8XIVtcKCH2GfFxt5d0Vgc5QXJEpX1zn2mtCa1MwYRjN2plgYfD84ha0xE7J0B0
oqV/wnjKXDsmrgVpncr3txd4+zKJFNkdNRE4eLAIupHo2XHTG4HvDJ5dBY2NhGU=
=sOml
-----END PGP SIGNATURE-----
Merge tag v4.15 of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
To resolve conflicts in:
drivers/infiniband/hw/mlx5/main.c
drivers/infiniband/hw/mlx5/qp.c
From patches merged into the -rc cycle. The conflict resolution matches
what linux-next has been carrying.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The script "checkpatch.pl" pointed information out like the following.
WARNING: quoted string split across lines
Thus fix the affected source code places.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The variable "tx_desc" will be set to an appropriate pointer a bit later.
Thus omit the explicit initialisation at the beginning.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Omit an extra message for a memory allocation failure in this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Some dst_ops (e.g. md_dst_ops)) doesn't set this handler. It may result to:
"BUG: unable to handle kernel NULL pointer dereference at (null)"
Let's add a helper to check if update_pmtu is available before calling it.
Fixes: 52a589d51f ("geneve: update skb dst pmtu on tx path")
Fixes: a93bf0ff44 ("vxlan: update skb dst pmtu on tx path")
CC: Roman Kapl <code@rkapl.cz>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Combination of CONFIG_DEBUG_OBJECTS_RCU_HEAD=y and
CONFIG_INFINIBAND_SRPT=m produces the following build error.
ERROR: "init_rcu_head" [drivers/infiniband/ulp/srpt/ib_srpt.ko] undefined!
make[1]: *** [scripts/Makefile.modpost:92: __modpost] Error 1
make: *** [Makefile:1216: modules] Error 2
The reason to it that init_rcu_head() is not exported and not supposed
to be used in modules. It is needed for dynamic initialization of
statically allocated rcu_head structures.
Fixes: 795bc112cd ("IB/srpt: Make it safe to use RCU for srpt_device.rch_list")
Fixes: a11253142e ("IB/srpt: Rework multi-channel support")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Although I'm not sure this parameter is useful for regular SRP users,
setting this parameter to 1 has shown to be invaluable for testing the
block layer core, SCSI core and device mapper queue running mechanisms.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Since the SRP_LOGIN_REQ defined in the SRP standard is larger than
what fits in the RDMA/CM login request private data, introduce a new
login request format for the RDMA/CM.
Note: since srp_daemon and ibsrpdm rely on the subnet manager and
since there is no equivalent of the IB subnet manager in non-IB
networks, login has to be performed manually for non-IB networks.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch does not change any functionality but makes srpt_cm_req_recv()
independent of the IB/CM and hence simplifies the patch that introduces
RDMA/CM support.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Introduce a union in struct srpt_rdma_ch for member variables that
depend on the type of connection manager. Avoid that error messages
report the IB/CM ID.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If a receive I/O context is removed from the wait list and
srpt_handle_new_iu() fails to allocate a send I/O context then
re-adding the receive I/O context to the wait list can cause
reordering. Avoid this by only removing a receive I/O context
from the wait list after allocating a send I/O context succeeded.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Wait list processing only occurs if the channel state >= CH_LIVE. Hence
set the channel state to CH_LIVE before triggering wait list processing
asynchronously.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Make sure that sport->mutex is not released between the duplicate
channel check, adding a channel to the channel list and performing
the sport enabled check. Avoid that srpt_disconnect_ch() can be
invoked concurrently with the ib_send_cm_rep() call by
srpt_cm_req_recv().
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The new pr_debug() statements are useful when debugging the ib_srpt
driver.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Move a mutex lock and unlock statement from srpt_close_session()
into srpt_disconnect_ch_sync(). Since the previous patch removed
the last user of the return value of that function, change the
return value of srpt_disconnect_ch_sync() into void.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Store initiator and target port ID's once per nexus instead of in each
channel data structure. This change simplifies the duplicate connection
check in srpt_cm_req_recv().
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use the source GID as session name instead of the initiator port ID
from the SRP login request. The only functional change in this patch
is that it changes the session name shown in debug messages.
Note: the fifth argument that is passed to target_alloc_session() is
what the SCSI target core uses as key for lookups in the ACL (access
control list) information.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In multipathing setups where a target system is equipped with
dual-port HCAs it is useful to have one connection per target port
instead of one connection per target HCA. Hence move the connection
list (rch_list) from struct srpt_device into struct srpt_port.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Process connection requests that use another P_Key than the default
correctly.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch fixes a use-after-free issue for ch->release_done when
running the SRP protocol on top of the rdma_rxe driver.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The next patch will iterate over rch_list from a context from which
it is not allowed to block. Hence make rch_list RCU-safe.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch does not change any functionality but prepares for the patch
that adds RDMA_CM support by making the RDMA_CM patch much easier to
read.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Show all path record query parameters if a path record query fails.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use kstrtoull() since simple_strtoull() is deprecated. This patch
improves error checking but otherwise does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Use correct parameter name and description in kernel-doc notation to
eliminate a kernel-doc warning.
../drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c:730: warning: Excess function parameter 'cport' description in 'opa_vnic_vema_send_trap'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: linux-doc@vger.kernel.org
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In case we fail to establish the connection we must drain our pre-posted
login recieve work request before continuing safely with connection
teardown.
Fixes: a060b5629a ("IB/core: generic RDMA READ/WRITE API")
Cc: <stable@vger.kernel.org> # 4.7+
Reported-by: Amrani, Ram <Ram.Amrani@cavium.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Merging in 12 patch series from Bart that required changes in the
current for-rc branch in order to apply cleanly.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Since all I/O context state changes are already serialized, it is
not necessary to protect I/O context state changes with the I/O
context spinlock. Hence remove that spinlock.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
It is not necessary to obtain ioctx->spinlock when reading the ioctx
state. Since after removal of this locking only a single line remains,
inline the srpt_get_cmd_state() function.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Introduce a function for converting a GUID into an ASCII string. This
patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Disabling an SRP target port causes the state of all QPs associated
with a port to be changed into IB_QPS_ERR. Avoid that this causes
one error message per I/O context to be reported.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
At least when running the ib_srpt driver on top of the rdma_rxe
driver it is easy to trigger a zero-length write completion in
the CH_DISCONNECTED state. Hence make the message that reports
this less noisy.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Make the ib_srpt driver use the IPv6 format for GIDs in log messages
to improve consistency of this driver with other RDMA kernel drivers.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Verify whether port numbers are in the expected range before using
these as an array index. Complain if a port number is out of range.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Since the SRQ event message is only useful for debugging purposes,
reduce its severity from "informational" to "debug".
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Rename rsp_size into max_rsp_size and SRPT_RQ_SIZE into MAX_SRPT_RQ_SIZE.
The new names better reflect the role of this member variable and constant.
Since the prefix "srp_" is superfluous in the context of the function
that creates an RDMA channel, rename srp_sq_size into sq_size.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch avoids that the following command reports any warnings:
scripts/kernel-doc -none drivers/infiniband/ulp/srpt/ib_srpt.h
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Avoid that warnings about missing parameter descriptions are reported
when building with W=1.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Make sure that the initiator port GUID is stored in ch->ini_guid.
Note: when initiating a connection sgid and dgid members in struct
sa_path_rec represent the source and destination GIDs. When accepting
a connection however sgid represents the destination GID and dgid the
source GID.
Fixes: commit 2bce1a6d22 ("IB/srpt: Accept GUIDs as port names")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
With the SRP protocol all RDMA operations are initiated by the target.
Since no RDMA operations are initiated by the initiator, do not grant
the initiator permission to submit RDMA reads or writes to the target.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If IB_CQ_REPORT_MISSED_EVENTS flag is passed in ib_req_notify_cq()
it may return positive value indicating non-empty CQ.
If return code not verified the log might be flooded with false
warning messages "request notify on send CQ failed".
Fixes: 8966e28d2e ("IB/ipoib: Use NAPI in UD/TX flows")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When using enhanced mode for IPoIB, two threads may execute xmit in
parallel to two different TX queues while the target is the same.
In this case, both of them will add the same neighbor to the path's
neigh link list and we might see the following message:
list_add double add: new=ffff88024767a348, prev=ffff88024767a348...
WARNING: lib/list_debug.c:31__list_add_valid+0x4e/0x70
ipoib_start_xmit+0x477/0x680 [ib_ipoib]
dev_hard_start_xmit+0xb9/0x3e0
sch_direct_xmit+0xf9/0x250
__qdisc_run+0x176/0x5d0
__dev_queue_xmit+0x1f5/0xb10
__dev_queue_xmit+0x55/0xb10
Analysis:
Two SKB are scheduled to be transmitted from two cores.
In ipoib_start_xmit, both gets NULL when calling ipoib_neigh_get.
Two calls to neigh_add_path are made. One thread takes the spin-lock
and calls ipoib_neigh_alloc which creates the neigh structure,
then (after the __path_find) the neigh is added to the path's neigh
link list. When the second thread enters the critical section it also
calls ipoib_neigh_alloc but in this case it gets the already allocated
ipoib_neigh structure, which is already linked to the path's neigh
link list and adds it again to the list. Which beside of triggering
the list, it creates a loop in the linked list. This loop leads to
endless loop inside path_rec_completion.
Solution:
Check list_empty(&neigh->list) before adding to the list.
Add a similar fix in "ipoib_multicast.c::ipoib_mcast_send"
Fixes: b63b70d877 ('IPoIB: Use a private hash table for path lookup in xmit path')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Patches for 4.16 that are dependent on patches sent to 4.15-rc.
These are small clean ups for the vmw_pvrdma and i40iw drivers.
* 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git:
RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header
RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file
i40iw: Change accelerated flag to bool
The locking order of vlan_rwsem (LOCK A) and then rtnl (LOCK B),
contradicts other flows such as ipoib_open possibly causing a deadlock.
To prevent this deadlock heavy flush is called with RTNL locked and
only then tries to acquire vlan_rwsem.
This deadlock is possible only when there are child interfaces.
[ 140.941758] ======================================================
[ 140.946276] WARNING: possible circular locking dependency detected
[ 140.950950] 4.15.0-rc1+ #9 Tainted: G O
[ 140.954797] ------------------------------------------------------
[ 140.959424] kworker/u32:1/146 is trying to acquire lock:
[ 140.963450] (rtnl_mutex){+.+.}, at: [<ffffffffc083516a>] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
[ 140.970006]
but task is already holding lock:
[ 140.975141] (&priv->vlan_rwsem){++++}, at: [<ffffffffc0834ee1>] __ipoib_ib_dev_flush+0x51/0x4e0 [ib_ipoib]
[ 140.982105]
which lock already depends on the new lock.
[ 140.990023]
the existing dependency chain (in reverse order) is:
[ 140.998650]
-> #1 (&priv->vlan_rwsem){++++}:
[ 141.005276] down_read+0x4d/0xb0
[ 141.009560] ipoib_open+0xad/0x120 [ib_ipoib]
[ 141.014400] __dev_open+0xcb/0x140
[ 141.017919] __dev_change_flags+0x1a4/0x1e0
[ 141.022133] dev_change_flags+0x23/0x60
[ 141.025695] devinet_ioctl+0x704/0x7d0
[ 141.029156] sock_do_ioctl+0x20/0x50
[ 141.032526] sock_ioctl+0x221/0x300
[ 141.036079] do_vfs_ioctl+0xa6/0x6d0
[ 141.039656] SyS_ioctl+0x74/0x80
[ 141.042811] entry_SYSCALL_64_fastpath+0x1f/0x96
[ 141.046891]
-> #0 (rtnl_mutex){+.+.}:
[ 141.051701] lock_acquire+0xd4/0x220
[ 141.055212] __mutex_lock+0x88/0x970
[ 141.058631] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
[ 141.063160] __ipoib_ib_dev_flush+0x71/0x4e0 [ib_ipoib]
[ 141.067648] process_one_work+0x1f5/0x610
[ 141.071429] worker_thread+0x4a/0x3f0
[ 141.074890] kthread+0x141/0x180
[ 141.078085] ret_from_fork+0x24/0x30
[ 141.081559]
other info that might help us debug this:
[ 141.088967] Possible unsafe locking scenario:
[ 141.094280] CPU0 CPU1
[ 141.097953] ---- ----
[ 141.101640] lock(&priv->vlan_rwsem);
[ 141.104771] lock(rtnl_mutex);
[ 141.109207] lock(&priv->vlan_rwsem);
[ 141.114032] lock(rtnl_mutex);
[ 141.116800]
*** DEADLOCK ***
Fixes: b4b678b06f ("IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since ib_init_ah_from_path initializes the address handle attribute, it is
renamed to reflect so.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently there are no users of ib_find_gid for RoCE transport. It is
only used by IPoIB.
Therefore its simplified to ignore RoCE ports and GID type check which
was previously done for every port.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In case that the PathRecord is not valid (SM changed its network prefix)
ipoib will continue issue PathQuery requests with the same parameters
that are in its database, which are no longer valid anymore.
Now the driver in that case will re-initialize the record from a valid
place (the priv structure keeps the updated values), and a valid request
will be issued.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The ipoib path database is organized around DGIDs from the LLADDR, but the
SA is free to return a different GID when asked for path. This causes a
bug because the SA's modified DGID is copied into the database key, even
though it is no longer the correct lookup key, causing a memory leak and
other malfunctions.
Ensure the database key does not change after the SA query completes.
Demonstration of the bug is as follows
ipoib wants to send to GID fe80:0000:0000:0000:0002:c903:00ef:5ee2, it
creates new record in the DB with that gid as a key, and issues a new
request to the SM.
Now, the SM from some reason returns path-record with other SGID (for
example, 2001:0000:0000:0000:0002:c903:00ef:5ee2 that contains the local
subnet prefix) now ipoib will overwrite the current entry with the new
one, and if new request to the original GID arrives ipoib will not find
it in the DB (was overwritten) and will create new record that in its
turn will also be overwritten by the response from the SM, and so on
till the driver eats all the device memory.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If one port fails to initialize an error message should indicate the
reason and driver should continue serving the working port(s) and other
HCA(s).
Fixes: e4b2d06892 ("IB/ipoib: Remove device when one port fails to init").
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
memalloc_noio_save modifies the behavior of MM, we must restore it after
we are done.
Fixes: d83187dda9 ("IB/IPoIB: Convert IPoIB to memalloc_noio_* calls")
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is no need to have a duplication of the generic library, i.e.
hex2bin().
Replace the open coded variant.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
pr_* is the preferred way to print messages, replace all
printk(KERN_WARN, ...) with pr_warn.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
- Add iWARP support to qedr driver
- Lots of misc fixes across subsystem
- Multiple update series to hns roce driver
- Multiple update series to hfi1 driver
- Updates to vnic driver
- Add kref to wait struct in cxgb4 driver
- Updates to i40iw driver
- Mellanox shared pull request
- timer_setup changes
- massive cleanup series from Bart Van Assche
- Two series of SRP/SRPT changes from Bart Van Assche
- Core updates from Mellanox
- i40iw updates
- IPoIB updates
- mlx5 updates
- mlx4 updates
- hns updates
- bnxt_re fixes
- PCI write padding support
- Sparse/Smatch/warning cleanups/fixes
- CQ moderation support
- SRQ support in vmw_pvrdma
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaDF9JAAoJELgmozMOVy/dDXUP/i92g+G4OJ+4hHMh4KCjQMHT
eMr/w9l1C033HrtsU1afPhqHOsKSxwCuJSiTgN4uXIm67/2kPK5Vlx+ir7mbOLwB
3ukVK6Q/aFdigWCUhIaJSlDpjbd2sEj7JwKtM3rucvMWJlBJ4mAbcVQVfU96CCsv
V9mO7dpR3QtYWDId9DukfnAfPUPFa3SMZnD7tdl6mKNRg/MjWGYLAL4nJoBfex5f
b4o+MTrbuFWXYsfDru1m9BpHgyul20ldfcnbe8C/sVOQmOgkX7ngD5Sdi1FLeRJP
GF/DnAqInC9N7cAxZHx4kH9x6mLMmEdfnwQ9VTVqGUHBsj3H4hQTVIAFfHUhWUbG
TP5ZHgZG2CewZ0rf092cWlDZwp6n0BalnbQJr+QN4MzPmYbofs3AccSKUwrle+e+
E6yYf4XxJdt7wRr4F1QKygtUEXSnNkNYUDQ4ZFbpJS/D4Sq80R1ZV/WZ7PJxm1D/
EIKoi7NU9cbPMIlbCzn8kzgfjS7Pe4p0WW/Xxc/IYmACzpwNPkZuFGSND79ksIpF
jhHqwZsOWFuXISjvcR4loc8wW6a5w5vjOiX0lLVz0NSdXSzVqav/2at7ZLDx/PT+
Lh9YVL51akA3hiD+3X6iOhfOUu6kskjT9HijE5T8rJnf0V+C6AtIRpwrQ7ONmjJm
3JMrjjLxtCIvpUyzCvDW
=A1oL
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"This is a fairly plain pull request. Lots of driver updates across the
stack, a huge number of static analysis cleanups including a close to
50 patch series from Bart Van Assche, and a number of new features
inside the stack such as general CQ moderation support.
Nothing really stands out, but there might be a few conflicts as you
take things in. In particular, the cleanups touched some of the same
lines as the new timer_setup changes.
Everything in this pull request has been through 0day and at least two
days of linux-next (since Stephen doesn't necessarily flag new
errors/warnings until day2). A few more items (about 30 patches) from
Intel and Mellanox showed up on the list on Tuesday. I've excluded
those from this pull request, and I'm sure some of them qualify as
fixes suitable to send any time, but I still have to review them
fully. If they contain mostly fixes and little or no new development,
then I will probably send them through by the end of the week just to
get them out of the way.
There was a break in my acceptance of patches which coincides with the
computer problems I had, and then when I got things mostly back under
control I had a backlog of patches to process, which I did mostly last
Friday and Monday. So there is a larger number of patches processed in
that timeframe than I was striving for.
Summary:
- Add iWARP support to qedr driver
- Lots of misc fixes across subsystem
- Multiple update series to hns roce driver
- Multiple update series to hfi1 driver
- Updates to vnic driver
- Add kref to wait struct in cxgb4 driver
- Updates to i40iw driver
- Mellanox shared pull request
- timer_setup changes
- massive cleanup series from Bart Van Assche
- Two series of SRP/SRPT changes from Bart Van Assche
- Core updates from Mellanox
- i40iw updates
- IPoIB updates
- mlx5 updates
- mlx4 updates
- hns updates
- bnxt_re fixes
- PCI write padding support
- Sparse/Smatch/warning cleanups/fixes
- CQ moderation support
- SRQ support in vmw_pvrdma"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (296 commits)
RDMA/core: Rename kernel modify_cq to better describe its usage
IB/mlx5: Add CQ moderation capability to query_device
IB/mlx4: Add CQ moderation capability to query_device
IB/uverbs: Add CQ moderation capability to query_device
IB/mlx5: Exposing modify CQ callback to uverbs layer
IB/mlx4: Exposing modify CQ callback to uverbs layer
IB/uverbs: Allow CQ moderation with modify CQ
iw_cxgb4: atomically flush the qp
iw_cxgb4: only call the cq comp_handler when the cq is armed
iw_cxgb4: Fix possible circular dependency locking warning
RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion
IB/core: Only maintain real QPs in the security lists
IB/ocrdma_hw: remove unnecessary code in ocrdma_mbx_dealloc_lkey
RDMA/core: Make function rdma_copy_addr return void
RDMA/vmw_pvrdma: Add shared receive queue support
RDMA/core: avoid uninitialized variable warning in create_udata
RDMA/bnxt_re: synchronize poll_cq and req_notify_cq verbs
RDMA/bnxt_re: Flush CQ notification Work Queue before destroying QP
RDMA/bnxt_re: Set QP state in case of response completion errors
RDMA/bnxt_re: Add memory barriers when processing CQ/EQ entries
...
Summary of modules changes for the 4.15 merge window:
- Treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook
- Minor code cleanups
Signed-off-by: Jessica Yu <jeyu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJaDCyzAAoJEMBFfjjOO8FyaYQP/AwHBy6XmwwVlWDP4BqIF6hL
Vhy3ccVLYEORvePv68tWSRPUz5n6+1Ebqanmwtkw6i8l+KwxY2SfkZql09cARc33
2iBE4bHF98iWQmnJbF6me80fedY9n5bZJNMQKEF9VozJWwTMOTQFTCfmyJRDBmk9
iidQj6M3idbSUOYIJjvc40VGx5NyQWSr+FFfqsz1rU5iLGRGEvA3I2/CDT0oTuV6
D4MmFxzE2Tv/vIMa2GzKJ1LGScuUfSjf93Lq9Kk0cG36qWao8l930CaXyVdE9WJv
bkUzpf3QYv/rDX6QbAGA0cada13zd+dfBr8YhchclEAfJ+GDLjMEDu04NEmI6KUT
5lP0Xw0xYNZQI7bkdxDMhsj5jaz/HJpXCjPCtZBnSEKiL4OPXVMe+pBHoCJ2/yFN
6M716XpWYgUviUOdiE+chczB5p3z4FA6u2ykaM4Tlk0btZuHGxjcSWwvcIdlPmjm
kY4AfDV6K0bfEBVguWPJicvrkx44atqT5nWbbPhDwTSavtsuRJLb3GCsHedx7K8h
ZO47lCQFAWCtrycK1HYw+oupNC3hYWQ0SR42XRdGhL1bq26C+1sei1QhfqSgA9PQ
7CwWH4UTOL9fhtrzSqZngYOh9sjQNFNefqQHcecNzcEjK2vjrgQZvRNWZKHSwaFs
fbGX8juZWP4ypbK+irTB
=c8vb
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module updates from Jessica Yu:
"Summary of modules changes for the 4.15 merge window:
- treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook
- minor code cleanups"
* tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Do not paper over type mismatches in module_param_call()
treewide: Fix function prototypes for module_param_call()
module: Prepare to convert all module_param_call() prototypes
kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
Current ib_modify_cq() is used to set CQ moderation parameters.
This patch renames ib_modify_cq() to be rdma_set_cq_moderation(),
because the kernel version of RDMA API doesn't need to follow already
exposed to user's API pattern (create_XXX/modify_XXX/query_XXX/destroy_XXX)
and better to have more accurate name which describes the actual usage.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The use_srq configfs attribute is created after it is read. Hence
modify srpt_tpg_attrib_use_srq_store() such that this function
switches dynamically between non-SRQ and SRQ mode.
Reported-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Introduce the helper function srpt_set_enabled(). Protect
sport->enabled changes with sdev->mutex. Makes configfs writes
into 'enabled' wait until all channel resources have been freed.
Wait until channel release has finished during kernel module
unload.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Except for changing a BUG_ON() call into a WARN_ON_ONCE() call, this
patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The only functional change in this patch is in the srpt_add_one()
error path: if allocating the ring buffer for the SRQ fails, fall
back to non-SRQ mode instead of disabling SRP target functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
IB and iWARP specs both spell out that posting a receive work request
to a queue pair in the RESET state is an invalid operation and required
to fail. Postpone posting receive work requests until after the
transition to the INIT state.
Fixes: commit dea262094c ("IB/srpt: Change default behavior from using SRQ to using RC")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Files removed in 'net-next' had their license header updated
in 'net'. We take the remove from 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Several function prototypes for the set/get functions defined by
module_param_call() have a slightly wrong argument types. This fixes
those in an effort to clean up the calls when running under type-enforced
compiler instrumentation for CFI. This is the result of running the
following semantic patch:
@match_module_param_call_function@
declarer name module_param_call;
identifier _name, _set_func, _get_func;
expression _arg, _mode;
@@
module_param_call(_name, _set_func, _get_func, _arg, _mode);
@fix_set_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._set_func;
identifier _val, _param;
type _val_type, _param_type;
@@
int _set_func(
-_val_type _val
+const char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }
@fix_get_prototype
depends on match_module_param_call_function@
identifier match_module_param_call_function._get_func;
identifier _val, _param;
type _val_type, _param_type;
@@
int _get_func(
-_val_type _val
+char * _val
,
-_param_type _param
+const struct kernel_param * _param
) { ... }
Two additional by-hand changes are included for places where the above
Coccinelle script didn't notice them:
drivers/platform/x86/thinkpad_acpi.c
fs/lockd/svc.c
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
NAPI budget is 64 packets, while maximum polling size for
the send CQ is 16. Let's bring them in sync, so the NAPI
budget will be reused completely.
Cc: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Instead of explicit call to poll_cq of the tx ring, use the NAPI mechanism
to handle the completions of each packet that has been sent to the HW.
The next major changes were taken:
* The driver init completion function in the creation of the send CQ,
that function triggers the napi scheduling.
* The driver uses CQ for RX for both modes UD and CM, and CQ for TX
for CM and UD.
Cc: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The first step toward using NAPI in the UD/TX flow is to separate
between two flows, the NAPI and the xmit, meaning no use of shared
variables between both flows.
This patch takes out the tx_outstanding variable that was used in both
flows and instead the driver uses the 2 cyclic ring variables: tx_head
and tx_tail, tx_head used in the xmit flow and tx_tail in the NAPI flow.
Cc: Kamal Heib <kamalh@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Conflicts:
drivers/infiniband/hw/cxgb4/cm.c
drivers/infiniband/hw/qib/qib_driver.c
drivers/infiniband/hw/qib/qib_mad.c
There were minor fixups needed in these files. Just minor context diffs
due to patches from independent sources touching the same basic area.
Signed-off-by: Doug Ledford <dledford@redhat.com>
The early for-next branch was based on v4.14-rc2, while the shared pull
request I got from Mellanox used a v4.14-rc4 base. I'm making the
branch that was the shared Mellanox pull request the new for-next branch
and merging the early for-next branch into it.
Signed-off-by: Doug Ledford <dledford@redhat.com>
For small networks it is safe to reduce the subnet timeout from
its default value (18 for opensm) to 16. Make the SRP CM timeout
dependent on the subnet timeout such that decreasing the subnet
timeout also causes SRP failover and failback to occur faster.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This is a micro-optimization for the hot path.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Although in the RC mode more resources are needed that mode has three
advantages over SRQ:
- It works with all RDMA adapters, even those that do not support
SRQ.
- Posting WRs and polling WCs does not trigger lock contention
because only one thread at a time accesses a WR or WC queue in
non-SRQ mode.
- The end-to-end flow control mechanism is used.
>From the IB spec:
C9-150.2.1: For QPs that are not associated with an SRQ, each HCA
receive queue shall generate end-to-end flow control credits. If
a QP is associated with an SRQ, the HCA receive queue shall not
generate end-to-end flow control credits.
Add new configfs attributes that allow to configure which mode to use
(/sys/kernel/config/target/srpt/$GUID/$GUID/attrib/use_srq). Note:
only the attribute for port 1 is relevant on multi-port adapters.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch is a micro-optimization for the hot path.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Make srpt_parse_i_port_id() return a negative value if hex2bin()
fails.
Fixes: commit a42d985bd5 ("ib_srpt: Initial SRP Target merge for v3.3-rc1")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Avoid that gcc 7 reports the following warning when building with W=1:
warning: this statement may fall through [-Wimplicit-fallthrough=]
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
To support passing child interfaces to the lower device a new
rdma_netdev function was used, set_id. This will allow us to
attach the PKEY index lower device resources such as TIS/QP.
For devices that do not support offloads in IPoIB same logic
will be used, setting the PKEY index to priv struct.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
When ndo_open and ndo_stop are called RTNL lock should be held.
In this specific case ipoib_ib_dev_open calls the offloaded ndo_open
which re-sets the number of TX queue assuming RTNL lock is held.
Since RTNL lock is not held, RTNL assert will fail.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Instead of making every caller convert the second argument of
sa_path_set_slid() and sa_path_set_dlid() to big endian format,
make these two functions accept LIDs in CPU endian format.
This patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Cc: Don Hiatt <don.hiatt@intel.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Alex Vesker <valex@mellanox.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Zhu Yanjun <yanjun.zhu@oracle.com>
Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Yuval Shaia <yuval.shaia@oracle.com>
Cc: linux-rdma@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add protocol specific routing control information in the encapsulation
header as per the configuration.
Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Scott Franco <safranco@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Update eth_link_status and operating status information to
represent the overall status of the virtual Ethernet switch port.
Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Clear the MAC table digest when the MAC table is freed.
Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Scott Franco <safranco@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Do not include EM specified MAC address in total MACs of the
UC MAC list.
Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Set power on default value of 1500 for eth_mtu.
Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Per pcp mtu fields are not used, mark them as reserved.
Reviewed-by: Sudeep Dutt <sudeep.dutt@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
commit 9a9b811269 will cause core to fail UD QP from being destroyed
on ipoib unload, therefore cause resources leakage.
On pkey change event above patch modifies mgid before calling underlying
driver to detach it from QP. Drivers' detach_mcast() will fail to find
modified mgid it was never given to attach in a first place.
Core qp->usecnt will never go down, so ib_destroy_qp() will fail.
IPoIB driver actually does take care of new broadcast mgid based on new
pkey by destroying an old mcast object in ipoib_mcast_dev_flush())
....
if (priv->broadcast) {
rb_erase(&priv->broadcast->rb_node, &priv->multicast_tree);
list_add_tail(&priv->broadcast->list, &remove_list);
priv->broadcast = NULL;
}
...
then in restarted ipoib_macst_join_task() creating a new broadcast mcast
object, sending join request and on completion tells the driver to attach
to reinitialized QP:
...
if (!priv->broadcast) {
...
broadcast = ipoib_mcast_alloc(dev, 0);
...
memcpy(broadcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
sizeof (union ib_gid));
priv->broadcast = broadcast;
...
Fixes: 9a9b811269 ("IB/ipoib: Update broadcast object if PKey value was changed in index 0")
Cc: stable@vger.kernel.org
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Call ipoib_remove_one when one of the IPoIB ports fails to initialize in
order not to leave the module in unstable state.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
No reason to have dependency on PCI for the entire infiniband stack so
move it to KConfig of only the drivers that actually using PCI.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Call free_rdma_netdev instead of free_netdev each time we want to
release a netdevice. This call is also relevant for future freeing
of offloaded child interfaces.
This patch also adds a missing call for free netdevice when releasing
a parent interface that has child interfaces using ipoib_remove_one.
Fixes: cd565b4b51 ('IB/IPoIB: Support acceleration options callbacks')
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A possible ABBA lock can happen with RTNL and vlan_rwsem.
For example:
Flow A: Device Flush
__ipoib_ib_dev_flush
down_read(vlan_rwsem) // Lock A
ipoib_flush_ah
flush_workqueue(priv->wq) // Wait for completion
A work on shared WQ (Mcast carrier)
ipoib_mcast_carrier_on_task
while (!rtnl_trylock()) // Wait for lock B
Flow B: Sysfs PKEY delete
ipoib_vlan_delete
lock(RTNL) // Lock B
down_write(vlan_rwsem) // Wait for lock A
This can happen with PKEY creates as well. The solution is to release
the RTNL lock in sysfs functions in case it is not possible to lock
VLAN RW semaphore and reset the SYS call.
Fixes: 69956d8326 ("IB/ipoib: Sync between remove_one to sysfs calls that use rtnl_lock")
Signed-off-by: Shalom Lagziel <shaloml@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The ib_mr->length represents the length of the MR in bytes as per
the IBTA spec 1.3 section 11.2.10.3 (REGISTER PHYSICAL MEMORY REGION).
Currently ib_mr->length field is defined as only 32-bits field.
This might result into truncation and failed WRs of consumers who
registers more than 4GB bytes memory regions and whose WRs accessing
such MRs.
This patch makes the length 64-bit to avoid such truncation.
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Fixes: 4c67e2bfc8 ("IB/core: Introduce new fast registration API")
Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
IPoIB doesn't support transport/rnr retry schemes as per
RFC so those errors are expected. No need to flood the
log files with them.
Tested-by: Michael Nowak <michael.nowak@oracle.com>
Tested-by: Rafael Alejandro Peralez <rafael.peralez@oracle.com>
Tested-by: Liwen Huang <liwen.huang@oracle.com>
Tested-by: Hong Liu <hong.x.liu@oracle.com>
Reviewed-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Reported-by: Rajiv Raja <rajiv.raja@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Adds support for ioctl callback in the RDMA netdevs to allow
supporting functions not handled by the generic interface code.
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In order to avoid deadlock between sysfs functions (like create/delete
child) and remove_one (both of them are using the sysfs lock and
rtnl_lock) the driver will use a state mutex for sync.
That will fix traces as the following:
schedule+0x3e/0x90
kernfs_drain+0x75/0xf0
? wait_woken+0x90/0x90
__kernfs_remove+0x12e/0x1c0
kernfs_remove+0x25/0x40
sysfs_remove_dir+0x57/0x90
kobject_del+0x22/0x60
device_del+0x195/0x230
pm_runtime_set_memalloc_noio+0xac/0xf0
netdev_unregister_kobject+0x71/0x80
rollback_registered_many+0x205/0x2f0
rollback_registered+0x31/0x40
unregister_netdevice_queue+0x58/0xb0
unregister_netdev+0x20/0x30
ipoib_remove_one+0xb7/0x240 [ib_ipoib]
ib_unregister_device+0xbc/0x1b0 [ib_core]
ib_unregister_mad_agent+0x29/0x30 [ib_core]
mlx4_ib_remove+0x67/0x280 [mlx4_ib]
INFO: task echo:24082 blocked for more than 120 seconds.
Tainted: G OE 4.1.12-37.5.1.el6uek.x86_64 #2
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
Call Trace:
schedule+0x3e/0x90
schedule_preempt_disabled+0xe/0x10
__mutex_lock_slowpath+0x95/0x110
? _rcu_barrier+0x177/0x220
mutex_lock+0x23/0x40
rtnl_lock+0x15/0x20
netdev_run_todo+0x81/0x1f0
rtnl_unlock+0xe/0x10
ipoib_vlan_delete+0x12f/0x1c0 [ib_ipoib]
delete_child+0x69/0x80 [ib_ipoib]
dev_attr_store+0x20/0x30
sysfs_kf_write+0x41/0x50
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The functions ib_register_event_handler() and
ib_unregister_event_handler() always returned success and they can't fail.
Let's convert those functions to be void, remove redundant checks and
cleanup tons of goto statements.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Merging our (hopefully) final -rc pull branch into our for-next branch
because some of our pending patches won't apply cleanly without having
the -rc patches in our tree.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Conflicts:
include/rdma/ib_verbs.h - Modified a function signature adjacent
to a newly added function signature from a previous merge
Signed-off-by: Doug Ledford <dledford@redhat.com>
Conflicts:
drivers/infiniband/hw/mlx5/main.c - Both add new code
include/rdma/ib_verbs.h - Both add new code
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is a need to forward FW version to user space
application through RDMA netlink. In order to make it safe, there
is need to declare nla_policy and limit the size of FW string.
The new define IB_FW_VERSION_NAME_MAX will limit the size of
FW version string. That define was chosen to be equal to
ETHTOOL_FWVERS_LEN, because many drivers anyway are limited
by that value indirectly.
The introduction of this define allows us to remove the string size
from get_fw_str function signature.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
IPoIB contains local_lid field which is 16 bits in
length, increase it to 32 bits.
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
srpt contains lid and sm_lid fields which are 16 bits in
length, increase them to 32 bits.
Signed-off-by: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
All modules in drivers/infiniband defined and used MODULE_VERSION, which
was pointless because the kernel version describes their state more accurate
then those arbitrary numbers.
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Acked-by: Sagi Grimbrg <sagi@grimberg.me>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Modify QP can fail and it can be acceptable, like when moving from RST to
ERR state, all the rest are not acceptable and a message to the log
should be printed.
The current code prints on all failures and many messages like:
"Failed to modify QP to ERROR state" appear, even when supported by the
state machine of the QP object.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
There is no need to assign "p" pointer twice.
This patch fixes the following smatch warning:
drivers/infiniband/ulp/ipoib/ipoib_cm.c:517 ipoib_cm_rx_handler() warn:
missing break? reassigning 'p->id'
Fixes: 839fcaba35 ("IPoIB: Connected mode experimental support")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Refactor error paths in ipoib_add_port() function. The code flow
ensures that the function terminates on every error flow and it makes
redundant all "else" cases.
The functions are called during the flow are returning "result < 0", in
case of error, so there is no need to check it explicitly.
Fixes: 58e9cc90cd ("IB/IPoIB: Fix bad error flow in ipoib_add_port()")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Update the multicast counter when multicast packets are received and
provide this information through ethtool support.
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Set IPOIB_NEIGH_TBL_FLUSH bit after initializing the neighbor
flushed completion, otherwise the garbage collector may signal
a completion while it is not initialized yet.
Fixes: b63b70d877 ("IPoIB: Use a private hash table for path lookup in xmit path")
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Don't allow negative values to max_nonsrq_conn_qp. There is no functional
impact on a negative value but it is logicically incorrect.
Fixes: 68e995a295 ("IPoIB/cm: Add connected mode support for devices without SRQs")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
While cleaning neighs and there is a send-only mcast neigh, the driver
should wait to finish its join process before trying to remove it.
Without this patch, we will see messages like: "ipoib_mcast_leave on an
in-flight join" and unexpected results in the join_complete.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
The work mcast_task can re-queue itself, so instead of doing
cancel && flush_workqueue, that still can leave a queued task
on the air, use cancel_delayed_work_sync.
Also, no need to use lock over the cancel, the original lock was
due to bit assignment setting (IPOIB_MCAST_RUN) that is not in use
anymore.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
A potential race between light_event and interface restart
may attach multicast group to an already attached QP.
Scenario:
light_event flow goes through ipoib_mcast_dev_flush function,
if a context switch occurs before calling ipoib_mcast_remove_list,
then we may face a situation where the broadcast of the priv is null
and the corresponding QP is not detached yet.
If an "interface restart" runs during the previous context switch,
the following scenario occurs:
When the device goes up, ipoib_ib_dev_up function will be called,
it will send a new registration request to the broadcast group and then
attach the group to the QP that was not detached before.
IPOIB_FLUSH_LIGHT INTERFACE RESTART
__ipoib_ib_dev_flush |
| |
| |
| |
ipoib_mcast_dev_flush |
Move mcast list and broadcast to remove_list |
| |
| |
Context Switch--> |
| ipoib_ib_dev_down
| |
| |
| ipoib_ib_dev_up
| |
| |
| ipoib_mcast_join_task
| allocate new broadcast
| |
| |
| Attach QP to multicast group
| |
| |
| <--Context Switch
ipoib_mcast_leave
Detach QP from multicast group
Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
We might get some bogus error completions in case the target will
remotely invalidate the rkey and the HCA will need to retransmit
from this buffer.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We accidentally don't see the error code on some of these error paths.
It means we return ERR_PTR(0) which is NULL and it results in a NULL
dereference in the caller.
This bug dates to pre-git days.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The driver checks if the lower level driver supports get_stats, and if
so calls it to get the updated statistics, otherwise takes from the
current netdevice stats object.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Commit 21caf2fc19 ("mm: teach mm by current context info to not do I/O
during memory allocation") added the memalloc_noio_(save|restore) functions
to enable people to modify the MM behavior by disabling I/O during memory
allocation. This was further extended in Fixes: 934f3072c1 ("mm: clear
__GFP_FS when PF_MEMALLOC_NOIO is set"). memalloc_noio_* functions prevent
allocation paths recursing back into the filesystem without explicitly
changing the flags for every allocation site.
However the IPoIB hasn't been keeping up with the changes and missed
completely these memalloc_noio_* calls. This led to update of
allocation site with special QP creation flag, see commit 09b93088d7
("IB: Add a QP creation flag to use GFP_NOIO allocations"), while this
flag is supported by small number of drivers in IB stack.
Let's change it by updating to memalloc_noio_* calls and allow
for every driver underneath enjoy NOIO allocations.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch checks if there is a driver below that
needs to be updated on the new MTU and calls it
accordingly.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The msleep(1) may do not sleep 1 ms as expected
and will sleep longer. The simple conversion from
msleep to usleep_range between 1ms and 2ms can solve an
issue.
The full and comprehensive explanation can be found at [1] and [2].
[1] https://lkml.org/lkml/2007/8/3/250
[2] Documentation/timers/timers-howto.txt
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Under heavy iser target(scst) start/stop stress during login/logout
on iser intitiator side happened trace call provided below.
The function iscsi_iser_slave_alloc iser_conn pointer could be NULL,
due to the fact that function iscsi_iser_conn_stop can be called before
and free iser connection. Let's protect that flow by introducing global mutex.
BUG: unable to handle kernel paging request at 0000000000001018
IP: [<ffffffffc0426f7e>] iscsi_iser_slave_alloc+0x1e/0x50 [ib_iser]
Call Trace:
? scsi_alloc_sdev+0x242/0x300
scsi_probe_and_add_lun+0x9e1/0xea0
? kfree_const+0x21/0x30
? kobject_set_name_vargs+0x76/0x90
? __pm_runtime_resume+0x5b/0x70
__scsi_scan_target+0xf6/0x250
scsi_scan_target+0xea/0x100
iscsi_user_scan_session.part.13+0x101/0x130 [scsi_transport_iscsi]
? iscsi_user_scan_session.part.13+0x130/0x130 [scsi_transport_iscsi]
iscsi_user_scan_session+0x1e/0x30 [scsi_transport_iscsi]
device_for_each_child+0x50/0x90
iscsi_user_scan+0x44/0x60 [scsi_transport_iscsi]
store_scan+0xa8/0x100
? common_file_perm+0x5d/0x1c0
dev_attr_store+0x18/0x30
sysfs_kf_write+0x37/0x40
kernfs_fop_write+0x12c/0x1c0
__vfs_write+0x18/0x40
vfs_write+0xb5/0x1a0
SyS_write+0x55/0xc0
Fixes: 318d311e8f ("iser: Accept arbitrary sg lists mapping if the device supports it")
Cc: <stable@vger.kernel.org> # v4.5+
Signed-off-by: Vladimir Neyelov <vladimirn@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pull SCSI target updates from Nicholas Bellinger:
"It's been usually busy for summer, with most of the efforts centered
around TCMU developments and various target-core + fabric driver bug
fixing activities. Not particularly large in terms of LoC, but lots of
smaller patches from many different folks.
The highlights include:
- ibmvscsis logical partition manager support (Michael Cyr + Bryant
Ly)
- Convert target/iblock WRITE_SAME to blkdev_issue_zeroout (hch +
nab)
- Add support for TMR percpu LUN reference counting (nab)
- Fix a potential deadlock between EXTENDED_COPY and iscsi shutdown
(Bart)
- Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce (Jiang Yi)
- Fix TMCU module removal (Xiubo Li)
- Fix iser-target OOPs during login failure (Andrea Righi + Sagi)
- Breakup target-core free_device backend driver callback (mnc)
- Perform TCMU add/delete/reconfig synchronously (mnc)
- Fix TCMU multiple UIO open/close sequences (mnc)
- Fix TCMU CHECK_CONDITION sense handling (mnc)
- Fix target-core SAM_STAT_BUSY + TASK_SET_FULL handling (mnc + nab)
- Introduce TYPE_ZBC support in PSCSI (Damien Le Moal)
- Fix possible TCMU memory leak + OOPs when recalculating cmd base
size (Xiubo Li + Bryant Ly + Damien Le Moal + mnc)
- Add login_keys_workaround attribute for non RFC initiators (Robert
LeBlanc + Arun Easi + nab)"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (68 commits)
iscsi-target: Add login_keys_workaround attribute for non RFC initiators
Revert "qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT"
tcmu: clean up the code and with one small fix
tcmu: Fix possbile memory leak / OOPs when recalculating cmd base size
target: export lio pgr/alua support as device attr
target: Fix return sense reason in target_scsi3_emulate_pr_out
target: Fix cmd size for PR-OUT in passthrough_parse_cdb
tcmu: Fix dev_config_store
target: pscsi: Introduce TYPE_ZBC support
target: Use macro for WRITE_VERIFY_32 operation codes
target: fix SAM_STAT_BUSY/TASK_SET_FULL handling
target: remove transport_complete
pscsi: finish cmd processing from pscsi_req_done
tcmu: fix sense handling during completion
target: add helper to copy sense to se_cmd buffer
target: do not require a transport_complete for SCF_TRANSPORT_TASK_SENSE
target: make device_mutex and device_list static
tcmu: Fix flushing cmd entry dcache page
tcmu: fix multiple uio open/close sequences
tcmu: drop configured check in destroy
...
max_fast_reg_page_list_len is only valid when the
memory management extentions are signaled by the underlying
driver.
Fix by adjusting iser_calc_scsi_params() to use
ISCSI_ISER_MAX_SG_TABLESIZE when the extentions are not indicated.
Reported-by: Thomas Rosenstein <thomas.rosenstein@creamfinance.com>
Fixes: Commit df749cdc45 ("IB/iser: Support up to 8MB data transfer in a single command")
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Thomas Rosenstein <thomas.rosenstein@creamfinance.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch fixes a NULL pointer dereference in isert_login_recv_done()
of isert_conn->cm_id due to isert_cma_handler() -> isert_connect_error()
resetting isert_conn->cm_id = NULL during a failed login attempt.
As per Sagi, we will always see the completion of all recv wrs posted
on the qp (given that we assigned a ->done handler), this is a FLUSH
error completion, we just don't get to verify that because we deref
NULL before.
The issue here, was the assumption that dereferencing the connection
cm_id is always safe, which is not true since:
commit 4a579da258
Author: Sagi Grimberg <sagig@mellanox.com>
Date: Sun Mar 29 15:52:04 2015 +0300
iser-target: Fix possible deadlock in RDMA_CM connection error
As I see it, we have a direct reference to the isert_device from
isert_conn which is the one-liner fix that we actually need like
we do in isert_rdma_read_done() and isert_rdma_write_done().
Reported-by: Andrea Righi <righi.andrea@gmail.com>
Tested-by: Andrea Righi <righi.andrea@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Do not only report the state of the I/O context before srpt_abort_cmd()
was called but also the new state assigned by srpt_abort_cmd()
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
- 2 Fixes for OPA found by debug kernel
- 1 Fix for user supplied input causing kernel problems
- 1 Fix for the IPoIB fixes submitted around -rc4
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZXVfPAAoJELgmozMOVy/dRRcP/AiN4wyEQ897se1fKXAktL1g
a17tiSkK2MukAVHbM++9Ea/YXK66e2s7Ls8Pd230E85N3V48rSUhWZUIUQLOm+gS
b98z53uNs6KkdBCezXABsHIi4PB6u1CfzaFaUfN5WI3ymAgsYqpQWMtNyO6GNe/R
Dur3vDieXPNJ2x+F1jiNxHFBXLKofCG0y1FX88zqsQI5vVVq7ASKgaaSX3T1emQY
18l4Dd7pesrWj4QD9jaqQiYkruF5VC1NE8/he8Zzy6XjSgnUZZfjbjuMptbW4y3y
Tvvd5bjMAkJhCbK1mhe1dZHPlYJhAguUBZfThjVSKtiMGwRhGA4SYkRtek3nZOga
/OLhERgj0VomHx7o+Pwp74DWnsSv08EMoc4hXKHZPPyxok83r9czejqm7mC2VbGd
Sa8LmVeLQp79e9MbGAj+PbNRHf9CE9dnLeFUmbj+qptXUVGvT8j9U1a9iTjTz0+2
NX/O4iWjtnt/CIkH9dhN9aWolswbmO2jSmmzb/x2EuCLv94GNtTyZLSifvxSYMnN
IWO86aGQmuUkWJ3RI/5tzq+gVzI6bdKB9hG5DOPWN/uJVF9nWkq3c69Bv9djvUoM
xi/rI0grxTqYHelRx3ja4ZqaI43R6YwL928XdtZJKQ/uNanq65Lyd6KKz3W7hT0l
emCoqb2MjuzsNWIPkSgg
=JEor
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma update from Doug Ledford:
"This includes two bugs against the newly added opa vnic that were
found by turning on the debug kernel options:
- sleeping while holding a lock, so a one line fix where they
switched it from GFP_KERNEL allocation to a GFP_ATOMIC allocation
- a case where they had an isolated caller of their code that could
call them in an atomic context so they had to switch their use of a
mutex to a spinlock to be safe, so this was considerably more lines
of diff because all uses of that lock had to be switched
In addition, the bug that was discussed with you already about an out
of bounds array access in ib_uverbs_modify_qp and ib_uverbs_create_ah
and is only seven lines of diff.
And finally, one fix to an earlier fix in the -rc cycle that broke
hfi1 and qib in regards to IPoIB (this one is, unfortunately, larger
than I would like for a -rc7 submission, but fixing the problem
required that we not treat all devices as though they had allocated a
netdev universally because it isn't true, and it took 70 lines of diff
to resolve the issue, but the final patch has been vetted by Intel and
Mellanox and they've both given their approval to the fix).
Summary:
- Two fixes for OPA found by debug kernel
- Fix for user supplied input causing kernel problems
- Fix for the IPoIB fixes submitted around -rc4"
[ Doug sent this having not noticed the 4.12 release, so I guess I'll be
getting another rdma pull request with the actuakl merge window
updates and not just fixes.
Oh well - it would have been nice if this small update had been the
merge window one. - Linus ]
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/core, opa_vnic, hfi1, mlx5: Properly free rdma_netdev
RDMA/uverbs: Check port number supplied by user verbs cmds
IB/opa_vnic: Use spinlock instead of mutex for stats_lock
IB/opa_vnic: Use GFP_ATOMIC while sending trap