WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Tony Jones	f4e91eb4a8	IB: convert struct class_device to struct device This converts the main ib_device to use struct device instead of struct class_device as class_device is going away. Signed-off-by: Tony Jones <tonyj@suse.de> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2008-04-19 19:10:30 -07:00
Matthew Wilcox	6188e10d38	Convert asm/semaphore.h users to linux/semaphore.h Signed-off-by: Matthew Wilcox <willy@linux.intel.com>	2008-04-18 22:22:54 -04:00
Jack Morgenstein	940801b27e	IB/mthca: Update module version and release date The ib_mthca driver has been stable for a while, so bump the version number to 1.0 to indicate this. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:34 -07:00
Dotan Barak	5121df3ae4	IB/mthca: Update QP state if query QP succeeds If the QP was moved to another state (such as SQE) by the hardware, then after this change the user won't have to set the IBV_QP_CUR_STATE mask in order to execute modify QP in order to recover from this state. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:34 -07:00
Roland Dreier	0f39cf3d54	IB/core: Add support for "send with invalidate" work requests Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a "send with invalidate" work request as defined in the iWARP verbs and the InfiniBand base memory management extensions. Also put "imm_data" and a new "invalidate_rkey" member in a new "ex" union in struct ib_send_wr. The invalidate_rkey member can be used to pass in an R_Key/STag to be invalidated. Add this new union to struct ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in ib_uverbs_post_send(). Fix up low-level drivers to deal with the change to struct ib_send_wr, and just remove the imm_data initialization from net/sunrpc/xprtrdma/, since that code never does any send with immediate operations. Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since the iWARP drivers currently in the tree set the bit. The amso1100 driver at least will silently fail to honor the IB_SEND_INVALIDATE bit if passed in as part of userspace send requests (since it does not implement kernel bypass work request queueing). Remove the flag from all existing drivers that set it until we know which ones are OK. The values chosen for the new flag is not consecutive to avoid clashing with flags defined in the XRC patches, which are not merged yet but which are already in use and are likely to be merged soon. This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:32 -07:00
Eli Cohen	b846f25aa2	IB/core: Add creation flags to struct ib_qp_init_attr Add a create_flags member to struct ib_qp_init_attr that will allow a kernel verbs consumer to create a pass special flags when creating a QP. Add a flag value for telling low-level drivers that a QP will be used for IPoIB UD LSO. The create_flags member will also be useful for XRC and ehca low-latency QP support. Since no create_flags handling is implemented yet, add code to all low-level drivers to return -EINVAL if create_flags is non-zero. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:09:27 -07:00
Roland Dreier	c263ff65d5	IB/mthca: Avoid integer overflow when allocating huge ICM table In mthca_alloc_icm_table(), the number of entries to allocate for the table->icm array is computed by calculating obj_size * nobj and then dividing by MTHCA_TABLE_CHUNK_SIZE. If nobj is really large, then obj_size * nobj may overflow and the division may get the wrong value (even a negative value). Fix this by calculating the number of objects per chunk and then dividing nobj by this value instead. This patch allows crazy configurations such as loading ib_mthca with the module parameter num_mtt=33554432 to work properly. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:01:13 -07:00
Roland Dreier	19773539d6	IB/mthca: Avoid integer overflow when dealing with profile size mthca_make_profile() returns the size in bytes of the HCA context layout it creates, or a negative value if an error occurs. However, the return value is declared as u64 and the memfree initialization path casts this value to int to test if it is negative. This makes it think incorrectly than an error has occurred if the context size happens to be bigger than 2GB, since this turns into a negative int. Fix this by having mthca_make_profile() return an s64 and testing for an error by checking whether this 64-bit value itself is negative. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:01:13 -07:00
Eli Cohen	680b575f6d	IB/mthca: Add IPoIB checksum offload support Arbel and Sinai devices support checksum generation and verification of TCP and UDP packets for UD IPoIB messages. This patch checks if the HCA supports this and sets the IB_DEVICE_UD_IP_CSUM capability flag if it does. It implements support for handling the IB_SEND_IP_CSUM send flag and setting the csum_ok field in receive work completions. Signed-off-by: Eli Cohen <eli@mellnaox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:01:11 -07:00
Roland Dreier	b39993936d	IB/mthca: Formatting cleanups Fix a few whitespace and other coding style problems. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-04-16 21:01:03 -07:00
Roland Dreier	b7f9c112a5	IB/mthca: Free correct MPT on error exit from mthca_fmr_alloc() When mthca_fmr_alloc() returns an error, it should free the MPT at the index key, not mr->ibmr.lkey, since the lkey has been mangled by hw_index_to_key() and no longer is the real index. This bug causes corruption of the MPT table free bitmap when mthca_fmr_alloc() fails. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-19 10:42:50 -08:00
Marcin Slusarz	5163dc1a64	IB/mthca: Convert to use be16_add_cpu() replace: big_endian_variable = cpu_to_beX(beX_to_cpu(big_endian_variable) + expression_in_cpu_byteorder); with: beX_add_cpu(&big_endian_variable, expression_in_cpu_byteorder); Generated with a semantic patch. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-13 07:47:47 -08:00
Roland Dreier	fe174357eb	IB/mthca: Add missing sg_init_table() in mthca_map_user_db() Usually harmless, since the scatterlist is always hard-coded to a length of 1, but it triggers a BUG() if CONFIG_DEBUG_SG=y, so we better fix it. This fixes <http://bugzilla.kernel.org/show_bug.cgi?id=9934>. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-12 14:38:22 -08:00
Olaf Kirch	2c78853472	IB/mthca: Return proper error codes from mthca_fmr_alloc() If the allocation of the MTT or the mailbox failed, mthca_fmr_alloc() would return 0 (success) no matter what. This leads to crashes a little down the road, when we try to dereference eg mr->mtt, which was really ERR_PTR(-Ewhatever). Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-04 20:20:44 -08:00
Roland Dreier	f33afc26dc	IB: Avoid marking __devinitdata as const Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-04 20:20:44 -08:00
Eli Cohen	1d368c5465	IB/ib_mthca: Pre-link receive WQEs in Tavor mode We have recently discovered that Tavor mode requires each WQE in a posted list of receive WQEs to have a valid NDA field at all times. This requirement holds true for regular QPs as well as for SRQs. This patch prelinks the receive queue in a regular QP and keeps the free list in SRQ always properly linked. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-04 20:20:44 -08:00
Eli Cohen	1203c42e7b	IB/mthca: Remove checks for srq->first_free < 0 The SRQ receive posting functions make sure that srq->first_free never becomes negative, so we can remove tests of whether it is negative. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-04 20:20:44 -08:00
Jack Morgenstein	6ccef1de2c	IB/mthca: Don't read reserved fields in mthca_QUERY_ADAPTER() For memfree devices, the firmware QUERY_ADAPTER command does not return vendor_id, device_id, and revision_id; do not return these fields in the QUERY_ADAPTER function for memfree devices. Instead, for memfree devices, initialize the rev_id field of the mthca device via init_node_data (MAD IFC query), as is done in the query_device verb implementation. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-04 20:20:43 -08:00
Roland Dreier	0d89fe2c0c	IB/mthca: Fix and simplify page size calculation in mthca_reg_phys_mr() In mthca_reg_phys_mr(), we calculate the page size for the HCA hardware to use to map the buffer list passed in by the consumer. For example, if the consumer passes in [0] addr 0x1000, size 0x1000 [1] addr 0x2000, size 0x1000 then the algorithm would come up with a page size of 0x2000 and a list of two pages, at 0x0000 and 0x2000. Usually, this would work fine since the memory region would start at an offset of 0x1000 and have a length of 0x2000. However, the old code did not take into account the alignment of the IO virtual address passed in. For example, if the consumer passed in a virtual address of 0x6000 for the above, then the offset of 0x1000 would not be used correctly because the page mask of 0x1fff would result in an offset of 0. We can fix this quite neatly by making sure that the page shift we use is no bigger than the first bit where the start of the first buffer and the IO virtual address differ. Also, we can further simplify the code by removing the special case for a single buffer by noticing that it doesn't matter if we use a page size that is too big. This allows the loop to compute the page shift to be replaced with __ffs(). Thanks to Bryan S Rosenburg <rosnbrg@us.ibm.com> for pointing out the original bug and suggesting several ways to improve this patch. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-02-04 20:20:42 -08:00
Roland Dreier	950529e5c6	IB/mthca: Update latest "native Arbel" firmware revision Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:17:44 -08:00
Adrian Bunk	e57895d389	IB/mthca: Remove MSI support as scheduled Remove MSI support from the mthca driver, as scheduled. There is no reason to use MSI instead of MSI-X, since MSI-X performs better. No one has spoken up since MSI support was deprecated in commit `f6be6fbe` ("IB/mthca: Schedule MSI support for removal"), so apparently the MSI support is unused. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2008-01-25 14:15:33 -08:00
Jens Axboe	642f149031	SG: Change sg_set_page() to take length and offset argument Most drivers need to set length and offset as well, so may as well fold those three lines into one. Add sg_assign_page() for those two locations that only needed to set the page, where the offset/length is set outside of the function context. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-24 11:20:47 +02:00
Linus Torvalds	0b776eb542	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: mlx4_core: Increase command timeout for INIT_HCA to 10 seconds IPoIB/cm: Use common CQ for CM send completions IB/uverbs: Fix checking of userspace object ownership IB/mlx4: Sanity check userspace send queue sizes IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))" IB/ehca: Enable large page MRs by default IB/ehca: Change meaning of hca_cap_mr_pgsize IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr() IB/ehca: Fix masking error in {,re}reg_phys_mr() IB/ehca: Supply QP token for SRQ base QPs IPoIB: Use round_jiffies() for ah_reap_task RDMA/cma: Fix deadlock destroying listen requests RDMA/cma: Add locking around QP accesses IB/mthca: Avoid alignment traps when writing doorbells mlx4_core: Kill mlx4_write64_raw()	2007-10-23 09:56:11 -07:00
Jens Axboe	45711f1af6	[SG] Update drivers to use sg helpers Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-22 21:19:53 +02:00
Roland Dreier	ab8403c424	IB/mthca: Avoid alignment traps when writing doorbells Architectures such as ia64 see alignment traps when doing a 64-bit read from __be32 doorbell[2] arrays to do doorbell writes in mthca_write64(). Fix this by just passing the two halves of the doorbell value into mthca_write64(). This actually improves the generated code by allowing the compiler to see what's going on better. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-15 20:17:27 -07:00
Eli Cohen	55a98e955c	IB/mthca: Mark error paths as unlikely() in post_srq_recv functions Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-10 11:24:33 -07:00
Roland Dreier	76d7cc0345	IB/mthca: Use mmiowb() to avoid firmware commands getting jumbled up Firmware commands are sent to the HCA by writing multiple words to a command register block. Access to this block of registers is serialized with a mutex. However, on large SGI systems, problems were seen with multiple CPUs issuing FW commands at the same time, because the writes to the register block may be reordered within the system interconnect and reach the HCA in a different order than they were issued (even with the mutex). Fix this by adding an mmiowb() before dropping the mutex. Tested-by: Arthur Kepner <akepner@sgi.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:17 -07:00
Roland Dreier	1a1eb6a646	IB/mthca: Increase max number of QPs per multicast group to 56 Increase the number of QPs allowed per multicast group from 8 to 56. This allows for one QP per core on 16-core systems, which are now quite common, and allows some space for future growth. This is basically the same patch that Jack Morgenstein <jackm@dev.mellanox.co.il> just supplied for mlx4. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:17 -07:00
Peter Oruba	a855b1a742	IB/mthca: Use PCI-X/PCI-Express read control interfaces These driver changes incorporate the proposed PCI-X / PCI-Express read byte count interface. Reading and setting those values doesn't take place "manually", instead wrapping functions are called to allow quirks for some PCI bridges. Signed-off by: Peter Oruba <peter.oruba@amd.com> Based on work by Stephen Hemminger <shemminger@linux-foundation.org> Cc: Roland Dreier <rolandd@cisco.com> Cc: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:07 -07:00
Michael S. Tsirkin	017aadc4b5	IB/mthca: Enable MSI-X by default Recover from MSI-X errors by automatically falling back on regular interrupt, instead of asking the user to do this manually. This makes it possible to enable MSI-X by default, and will make it possible to get rid of the msi_x module option in the future. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-10-09 19:59:06 -07:00
Michael S. Tsirkin	c1f74958db	IB/mthca: Change command token on timeout The FW command token is currently only updated on a command completion event. This means that on command timeout, the same token will be reused for new command, which results in a mess if the timed out command does eventually complete. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-20 21:19:43 -07:00
Roland Dreier	43509d1fec	IB/mthca: Simplify use of size0 in work request posting Current code sets size0 to 0 at the start of work request posting functions and then handles size0 == 0 specially within the loop over work requests. Change this so size0 is set along with f0 the first time through the loop (when nreq == 0). This makes the code easier to understand by making it clearer that f0 and size0 are always initialized if nreq != 0 without having to know that size0 == 0 implies nreq == 0. Also annotate size0 with uninitialized_var() so that this doesn't introduce a new compiler warning. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-18 13:28:29 -07:00
Roland Dreier	e535c699bf	IB/mthca: Factor out setting WQE UD segment entries Factor code to set UD entries out of the work request posting functions into inline functions set_tavor_ud_seg() and set_arbel_ud_seg(). This doesn't change the generated code in any significant way, and makes the source easier on the eyes. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-18 13:21:14 -07:00
Roland Dreier	400ddc11eb	IB/mthca: Factor out setting WQE remote address and atomic segment entries Factor code to set remote address and atomic segment entries out of the work request posting functions into inline functions set_raddr_seg() and set_atomic_seg(). This doesn't change the generated code in any significant way, and makes the source easier on the eyes. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-18 12:55:42 -07:00
Roland Dreier	80885456e8	IB/mthca: Factor out setting WQE data segment entries Factor code to set data segment entries out of the work request posting functions into inline functions mthca_set_data_seg() and mthca_set_data_seg_inval(). This makes the code more readable and also allows the compiler to do a better job -- on x86_64: add/remove: 0/0 grow/shrink: 0/6 up/down: 0/-69 (-69) function old new delta mthca_arbel_post_srq_recv 373 369 -4 mthca_arbel_post_receive 570 562 -8 mthca_tavor_post_srq_recv 520 508 -12 mthca_tavor_post_send 1344 1330 -14 mthca_arbel_post_send 1481 1467 -14 mthca_tavor_post_receive 792 775 -17 Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-18 11:30:34 -07:00
Roland Dreier	6d7d080e9f	IB/mthca: Use uninitialized_var() for f0 Commit `9db48926` ("drivers/infiniband/hw/mthca/mthca_qp: kill uninit'd var warning") added "= 0" to the declarations of f0 to shut up gcc warnings. However, there's no point in making the code bigger by initializing f0 to a random value just to get rid of a warning; setting f0 to 0 is no safer than just using uninitialized_var(), which documents the situation better and gives smaller code too. For example, on x86_64: add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-16 (-16) function old new delta mthca_tavor_post_send 1352 1344 -8 mthca_arbel_post_send 1489 1481 -8 Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-17 19:30:51 -07:00
Roland Dreier	e4daf73868	IB/mthca: Fix printk format used for firmware version in warning When warning about out-of-date firmware, current mthca code messes up the formatting of the version if the subminor doesn't have three digits. It doesn't fill the field with 0s so we end up with: ib_mthca 0000:0b:00.0: HCA FW version 1.1. 0 is old (1.2. 0 is current). Change the format from "%3d" to "%03d" to get the right thing printed. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-17 18:37:42 -07:00
Roland Dreier	f6be6fbe26	IB/mthca: Schedule MSI support for removal The mthca driver supports both MSI and MSI-X. However, MSI-X works with all hardware that the driver handles, and provides a superset of what MSI does, so there's no point in having code for both. Schedule MSI support for removal in 2008 to give anyone who actually needs MSI and who can't use MSI time to speak up. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-17 18:37:41 -07:00
Jeff Garzik	9db4892620	drivers/infiniband/hw/mthca/mthca_qp: kill uninit'd var warning drivers/infiniband/hw/mthca/mthca_qp.c: In function ‘mthca_tavor_post_send’: drivers/infiniband/hw/mthca/mthca_qp.c:1594: warning: ‘f0’ may be used uninitialized in this function drivers/infiniband/hw/mthca/mthca_qp.c: In function ‘mthca_arbel_post_send’: drivers/infiniband/hw/mthca/mthca_qp.c:1949: warning: ‘f0’ may be used uninitialized in this function Initializing 'f0' is not strictly necessary in either case, AFAICS. I was considering use of uninitialized_var(), but looking at the complex flow of control in each function, I feel it is wiser and safer to simply zero the var and be certain of ourselves. Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-07-17 16:18:00 -04:00
Shani Moideen	8909c571fa	IB/mthca: Replace memset(<addr>, 0, PAGE_SIZE) with clear_page(<addr>) Signed-off-by: Shani Moideen <shani.moideen@wipro.com> Signed-off-by: Roland Dreier <rolandd@cisco.com> ----	2007-07-10 12:28:05 -07:00
Jan Engelhardt	06cc85086e	IB: Use menuconfig for InfiniBand menu Change Kconfig objects from "menu, config" into "menuconfig" so that the user can disable the whole feature without having to enter the menu first. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-07-09 20:12:26 -07:00
Roland Dreier	3e1db334dc	IB/mthca, mlx4_core: Fix typo in comment s/signifant/significant/ Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-06-07 11:51:59 -07:00
Michael S. Tsirkin	8b7e15772a	IB/mthca: Fix handling of send CQE with error for QPs connected to SRQ mthca_free_err_wqe() currently treats both send and receive CQEs identically if a QP is using an SRQ. But for Tavor hardware, send CQEs with error can be chained together even if the RQ is part of SRQ, so we may miss some CQEs. Fix by following the WQE chain for all send CQEs even for non-SRQ QPs. This fixes crashes in IPoIB CM: <https://bugs.openfabrics.org//show_bug.cgi?id=604> Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-29 16:07:09 -07:00
Linus Torvalds	8aee74c8ee	Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: IB/cm: Improve local id allocation IPoIB/cm: Fix SRQ WR leak IB/ipoib: Fix typos in error messages IB/mlx4: Check if SRQ is full when posting receive IB/mlx4: Pass send queue sizes from userspace to kernel IB/mlx4: Fix check of opcode in mlx4_ib_post_send() mlx4_core: Fix array overrun in dump_dev_cap_flags() IB/mlx4: Fix RESET to RESET and RESET to ERROR transitions IB/mthca: Fix RESET to ERROR transition IB/mlx4: Set GRH:HopLimit when sending globally routed MADs IB/mthca: Set GRH:HopLimit when building MLX headers IB/mlx4: Fix check of max_qp_dest_rdma in modify QP IB/mthca: Fix use-after-free on device restart IB/ehca: Return proper error code if register_mr fails IPoIB: Handle P_Key table reordering IB/core: Use start_port() and end_port() IB/core: Add helpers for uncached GID and P_Key searches IB/ipath: Fix potential deadlock with multicast spinlocks IB/core: Free umem when mm is already gone	2007-05-21 16:19:32 -07:00
Alexey Dobriyan	e8edc6e03a	Detach sched.h from mm.h First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-05-21 09:18:19 -07:00
Michael S. Tsirkin	b18aad7150	IB/mthca: Fix RESET to ERROR transition According to the IB spec, a QP can be moved from RESET to the ERROR state, but mthca firmware does not support this and returns an error if we try. Work around this FW limitation by moving the QP from RESET to INIT with dummy parameters and then transitioning from INIT to ERROR. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-19 08:51:57 -07:00
Rolf Manderscheid	3f37cae694	IB/mthca: Set GRH:HopLimit when building MLX headers Global CM packets used by rmda_cm were being sent with a GRH:hopLimit of zero, causing them to be dropped by the router. The problem is a missing initialization of the hop_limit field in mthca_read_ah(), which was called by build_mlx_header() when sending a MAD on QP1. Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-19 08:51:56 -07:00
Ali Ayoub	de57c9f102	IB/mthca: Fix use-after-free on device restart Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-19 08:51:56 -07:00
Michael S. Tsirkin	bd18c11277	IB/mthca: Set cleaned CQEs back to HW ownership when cleaning CQ mthca_cq_clean() updates the CQ consumer index without moving CQEs back to HW ownership. As a result, the same WRID might get reported twice, resulting in a use-after-free. This was observed in IPoIB CM. Fix by moving all freed CQEs to HW ownership. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=617> Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-14 14:10:34 -07:00
Michael S. Tsirkin	3e28c56b9b	IB/mthca: Fix posting >255 recv WRs for Tavor Fix posting lists of > 255 receive WRs for Tavor: rq.next_ind must be updated each doorbell, otherwise the next doorbell will use an incorrect index. Found by Ronni Zimmermann at Mellanox. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-14 14:10:34 -07:00
Roland Dreier	f7c6a7b5d5	IB/uverbs: Export ib_umem_get()/ib_umem_release() to modules Export ib_umem_get()/ib_umem_release() and put low-level drivers in control of when to call ib_umem_get() to pin and DMA map userspace, rather than always calling it in ib_uverbs_reg_mr() before calling the low-level driver's reg_user_mr method. Also move these functions to be in the ib_core module instead of ib_uverbs, so that driver modules using them do not depend on ib_uverbs. This has a number of advantages: - It is better design from the standpoint of making generic code a library that can be used or overridden by device-specific code as the details of specific devices dictate. - Drivers that do not need to pin userspace memory regions do not need to take the performance hit of calling ib_mem_get(). For example, although I have not tried to implement it in this patch, the ipath driver should be able to avoid pinning memory and just use copy_{to,from}_user() to access userspace memory regions. - Buffers that need special mapping treatment can be identified by the low-level driver. For example, it may be possible to solve some Altix-specific memory ordering issues with mthca CQs in userspace by mapping CQ buffers with extra flags. - Drivers that need to pin and DMA map userspace memory for things other than memory regions can use ib_umem_get() directly, instead of hacks using extra parameters to their reg_phys_mr method. For example, the mlx4 driver that is pending being merged needs to pin and DMA map QP and CQ buffers, but it does not need to create a memory key for these buffers. So the cleanest solution is for mlx4 to call ib_umem_get() in the create_qp and create_cq methods. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-08 18:00:37 -07:00
Linus Torvalds	972d45fb43	Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: IPoIB: Convert to NAPI IB: Return "maybe missed event" hint from ib_req_notify_cq() IB: Add CQ comp_vector support IB/ipath: Fix a race condition when generating ACKs IB/ipath: Fix two more spin lock problems IB/fmr_pool: Add prefix to all printks IB/srp: Set proc_name IB/srp: Add orig_dgid sysfs attribute to scsi_host IPoIB/cm: Don't crash if remote side uses one QP for both directions RDMA/cxgb3: Support for new abort logic RDMA/cxgb3: Initialize cpu_idx field in cpl_close_listserv_req message RDMA/cxgb3: Fail qp creation if the requested max_inline is too large RDMA/cxgb3: Fix TERM codes IPoIB/cm: Fix error handling in ipoib_cm_dev_open() IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed IB/mthca: Work around kernel QP starvation IB/ipath: Don't put QP in timeout queue if waiting to send IB/ipath: Don't call spin_lock_irq() from interrupt context	2007-05-07 12:18:21 -07:00
Roland Dreier	ed23a72778	IB: Return "maybe missed event" hint from ib_req_notify_cq() The semantics defined by the InfiniBand specification say that completion events are only generated when a completions is added to a completion queue (CQ) after completion notification is requested. In other words, this means that the following race is possible: while (CQ is not empty) ib_poll_cq(CQ); // new completion is added after while loop is exited ib_req_notify_cq(CQ); // no event is generated for the existing completion To close this race, the IB spec recommends doing another poll of the CQ after requesting notification. However, it is not always possible to arrange code this way (for example, we have found that NAPI for IPoIB cannot poll after requesting notification). Also, some hardware (eg Mellanox HCAs) actually will generate an event for completions added before the call to ib_req_notify_cq() -- which is allowed by the spec, since there's no way for any upper-layer consumer to know exactly when a completion was really added -- so the extra poll of the CQ is just a waste. Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for ib_req_notify_cq() so that it can return a hint about whether the a completion may have been added before the request for notification. The return value of ib_req_notify_cq() is extended so: < 0 means an error occurred while requesting notification == 0 means notification was requested successfully, and if IB_CQ_REPORT_MISSED_EVENTS was passed in, then no events were missed and it is safe to wait for another event. > 0 is only returned if IB_CQ_REPORT_MISSED_EVENTS was passed in. It means that the consumer must poll the CQ again to make sure it is empty to avoid the race described above. We add a flag to enable this behavior rather than turning it on unconditionally, because checking for missed events may incur significant overhead for some low-level drivers, and consumers that don't care about the results of this test shouldn't be forced to pay for the test. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-06 21:18:11 -07:00
Michael S. Tsirkin	f4fd0b224d	IB: Add CQ comp_vector support Add a num_comp_vectors member to struct ib_device and extend ib_create_cq() to pass in a comp_vector parameter -- this parallels the userspace libibverbs API. Update all hardware drivers to set num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector value. Pass the value of num_comp_vectors to userspace rather than hard-coding a value of 1. We want multiple CQ event vector support (via MSI-X or similar for adapters that can generate multiple interrupts), but it's not clear how many vectors we want, or how we want to deal with policy issues such as how to decide which vector to use or how to set up interrupt affinity. This patch is useful for experimenting, since no core changes will be necessary when updating a driver to support multiple vectors, and we know that we want to make at least these changes anyway. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-05-06 21:18:11 -07:00
Jean Delvare	6473d160b4	PCI: Cleanup the includes of <linux/pci.h> I noticed that many source files include <linux/pci.h> while they do not appear to need it. Here is an attempt to clean it all up. In order to find all possibly affected files, I searched for all files including <linux/pci.h> but without any other occurence of "pci" or "PCI". I removed the include statement from all of these, then I compiled an allmodconfig kernel on both i386 and x86_64 and fixed the false positives manually. My tests covered 66% of the affected files, so there could be false positives remaining. Untested files are: arch/alpha/kernel/err_common.c arch/alpha/kernel/err_ev6.c arch/alpha/kernel/err_ev7.c arch/ia64/sn/kernel/huberror.c arch/ia64/sn/kernel/xpnet.c arch/m68knommu/kernel/dma.c arch/mips/lib/iomap.c arch/powerpc/platforms/pseries/ras.c arch/ppc/8260_io/enet.c arch/ppc/8260_io/fcc_enet.c arch/ppc/8xx_io/enet.c arch/ppc/syslib/ppc4xx_sgdma.c arch/sh64/mach-cayman/iomap.c arch/xtensa/kernel/xtensa_ksyms.c arch/xtensa/platform-iss/setup.c drivers/i2c/busses/i2c-at91.c drivers/i2c/busses/i2c-mpc.c drivers/media/video/saa711x.c drivers/misc/hdpuftrs/hdpu_cpustate.c drivers/misc/hdpuftrs/hdpu_nexus.c drivers/net/au1000_eth.c drivers/net/fec_8xx/fec_main.c drivers/net/fec_8xx/fec_mii.c drivers/net/fs_enet/fs_enet-main.c drivers/net/fs_enet/mac-fcc.c drivers/net/fs_enet/mac-fec.c drivers/net/fs_enet/mac-scc.c drivers/net/fs_enet/mii-bitbang.c drivers/net/fs_enet/mii-fec.c drivers/net/ibm_emac/ibm_emac_core.c drivers/net/lasi_82596.c drivers/parisc/hppb.c drivers/sbus/sbus.c drivers/video/g364fb.c drivers/video/platinumfb.c drivers/video/stifb.c drivers/video/valkyriefb.c include/asm-arm/arch-ixp4xx/dma.h sound/oss/au1550_ac97.c I would welcome test reports for these files. I am fine with removing the untested files from the patch if the general opinion is that these changes aren't safe. The tested part would still be nice to have. Note that this patch depends on another header fixup patch I submitted to LKML yesterday: [PATCH] scatterlist.h needs types.h http://lkml.org/lkml/2007/3/01/141 Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2007-05-02 19:02:35 -07:00
Michael S. Tsirkin	9ba6d5529d	IB/mthca: Work around kernel QP starvation With mthca, RC QPs can starve each other and even UD QPs on the same hardware schedule queue. As a result, userspace MPI can starve e.g. IPoIB traffic, with netdev watchdog warnings getting printed out, and TCP connections getting stuck or failing. Reduce the chance of this happening by using three separate hardware schedule queues: one for userspace RC QPs, one for kernel RC QPs, and one for all other QPs. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-04-30 17:30:28 -07:00
Joachim Fenkes	1912ffbb88	IB: Set class_dev->dev in core for nice device symlink All RDMA drivers except ehca set class_dev->dev to their dma_device value (ehca leaves this unset). dma_device is the only value that makes any sense, so move this assignment to core/sysfs.c. This reduce the duplicated code in the rest of the drivers and gives ehca a nice /sys/class/infiniband/ehcaX/device symlink. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-04-24 21:30:38 -07:00
Roland Dreier	30c00986f3	IB/mthca: Simplify CQ cleaning in mthca_free_qp() mthca_free_qp() already has local variables to hold the QP's send_cq and recv_cq, so we can slightly clean up the calls to mthca_cq_clean() by using those local variables instead of expressions like to_mcq(qp->ibqp.send_cq). Also, by cleaning the recv_cq first, we can avoid worrying about whether the QP is attached to an SRQ for the second call, because we would only clean send_cq if send_cq is not equal to recv_cq, and that means send_cq cannot have any receive completions from the QP being destroyed. All this work even improves the generated code a bit: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5) function old new delta mthca_free_qp 510 505 -5 Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-04-24 16:31:11 -07:00
Roland Dreier	532c3b5817	IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memory Commit `b2875d4c` ("IB/mthca: Always fill MTTs from CPU") causes a crash in mthca_write_mtt() with non-memfree HCAs that have their memory hidden (that is, have only two PCI BARs instead of having a third BAR that allows access to the RAM attached to the HCA) on 64-bit architectures. This is because the commit just before, `c20e20ab` ("IB/mthca: Merge MR and FMR space on 64-bit systems") makes dev->mr_table.fmr_mtt_buddy equal to &dev->mr_table.mtt_buddy and hence mthca_write_mtt() tries to write directly into the HCA's MTT table. However, since that table is in the HCA's memory, this is impossible without the PCI BAR that gives access to that memory. This causes a crash because mthca_tavor_write_mtt_seg() basically tries to dereference some offset of a NULL pointer. Fix this by adding a test of MTHCA_FLAG_FMR in mthca_write_mtt() so that we always use the WRITE_MTT firmware command rather than writing directly if FMRs are not enabled. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-04-24 16:31:04 -07:00
Roland Dreier	3f114853d4	IB/mthca: Update HCA firmware revisions Update the driver's list of current firmware versions with Mellanox's latest releases. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-04-18 20:21:02 -07:00
Michael S. Tsirkin	608d8268be	IB/mthca: Fix data corruption after FMR unmap on Sinai In mthca_arbel_fmr_unmap(), the high bits of the key are masked off. This gets rid of the effect of adjust_key(), which makes sure that bits 3 and 23 of the key are equal when the Sinai throughput optimization is enabled, and so it may happen that an FMR will end up with bits 3 and 23 in the key being different. This causes data corruption, because when enabling the throughput optimization, the driver promises the HCA firmware that bits 3 and 23 of all memory keys will always be equal. Fix by re-applying adjust_key() after masking the key. Thanks to Or Gerlitz for reproducing the problem, and Ariel Shahar for help in debug. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-04-16 14:10:55 -07:00
Michael S. Tsirkin	0264d88531	IB/mthca: Fix thinko in init_mr_table() Commit `c20e20ab` ("IB/mthca: Merge MR and FMR space on 64-bit systems") swapped the number of MTTs and MPTs when initializing the MR table. As a result, we get a kernel oops when the number of MTT segments allocated exceeds 0x20000. Noted by Troy Benjegerdes <troy@scl.ameslab.gov>, and reproduced by Dotan Barak <dotanb@mellanox.co.il>. This fixes https://bugs.openfabrics.org/show_bug.cgi?id=490 Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-03-26 15:59:32 -07:00
Roland Dreier	88171cfed5	IB/mthca: Fix error path in mthca_alloc_memfree() The garbled logic in mthca_alloc_memfree() causes it to return 0, even if it fails to allocate all doorbell records. Fix it to return -ENOMEM when it fails. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-03-01 13:17:14 -08:00
Adrian Bunk	c9add6ec56	IB/mthca: Make 2 functions static This patch makes the needlessly global functions mthca_tavor_write_mtt_seg() and mthca_arbel_write_mtt_seg() static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-20 20:12:02 -08:00
Roland Dreier	11282b32a4	IB/mthca: Fix allocation of ICM chunks in coherent memory The change to allow allocating ICM chunks from coherent memory did not increment the count of sg entries properly, so a chunk that required more than allocation would not be mapped properly by the HCA. Fix this by adding the missing increment of chunk->nsg. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-16 13:57:33 -08:00
Dotan Barak	fc89afce34	IB/mthca: Allow the QP state transition RESET->RESET RESET->RESET is an allowed QP state transition, so mthca should handle it correctly, by just returning success without involving the firmware. Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-16 13:57:32 -08:00
Michael S. Tsirkin	b2875d4c39	IB/mthca: Always fill MTTs from CPU Speed up memory registration by filling in MTTs directly when the CPU can write directly to the whole table (all mem-free cards, and to Tavor mode on 64-bit systems with the patch I posted earlier). This reduces the number of FW commands needed to register an MR by at least a factor of 2 and speeds up memory registration significantly. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-12 16:16:29 -08:00
Michael S. Tsirkin	c20e20ab0f	IB/mthca: Merge MR and FMR space on 64-bit systems For Tavor, we currently reserve separate MPT and MTT space for FMRs to avoid abusing the vmalloc space on 32 bit kernels. No such problem exists on 64 bit kernels so let's not do it there. This way we have a shared pool for MR and FMR resources, used on demand. This will also make it possible to write MTTs for regular regions directly from driver. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-12 16:16:29 -08:00
Michael S. Tsirkin	391e4dea71	IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs We allocate the MTT table with alloc_pages() and then do pci_map_sg(), so we must call pci_dma_sync_sg() after the CPU writes to the MTT table. This works since the device will never write MTTs on mem-free HCAs, once we get rid of the use of the WRITE_MTT firmware command. This change is needed to make that work, and is an improvement for now, since it gives FMRs a chance at working. For MPTs, both the device and CPU might write there, so we must allocate DMA coherent memory for these. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-12 16:16:29 -08:00
Michael S. Tsirkin	1d1f19cfce	IB/mthca: Give reserved MTTs a separate cache line MTTs are allocated in non-cache-coherent memory, so we must give reserved MTTs their own cache line, to prevent both device and CPU from writing into the same cache line at the same time. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-12 16:16:29 -08:00
Michael S. Tsirkin	c7d204e8fd	IB/mthca: Fix reserved MTTs calculation on mem-free HCAs The reserved_mtts field has different meaning in Tavor and Arbel, so we are wasting mtt entries on memfree. Fix the Arbel case to match Tavor semantics. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-12 16:16:29 -08:00
David Howells	6bdd61d876	IB/mthca: Work around gcc bug on sparc64 For some reason gcc-3.4.5 on sparc64 does: WARNING: "____ilog2_NaN" [drivers/infiniband/hw/mthca/ib_mthca.ko] undefined! Points to note: (1) The asm volatile flush/flushw are just markers for viewing what comes out in the assembly; removing them has no effect on the result. (2) Changing almost anything else in dwh__mthca_arbel_init_srq_context() or dwh__mthca_alloc_srq() causes the problem to go away. The compiler command line issued by the kernel build is: /opt/crosstool/gcc-3.4.5-glibc-2.3.6/sparc64-unknown-linux-gnu/bin/sparc64-unknown-linux-gnu-gcc -fno-strict-aliasing -fno-common -Os -m64 -mno-fpu -mcpu=ultrasparc -mcmodel=medlow -ffixed-g4 -ffixed-g5 -fcall-used-g7 -Wa,--undeclared-regs -pg -fno-omit-frame-pointer -fno-optimize-sibling-calls -fasynchronous-unwind-tables -g -c -o drivers/infiniband/hw/mthca/.tmp_mthca_srq.o drivers/infiniband/hw/mthca/mthca_srq.c This can be reduced to this whilst still retaining the problem: /opt/crosstool/gcc-3.4.5-glibc-2.3.6/sparc64-unknown-linux-gnu/bin/sparc64-unknown-linux-gnu-gcc -m64 -c -o drivers/infiniband/hw/mthca/mthca_srq.o drivers/infiniband/hw/mthca/mthca_srq.c -Os Removing -Os or changing it to -O or -O0 thru -O6 gets rid of the problem. This patch to the kernel code fixes the problem: Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-10 08:00:49 -08:00
Roland Dreier	99d4f22e91	IB/mthca: Use correct structure size in call to memset() When clearing the ib_ah_attr parameter in to_ib_ah_attr(), use sizeof ib_ah_attr instead of sizeof path. Pointed out by Jack Morgenstein <jackm@mellanox.co.il>. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-10 08:00:47 -08:00
Michael S. Tsirkin	062dbb69f3	IB: Return qp pointer as part of ib_wc struct ib_wc currently only includes the local QP number: this matches the IB spec, but seems mostly useless. The following patch replaces this with the pointer to qp itself, and updates all low level drivers and all users. This has the following advantages: - Ability to get a per-qp context through wc->qp->qp_context - Existing drivers already have the qp pointer ready in poll cq, so this change actually saves a tiny bit (extra memory read) on data path (for ehca it would actually be expensive to find the QP pointer when polling a CQ, but ehca does not support SRQ so we can leave wc->qp as NULL for ehca) - Users that need the QP number can still get it through wc->qp->qp_num Use case: In IPoIB connected mode code, I have a common CQ shared by multiple QPs. To track connection usage, I need a way to get at some per-QP context upon the completion, and I would like to avoid allocating context object per work request just to stick a QP pointer into it. With this code, I can just use wc->qp->qp_context. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-02-04 14:11:55 -08:00
Dotan Barak	f5e10529a9	IB/mthca: Don't execute QUERY_QP firmware command for QP in RESET state If a QP being queried is in the RESET state, don't execute the QUERY_QP firmware command (because it will fail). Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-01-09 14:14:28 -08:00
Jack Morgenstein	98714cb161	IB/mthca: Fix PRM compliance problem in atomic-send completions According to the Tavor and Arbel programmer's reference manuals, the number of bytes transferred is not provided in the byte_cnt field of the CQ entry for atomic operation completions. For atomic operations, the number of bytes transferred is always 8 (when the status is "success"), and this constant value should always be used by the driver in the ib_wc entry returned, rather than using the CQE. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-01-07 20:25:24 -08:00
Michael S. Tsirkin	46707e96b7	IB/mthca: Fix off-by-one in FMR handling on memfree mthca_table_find() will return the wrong address when the table entry being searched for is exactly at the beginning of a sglist entry (other than the first), because it uses >= when it should use >. Example: assume we have 2 entries in scatterlist, 4K each, offset is 4K. The current code will return first entry + 4K when we really want the second entry. In particular this means mapping an FMR on a memfree HCA may end up writing the page table into the wrong place, leading to memory corruption and also causing the HCA to use an incorrect address translation table. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2007-01-04 19:46:32 -08:00
Michael S. Tsirkin	9d79f1b467	[PATCH] IB/mthca: Fix FMR breakage caused by kmemdup() conversion Commit `bed8bdfd` ("IB: kmemdup() cleanup") introduced one bad conversion to kmemdup() in mthca_alloc_fmr(), where the structure allocated and the structure copied are not the same size. Revert this back to the original kmalloc()/memcpy() code. Reported-by: Dotan Barak <dotanb@mellanox.co.il>. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <roland@digitalvampire.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-30 10:55:55 -08:00
Roland Dreier	0b0df6f207	IB/mthca: Use DEFINE_MUTEX() instead of mutex_init() mthca_device_mutex() can be initialized automatically with DEFINE_MUTEX() rather than explicitly calling mutex_init(). This saves a bit of text and shrinks the source by a line, so we may as well do it.... Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-12-15 20:55:28 -08:00
Leonid Arsh	82da703ee6	IB/mthca: Add HCA profile module parameters Add module parameters that enable settting some of the HCA profile values, such as the number of QPs, CQs, etc. Signed-off-by: Leonid Arsh <leonida@voltaire.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-12-15 20:46:31 -08:00
David Howells	f0d1b0b30d	[PATCH] LOG2: Implement a general integer log2 facility in the kernel This facility provides three entry points: ilog2() Log base 2 of unsigned long ilog2_u32() Log base 2 of u32 ilog2_u64() Log base 2 of u64 These facilities can either be used inside functions on dynamic data: int do_something(long q) { ...; y = ilog2(x) ...; } Or can be used to statically initialise global variables with constant values: unsigned n = ilog2(27); When performing static initialisation, the compiler will report "error: initializer element is not constant" if asked to take a log of zero or of something not reducible to a constant. They treat negative numbers as unsigned. When not dealing with a constant, they fall back to using fls() which permits them to use arch-specific log calculation instructions - such as BSR on x86/x86_64 or SCAN on FRV - if available. [akpm@osdl.org: MMC fix] Signed-off-by: David Howells <dhowells@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Howells <dhowells@redhat.com> Cc: Wojtek Kaniewski <wojtekka@toxygen.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:51 -08:00
Christoph Lameter	54e6ecb239	[PATCH] slab: remove SLAB_ATOMIC SLAB_ATOMIC is an alias of GFP_ATOMIC Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:24 -08:00
David Howells	4c1ac1b491	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 14:37:56 +00:00
Jack Morgenstein	7013696a5f	IB/mthca: Fix initial SRQ logsize for mem-free HCAs When initializing an mthca SRQ, the log_srq_size field should be the log of the number of SRQ WQEs, not the log of the number of bytes in the SRQ. This affects only mthca drivers for memfree HCAs which set the initial srq wqe counter (in the SW2HW transition) to a non-zero value. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:08 -08:00
Roland Dreier	f4f3d0f0ec	IB/mthca: Fix section mismatches Commit `b3b30f5e` ("IB/mthca: Recover from catastrophic errors") introduced some section mismatch breakage, because the error recovery code tears down and reinitializes the device, which calls into lots of code originally marked __devinit and __devexit from regular .text. Fix this by getting rid of these now-incorrect section markers. Reported by Randy Dunlap <randy.dunlap@oracle.com>. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:06 -08:00
Eric Sesterhenn	bed8bdfddd	IB: kmemdup() cleanup Replace open coded kmemdup() to save some screen space, and allow inlining/not inlining to be triggered by gcc. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-11-29 15:33:06 -08:00
David Howells	c4028958b6	WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:57:56 +00:00
Michael S. Tsirkin	68586b67ab	IB/mthca: Fix MAD extended header format for MAD_IFC firmware command Several fields in an incoming MAD extended info header were passed into the MAD_IFC firmware command at incorrect offsets (mostly off by 4 bytes). As the result, the HCA will fail to generate traps in which this info is needed (e.g. traps which include the GRH of the incoming packet), in violation of the IB spec. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-10-31 10:51:26 -08:00
Arthur Kepner	1f5c23e2c1	IB/mthca: Use mmiowb after doorbell ring We discovered a problem when running IPoIB applications on multiple CPUs on an Altix system. Many messages such as: ib_mthca 0002:01:00.0: SQ 000014 full (19941644 head, 19941707 tail, 64 max, 0 nreq) appear in syslog, and the driver wedges up. Apparently this is because writes to the doorbells from different CPUs reach the device out of order. The following patch adds mmiowb() calls after doorbell rings to ensure the doorbell writes are ordered. Signed-off-by: Arthur Kepner <akepner@sgi.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-10-16 20:22:35 -07:00
Michael S. Tsirkin	2cbe19d48a	IB/mthca: Fix off-by-one in mthca SRQ creation All HCAs (not just mem-free) need a spare SRQ entry, so bump srq->max by 1 in all cases. Noted by Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-10-10 12:50:38 -07:00
Jack Morgenstein	a8bf4e7717	IB/mthca: Query port fix Fill in "max_vl_num" (encoded according to VLCap field in the PortInfo MAD) and "init_type_reply" values in the ib_query_port() verb. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-10-10 12:50:36 -07:00
David Howells	7d12e780e0	IRQ: Maintain regs pointer globally rather than passing to IRQ handlers Maintain a per-CPU global "struct pt_regs " variable which can be used instead of passing regs around manually through all ~1800 interrupt handlers in the Linux kernel. The regs pointer is used in few places, but it potentially costs both stack space and code to pass it around. On the FRV arch, removing the regs parameter from all the genirq function results in a 20% speed up of the IRQ exit path (ie: from leaving timer_interrupt() to leaving do_IRQ()). Where appropriate, an arch may override the generic storage facility and do something different with the variable. On FRV, for instance, the address is maintained in GR28 at all times inside the kernel as part of general exception handling. Having looked over the code, it appears that the parameter may be handed down through up to twenty or so layers of functions. Consider a USB character device attached to a USB hub, attached to a USB controller that posts its interrupts through a cascaded auxiliary interrupt controller. A character device driver may want to pass regs to the sysrq handler through the input layer which adds another few layers of parameter passing. I've build this code with allyesconfig for x86_64 and i386. I've runtested the main part of the code on FRV and i386, though I can't test most of the drivers. I've also done partial conversion for powerpc and MIPS - these at least compile with minimal configurations. This will affect all archs. Mostly the changes should be relatively easy. Take do_IRQ(), store the regs pointer at the beginning, saving the old one: struct pt_regs old_regs = set_irq_regs(regs); And put the old one back at the end: set_irq_regs(old_regs); Don't pass regs through to generic_handle_irq() or __do_IRQ(). In timer_interrupt(), this sort of change will be necessary: - update_process_times(user_mode(regs)); - profile_tick(CPU_PROFILING, regs); + update_process_times(user_mode(get_irq_regs())); + profile_tick(CPU_PROFILING); I'd like to move update_process_times()'s use of get_irq_regs() into itself, except that i386, alone of the archs, uses something other than user_mode(). Some notes on the interrupt handling in the drivers: () input_dev() is now gone entirely. The regs pointer is no longer stored in the input_dev struct. () finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does something different depending on whether it's been supplied with a regs pointer or not. (*) Various IRQ handler function pointers have been moved to type irq_handler_t. Signed-Off-By: David Howells <dhowells@redhat.com> (cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)	2006-10-05 15:10:12 +01:00
Roland Dreier	d35cc330a2	IB/mthca: Simplify calls to mthca_cq_clean() If a QP has separate send and receive CQs, then the send CQ will never have receive completions from that QP in it. So when cleaning the send CQ, there's no need to pass in an SRQ pointer, even if the QP is attached to an SRQ. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:55 -07:00
Jack Morgenstein	b3b30f5e8a	IB/mthca: Recover from catastrophic errors Trigger device remove and then add when a catastrophic error is detected in hardware. This, in turn, will cause a device reset, which we hope will recover from the catastrophic condition. Since this might interefere with debugging the root cause, add a module option to suppress this behaviour. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:54 -07:00
Tom Tucker	07ebafbaaa	RDMA: iWARP Core Changes. Modifications to the existing rdma header files, core files, drivers, and ulp files to support iWARP, including: - Hook iWARP CM into the build system and use it in rdma_cm. - Convert enum ib_node_type to enum rdma_node_type, which includes the possibility of RDMA_NODE_RNIC, and update everything for this. Signed-off-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:47 -07:00
Roland Dreier	3cd965646b	IB: Whitespace fixes Remove some trailing whitespace that has snuck in despite the best efforts of whitespace=error-all. Also fix a few other whitespace bogosities. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:46 -07:00
Jack Morgenstein	9e583b85c2	IB/mthca: Return correct number of bits for static rate in query_qp Incorrect number of bits was taken for static_rate field. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:42 -07:00
Jack Morgenstein	f6f76725b5	IB/mthca: Return port number for unconnected QPs in query_qp port_num was not being returned for unconnected QPs. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:41 -07:00
Jack Morgenstein	b046a04e16	IB/mthca: Fix default static rate returned for Tavor in AV When default static rate is returned for Tavor, need to translate it to an ib rate value. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:40 -07:00
Ralph Campbell	9bc57e2d19	IB/uverbs: Pass userspace data to modify_srq and modify_qp methods Pass a struct ib_udata to the low-level driver's ->modify_srq() and ->modify_qp() methods, so that it can get to the device-specific data passed in by the userspace driver. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:22:25 -07:00
James Lentini	2a214182d2	IB/mthca: Include the header we really want Signed-off-by: James Lentini <jlentini@netapp.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:17:20 -07:00
Michael S. Tsirkin	9fd558f454	IB/mthca: Don't use privileged UAR for kernel access Make kernel use UAR2 instead of UAR1 for hardware access: this adds sanity checking from the hardware side, without any performance cost. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:17:18 -07:00
Jack Morgenstein	b27075735e	IB/mthca: Fix lid used for sending traps The SM LID used to send traps to is incorrectly set to port LID. This is a regression from 2.6.17 -- after a PortInfo MAD is received, no traps are sent to the SM LID. The traps go to the loopback interface instead, and are dropped there. The SM LID should be taken from the sm_lid of the PortInfo response. The bug was introduced by commit 12bbb2b7be7f5564952ebe0196623e97464b8ac5: IB/mthca: Add client reregister event generation Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-09-22 15:17:17 -07:00
Roland Dreier	5a4e6dccbc	IB/mthca: Use IRQ safe locks to protect allocation bitmaps It is supposed to be OK to call mthca_create_ah() and mthca_destroy_ah() from any context. However, for mem-full HCAs, these functions use the mthca_alloc() and mthca_free() bitmap helpers, and those helpers use non-IRQ-safe spin_lock() internally. Lockdep correctly warns that this could lead to a deadlock. Fix this by changing mthca_alloc() and mthca_free() to use spin_lock_irqsave(). Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-08-31 17:25:56 -07:00
Michael S. Tsirkin	834ac73d4b	IB/mthca: Update HCA firmware revisions Update the driver's list of HCA firmware revisions to make sure people running Sinai firmware older than 1.1.0 get a message suggesting a firmware upgrade. Update the Arbel versions as well while we are at it. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-08-23 13:29:08 -07:00
Roland Dreier	5beba53230	IB/mthca: No userspace SRQs if HCA doesn't have SRQ support Leave all SRQ methods out of the device's uverbs_cmd_mask if the device doesn't have SRQ support (because of ancient firmware) so that we don't allow userspace to call the driver's create_srq method. This fixes a userspace-triggerable oops caused by ib_uverbs_create_srq() following the device's ->create_srq function pointer, which will be NULL if the device doesn't support SRQs. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-08-18 10:41:46 -07:00
Roland Dreier	a19aa5c5fd	IB/mthca: Fix potential AB-BA deadlock with CQ locks When destroying a QP, mthca locks both the QP's send CQ and receive CQ. However, the following scenario is perfectly valid: QP_a: send_cq == CQ_x, recv_cq == CQ_y QP_b: send_cq == CQ_y, recv_cq == CQ_x The old mthca code simply locked send_cq and then recv_cq, which in this case could lead to an AB-BA deadlock if QP_a and QP_b were destroyed simultaneously. We can fix this by changing the locking code to lock the CQ with the lower CQ number first, which will create a consistent lock ordering. Also, the second CQ is locked with spin_lock_nested() to tell lockdep that we know what we're doing with the lock nesting. This bug was found by lockdep. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-08-11 08:56:57 -07:00
Michael S. Tsirkin	e54b82d739	IB/mthca: Make fence flag work for send work requests The fence bit needs to be set in the doorbell too, not just the WQE. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-08-10 10:50:52 -07:00
Roland Dreier	69e9fbb460	IB/mthca: Clean up mthca array index mask Define a constant MTHCA_ARRAY_MASK to replace repeated uses of (PAGE_SIZE / sizeof (void *) - 1) in mthca array code. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-08-03 09:44:22 -07:00
Michael S. Tsirkin	bf74c7479e	IB/mthca: Fix mthca_array_clear() thinko mthca_array_clear() does not clear the slot if the used count is positive. This leads to crashes in mthca_qp_event() since that uses mthca_array_get() to check that the qp is valid. Discovered by Ali Ayoub. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-08-03 09:44:21 -07:00
Roland Dreier	8fdf679fdb	IB/mthca: Initialize max_cmds before debug code prints it Read the max_cmds value from the response to the QUERY_FW command before printing out the value, so that the real value goes into the debug output. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-07-24 09:36:50 -07:00
Dotan Barak	1252c517cf	IB/mthca: Fix SRQ limit event range check Mem-free HCAs always keep one spare SRQ WQE, so the SRQ limit cannot be set beyond srq->max - 1. Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-07-24 07:20:32 -07:00
Michael S. Tsirkin	0964d91618	[PATCH] IB/mthca: comment fix After recent changes, mthca_wq_init does not actually initialize the WQ as it used to - it simply resets all index fields to their initial values. So, let's rename it to mthca_wq_reset. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Cc: Roland Dreier <rolandd@cisco.com> Acked-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-14 21:53:50 -07:00
Jack Morgenstein	2290d2c9f5	[PATCH] IB/mthca: fix static rate returned by mthca_ah_query mthca_ah_query returs the static rate of the address handle in internal mthc format. fix it to use rate encoding from enum ib_rate, which is what users expect. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Cc: Roland Dreier <rolandd@cisco.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-14 21:53:50 -07:00
Zach Brown	a46f9484f8	[PATCH] mthca: initialize send and receive queue locks separately mthca: initialize send and receive queue locks separately lockdep identifies a lock by the call site of its initialization. By initializing the send and receive queue locks in mthca_wq_init() we confuse lockdep. It warns that that the ordered acquiry of both locks in mthca_modify_qp() is recursive acquiry of one lock: ============================================= [ INFO: possible recursive locking detected ] --------------------------------------------- modprobe/1192 is trying to acquire lock: (&wq->lock){....}, at: [<f892b4db>] mthca_modify_qp+0x60/0xa7b [ib_mthca] but task is already holding lock: (&wq->lock){....}, at: [<f892b4ce>] mthca_modify_qp+0x53/0xa7b [ib_mthca] Initializing the locks separately in mthca_alloc_qp_common() stops the warning and will let lockdep enforce proper ordering on paths that acquire both locks. Signed-off-by: Zach Brown <zach.brown@oracle.com> Cc: Roland Dreier <rolandd@cisco.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-04 10:24:57 -07:00
Thomas Gleixner	dace145374	[PATCH] irq-flags: misc drivers: Use the new IRQF_ constants Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-07-02 13:58:50 -07:00
Jörn Engel	6ab3d5624e	Remove obsolete #include <linux/config.h> Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-06-30 19:25:36 +02:00
Greg Kroah-Hartman	e29419fffc	[PATCH] 64bit resource: fix up printks for resources in misc drivers This is needed if we wish to change the size of the resource structures. Based on an original patch from Vivek Goyal <vgoyal@in.ibm.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2006-06-27 09:23:59 -07:00
Roland Dreier	c93b6fbaa9	IB/mthca: Make all device methods truly reentrant Documentation/infiniband/core_locking.txt says: All of the methods in struct ib_device exported by a low-level driver must be fully reentrant. The low-level driver is required to perform all synchronization necessary to maintain consistency, even if multiple function calls using the same object are run simultaneously. However, mthca's modify_qp, modify_srq and resize_cq methods are currently not reentrant. Add a mutex to the QP, SRQ and CQ structures so that these calls can be properly serialized. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:41 -07:00
Roland Dreier	c9c5d9feef	IB/mthca: Fix memory leak on modify_qp error paths Some error paths after the mthca_alloc_mailbox() call in mthca_modify_qp() just do a "return -EINVAL" without freeing the mailbox. Convert these returns to "goto out" to avoid leaking the mailbox storage. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:41 -07:00
Or Gerlitz	d4cb0784fd	IB/mthca: Fill in max_map_per_fmr device attribute Report the true max_map_per_fmr value from mthca_query_device(), taking into account the change in FMR remapping introduced by the Sinai performance optimization. Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:37 -07:00
Leonid Arsh	12bbb2b7be	IB/mthca: Add client reregister event generation Change the mthca snoop of MADs that set PortInfo to check if the SM has set the client reregister bit, and if it has, generate a client reregister event. If the bit is not set, just generate a LID change event as usual. Signed-off-by: Leonid Arsh <leonida@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:36 -07:00
Roland Dreier	e9cd59418f	IB/mthca: Convert FW commands to use wait_for_completion_timeout() The kernel has had wait_for_completion_timeout() for a long time now. mthca should use it to handle FW commands timing out, instead of implementing the same thing in a much more complicated way by using wait_for_completion() along with a timer that does complete(). Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:30 -07:00
Michael S. Tsirkin	a26026c122	IB/mthca: Remove dead code Kill some dead code in mthca_eq.c Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:29 -07:00
Michael S. Tsirkin	4e56ea794e	IB/mthca: memfree completion with error FW bug workaround Memfree firmware is in rare cases reporting WQE index == base - 1 in receive completion with error, instead of (rq size - 1); base is 0 in mthca. Here is a patch to avoid kernel crash and report a correct WR id in this case. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:20 -07:00
Michael S. Tsirkin	13aa6ecb47	IB/mthca: restore missing PCI registers after reset mthca does not restore the following PCI-X/PCI Express registers after reset: PCI-X device: PCI-X command register PCI-X bridge: upstream and downstream split transaction registers PCI Express : PCI Express device control and link control registers This causes instability and/or bad performance on systems where one of these registers is set to a non-default value by BIOS. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-06-17 20:37:00 -07:00
Michael S. Tsirkin	ab28b171ea	IB/mthca: Fix posting lists of 256 receive requests to SRQ for Tavor If we post a list of length exactly a multiple of 256, nreq in doorbell gets set to 256 which is wrong: it should be encoded by 0. This is because we only zero it out on the next WR, which may not be there. The solution is to ring the doorbell after posting a WQE, not before posting the next one. This is the same bug that we just fixed for QPs with non-shared RQ. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-05-24 13:43:37 -07:00
Michael S. Tsirkin	23f3bc0f2c	IB/mthca: Fix posting lists of 256 receive requests for Tavor If we post a list of length 256 exactly, nreq in doorbell gets set to 256 which is wrong: it should be encoded by 0. This is because we only zero it out on the next WR, which may not be there. The solution is to ring the doorbell after posting a WQE, not before posting the next one. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-05-18 11:37:03 -07:00
Roland Dreier	1db76c14d2	IB/mthca: Make fw_cmd_doorbell default to 0 Setting fw_cmd_doorbell allows FW command to be queued using posted writes instead of requiring polling on a "go" bit, so it should be a performance boost. However, the option causes problems with at least some device/firmware combinations, so set the default to 0 until we understand what's going on better. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-05-17 07:48:07 -07:00
Michael S. Tsirkin	ce477ae4f8	IB/mthca: FMR ioremap fix Addresses for ioremap must be calculated off of pci_resource_start; we can't directly use the bus address as seen by the HCA. Fix the code that remaps device memory for FMR access. Based on patch by Klaus Smolin. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-05-10 15:16:57 -07:00
Roland Dreier	a3285aa4ee	IB/mthca: Fix race in reference counting Fix races in in destroying various objects. If a destroy routine waits for an object to become free by doing wait_event(&obj->wait, !atomic_read(&obj->refcount)); /* now clean up and destroy the object */ and another place drops a reference to the object by doing if (atomic_dec_and_test(&obj->refcount)) wake_up(&obj->wait); then this is susceptible to a race where the wait_event() and final freeing of the object occur between the atomic_dec_and_test() and the wake_up(). And this is a use-after-free, since wake_up() will be called on part of the already-freed object. Fix this in mthca by replacing the atomic_t refcounts with plain old integers protected by a spinlock. This makes it possible to do the decrement of the reference count and the wake_up() so that it appears as a single atomic operation to the code waiting on the wait queue. While touching this code, also simplify mthca_cq_clean(): the CQ being cleaned cannot go away, because it still has a QP attached to it. So there's no reason to be paranoid and look up the CQ by number; it's perfectly safe to use the pointer that the callers already have. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-05-09 10:50:29 -07:00
Roland Dreier	254abfd33a	IB/mthca: Fix offset in query_gid method GuidInfo records have 8 byte GUIDs in them, so an index should be multiplied by 8 to get an offset. mthca_query_gid() was incorrectly multiplying by 16. Noticed by Leonid Keller <leonid@mellanox.co.il>. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-05-01 10:40:23 -07:00
Adrian Bunk	415dcd95b2	IB/mthca: make a function static This patch makes the needlessly global mthca_update_rate() static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-04-19 11:40:12 -07:00
Jack Morgenstein	59fef3b1e9	IB/mthca: Fix max_srq_sge returned by ib_query_device for Tavor devices The driver allocates SRQ WQEs size with a power of 2 size both for Tavor and for memfree. For Tavor, however, the hardware only requires the WQE size to be a multiple of 16, not a power of 2, and the max number of scatter-gather allowed is reported accordingly by the firmware (and this is the value currently returned by ib_query_device() and ibv_query_device()). If the max number of scatter/gather entries reported by the FW is used when creating an SRQ, the creation will fail for Tavor, since the required WQE size will be increased to the next power of 2, which turns out to be larger than the device permitted max WQE size (which is not a power of 2). This patch reduces the reported SRQ max wqe size so that it can be used successfully in creating an SRQ on Tavor HCAs. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-04-12 11:42:30 -07:00
Michael S. Tsirkin	abf45dbb5b	IB/mthca: Disable tuning PCI read burst size The PCI spec recommends against drivers playing with a device's PCI read burst size, and says that systems software should configure it. And we actually have users that report that changing it from the default set by BIOS hurts performance and/or stability for them. On the other hand, the Mellanox Programmer's Reference Manual recommends turning it up all the way to the maximum value. Some tests conducted here in the lab do not show performance improvement from this tuning, but this might be just me. As a work-around, make this tuning an option, off by default (safe value), with an eye towards removing it completely one day if no one complains. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-04-10 09:43:58 -07:00
Jack Morgenstein	bf6a9e31cf	IB: simplify static rate encoding Push translation of static rate to HCA format into low-level drivers, where it belongs. For static rate encoding, use encoding of rate field from IB standard PathRecord, with addition of value 0, for backwards compatibility with current usage. The changes are: - Add enum ib_rate to midlayer includes. - Get rid of static rate translation in IPoIB; just use static rate directly from Path and MulticastGroup records. - Update mthca driver to translate absolute static rate into the format used by hardware. This also fixes mthca's static rate handling for HCAs that are capable of 4X DDR. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-04-10 09:43:47 -07:00
Roland Dreier	227c939b00	IB/mthca: Always build debugging code unless CONFIG_EMBEDDED=y Change the mthca debugging trace output code so that it can enabled and disabled at runtime with the debug_level module parameter in sysfs. Also, don't allow CONFIG_INFINIBAND_MTHCA_DEBUG to be disabled unless CONFIG_EMBEDDED is selected. We want users (and especially distros) to have this turned on unless they really need to save space, because by the time we want debugging output, it's usually too late to rebuild a kernel. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-04-02 14:39:20 -07:00
Roland Dreier	e1f7868c80	IB/mthca: Fix section mismatch problems Quite a few cleanup functions in mthca were marked as __devexit. However, they could also be called from error paths during initialization, so they cannot be marked that way. Just delete all of the incorrect annotations. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-29 09:36:46 -08:00
Jack Morgenstein	a07bacca7b	IB/mthca: Fix check of size in SRQ creation The previous patch for Tavor broke MemFree logic. The driver should perform limit check only for Tavor. For MemFree, the check is incorrect, since ds (WQE stride) is always a power-of-2 (although the max_desc_size may not be). In Tavor, however, WQE stride and desc_size are the same, and are not necessarily power-of-2. The check was really for the WQE stride (and it Tavor, we use max_desc_size for the stride). Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-29 09:36:46 -08:00
Roland Dreier	192daa18dd	IB/mthca: Fix modify QP error path If the call to mthca_MODIFY_QP() failed, then mthca_modify_qp() would still do some things it shouldn't, such as store away attributes for special QPs. Fix this, and simplify the code, by simply jumping to the exit path if mthca_MODIFY_QP() fails. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-24 15:47:30 -08:00
Roland Dreier	b0b3a8e193	IB/mthca: Fix indentation Fix some whitespace damage (indenting with spaces) that snuck in. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-24 15:47:29 -08:00
Jack Morgenstein	b3f64967fa	IB/mthca: Fix uninitialized variable in mthca_alloc_qp() mthca_alloc_sqp() by mthca_set_qp_size() need to set qp->transport before calling mthca_set_qp_size(), since the value is used there. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-24 15:47:29 -08:00
Jack Morgenstein	d4301e2c66	IB/mthca: Check SRQ limit in modify SRQ operation When setting the shared receive queue (SRQ) watermark in a modify SRQ operation, make sure that the supplied value is not larger than the full size of the SRQ. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-24 15:47:28 -08:00
Jack Morgenstein	ded9ad721d	IB/mthca: Check that SRQ WQE size does not exceed device's max value Guarantee the calculated work queue entry size does not exceed the max allowable WQE size when creating an SRQ. This is a problem with Arbel in Tavor-compatibility mode because the current WQE size computation method rounds up to next power of 2. Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-24 15:47:27 -08:00
Dotan Barak	0ef61db837	IB/mthca: Check that sgid_index and path_mtu are valid in modify_qp Add a check that the modify QP parameters sgid_index and path_mtu are valid, since they might come from userspace. Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-24 15:47:27 -08:00
Eli Cohen	fd02e8038e	IB/mthca: Query SRQ srq_limit fixes Fix endianness handling of srq_limit: it is big-endian in the context structure, so we need to swab it before returning it. Also add support for srq_limit query for Tavor (non-MemFree) HCAs. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:26 -08:00
Dotan Barak	e10e271bfd	IB/mthca: Correct reported SRQ size in MemFree case. MemFree devices need to reserve one shared receive queue (SRQ) work request for internal use, so the capacity returned from the create_srq and query_srq methods should be srq->max - 1. Signed-off-by: Dotan Barak <dotanb@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:26 -08:00
Roland Dreier	6b63e3015a	IB/mthca: Coverity fix to mthca_init_eq_table() Fix bug found by coverity: the loop body never executed, because it was doing for (i = 0; i < MTHCA_EQ_CMD; ++i), but MTHCA_EQ_CMD is 0. The correct loop bound is MTHCA_NUM_EQ, to loop over all EQs. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:25 -08:00
Roland Dreier	6226bb5701	IB/mthca: Update firmware versions Update known firmware versions in driver's table to the latest releases. Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:22 -08:00
Eli Cohen	651eaac928	IB/mthca: Optimize large messages on Sinai HCAs Sinai (one-port PCI Express) HCAs get improved throughput for messages bigger than 80 KB in DDR mode if memory keys are formatted in a specific way. The enhancement only works if the memory key table is smaller than 2^24 entries. For larger tables, the enhancement is off and a warning is printed (to avoid silent performance loss). Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Michael Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>	2006-03-20 10:08:22 -08:00

1 2 3 4 5 ...

427 Коммитов