WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Yihang Li	89954f024c	scsi: hisi_sas: Ensure all enabled PHYs up during controller reset For the controller reset operation, hisi_sas_phy_enable() is executed for each enabled local PHY, and refresh the port id of each device based on the latest hisi_sas_phy->port_id after 1 second sleep, hisi_sas_phy->port_id is configured in the interrupt processing function phy_up_v3_hw(). However, in directly attached scenario, for some SATA disks the amount of time for phyup more than 1s sometimes. In this case, incorrect port id may be configured in hisi_sas_refresh_port_id(). As a result, all the internal IOs fail and disk lost, such as follows: [10717.666565] hisi_sas_v3_hw 0000:74:02.0: phyup: phy1 link_rate=10(sata) [10718.826813] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=63 task=00000000c1ab1c2b dev id=200 addr=5000000000000501 CQ hdr: 0x8000007 0xc8003f 0x0 0x0 Error info: 0x0 0x0 0x0 0x0 [10718.843428] sas: TMF task open reject failed 5000000000000501 [10718.849242] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=64 task=00000000c1ab1c2b dev id=200 addr=5000000000000501 CQ hdr: 0x8000007 0xc80040 0x0 0x0 Error info: 0x0 0x0 0x0 0x0 [10718.865856] sas: TMF task open reject failed 5000000000000501 [10718.871670] hisi_sas_v3_hw 0000:74:02.0: erroneous completion iptt=65 task=00000000c1ab1c2b dev id=200 addr=5000000000000501 CQ hdr: 0x8000007 0xc80041 0x0 0x0 Error info: 0x0 0x0 0x0 0x0 [10718.888284] sas: TMF task open reject failed 5000000000000501 [10718.894093] sas: executing TMF for 5000000000000501 failed after 3 attempts! [10718.901114] hisi_sas_v3_hw 0000:74:02.0: ata disk 5000000000000501 reset failed [10718.908410] hisi_sas_v3_hw 0000:74:02.0: controller reset complete ..... [10773.298633] ata216.00: revalidation failed (errno=-19) [10773.303753] ata216.00: disable device So the time of waitting for PHYs up is 1s which may be not enough. To solve the issue, running hisi_sas_phy_enable() in parallel through async operations and use wait_for_completion_timeout() to wait for PHYs come up instead of directly sleep for 1 second. Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1679283265-115066-4-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-04-02 21:57:35 -04:00
Xingui Yang	71fb36b5ff	scsi: hisi_sas: Grab sas_dev lock when traversing the members of sas_dev.list When freeing slots in function slot_complete_v3_hw(), it is possible that sas_dev.list is being traversed elsewhere, and it may trigger a NULL pointer exception, such as follows: ==>cq thread ==>scsi_eh_6 ==>scsi_error_handler() ==>sas_eh_handle_sas_errors() ==>sas_scsi_find_task() ==>lldd_abort_task() ==>slot_complete_v3_hw() ==>hisi_sas_abort_task() ==>hisi_sas_slot_task_free() ==>dereg_device_v3_hw() ==>list_del_init() ==>list_for_each_entry_safe() [ 7165.434918] sas: Enter sas_scsi_recover_host busy: 32 failed: 32 [ 7165.434926] sas: trying to find task 0x00000000769b5ba5 [ 7165.434927] sas: sas_scsi_find_task: aborting task 0x00000000769b5ba5 [ 7165.434940] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(00000000769b5ba5) aborted [ 7165.434964] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(00000000c9f7aa07) ignored [ 7165.434965] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(00000000e2a1cf01) ignored [ 7165.434968] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 7165.434972] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(0000000022d52d93) ignored [ 7165.434975] hisi_sas_v3_hw 0000:b4:02.0: slot complete: task(0000000066a7516c) ignored [ 7165.434976] Mem abort info: [ 7165.434982] ESR = 0x96000004 [ 7165.434991] Exception class = DABT (current EL), IL = 32 bits [ 7165.434992] SET = 0, FnV = 0 [ 7165.434993] EA = 0, S1PTW = 0 [ 7165.434994] Data abort info: [ 7165.434994] ISV = 0, ISS = 0x00000004 [ 7165.434995] CM = 0, WnR = 0 [ 7165.434997] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f29543f2 [ 7165.434998] [0000000000000000] pgd=0000000000000000 [ 7165.435003] Internal error: Oops: 96000004 [#1] SMP [ 7165.439863] Process scsi_eh_6 (pid: 4109, stack limit = 0x00000000c43818d5) [ 7165.468862] pstate: 00c00009 (nzcv daif +PAN +UAO) [ 7165.473637] pc : dereg_device_v3_hw+0x68/0xa8 [hisi_sas_v3_hw] [ 7165.479443] lr : dereg_device_v3_hw+0x2c/0xa8 [hisi_sas_v3_hw] [ 7165.485247] sp : ffff00001d623bc0 [ 7165.488546] x29: ffff00001d623bc0 x28: ffffa027d03b9508 [ 7165.493835] x27: ffff80278ed50af0 x26: ffffa027dd31e0a8 [ 7165.499123] x25: ffffa027d9b27f88 x24: ffffa027d9b209f8 [ 7165.504411] x23: ffffa027c45b0d60 x22: ffff80278ec07c00 [ 7165.509700] x21: 0000000000000008 x20: ffffa027d9b209f8 [ 7165.514988] x19: ffffa027d9b27f88 x18: ffffffffffffffff [ 7165.520276] x17: 0000000000000000 x16: 0000000000000000 [ 7165.525564] x15: ffff0000091d9708 x14: ffff0000093b7dc8 [ 7165.530852] x13: ffff0000093b7a23 x12: 6e7265746e692067 [ 7165.536140] x11: 0000000000000000 x10: 0000000000000bb0 [ 7165.541429] x9 : ffff00001d6238f0 x8 : ffffa027d877af00 [ 7165.546718] x7 : ffffa027d6329600 x6 : ffff7e809f58ca00 [ 7165.552006] x5 : 0000000000001f8a x4 : 000000000000088e [ 7165.557295] x3 : ffffa027d9b27fa8 x2 : 0000000000000000 [ 7165.562583] x1 : 0000000000000000 x0 : 000000003000188e [ 7165.567872] Call trace: [ 7165.570309] dereg_device_v3_hw+0x68/0xa8 [hisi_sas_v3_hw] [ 7165.575775] hisi_sas_abort_task+0x248/0x358 [hisi_sas_main] [ 7165.581415] sas_eh_handle_sas_errors+0x258/0x8e0 [libsas] [ 7165.586876] sas_scsi_recover_host+0x134/0x458 [libsas] [ 7165.592082] scsi_error_handler+0xb4/0x488 [ 7165.596163] kthread+0x134/0x138 [ 7165.599380] ret_from_fork+0x10/0x18 [ 7165.602940] Code: d5033e9f b9000040 aa0103e2 eb03003f (f9400021) [ 7165.609004] kernel fault(0x1) notification starting on CPU 75 [ 7165.700728] ---[ end trace fc042cbbea224efc ]--- [ 7165.705326] Kernel panic - not syncing: Fatal exception To fix the issue, grab sas_dev lock when traversing the members of sas_dev.list in dereg_device_v3_hw() and hisi_sas_release_tasks() to avoid concurrency of adding and deleting member. When function hisi_sas_release_tasks() calls hisi_sas_do_release_task() to free slot, the lock cannot be grabbed again in hisi_sas_slot_task_free(), then a bool parameter need_lock is added. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1679283265-115066-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-04-02 21:57:35 -04:00
Xiang Chen	b711ef5e17	scsi: hisi_sas: Sync complete queue for poll queue Currently we sync irq to avoid freeing task before using task in I/O completion. After adding io_uring support, we need to do something similar for poll queues. As the process of CQ entries on poll queue are protected by spinlock cq->lock, we can use spin_lock() + spin_unlock() on cq->lock to make sure that CQ entries are processed to completion and then the complete queue is synced. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1678169355-76215-4-git-send-email-chenxiang66@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-09 21:50:02 -05:00
Xiang Chen	0e47effa77	scsi: hisi_sas: Add poll support for v3 hw Add a module parameter to set how many queues are used for iopoll. Also fill the interface mq_poll. For internal I/Os from libsas and libata we use non-iopoll queue (queue 0) to deliver and complete them. But for internal abort I/Os, just don't send them for poll queues. There is still a risk associated as this sends internal abort commands to non-iopoll queues which actually requires sending an internal abort command to every queue. As a result, make the module parameter as "experimental" for now. Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1678169355-76215-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-03-09 21:50:01 -05:00
Yihang Li	f58c897006	scsi: hisi_sas: Set a port invalid only if there are no devices attached when refreshing port id Currently the driver sets the port invalid if one phy in the port is not enabled, which may cause issues in expander situation. In directly attached situation, if phy up doesn't occur in time when refreshing port id, the port is incorrectly set to invalid which will also cause disk lost. Therefore set a port invalid only if there are no devices attached to the port. Signed-off-by: Yihang Li <liyihang9@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1672805000-141102-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-01-12 00:08:03 -05:00
Xingui Yang	037b48057e	scsi: hisi_sas: Use abort task set to reset SAS disks when discovered Currently clear task set is used to abort all commands remaining in the disk when the SAS disk is discovered, and if the disk is discovered by two initiators, other I_T nexuses are also affected. So use abort task set instead and take effect only on the specified I_T nexus. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Link: https://lore.kernel.org/r/1672805000-141102-2-git-send-email-chenxiang66@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-01-12 00:08:03 -05:00
Jason Yan	ea44242bbf	scsi: hisi_sas: Fix tag freeing for reserved tags The reserved tags were put in the lower region of the tagset in commit `f7d190a94e` ("scsi: hisi_sas: Put reserved tags in lower region of tagset"). However, only the allocate function was changed, freeing was not handled. This resulted in a failure to boot: [ 33.467345] hisi_sas_v3_hw 0000:b4:02.0: task exec: failed[-132]! [ 33.473413] sas: Executing internal abort failed 5000000000000603 (-132) [ 33.480088] hisi_sas_v3_hw 0000:b4:02.0: I_T nexus reset: internal abort (-132) [ 33.657336] hisi_sas_v3_hw 0000:b4:02.0: task exec: failed[-132]! [ 33.663403] ata7.00: failed to IDENTIFY (I/O error, err_mask=0x40) [ 35.787344] hisi_sas_v3_hw 0000:b4:04.0: task exec: failed[-132]! [ 35.793411] sas: Executing internal abort failed 5000000000000703 (-132) [ 35.800084] hisi_sas_v3_hw 0000:b4:04.0: I_T nexus reset: internal abort (-132) [ 35.977335] hisi_sas_v3_hw 0000:b4:04.0: task exec: failed[-132]! [ 35.983403] ata10.00: failed to IDENTIFY (I/O error, err_mask=0x40) [ 35.989643] ata10.00: revalidation failed (errno=-5) Fixes: `f7d190a94e` ("scsi: hisi_sas: Put reserved tags in lower region of tagset") Cc: John Garry <john.g.garry@oracle.com> Cc: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Jason Yan <yanaijie@huawei.com> Reviewed-by: John Garry <john.g.garry@oracle.com> Acked-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2023-01-07 06:25:08 -05:00
Jie Zhan	3c2673a09c	scsi: hisi_sas: Fix SATA devices missing issue during I_T nexus reset SATA devices on an expander may be removed and not be found again when I_T nexus reset and revalidation are processed simultaneously. The issue comes from: - Revalidation can remove SATA devices in link reset, e.g. in hisi_sas_clear_nexus_ha(). - However, hisi_sas_debug_I_T_nexus_reset() polls the state of a SATA device on an expander after sending link_reset, where it calls: hisi_sas_debug_I_T_nexus_reset sas_ata_wait_after_reset ata_wait_after_reset ata_wait_ready smp_ata_check_ready sas_ex_phy_discover sas_ex_phy_discover_helper sas_set_ex_phy The ex_phy's change count is updated in sas_set_ex_phy(), so SATA devices after a link reset may not be found later through revalidation. A similar issue was reported in: commit `0f3fce5cc7` ("[SCSI] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready") commit `87c8331fcf` ("[SCSI] libsas: prevent domain rediscovery competing with ata error handling"). To address this issue, in hisi_sas_debug_I_T_nexus_reset(), we now call smp_ata_check_ready_type() that only polls the device type while not updating the ex_phy's data of libsas. Fixes: `71453bd9d1` ("scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset") Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> Link: https://lore.kernel.org/r/20221118083714.4034612-5-zhanjie9@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-11-26 02:26:02 +00:00
Jie Zhan	94a3555d1f	scsi: Revert "scsi: hisi_sas: Don't send bcast events from HW during nexus HA reset" This reverts commit `f5f2a27160`. This is now unnecessary to solve the SATA devices missing issue in hisi_sas_clear_nexus_ha(). Hence, we should not ignore bcast events during sas_eh_handle_sas_errors() in case of missing bcast events, unless a justified need is found and a mechanism to defer (but not ignore) bcast events in sas_eh_handle_sas_errors() is provided. Also, in hisi_sas_clear_nexus_ha(), there is nothing further to handle in "out: " other than return, so that part can be reverted. Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> Link: https://lore.kernel.org/r/20221118083714.4034612-3-zhanjie9@hisilicon.com Reviewed-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-11-26 02:26:02 +00:00
Jie Zhan	7e613be7c6	scsi: Revert "scsi: hisi_sas: Drain bcast events in hisi_sas_rescan_topology()" This reverts commit `11ff0c98fc`. Draining or flushing events in hisi_sas_rescan_topology() can hang the driver, typically with phy up or phy down events being processed, i.e. sas_porte_bytes_dmaed() or sas_phye_loss_of_signal(). Signed-off-by: Jie Zhan <zhanjie9@hisilicon.com> Link: https://lore.kernel.org/r/20221118083714.4034612-2-zhanjie9@hisilicon.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-11-26 02:26:02 +00:00
John Garry	f7d190a94e	scsi: hisi_sas: Put reserved tags in lower region of tagset To be consistent with blk-mq, put the reserved tags in the lower region of the tagset. Eventually we hope to get rid of all this reserved tag management. Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1666091763-11023-4-git-send-email-john.garry@huawei.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-10-22 03:02:51 +00:00
John Garry	295fd2330a	scsi: hisi_sas: Use sas_task_find_rq() Use sas_task_find_rq() to lookup the request per task for its driver tag. Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1666091763-11023-3-git-send-email-john.garry@huawei.com Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-10-22 03:02:51 +00:00
Jason Yan	f0ed7bd5d9	scsi: hisi_sas: Use sas_find_attathed_phy_id() instead of open coding it The attached phy finding is open coded. Replace it with sas_find_attached_phy_id(). To keep things consistent, the return value of hisi_sas_dev_found() is also changed to -ENODEV after calling sas_find_attathed_phy_id() failed. Signed-off-by: Jason Yan <yanaijie@huawei.com> Link: https://lore.kernel.org/r/20220928070130.3657183-6-yanaijie@huawei.com Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-10-18 03:28:09 +00:00
Xingui Yang	930d97dabd	scsi: hisi_sas: Add SATA_DISK_ERR bit handling for v3 hw When CQ header dw3 SATA_DISK_ERR is set it means this SATA disk is in error state and the current IPTT is invalid. An invalid IPTT does not correspond to any slot. In this scenario, new I/Os that delivered to disk will be rejected by the controller and all I/Os remaining in the disk should be aborted, which we add here with the sas_ata_device_link_abort() call. In hisi_sas_abort_task() we don't want to issue a soft reset as it may cause info to be lost in the target disk for the ATA EH autopsy. In this case, just release resources - the disk won't return other I/Os normally after NCQ Error, so this is safe. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1665998435-199946-4-git-send-email-john.garry@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-10-18 02:37:45 +00:00
Xingui Yang	4b329abc91	scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task() Each branch currently defines a slot variable independently, and it is neater to move it to the function head. Signed-off-by: Xingui Yang <yangxingui@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1665998435-199946-3-git-send-email-john.garry@huawei.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-10-18 02:37:45 +00:00
John Garry	f5f2a27160	scsi: hisi_sas: Don't send bcast events from HW during nexus HA reset Remote devices may go missing from the per-device nexus reset part of the HA nexus, i.e after the controller reset. This is because libsas may find the devices to be gone as the phy may be temporarily down when processing the bcast event generated from the nexus reset. Filter out bcast events during this time to stop the devices being lost. Link: https://lore.kernel.org/r/1662378529-101489-6-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-09-06 22:28:11 -04:00
John Garry	e9b6bada98	scsi: hisi_sas: Add helper to process bcast events Add a helper for bcast processing to reduce duplication. Link: https://lore.kernel.org/r/1662378529-101489-5-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-09-06 22:28:11 -04:00
John Garry	11ff0c98fc	scsi: hisi_sas: Drain bcast events in hisi_sas_rescan_topology() In resetting the controller, SATA devices may be lost. The issue is that when we insert the bcast events to rescan the topology in hisi_sas_rescan_topology(), when we subsequently nexus reset the SATA devices in hisi_sas_async_I_T_nexus_reset(), there is a small timing window in which the remote phy is down and we process the bcast event (meaning that libsas judges that the disk is lost). Ensure that all bcast events are processed prior to the nexus reset to close this window. Link: https://lore.kernel.org/r/1662378529-101489-4-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-09-06 22:28:11 -04:00
John Garry	bc5551157a	scsi: hisi_sas: Clear HISI_SAS_HW_FAULT_BIT earlier Once the controller HW has been reset then we can unset flag HISI_SAS_HW_FAULT_BIT. In clearing this flag earlier we can now successfully execute commands in hisi_sas_controller_reset_done(), like bcast processing. Link: https://lore.kernel.org/r/1662378529-101489-3-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-09-06 22:28:10 -04:00
Xiang Chen	f0902095a7	scsi: hisi_sas: Relocate DMA unmap of SMP task Currently SMP tasks are DMA unmapped only when cq of SMP I/O is returned normally. If the cq of SMP I/O is returned with exception actually SMP TAS is never unmapped. Relocate DMA unmap of SMP task to fix the issue. Link: https://lore.kernel.org/r/1657823002-139010-4-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-07-18 23:04:12 -04:00
Xiang Chen	bc22f9c06c	scsi: hisi_sas: Remove unnecessary variable to hold DMA map elements Use slot->n_elem to store the return value of dma_map_sg() for SSP and SMP IOs, and remove unnecessary variable n_elem_req. Link: https://lore.kernel.org/r/1657823002-139010-3-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-07-18 23:04:11 -04:00
John Garry	6c6ac8b777	scsi: hisi_sas: Fix memory ordering in hisi_sas_task_deliver() The memories for the slot should be observed to be written prior to observing the slot as ready. Prior to commit `26fc0ea74f` ("scsi: libsas: Drop SAS_TASK_AT_INITIATOR"), we had a spin_lock() + spin_unlock() immediately before marking the slot as ready. The spin_unlock() - with release semantics - caused the slot memory to be observed to be written. Now that the spin_lock() + spin_unlock() is gone, use a smp_wmb(). Link: https://lore.kernel.org/r/1652774661-12935-1-git-send-email-john.garry@huawei.com Fixes: `26fc0ea74f` ("scsi: libsas: Drop SAS_TASK_AT_INITIATOR") Reported-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Yihang Li <liyihang6@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-05-19 20:16:26 -04:00
John Garry	e9dedc13bb	scsi: hisi_sas: Fix rescan after deleting a disk Removing an ATA device via sysfs means that the device may not be found through re-scanning: root@ubuntu:/home/john# lsscsi [0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda [0:0:1:0] disk ATA HGST HUS724040AL A8B0 /dev/sdb [0:0:8:0] enclosu 12G SAS Expander RevB - root@ubuntu:/home/john# echo 1 > /sys/block/sdb/device/delete root@ubuntu:/home/john# echo "- - -" > /sys/class/scsi_host/host0/scan root@ubuntu:/home/john# lsscsi [0:0:0:0] disk SanDisk LT0200MO P404 /dev/sda [0:0:8:0] enclosu 12G SAS Expander RevB - root@ubuntu:/home/john# The problem is that the rescan of the device may conflict with the device in being re-initialized, as follows: - In the rescan we call hisi_sas_slave_alloc() in store_scan() -> sas_user_scan() -> [__]scsi_scan_target() -> scsi_probe_and_add_lunc() -> scsi_alloc_sdev() -> hisi_sas_slave_alloc() -> hisi_sas_init_device() In hisi_sas_init_device() we issue an IT nexus reset for ATA devices - That IT nexus causes the remote PHY to go down and this triggers a bcast event - In parallel libsas processes the bcast event, finds that the phy is down and marks the device as gone The hard reset issued in hisi_sas_init_device() is unncessary - as described in the code comment - so remove it. Also set dev status as HISI_SAS_DEV_NORMAL as the hisi_sas_init_device() call. Link: https://lore.kernel.org/r/1652354134-171343-4-git-send-email-john.garry@huawei.com Fixes: `36c6b7613e` ("scsi: hisi_sas: Initialise devices in .slave_alloc callback") Tested-by: Yihang Li <liyihang6@hisilicon.com> Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-05-19 20:16:26 -04:00
John Garry	71453bd9d1	scsi: hisi_sas: Use sas_ata_wait_after_reset() in IT nexus reset We have seen errors like this when a SATA device is probed: [524.566298] hisi_sas_v3_hw 0000L74:02.0: erroneous completion iptt=4096 ... [524.582827] sas: TMF task open reject failed 500e004aaaaaaaa00 Since commit `21c7e97247` ("scsi: hisi_sas: Disable SATA disk phy for severe I_T nexus reset failure"), we issue an ATA softreset to disks after a phy reset to ensure that they are in sound working order. If the softreset is issued before the remote phy has come back up then the softreset will fail (errors as above). Remedy this by waiting for the phy to come back up after the reset. Link: https://lore.kernel.org/r/1652354134-171343-3-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-05-19 20:16:26 -04:00
Dan Carpenter	066f4c3194	scsi: hisi_sas: Remove stray fallthrough annotation This case statement doesn't fall through any more so remove the fallthrough annotation. Link: https://lore.kernel.org/r/20220317075214.GC25237@kili Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-03-29 23:53:03 -04:00
John Garry	095478a6e5	scsi: hisi_sas: Use libsas internal abort support Use the common libsas internal abort functionality. In addition, this driver has special handling for internal abort timeouts - specifically whether to reset the controller in that instance, so extend the API for that. Timeout is now increased to 20 * Hz from 6 * Hz. We also retry for failure now, but this should not make a difference. Link: https://lore.kernel.org/r/1647001432-239276-5-git-send-email-john.garry@huawei.com Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-03-14 23:33:24 -04:00
Xiang Chen	512623de52	scsi: hisi_sas: Change hisi_sas_control_phy() phyup timeout The time of phyup not only depends on the controller but also the type of disk connected. As an example, from experience, for some SATA disks the amount of time from reset/power-on to receive the D2H FIS for phyup can take upto and more than 10s sometimes. According to the specification of some SATA disks such as ST14000NM0018, the max time from power-on to ready is 30s. Based on this the current timeout of phyup at 2s which is not enough. So set the value as HISI_SAS_WAIT_PHYUP_TIMEOUT (30s) in hisi_sas_control_phy(). For v3 hw there is a pre-existing workaround for a HW bug, being that we issue a link reset when the OOB occurs but the phyup does not. The current phyup timeout is HISI_SAS_WAIT_PHYUP_TIMEOUT. So if this does occur from when issuing a phy enable or similar via hisi_sas_control_phy(), the subsequent HW workaround linkreset processing calls hisi_sas_control_phy(), but this will pend the original phy reset timing out, so it is safe. Link: https://lore.kernel.org/r/1645703489-87194-3-git-send-email-john.garry@huawei.com Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-27 21:46:40 -05:00
John Garry	3f2e252ef7	scsi: libsas: Add sas_execute_ata_cmd() Add a function to execute an ATA command using the TMF code, and use in the hisi_sas driver. That driver needs to be able to issue the command on a specific phy, so add an interface for that. With that, hisi_sas_exec_internal_tmf_task() may be deleted. Link: https://lore.kernel.org/r/1645534259-27068-19-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-22 21:11:02 -05:00
John Garry	4fea759edf	scsi: libsas: Add sas_abort_task() Add a generic implementation of abort task TMF handler, and use in LLDDs. With that, some LLDDs custom TMF functions can now be deleted. Link: https://lore.kernel.org/r/1645112566-115804-18-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-19 15:59:36 -05:00
John Garry	72f8810e1f	scsi: libsas: Add sas_query_task() Add a generic implementation of query task TMF handler, and use in LLDDs. Link: https://lore.kernel.org/r/1645112566-115804-17-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-19 15:59:36 -05:00
John Garry	29d7769055	scsi: libsas: Add sas_lu_reset() Add a generic implementation of LU reset TMF handler, and use in LLDDs. Link: https://lore.kernel.org/r/1645112566-115804-16-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-19 15:59:36 -05:00
John Garry	e858545295	scsi: libsas: Add sas_clear_task_set() Add a generic implementation of clear task set TMF handler, and use in LLDDs. Link: https://lore.kernel.org/r/1645112566-115804-15-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-19 15:59:36 -05:00
John Garry	69b80a0ed0	scsi: libsas: Add sas_abort_task_set() Add a generic implementation of abort task set TMF handler, and use in LLDDs. Link: https://lore.kernel.org/r/1645112566-115804-14-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-19 15:59:36 -05:00
John Garry	693e66a0a6	scsi: libsas: Add TMF handler aborted callback The hisi_sas and pm8001 TMF handlers have some special processing for when the TMF is aborted, so add a callback and fill it in for those drivers. Link: https://lore.kernel.org/r/1645112566-115804-13-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-19 15:59:35 -05:00
John Garry	96e54376a8	scsi: libsas: Add sas_task.tmf Add a pointer to a sas_tmf_task to the sas_task struct, as this will be used when the common LLDD TMF code is factored out. Also set it for the LLDDs to store per-sas_task TMF info. Link: https://lore.kernel.org/r/1645112566-115804-9-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-19 15:59:35 -05:00
John Garry	bbfe82cdba	scsi: libsas: Add struct sas_tmf_task Some of the LLDDs which use libsas have their own definition of a struct to hold TMF info, so add a common struct for libsas. Also add an interim force phy id field for hisi_sas driver, which will be removed once the STP "TMF" code is factored out. Even though some LLDDs (pm8001) use a u32 for the tag, u16 will be adequate, as that named driver only uses tags in range [0, 1024). Link: https://lore.kernel.org/r/1645112566-115804-8-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-19 15:59:35 -05:00
John Garry	da19eaba6e	scsi: hisi_sas: Delete unused I_T_NEXUS_RESET_PHYUP_TIMEOUT There is no user, so delete it. Link: https://lore.kernel.org/r/1645112566-115804-6-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-19 15:59:34 -05:00
John Garry	25882c82f8	scsi: libsas: Delete lldd_clear_aca callback This callback is never called, so remove support. Link: https://lore.kernel.org/r/1645112566-115804-4-git-send-email-john.garry@huawei.com Tested-by: Yihang Li <liyihang6@hisilicon.com> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-19 15:59:34 -05:00
Martin K. Petersen	ac2beb4e3b	Merge branch '5.17/scsi-fixes' into 5.18/scsi-staging Pull 5.17 fixes branch into 5.18 tree to resolve a few pm8001 driver merge conflicts. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-14 21:51:29 -05:00
John Garry	26fc0ea74f	scsi: libsas: Drop SAS_TASK_AT_INITIATOR This flag is now only ever set, so delete it. This also avoids a use-after-free in the pm8001 queue path, as reported in the following: https://lore.kernel.org/linux-scsi/c3cb7228-254e-9584-182b-007ac5e6fe0a@huawei.com/T/#m28c94c6d3ff582ec4a9fa54819180740e8bd4cfb https://lore.kernel.org/linux-scsi/0cc0c435-b4f2-9c76-258d-865ba50a29dd@huawei.com/ [mkp: checkpatch + two SAS_TASK_AT_INITIATOR references] Link: https://lore.kernel.org/r/1644489804-85730-3-git-send-email-john.garry@huawei.com Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-02-11 17:02:50 -05:00
John Garry	c763ec4c10	scsi: hisi_sas: Fix setting of hisi_sas_slot.is_internal The hisi_sas_slot.is_internal member is not set properly for ATA commands which the driver sends directly. A TMF struct pointer is normally used as a test to set this, but it is NULL for those commands. It's not ideal, but pass an empty TMF struct to set that member properly. Link: https://lore.kernel.org/r/1643627607-138785-1-git-send-email-john.garry@huawei.com Fixes: `dc313f6b12` ("scsi: hisi_sas: Factor out task prep and delivery code") Reported-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-01-31 16:50:59 -05:00
Christophe JAILLET	8001fa240f	scsi: hisi_sas: Remove useless DMA-32 fallback configuration As stated in [1], dma_set_mask() with a 64-bit mask never fails if dev->dma_mask is non-NULL. So, if it fails, the 32-bit case will also fail for the same reason. Simplify code and remove some dead code accordingly. [1]: https://lore.kernel.org/linux-kernel/YL3vSPK5DXTNvgdx@infradead.org/#t Link: https://lore.kernel.org/r/1bf2d3660178b0e6f172e5208bc0bd68d31d9268.1642237482.git.christophe.jaillet@wanadoo.fr Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-01-24 23:30:29 -05:00
Xiang Chen	5d9224fb07	scsi: hisi_sas: Remove unused variable and check in hisi_sas_send_ata_reset_each_phy() In commit `29e2bac874` ("scsi: hisi_sas: Fix some issues related to asd_sas_port->phy_list"), we use asd_sas_port->phy_mask instead of accessing asd_sas_port->phy_list, and it is enough to use asd_sas_port->phy_mask to check the state of phy, so remove the unused check and variable. Link: https://lore.kernel.org/r/1641300126-53574-1-git-send-email-chenxiang66@hisilicon.com Fixes: `29e2bac874` ("scsi: hisi_sas: Fix some issues related to asd_sas_port->phy_list") Reported-by: Nathan Chancellor <nathan@kernel.org> Reported-by: Colin King <colin.i.king@gmail.com> Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-01-07 09:09:39 -05:00
Xiang Chen	ae9b69e85e	scsi: hisi_sas: Keep controller active between ISR of phyup and the event being processed It is possible that controller may become suspended between processing a phyup interrupt and the event being processed by libsas. As such, we can't ensure the controller is active when processing the phyup event - this may cause the phyup event to be lost or other issues. To avoid any possible issues, add pm_runtime_get_noresume() in phyup interrupt handler and pm_runtime_put_sync() in the work handler exit to ensure that we stay always active. Since we only want to call pm_runtime_get_noresume() for v3 hw, signal this will a new event, HISI_PHYE_PHY_UP_PM. Link: https://lore.kernel.org/r/1639999298-244569-14-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:31 -05:00
Xiang Chen	29e2bac874	scsi: hisi_sas: Fix some issues related to asd_sas_port->phy_list Most places that use asd_sas_port->phy_list are protected by spinlock asd_sas_port->phy_list_lock, however there are still some places which miss grabbing the lock. Add it in function hisi_sas_refresh_port_id() when accessing asd_sas_port->phy_list. This carries a risk that list mutates while at the same time dropping the lock in function hisi_sas_send_ata_reset_each_phy(). Read asd_sas_port->phy_mask instead of accessing asd_sas_port->phy_list to avoid this risk. Link: https://lore.kernel.org/r/1639999298-244569-6-git-send-email-chenxiang66@hisilicon.com Acked-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:29 -05:00
John Garry	6cc7390877	scsi: Revert "scsi: hisi_sas: Filter out new PHY up events during suspend" This reverts commit `b14a37e011`. In that commit, we had to filter out phy-up events during suspend, as it work cause a deadlock between processing the phyup event and the resume HA function try to drain the HA event workqueue to complete the resume process. Now that we no longer try to drain the HA event queue during the HA resume processor, the deadlock would not occur, so remove the special handling for it. Link: https://lore.kernel.org/r/1639999298-244569-3-git-send-email-chenxiang66@hisilicon.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-22 23:38:29 -05:00
Qi Liu	37310bad7f	scsi: hisi_sas: Fix phyup timeout on FPGA The OOB interrupt and phyup interrupt handlers may run out-of-order in high CPU usage scenarios. Since the hisi_sas_phy.timer is added in hisi_sas_phy_oob_ready() and disarmed in phy_up_v3_hw(), this out-of-order execution will cause hisi_sas_phy.timer timeout to trigger. To solve, protect hisi_sas_phy.timer and .attached with a lock, and ensure that the timer won't be added after phyup handler completes. Link: https://lore.kernel.org/r/1639579061-179473-8-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:59:58 -05:00
Qi Liu	16775db613	scsi: hisi_sas: Prevent parallel FLR and controller reset If we issue a controller reset command during executing a FLR a hung task may be found: Call trace: __switch_to+0x158/0x1cc __schedule+0x2e8/0x85c schedule+0x7c/0x110 schedule_timeout+0x190/0x1cc __down+0x7c/0xd4 down+0x5c/0x7c hisi_sas_task_exec+0x510/0x680 [hisi_sas_main] hisi_sas_queue_command+0x24/0x30 [hisi_sas_main] smp_execute_task_sg+0xf4/0x23c [libsas] sas_smp_phy_control+0x110/0x1e0 [libsas] transport_sas_phy_reset+0xc8/0x190 [libsas] phy_reset_work+0x2c/0x40 [libsas] process_one_work+0x1dc/0x48c worker_thread+0x15c/0x464 kthread+0x160/0x170 ret_from_fork+0x10/0x18 This is a race condition which occurs when the FLR completes first. Here the host HISI_SAS_RESETTING_BIT flag out gets of sync as HISI_SAS_RESETTING_BIT is not always cleared with the hisi_hba.sem held, so now only set/unset HISI_SAS_RESETTING_BIT under hisi_hba.sem . Link: https://lore.kernel.org/r/1639579061-179473-7-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:59:58 -05:00
Qi Liu	20c634932a	scsi: hisi_sas: Prevent parallel controller reset and control phy command A user may issue a control phy command from sysfs at any time, even if the controller is resetting. If a phy is disabled by hardreset/linkreset command before calling get_phys_state() in the reset path, the saved phy state may be incorrect. To avoid incorrectly recording the phy state, use hisi_hba.sem to ensure that the controller reset may not run at the same time as when the phy control function is running. Link: https://lore.kernel.org/r/1639579061-179473-6-git-send-email-john.garry@huawei.com Signed-off-by: Qi Liu <liuqi115@huawei.com> Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:59:57 -05:00
John Garry	dc313f6b12	scsi: hisi_sas: Factor out task prep and delivery code The task prep code is the same between the normal path (in hisi_sas_task_prep()) and the internal abort path, so factor is out into a common function. Link: https://lore.kernel.org/r/1639579061-179473-5-git-send-email-john.garry@huawei.com Signed-off-by: John Garry <john.garry@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-16 22:59:57 -05:00

1 2 3 4 5 ...

369 Коммитов