Remove the global driver work queue and replace it with a workqueue
local to the adapter. The usage of this workqueue makes this the
correct place for the structure. In addition multiple adapters won't
block each other due to the serialization of the queued work.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The flag ZFCP_REQ_AUTO_CLEANUP was useless as the
ZFCP_STATUS_FSFREQ_CLEANUP flag is there for exactly the same purpose.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Remove the special case for NO_QTCB requests and optimize the
mempool and cache processing for fsfreqs. Especially use seperate
mempools for the zfcp_fsf_req and zfcp_qtcb structs.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The combination wait_queue/wakeup in conjunction with the flag
ZFCP_STATUS_FSFREQ_COMPLETED to signal the completion of an fsfreq
was not race-safe and can be better solved by a completion.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There is no need for the QDIO layer to have knowledge or do things
wich are done better by the FSF layer and vice versa. Straighten a
few things to improve vividness.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Using a bitwise OR to not set anything at all is pointless so remove
the useless statement.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The default trace level is to only trace failed SCSI commands. Thus it
is not necessary to collect trace data for most SCSI commands since it
will be thrown away later. Restructure the SCSI trace infrastructure
to first check the trace level in a inline function and only do the
expensive data collection for matching trace levels.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Under certain conditions it is possible that a WKA port ist not opened
within the expected timeframe of half a second. In this situation
the WKA port remains in the state OPENING preventing any succeding
request to open the port. This led to unrecoverable remote ports.
Fixing this by always setting an appropriate WKA port status before
leaving the function and removing the timeout value here since it's
not needed here because the general timeout processing would deal
with it if required.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Depending on interruptions on some storage systems, the complete
channel can stall which looks like an outbound queue stall to Linux.
When trying to acquire a free SBAL for a non-SCSI command, zfcp waits
for 5 seconds for a free slot to appear. This is the right place to
detect a queue stall: If the wait times out, we assume a stalled queue
and try to recover this.
The overall strategy should be to trigger the erp from specific
events, and not try an overall escalation from one failed port to a
full-blown queue recovery. If we manage to send a command, the status
codes for this command or a timeout will trigger the right follow-on
actions.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The ELS ADISC and the GID_PN requests sent from zfcp fit into
unchained FSF requests. Change the FSF allocation logic to use
unchained requests whenever possible where everything fits in one
SBAL. This avoids acquiring more SBALs than necessary, especially
during zfcp recovery when things might be stalled.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When a fsf_req or a qtcb cannot be allocated return -ENOMEM instead of
-EIO.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
We should not modify the port status after triggering an ERP action
for the port. It is not guaranteed which status is finally active
when the ERP action is performed. This can lead to situations which
are unwanted and hard to debug in case of a failure.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Don't access the block layer request, get the payload length instead
from the FC job. Simplify access to the zfcp_port, only the d_id is
required, if the port is no longer accessed later. This is possible
when the els_handler does not access the port pointer from the ELS
request.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Keep the information about the device and model id in zfcp_ccw. This
requires an additional helper function to check for the privileged
cfdc subchannel, but it allows the removal of the redundant defines
from the zfcp_def header file.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
In rare cases, open port request might timeout, erp calls
zfcp_port_put, port gets dequeued. Now, the late returning (or
dismissed) fsf-port-open calls the fsf_port_open_handler that tries to
reference the port data structure leading to a kernel oops.
Signed-off-by: Martin Petermann <martin.petermann@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Add comments where there is a deliberate fall through in switch/case
statements. This makes some code checkers happy and makes it clear
that there is no missing break statement.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
enum dma_data_direction only has the 4 values DMA_BIDIRECTIONAL,
DMA_TO_DEVICE, DMA_FROM_DEVICE and DMA_NONE. No need to have the
default case. While changing this, setup sbtype in one place to make
sparse happy.
The default value of retval is already -EIO, so remove the
additional assignment for these two cases.
Reviewed-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The zfcp_port might have been removed, while the FC fast_io_fail timer
is still running and could trigger the terminate_rport_io callback.
Set the pointer to the zfcp_port to NULL and check accordingly
before using it.
Reviewed-by: Martin Petermann <martin@linux.vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The current sbal counting can be wrong if a fsf request is
waiting for free sbals and at the same time qdio request queue
is shutdown and re-opened. Revering a previous patch fixes this
issue.
Signed-off-by: Martin Petermann <martin.petermann@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Error codes specific to the control file requests are evaluated by the
actcli tool, so don't report -ENXIO for those. Generic problems are
still checked for outside the command specific handler.
Reviewed-by: Martin Petermann <martin@linux.vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Fix problem that zfcp_fsf_exchange_config_data_sync and
zfcp_fsf_exchange_config_data_sync could try to call zfcp_fsf_req_free
with a NULL pointer.
Reviewed-by: Martin Petermann <martin@linux.vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Avoid referencing a fsf request after sending it in fcp_fsf_req_send,
it might have already completed and deallocated.
Signed-off-by: Martin Petermann <martin@linux.vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Report the fc_host_port_type as FC_PORTTYPE_NPIV when the subchannel
is running in NPIV mode. This allows to see the correct type with
lsscsi -H -t --list
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Use the I/O blocking mechanism in the FC transport class to allow
faster failovers for multipathing:
- Call fc_remote_port_delete early to set the rport to BLOCKED.
- Check the rport status in queuecommand with fc_remote_portchkready
to no longer accept new I/O for this port and fail the I/O with the
appropriate scsi_cmnd result.
- Implement the terminate_rport_io handler to abort all pending I/O
requests
- Return SCSI commands with DID_TRANSPORT_DISRUPTED while erp is
running.
- When updating the remote port status, check for late changes and
update the remote ports status accordingly.
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
After an error condition resolved a remote storage port was never
re-opened. The incoming RSCN was not processed accordingly due
to a misinterpreted status flag / return value combination.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The current number based id ERP logging is replaced by a string
based tag version. The benefit is an easier location of the code in
question and the removal of the lengthy array referencing the
individual messages.
The string (7 bytes) based version does not use more space since those
bytes were "used" anyway due to the alignment of the structure.
The encoding of the 7 byte string is as follows
[0-1] = filename
[2-5] = task/function
[6] = section
Due to the character of this string (fixed length) a string
termination is not required here.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The status read response FSF_STATUS_READ_SUB_ERROR_PORT is not
defined in the specs and therefore not valid.
All occurrences are removed from the code.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Issue ELS ADISC requests from workqueue. This allows the link test
request to be sent when the request queue is full due to I/O load for
other remote ports. It also simplifies request queue locking,
zfcp_fsf_send_fcp_command_task is now the only function that has
interrupts disabled from the caller. This is also a prereq for the FC
passthrough support that issues ELS requests from userspace.
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When the SCSI midlayer is running error recovery, the low-level error
recovery in zfcp could be running and preventing the SCSI midlayer to
issue error recovery requests. To avoid unnecessary error recovery
escalation, wait for the zfcp erp to finish and retry if necessary.
While reworking the SCSI eh handlers, alsa cleanup the code and
simplify the interface from zfcp_scsi to the fsf layer.
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
For calls from zfcp erp, scsi_eh and sysfs switch the calls issuing
FSF requests to zfcp_fsf_req_sbal_get to wait for free SBALs.
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Only increment the req_id for successfully issued requests. This
avoids some confusion when debugging issued fsf requests.
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The lock only needs to protect the softirq context called from qdio
against the userspace context called from sysfs. spin_lock and
spin_lock_bh is enough.
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
PORT_PHYS_CLOSING is only set and cleared, but not actually used
for status checking.
PORT_INVALID_WWPN is set when the GID_PN request does not return
a d_id for a remote port, e.g. when a remote port has been
unplugged. For this case, the d_id is zero. In the erp we can
check the d_id and use the normal escalation procedure that gives
up after three retries and remove the special case.
PORT_NO_WWPN is unused: Each port in the remote port list has a
valid wwpn. The WKA ports are now tracked outside the port
list. Remove the PORT_NO_WWPN flag, since this is no longer set
for any port.
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The SUGGEST_* flags in the SCSI command result have been out of fashion
for a while and we don't actually use them in the error handling.
Remove the remaining occurrences.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Remove a message that was emitted for a port that could not initially
be opened. This is a rare case when the port discovery hits an
initiator port and only confuses the user with an initator port logged
in the message. Remove the whole special case: The failed "open port"
request triggers required follow-up actions anyway.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Acked-by: Felix Beck <felix@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Add the support to send CT and ELS requests as unchained FSF requests. This is
required for older hardware and was somehow omitted during the cleanup of the
FSF layer. The req_count and resp_count attributes are unused, so remove them
instead of adding a special case for setting them. Also add debug data and a
warning, when the ct request hits a limit.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Acked-by: Martin Petermann <martin@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The port flag DID_DID indicates whether we know the current id of the
port. This is always set in parallel. Since the id 0 is invalid
(because the port id 0 is invalid) we can remove the DID_DID flag:
d_id of 0 indicates an invalid d_id != 0 is a valid one.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Acked-by: Felix Beck <felix@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When waiting for a request claim the SBAL before waiting. This way,
locking before each check of the free counter is not required and
sparse does not emit warnings for the complicated locking scheme.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Acked-by: Felix Beck <felix@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Move the closing parenthesis before the line break.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Acked-by: Felix Beck <felix@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The check of having a valid pointer was performed before the
processing was secured by the lock. Between those two steps the
pointer can turn invalid. During further processing another value is
used (referenced by the pointer described above) as a function pointer
which is never verified to be valid either, resulting under some
circumstances in an invalid function call. This patch is fixing both
issues.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Aborting a SCSI cmnd might requrie to send a abort_fsf_cmnd. If the
creation of this fsf_req fails an ERR_PTR is returned where a NULL
value would be expected as an error indicator. This ERR_PTR is
dereferenced as valid fsf_req in succeeding processing leading to
an error.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Running two wka_port_get calls in parallel could issue two open_port
requests, overwriting the port handle. Don't issue an open_port
for the state PORT_OPENING, and only read the data from GOOD
responses.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Acked-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Fix the handling of the request list in the error path:
- Use irqsave for the lock as in the good path.
- Before removing the request, check if it is still in the list, a
call to dismiss_all might have changed the list in between.
- zfcp_qdio_send does not change the queue counters on failure,
trying revert something is wrong, so remove this.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
When allocating fsf requests without qtcb, store the pointer to the
mempool in the fsf requests for later call to mempool_free. This
codepath is only used by the status_read requests.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The per adapter req_list_lock must be held with interrupts disabled, otherwise
we might end up with nice deadlocks as lockdep tells us (see below).
zfcp 0.0.1804: QDIO problem occurred.
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.27-rc8-00035-g4a77035-dirty #86
---------------------------------------------------------
swapper/0 just changed the state of lock:
(&adapter->erp_lock){++..}, at: [<00000000002c82ae>] zfcp_erp_adapter_reopen+0x4e/0x8c
but this lock took another, hard-irq-unsafe lock in the past:
(&adapter->req_list_lock){-+..}
and interrupts could create inverse lock ordering between them.
[tons of backtraces, but only the interesting part follows]
the second lock's dependencies:
-> (&adapter->req_list_lock){-+..} ops: 2280627634176 {
initial-use at:
[<0000000000071f10>] __lock_acquire+0x504/0x18bc
[<000000000007335c>] lock_acquire+0x94/0xbc
[<00000000003d7224>] _spin_lock_irqsave+0x6c/0xb0
[<00000000002cf684>] zfcp_fsf_req_dismiss_all+0x50/0x140
[<00000000002c87ee>] zfcp_erp_adapter_strategy_generic+0x66/0x3d0
[<00000000002c9498>] zfcp_erp_thread+0x88c/0x1318
[<000000000001b0d2>] kernel_thread_starter+0x6/0xc
[<000000000001b0cc>] kernel_thread_starter+0x0/0xc
in-softirq-W at:
[<0000000000072172>] __lock_acquire+0x766/0x18bc
[<000000000007335c>] lock_acquire+0x94/0xbc
[<00000000003d7224>] _spin_lock_irqsave+0x6c/0xb0
[<00000000002ca73e>] zfcp_qdio_int_resp+0xbe/0x2ac
[<000000000027a1d6>] qdio_kick_inbound_handler+0x82/0xa0
[<000000000027daba>] tiqdio_inbound_processing+0x62/0xf8
[<0000000000047ba4>] tasklet_action+0x100/0x1f4
[<0000000000048b5a>] __do_softirq+0xae/0x154
[<0000000000021e4a>] do_softirq+0xea/0xf0
[<00000000000485de>] irq_exit+0xde/0xe8
[<0000000000268c64>] do_IRQ+0x160/0x1fc
[<00000000000261a2>] io_return+0x0/0x8
[<000000000001b8f8>] cpu_idle+0x17c/0x224
hardirq-on-W at:
[<0000000000072190>] __lock_acquire+0x784/0x18bc
[<000000000007335c>] lock_acquire+0x94/0xbc
[<00000000003d702c>] _spin_lock+0x5c/0x9c
[<00000000002caff6>] zfcp_fsf_req_send+0x3e/0x158
[<00000000002ce7fe>] zfcp_fsf_exchange_config_data+0x106/0x124
[<00000000002c8948>] zfcp_erp_adapter_strategy_generic+0x1c0/0x3d0
[<00000000002c98ea>] zfcp_erp_thread+0xcde/0x1318
[<000000000001b0d2>] kernel_thread_starter+0x6/0xc
[<000000000001b0cc>] kernel_thread_starter+0x0/0xc
}
... key at: [<0000000000e356c8>] __key.26629+0x0/0x8
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmit@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch writes the channel and fabric latencies in nanoseconds per
request via blktrace for later analysis. The utilization of the inbound
and outbound adapter queue is also reported.
Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com>
Signed-off-by: Martin Peschke <mp3@de.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Trace ids 107 and 3 are used twice, fix this to have unique ids for
the erp triggers.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Each adapter reopen trigger automatically a scan_port task which
is waiting for the ERP to be finished before further processing.
Since the initial device setup enqueues adapter, port and LUN which
are individual ERP actions, this process would start after
everything is done. Unfortunately the port_reopen requires another
scheduled work to be finished which is queued after the automatic
scan_port -> deadlock !
This fix creates an own work queue for ERP based nameserver requests.
Signed-off-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>