Upper-layer drivers allocate their SBALs by calling qdio_alloc_buffers()
for each individual queue. But when later passing the SBAL addresses to
qdio_establish(), they need to be in a single array of pointers.
So if the driver uses multiple Input or Output queues, it needs to
allocate a temporary array just to present all its SBAL pointers in this
layout.
This patch slightly changes the format of the QDIO initialization data,
so that drivers can pass a per-queue array where each element points to
a queue's SBAL array.
zfcp doesn't use multiple queues, so the impact there is trivial.
For qeth this brings a nice reduction in complexity, and removes
a page-sized allocation.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Ever since commit 4a71df5004 ("qeth: new qeth device driver") introduced
this attribute, it can be read & written but has no actual effect.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a device is configured in prio-queue mode to pin all traffic onto
a specific HW queue, treat this as a distinct variant of prio-queueing
instead of QETH_NO_PRIO_QUEUEING.
This corrects an error message from qeth_osa_set_output_queues() for
devices configured in such a mode.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since IQD devices complete (most of) their transmissions synchronously,
they don't offer TX completion IRQs and have no HW coalescing controls.
But we can fake the easy parts in SW, and give the user some control wrt
to how often the TX NAPI code should be triggered to process the TX
completions.
Having per-queue controls can in particular help the dedicated mcast
queue, as it likely benefits from different fine-tuning than what the
ucast queues need.
CC: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Count the number of TX doorbells we issue to the qdio layer.
Also count the number of actual frames in a TX buffer, and then
use this data along with the byte count during TX completion.
We'll make additional use of the frame count in a subsequent patch.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We're down to a single bit flag for MAC-address related status, reflect
that in the info struct.
Also set up the flag during initialization instead of clearing it during
shutdown - one more little step towards unifying the shutdown code.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The logic that deals with errors from qeth_l3_get_unique_id() is quite
complex: it sets card->unique_id to 0xfffe, additionally flags it as
UNIQUE_ID_NOT_BY_CARD and later takes this flag as cue to not propagate
card->unique_id to dev->dev_id. With dev->dev_id thus holding 0,
addrconf_ifid_eui48() applies its default behaviour.
Get rid of all the special bit masks, and just return the old uid in
case of an error. For the vast majority of cases this will be 0 (and so
we still get the desired default behaviour) - with the rare exception
where qeth_l3_get_unique_id() might have been called earlier but the
initialization then failed at a later point.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since RX buffers may contain multiple packets, qeth's NAPI poll code can
exhaust its budget in the middle of an RX buffer. Thus we keep track of
our current position within the active RX buffer, so we can resume
processing here in the next NAPI poll period.
Clean up that code by tracking the index of the active buffer element,
instead of a pointer to it.
Also simplify the code that advances to the next RX buffer when the
current buffer has been fully processed.
v2: - remove QDIO_ELEMENT_NO() macro (davem)
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To check whether a netdevice has already been registered, look at
NETREG_REGISTERED to replace some hacks I added a while ago.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For ucast traffic, qeth_iqd_select_queue() falls back to
netdev_pick_tx(). This will potentially use skb_tx_hash() to distribute
the flow over all active TX queues - so txq 0 is a valid selection, and
qeth_iqd_select_queue() needs to check for this and put it on some other
queue. As a result, the distribution for ucast flows is unbalanced and
hits QETH_IQD_MIN_UCAST_TXQ heavier than the other queues.
Open-coding a custom variant of skb_tx_hash() isn't an option, since
netdev_pick_tx() also gives us eg. access to XPS. But we can pull a
little trick: add a single TC class that excludes the mcast txq, and
thus encourage skb_tx_hash() to not pick the mcast txq.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
z/VM NICs don't offer HW QoS for TX rings. So just use netdev_pick_tx()
to distribute the connections equally over all enabled TX queues.
We start with just 1 enabled TX queue (this matches the typical
configuration without prio-queueing). A follow-on patch will allow users
to enable additional TX queues.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RX buffer pool is allocated in qeth_alloc_qdio_queues().
A subsequent pool resizing is then handled in a very simple way:
first free the current pool, then allocate a new pool of the requested
size.
There's two ways where this can go wrong:
1. if the resize action happens _before_ the initial pool was allocated,
then a subsequent initialization will call qeth_alloc_qdio_queues()
and fill the pool with a second(!) set of pages. We consume twice the
planned amount of memory.
This is easy to fix - just skip the resizing if the queues haven't
been allocated yet.
2. if the initial pool was created by qeth_alloc_qdio_queues() but a
subsequent resizing fails, then the device has no(!) RX buffer pool.
The next initialization will _not_ call qeth_alloc_qdio_queues(), and
attempting to back the RX buffers with pages in
qeth_init_qdio_queues() will fail.
Not very difficult to fix either - instead of re-allocating the whole
pool, just allocate/free as many entries to match the desired size.
Fixes: 4a71df5004 ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RX buffer elements are always backed with full pages, reflect this
in the pointer type.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the ethtool hooks for the ETHTOOL_RX_COPYBREAK tunable.
The copybreak is stored into netdev_priv, so that we automatically go
back to the default value if the netdev is re-allocated.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qeth_l?_stop_card() is _never_ called while in HARDSETUP state, and
there's no other usage of the card state that relies on the
DOWN -> HARDSETUP -> SOFTSETUP transition.
As related cleanup, remove the check in qeth_realloc_buffer_pool() as it
is already done by the callers.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When data is received on the READ channel, the matching logic for cmds
that are waiting for a reply is currently hard-coded into the channel's
main IO callback.
Move this into a per-cmd callback, so that we can apply custom matching
logic for each individual cmd.
This also allows us to remove the coarse-grained check for unexpected
non-IPA replies, since they will no longer match against _all_ pending
cmds.
Note that IDX cmds use _no_ matcher, since their reply is synchronously
received as part of the cmd's IO.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Large parts of the online/offline code are identical now, and cleaning
up the remaining stuff is easier with a shared core.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move some duplicated logic into a shared code path.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit f677fcb9ae ("s390/qeth: ensure linear access to packet headers"),
the CQ-specific skbs are allocated with a slightly bigger linear part
than necessary. Shrink it down to the maximum that's needed by
qeth_extract_skb().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To reduce the path length and levels of indirection, move the RX
processing from the sub-drivers into the core.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert the old code to use struct qeth_ipa_caps, and while at it remove
all unused helper macros.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
card->wait_q is shared by different users, for different wake-up
conditions. qeth_irq() can potentially trigger multiple of these
conditions:
1) A change to channel->irq_pending, which qeth_send_control_data() is
waiting for.
2) A change to card->state, which qeth_clear_channel() and
qeth_halt_channel() are waiting for.
As qeth_irq() does only a single wake_up(), we might miss to wake up
a second eligible waiter. Luckily all waiters are guarded with a
timeout, so this situation should recover on its own eventually.
To make things work robustly, add an additional wake_up() for changes
to channel->state. And extract a helper that updates
channel->irq_pending along with the needed wake_up().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The cio layer's intparm logic does not align itself well with how qeth
manages cmd IOs. When an active IO gets terminated via halt/clear, the
corresponding IRQ's intparm does not reflect the cmd buffer but rather
the intparm that was passed to ccw_device_halt() / ccw_device_clear().
This behaviour was recently clarified in
commit b91d9e67e5 ("s390/cio: fix intparm documentation").
As a result, qeth_irq() currently doesn't cancel a cmd that was
terminated via halt/clear. This primarily causes us to leak
card->read_cmd after the qeth device is removed, since our IO path still
holds a refcount for this cmd.
For qeth this means that we need to keep track of which IO is pending on
a device ('active_cmd'), and use this as the intparm when calling
halt/clear. Otherwise qeth_irq() can't match the subsequent IRQ to its
cmd buffer.
Since we now keep track of the _expected_ intparm, we can also detect
any mismatch; this would constitute a bug somewhere in the lower layers.
In this case cancel the active cmd - we effectively "lost" the IRQ and
should not expect any further notification for this IO.
Fixes: 405548959c ("s390/qeth: add support for dynamically allocated cmds")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Depending on a packet's type, the RX path needs to access fields in the
packet headers and thus requires a minimum packet length.
Enforce this length when building the skb.
On the other hand a single runt packet is no reason to drop the whole
RX buffer. So just skip it, and continue processing on the next packet.
Fixes: 4a71df5004 ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Minor conflict in drivers/s390/net/qeth_l2_main.c, kept the lock
from commit c8183f5489 ("s390/qeth: fix potential deadlock on
workqueue flush"), removed the code which was removed by commit
9897d583b0 ("s390/qeth: consolidate some duplicated HW cmd code").
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The L2 bridgeport code uses the coarse 'conf_mutex' for guarding access
to its configuration state.
This can result in a deadlock when qeth_l2_stop_card() - called under the
conf_mutex - blocks on flush_workqueue() to wait for the completion of
pending bridgeport workers. Such workers would also need to aquire
the conf_mutex, stalling indefinitely.
Introduce a lock that specifically guards the bridgeport configuration,
so that the workers no longer need the conf_mutex.
Wrapping qeth_l2_promisc_to_bridge() in this fine-grained lock then also
fixes a theoretical race against a concurrent qeth_bridge_port_role_store()
operation.
Fixes: c0a2e4d10d ("s390/qeth: conclude all event processing before offlining a card")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use vlan_for_each() instead of tracking each registered VID internally.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each RX buffer may contain up to 64KB worth of data. In case the device
needs to discard a packet _after_ already having reserved space for it
in the buffer, the whole buffer gets set to ERROR state. As the buffer
might contain any number of good packets, this can result in collateral
packet loss.
qeth can provide relief by enabling per-frame invalidation. The RX
buffer is then presented as usual, we just need to spot & drop any
individual packet that was flagged as invalid.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Where available, use the fine-grained counters in rtnl_link_stats64 to
indicate different RX error causes. For drop reasons, use driver-private
ethtool counters.
In particular this patch allows us to keep track of driver-side drops due
to unknown/unsupported HW descriptor format.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For IQD devices with Multi-Write support, we can defer the queue-flush
further and transmit multiple IO buffers with a single TX doorbell.
The same-target restriction still applies.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IQD devices offer limited support for bulking: all frames in a TX buffer
need to have the same target. qeth_iqd_may_bulk() implements this
constraint, and allows us to defer the TX doorbell until
(a) the buffer is full (since each buffer needs its own doorbell), or
(b) the entire TX queue is full, or
(b) we reached the BQL limit.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each TX buffer may contain multiple skbs. So just accumulate the sent
byte count in the buffer struct, and later use the same count when
completing the buffer.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to their large MTU and potentially low utilization of TX buffers,
IQD devices in particular require fast TX recycling. This makes them
a prime candidate for a TX NAPI path in qeth.
qeth_tx_poll() uses the recently introduced qdio_inspect_queue() helper
to poll the TX queue for completed buffers. To avoid hogging the CPU for
too long, we yield to the stack after completing an entire queue's worth
of buffers.
While IQD is expected to transfer its buffers synchronously (and thus
doesn't support TX interrupts), a timer covers for the odd case where a
TX buffer doesn't complete synchronously. Currently this timer should
only ever fire for
(1) the mcast queue,
(2) the occasional race, where the NAPI poll code observes an update to
queue->used_buffers while the TX doorbell hasn't been issued yet.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This consolidates the SW statistics code, and improves it to
(1) account for the header overhead of each segment on a TSO skb,
(2) count dangling packets as in-error (during eg. shutdown), and
(3) only count offloads when the skb was successfully transmitted.
We also count each segment of an TSO skb as one packet - except for
tx_dropped, to be consistent with dev->tx_dropped.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have logic to determine the desired promisc mode in _each_ code path.
Change things around so that there is a clean split between
(a) high-level code that selects the new mode, and (b) implementations
of the various mechanisms to program this mode.
This also keeps qeth_promisc_to_bridge() from polluting the debug logs
on each RX modeset.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Except for card->read_cmd, every cmd we issue now passes through
qeth_send_control_data() and allocates a qeth_reply struct. The way we
use this struct requires additional refcounting, and pointer tracking.
Clean up things by moving most of qeth_reply's content into the main
cmd struct. This keeps things in one place, saves us the additional
refcounting and simplifies the overall code flow.
A nice little benefit is that we can now match incoming replies against
the pending requests themselves, without caching the requests' seqnos.
The qeth_reply struct stays around for a little bit longer in a shrunk
form, to avoid touching every single callback.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qeth_snmp_command_cb() is the only cmd callback that pulls the reply's
data length from a low-level transport header field. This requires
additional complexity (ie. reply->offset) to make the header accessible
to what is supposed to be a pure IPA cmd callback.
Adapter cmds have a length field in their sub-cmd header, get the data
length from there instead.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an cmd IO completes in qeth_irq(), calculate how much data was
processed by the device and pass this value to the cmd's callback.
This allows cmds that retrieve data from the device to check whether
sufficient data was received, so we do that in qeth_read_conf_data_cb().
Suggested-by: Jens Remus <jremus@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than fumbling with hard-coded offsets, use the proper struct to
access the retrieved RCD information.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Callbacks for a cmd reply run outside the protection of card->lock, to
allow for additional cmds to be issued & enqueued in parallel.
When qeth_send_control_data() bails out for a cmd without having
received a reply (eg. due to timeout), its callback may concurrently be
processing a reply that just arrived. In this case, the callback
potentially accesses a stale reply->reply_param area that eg. was
on-stack and has already been released.
To avoid this race, add some locking so that qeth_send_control_data()
can (1) wait for a concurrently running callback, and (2) zap any
pending callback that still wants to run.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
The cast type currently gets selected in .ndo_start_xmit, and is then
piped through several layers until it's stored into the HW header.
Push the selection down into qeth_l?_fill_header() to (1) reduce the
number of xmit-wide parameters, and (2) merge the two route validation
checks into just one.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As follow-up to commit 0cd6783d3c ("s390/qeth: check dst entry before use"),
consolidate the dst_check() logic into a single helper and add a wrapper
around the cast type selection.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
De-duplicate the pm callback implementations from the two sub-drivers,
replacing them with core helpers that delegate to the .set_online and
.set_offline callbacks.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that all cmds are dynamically allocated, the code for static cmd
buffers can go away entirely. Resulting in a nice reduction of
code/data size & complexity, while removing the risk that
qeth_clear_cmd_buffers() releases cmds that are still in-flight.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new wrapper that allocates DIAG cmds of the right size, and fills
in the common fields.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts the adapter, assist and bridgeport cmd paths to
dynamic allocation. Most of the work is about re-organizing the cmd
headers, calculating the correct cmd length, and filling in the right
value in the sub-cmd's length field.
Since we now also set the correct length for cmds that are not reflected
by a fixed struct (ie SNMP), we can remove the work-around from
qeth_snmp_command().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For code that uses qeth_send_simple_setassparms_prot(), we currently
can't differentiate whether the cmd should contain (1) no parameter, or
(2) a 4-byte parameter with value 0.
At the moment this doesn't cause any trouble. But when using dynamically
allocated cmds, we need to know whether to allocate & transmit an
additional 4 bytes of zeroes.
So instead of the raw parameter value, pass a parameter pointer
(or NULL) to qeth_send_simple_setassparms_prot().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch reduces the usage of the write channel's static cmd buffers,
by dynamically allocating all simple IPA cmds (eg. STARTLAN, SETVMAC).
It also converts the OSN path.
Doing so requires some changes to how we calculate the cmd length.
Currently when building IPA cmds, we're quite generous in how much data
we send down to the device (basically the size of the biggest cmd we
know). This is no real concern at the moment, since the static cmd
buffers are backed with zeroed pages. But for dynamic allocations, the
exact length matters. So this patch also adds the needed length
calculations to each cmd path.
Commands that have multiple subtypes (eg. SETADP) of differing length
will be converted with follow-up patches.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>