In order to implement byte queue limits (bql) in CAN drivers, the length of the
CAN frame needs to be passed into the networking stack after queueing and after
transmission completion.
To avoid to calculate this length twice, extend can_get_echo_skb() to return
that value. Convert all users of this function, too.
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-14-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Add a frame_len argument to can_put_echo_skb() which is used to save length of
the CAN frame into field frame_len of struct can_skb_priv so that it can be
later used after transmission completion. Convert all users of this function,
too.
Drivers which implement BQL call can_put_echo_skb() with the output of
can_skb_get_frame_len(skb) and drivers which do not simply pass zero as an
input (in the same way that NULL would be given to can_get_echo_skb()). This
way, we have a nice symmetry between the two echo functions.
Link: https://lore.kernel.org/r/20210111061335.39983-1-mailhol.vincent@wanadoo.fr
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/r/20210111141930.693847-13-mkl@pengutronix.de
Signed-off-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
In the error handling in c_can_power_up(), there are two bugs:
1) c_can_pm_runtime_get_sync() will increase usage counter if device is not
empty. Forgetting to call c_can_pm_runtime_put_sync() will result in a
reference leak here.
2) c_can_reset_ram() operation will set start bit when enable is true. We
should clear it in the error handling.
We fix it by adding c_can_pm_runtime_put_sync() for 1), and
c_can_reset_ram(enable is false) for 2) in the error handling.
Fixes: 8212003260 ("can: c_can: Add d_can suspend resume support")
Fixes: 52cde85acc ("can: c_can: Add d_can raminit support")
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
Link: https://lore.kernel.org/r/20201128133922.3276973-2-zhangqilong3@huawei.com
[mkl: return "0" instead of "ret"]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The naming of can_dlc as element of struct can_frame and also as variable
name is misleading as it claims to be a 'data length CODE' but in reality
it always was a plain data length.
With the indroduction of a new 'len' element in struct can_frame we can now
remove can_dlc as name and make clear which of the former uses was a plain
length (-> 'len') or a data length code (-> 'dlc') value.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20201120100444.3199-1-socketcan@hartkopp.net
[mkl: gs_usb: keep struct gs_host_frame::can_dlc as is]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The get_can_dlc() macro is used to ensure the payload length information of
the Classical CAN frame to be max 8 bytes (the CAN_MAX_DLEN).
Rename the macro and use the correct constant in preparation of the len/dlc
cleanup for Classical CAN frames.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20201110101852.1973-3-socketcan@hartkopp.net
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
While the state is updated when the error counters increase and
decrease, there is no event when the bus recovers and the error counters
decrease again. So add that event as well.
Change the state going downward to be ERROR_PASSIVE -> ERROR_WARNING ->
ERROR_ACTIVE instead of directly to ERROR_ACTIVE again.
Signed-off-by: Jeroen Hofstee <jhofstee@victronenergy.com>
Acked-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be>
Tested-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
When the CAN interface is closed it the hardwre is put in power down
mode, but does not reset the error counters / state. Reset the D_CAN on
open, so the reported state and the actual state match.
According to [1], the C_CAN module doesn't have the software reset.
[1] http://www.bosch-semiconductors.com/media/ip_modules/pdf_2/c_can_fd8/users_manual_c_can_fd8_r210_1.pdf
Signed-off-by: Jeroen Hofstee <jhofstee@victronenergy.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
When the status register is read without the status IRQ pending, the
chip may not raise the interrupt line for an upcoming status interrupt
and the driver may miss a status interrupt.
It is critical that the BUSOFF status interrupt is forwarded to the
higher layers, since no more interrupts will follow without
intervention.
Thanks to Wolfgang and Joe for bringing up the first idea.
Signed-off-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: Joe Burmeister <joe.burmeister@devtank.co.uk>
Fixes: fa39b54ccf ("can: c_can: Get rid of pointless interrupts")
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
napi_complete_done() allows to opt-in for gro_flush_timeout,
added back in linux-3.19, commit 3b47d30396
("net: gro: add a per device gro flush timer")
This allows for more efficient GRO aggregation without
sacrifying latencies.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When testing CAN write floods on Altera's CycloneV, the first 2 bytes
are sometimes 0x00, 0x00 or corrupted instead of the values sent. Also
observed bytes 4 & 5 were corrupted in some cases.
The D_CAN Data registers are 32 bits and changing from 16 bit writes to
32 bit writes fixes the problem.
Testing performed on Altera CycloneV (D_CAN). Requesting tests on other
C_CAN & D_CAN platforms.
Reported-by: Richard Andrysek <richard.andrysek@gomtec.de>
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The assignment 'cf->data[2] |= CAN_ERR_PROT_UNSPEC' used at CAN error message
creation time is obsolete as CAN_ERR_PROT_UNSPEC is zero and cf->data[2] is
initialized with zero in alloc_can_err_skb() anyway.
So we could either assign 'cf->data[2] = CAN_ERR_PROT_UNSPEC' correctly or we
can remove the obsolete OR operation entirely.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
As Dan Carpenter reported in http://marc.info/?l=linux-can&m=144793696016187
the assignment of the error location in CAN error messages had some bit wise
overlaps. Indeed the value to be assigned in data[3] is no bitfield but defines
a single value which points to a location inside the CAN frame on the wire.
This patch fixes the assignments for the error locations in error messages.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The previous change 3973c526ae (net: can: c_can: Disable pins when CAN
interface is down) causes a slight glitch on the pinctrl settings when used.
Since commit ab78029 (drivers/pinctrl: grab default handles from device core),
the device core will automatically set the default pins. This causes the pins
to be momentarily set to the default and then to the sleep state in
register_c_can_dev(). By adding an optional "enable" state, boards can set the
default pin state to be disabled and avoid the glitch when the switch from
default to sleep first occurs. If the "enable" state is not available
c_can_pinctrl_select_state() falls back to using the "default" pinctrl state.
[Roger Q] - Forward port to v4.2 and use pinctrl_get_select().
Signed-off-by: J.D. Schroeder <jay.schroeder@garmin.com>
Signed-off-by: Roger Quadros <rogerq@ti.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Conflicts:
arch/arm/boot/dts/imx6sx-sdb.dts
net/sched/cls_bpf.c
Two simple sets of overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
Put controller into init mode in network stop to end pending transmissions. The
issue is observed in cases when transmitted frame is not acked.
Signed-off-by: Viktor Babrian <babrian.viktor@renyi.mta.hu>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
In order to be able to move the stats increment from can_bus_off() into
can_change_state(), the increment had to be moved back into code that was using
can_bus_off() but not can_change_state().
As a side-effect, this patch fixes the following bugs:
* Redundant call to can_bus_off() in c_can.
* Bus-off counted twice in xilinx_can.
Signed-off-by: Andri Yngvason <andri.yngvason@marel.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
DRA7 CAN IP suffers from a problem which causes it to be prevented
from fully turning OFF (i.e. stuck in transition) if the module was
disabled while there was traffic on the CAN_RX line.
To work around this issue we select the SLEEP pin state by default
on probe and use the DEFAULT pin state on CAN up and back to the
SLEEP pin state on CAN down.
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Conflicts:
drivers/net/bonding/bond_alb.c
drivers/net/ethernet/altera/altera_msgdma.c
drivers/net/ethernet/altera/altera_sgdma.c
net/ipv6/xfrm6_output.c
Several cases of overlapping changes.
The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.
In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.
Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.
Signed-off-by: David S. Miller <davem@davemloft.net>
Add helpers for 32-bit accesses and replace open-coded 32-bit access
with calls to helpers. Minimum changes are done to the pci case, as I
don't have access to that hardware.
Tested-by: Thor Thayer <tthayer@altera.com>
Signed-off-by: Thor Thayer <tthayer@altera.com>
Signed-off-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
In 2b9aecdce2 ("can: c_can: Disable rx split as workaround") a new Kconfig
option was introduced as a workaround. The tests performed by Alexander Stein
confirmed this option to be obsolete with all the other cleanups and fixes
that had been discussed that time:
http://marc.info/?l=linux-can&m=139746476821294&w=2
Both (author and tester) agreed to remove this Kconfig option again:
http://marc.info/?l=linux-can&m=139883820714229&w=2
As some more cleanups took place since then a simple revert is not possible.
This patch removes the entire option as it would behave when disabled.
Further beautification’s can be done later.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
It's suffcient to kill the TXIE bit in the message control register
even if the documentation of C and D CAN says that it's not allowed to
do that while MSGVAL is set. Reality tells a different story and this
change gives us another 2% of CPU back for not waiting on I/O.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Mark suggested to use one IF for the softirq and the other for the
xmit function to avoid the xmit lock.
That requires to write the frame into the interface first, then handle
the echo skb and store the dlc before committing the TX request to the
message ram.
We use an atomic to handle the active buffers instead of reading the
MSGVAL register as thats way faster especially on PCH/x86.
Suggested-by: Mark <mark5@del-llc.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Instead of obfuscating the code by artificial 16 bit splits use the
proper 32 bit assignments and split the result when writing to the
interface.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Remove the MASK from the TX transfer side.
Make the code readable and get rid of the annoying IFX_WRITE_XXX_16BIT
macros which are just obfuscating the code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Sigh!
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Alexander reported that the new optimized handling of the RX fifo
causes random packet loss on Intel PCH C_CAN hardware.
After a few fruitless debugging sessions I got hold of a PCH (eg20t)
afflicted system. That machine does not have the CAN interface wired
up, but it was possible to reproduce the issue with the HW loopback
mode.
As Alexander observed correctly, clearing the NewDat flag along with
reading out the message buffer causes that issue on C_CAN, while D_CAN
handles that correctly.
Instead of restoring the original message buffer handling horror the
following workaround solves the issue:
transfer buffer to IF without clearing the NewDat
handle the message
clear NewDat bit
That's similar to the original code but conditional for C_CAN.
I really wonder why all user manuals (C_CAN, Intel PCH and some more)
recommend to clear the NewDat bit right away. The knows it all Oracle
operated by Gurgle does not unearth any useful information either. I
simply cannot believe that we are the first to uncover that HW issue.
Reported-and-tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The RX buffer split causes packet loss in the hardware:
What happens is:
RX Packet 1 --> message buffer 1 (newdat bit is not cleared)
RX Packet 2 --> message buffer 2 (newdat bit is not cleared)
RX Packet 3 --> message buffer 3 (newdat bit is not cleared)
RX Packet 4 --> message buffer 4 (newdat bit is not cleared)
RX Packet 5 --> message buffer 5 (newdat bit is not cleared)
RX Packet 6 --> message buffer 6 (newdat bit is not cleared)
RX Packet 7 --> message buffer 7 (newdat bit is not cleared)
RX Packet 8 --> message buffer 8 (newdat bit is not cleared)
Clear newdat bit in message buffer 1
Clear newdat bit in message buffer 2
Clear newdat bit in message buffer 3
Clear newdat bit in message buffer 4
Clear newdat bit in message buffer 5
Clear newdat bit in message buffer 6
Clear newdat bit in message buffer 7
Clear newdat bit in message buffer 8
Now if during that clearing of newdat bits, a new message comes in,
the HW gets confused and drops it.
It does not matter how many of them you clear. I put a delay between
clear of buffer 1 and buffer 2 which was long enough that the message
should have been queued either in buffer 1 or buffer 9. But it did not
show up anywhere. The next message ended up in buffer 1. So the
hardware lost a packet of course without telling it via one of the
error handlers.
That does not happen on all clear newdat bit events. I see one of 10k
packets dropped in the scenario which allows us to reproduce. But the
trace looks always the same.
Not splitting the RX Buffer avoids the packet loss but can cause
reordering. It's hard to trigger, but it CAN happen.
With that mode we use the HW as it was probably designed for. We read
from the buffer 1 upwards and clear the buffer as we get the
message. That's how all microcontrollers use it. So I assume that the
way we handle the buffers was never really tested. According to the
public documentation it should just work :)
Let the user decide which evil is the lesser one.
[ Oliver Hartkopp: Provided a sane config option and help text and
made me switch to favour potential and unlikely reordering over
packet loss ]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The driver handles pointlessly TWO interrupts per packet. The reason
is that it enables the status interrupt which fires for each rx and tx
packet and it enables the per message object interrupts as well.
The status interrupt merily acks or in case of D_CAN ignores the TX/RX
state and then the message object interrupt fires.
The message objects interrupts are only useful if all message objects
have hardware filters activated.
But we don't have that and its not simple to implement in that driver
without rewriting it completely.
So we can ditch the message object interrupts and handle the RX/TX
right away from the status interrupt. Instead of TWO we handle ONE.
Note: We must keep the TXIE/RXIE bits in the message buffers because
the status interrupt alone is not reliable enough in corner cases.
If we ever have the need for HW filtering, then this code needs a
complete overhaul and we can think about it then. For now we prefer a
lower interrupt load.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
On D_CAN the RXOK, TXOK and LEC bits are cleared/set on read of the
status register. No need to update them.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Instead of writing to the message object we can simply clear the
NewDat bit with the get method.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If the allocation of the error skb fails, we still want to see the
error statistics.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Reading the LEC type with
return (mode & ENABLED) && (status & LEC_MASK);
is not guaranteed to return (status & LEC_MASK) if the enabled bit in
mode is set. It's guaranteed to return 0 or !=0.
Remove the inline function and call unconditionally into the
berr_handling code and return early when the reporting is disabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If the allocation of an error skb fails, the state change handling
returns w/o doing any work. That leaves the interface in a wreckaged
state as the internal status is wrong.
Split the interface handling and the skb handling.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
There is no guarantee that the skb is in the same state after calling
net_receive_skb(). It might be freed or reused. Not really harmful as
its a read access, except you turn on the proper debugging options
which catch a use after free.
The whole can subsystem is full of this. Copy and paste ....
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The state change handler is called with device interrupts disabled
already. So no point in disabling them again when we enter bus off
state.
But what's worse is that we reenable the interrupts at the end of NAPI
poll unconditionally. So c_can_start() which is called from the
restart timer can trigger interrupts which confuse the hell out of the
half reinitialized driver/hw.
Remove the pointless device interrupt disable in the BUS_OFF handler
and prevent reenabling the device interrupts at the end of the poll
routine when the current state is BUS_OFF.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
c_can_start() enables interrupts way too early. The first enabling
happens when setting the control mode in c_can_chip_config() and then
again at the end of the function.
But that happens before napi_enable() and that means that an interrupt
which comes in will disable interrupts again and call napi_schedule,
which ignores the request and the later napi_enable() is not making
thinks work either. So the interface is up with all device interrupts
disabled.
Move the device interrupt after napi_enable() and add it to the other
callsites of c_can_start() in c_can_set_mode() and c_can_power_up()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Stein <alexander.stein@systec-electronic.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iEYEABECAAYFAlM6jfsACgkQjTAFq1RaXHOVTwCeOyOYOJ+Ze2x33WydGcGsvX8a
QPoAoJVAcekmL5MVbcMTTvc8QL1AebAm
=4aEp
-----END PGP SIGNATURE-----
Merge tag 'linux-can-fixes-for-3.15-20140401' of git://gitorious.org/linux-can/linux-can
linux-can-fixes-for-3.15-20140401
Marc Kleine-Budde says:
====================
this is a pull request of 16 patches for the 3.15 release cycle.
Bjorn Van Tilt contributes a patch which fixes a memory leak in usb_8dev's
usb_8dev_start_xmit()s error path. A patch by Robert Schwebel fixes a typo in
the can documentation. The remaining patches all target the c_can driver. Two
of them are by me; they add a missing netif_napi_del() and return value
checking. Thomas Gleixner contributes 12 patches, which address several
shortcomings in the driver like hardware initialisation, concurrency, message
ordering and poor performance.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no point to toggle the RX led for every packet. Especially if
we have a full FIFO we want to avoid everything we can.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The function loads the message object from the hardware to get the
payload length. The previous patch stores that information in an
array, so we can avoid the hardware access.
Remove the hardware access and move the led toggle outside of the
spinlocked region. Toggle the led only once when at least one packet
has been received.
Binary size shrinks along with the code
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
We can avoid the HW access in TX cleanup path for retrieving the DLC
of the sent package if we store the DLC in a private array.
Ideally this should be handled in the can_echo_skb functions, but I
leave that exercise to the CAN folks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
commit 4ce78a838c (can: c_can: Speed up rx_poll function) hyped a
performance improvement by reducing the access to the interrupt
pending register from a dual 16 bit to a single 16 bit access. Wow!
Thereby it crippled the driver to cast the 16 msg objects in stone,
which is completly braindead as contemporary hardware has up to 128
message objects. Supporting larger object buffers is a major surgery,
but it'd be definitely worth it especially as the driver does not
support HW message filtering ....
The logic of the "FIFO" implementation is to split the FIFO in half.
For the lower half we read the buffers and clear the interrupt pending
bit, but keep the newdat bit set, so the HW will queue above those
buffers.
When we read out the last low buffer then we reenable all the low half
buffers by clearing the newdat bit.
The upper half buffers clear the newdat and the interrupt pending bit
right away as we know that the lower half bits are clear and give us a
headstart against the hardware.
Now the implementation is:
transfer_message_object()
read_object_and_put_into_skb();
if (obj < END_OF_LOW_BUF)
clear_intpending(obj)
else if (obj > END_OF_LOW_BUF)
clear_intpending_and_newdat(obj)
else if (obj == END_OF_LOW_BUF)
clear_newdat_of_all_low_objects()
The hardware allows to avoid most of the mess simply because we can
tell the transfer_message_object() function to clear bits right away.
So we can be clever and do:
if (obj <= END_OF_LOW_BUF)
ctrl = TRANSFER_MSG | CLEAR_INTPND;
else
ctrl = TRANSFER_MSG | CLEAR_INTPND | CLEAR_NEWDAT;
transfer_message_object(ctrl)
read_object_and_put_into_skb();
if (obj == END_OF_LOW_BUF)
clear_newdat_of_all_low_objects()
So we save a complete control operation on all message objects except
the one which is the end of the low buffer. That's a few micro seconds
per object.
I'm not adding a boasting profile to that, simply because it's self
explaining.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mkl: adjusted subject and commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
If every other line contains line breaks, that's a clear sign for
indentation level madness. Split out the inner loop and move the code
to a separate function. gcc creates slightly worse code for that, but
we'll fix that in the next step.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mkl: adjusted subject]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The network core does not serialize the access to the hardware. The
xmit related code lets the following happen:
CPU0 CPU1
interrupt()
do_poll()
c_can_do_tx()
Fiddle with HW and xmit()
internal data Fiddle with HW and
internal data
due the complete lack of serialization.
Add proper locking.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
The rx_poll code has the following gem:
if (msg_ctrl_save & IF_MCONT_EOB)
return num_rx_pkts;
The EOB bit is the indicator for the hardware that this is the last
configured FIFO object. But this object can contain valid data, if we
manage to free up objects before the overrun case hits.
Now if the code exits due to the EOB bit set, then this buffer is
stale and the interrupt bit and NewDat bit of the buffer are still
set. Results in a nice interrupt storm unless we come into an overrun
situation where the MSGLST bit gets set.
ksoftirqd/0-3 [000] ..s. 79.124101: c_can_poll: rx_poll: val: 00008001 pend 00008001
ksoftirqd/0-3 [000] ..s. 79.124176: c_can_poll: rx_poll: val: 00008000 pend 00008000
ksoftirqd/0-3 [000] ..s. 79.124187: c_can_poll: rx_poll: val: 00008002 pend 00008002
ksoftirqd/0-3 [000] ..s. 79.124256: c_can_poll: rx_poll: val: 00008000 pend 00008000
ksoftirqd/0-3 [000] ..s. 79.124267: c_can_poll: rx_poll: val: 00008000 pend 00008000
The amazing thing is that the check of the MSGLST (aka overrun bit)
used to be after the check of the EOB bit. That was "fixed" in commit
5d0f801a2c(can: c_can: Fix RX message handling, handle lost message
before EOB). But the author of this "fix" did not even understand that
the EOB check is broken as well.
Again a simple solution: Remove
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[mkl: adjusted subject and commit message]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>