WSL2-Linux-Kernel/Documentation
Juergen Gross a5aabace5f locking/csd_lock: Add more data to CSD lock debugging
In order to help identifying problems with IPI handling and remote
function execution add some more data to IPI debugging code.

There have been multiple reports of CPUs looping long times (many
seconds) in smp_call_function_many() waiting for another CPU executing
a function like tlb flushing. Most of these reports have been for
cases where the kernel was running as a guest on top of KVM or Xen
(there are rumours of that happening under VMWare, too, and even on
bare metal).

Finding the root cause hasn't been successful yet, even after more than
2 years of chasing this bug by different developers.

Commit:

  35feb60474 ("kernel/smp: Provide CSD lock timeout diagnostics")

tried to address this by adding some debug code and by issuing another
IPI when a hang was detected. This helped mitigating the problem
(the repeated IPI unlocks the hang), but the root cause is still unknown.

Current available data suggests that either an IPI wasn't sent when it
should have been, or that the IPI didn't result in the target CPU
executing the queued function (due to the IPI not reaching the CPU,
the IPI handler not being called, or the handler not seeing the queued
request).

Try to add more diagnostic data by introducing a global atomic counter
which is being incremented when doing critical operations (before and
after queueing a new request, when sending an IPI, and when dequeueing
a request). The counter value is stored in percpu variables which can
be printed out when a hang is detected.

The data of the last event (consisting of sequence counter, source
CPU, target CPU, and event type) is stored in a global variable. When
a new event is to be traced, the data of the last event is stored in
the event related percpu location and the global data is updated with
the new event's data. This allows to track two events in one data
location: one by the value of the event data (the event before the
current one), and one by the location itself (the current event).

A typical printout with a detected hang will look like this:

csd: Detected non-responsive CSD lock (#1) on CPU#1, waiting 5000000003 ns for CPU#06 scf_handler_1+0x0/0x50(0xffffa2a881bb1410).
	csd: CSD lock (#1) handling prior scf_handler_1+0x0/0x50(0xffffa2a8813823c0) request.
        csd: cnt(00008cc): ffff->0000 dequeue (src cpu 0 == empty)
        csd: cnt(00008cd): ffff->0006 idle
        csd: cnt(0003668): 0001->0006 queue
        csd: cnt(0003669): 0001->0006 ipi
        csd: cnt(0003e0f): 0007->000a queue
        csd: cnt(0003e10): 0001->ffff ping
        csd: cnt(0003e71): 0003->0000 ping
        csd: cnt(0003e72): ffff->0006 gotipi
        csd: cnt(0003e73): ffff->0006 handle
        csd: cnt(0003e74): ffff->0006 dequeue (src cpu 0 == empty)
        csd: cnt(0003e7f): 0004->0006 ping
        csd: cnt(0003e80): 0001->ffff pinged
        csd: cnt(0003eb2): 0005->0001 noipi
        csd: cnt(0003eb3): 0001->0006 queue
        csd: cnt(0003eb4): 0001->0006 noipi
        csd: cnt now: 0003f00

The idea is to print only relevant entries. Those are all events which
are associated with the hang (so sender side events for the source CPU
of the hanging request, and receiver side events for the target CPU),
and the related events just before those (for adding data needed to
identify a possible race). Printing all available data would be
possible, but this would add large amounts of data printed on larger
configurations.

Signed-off-by: Juergen Gross <jgross@suse.com>
[ Minor readability edits. Breaks col80 but is far more readable. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20210301101336.7797-4-jgross@suse.com
2021-03-06 12:49:48 +01:00
..
ABI A handful of late-arriving documentation fixes, nothing all that notable. 2021-02-26 14:21:18 -08:00
PCI Documentation: PCI: Add PCI endpoint NTB function user guide 2021-02-23 14:15:45 -06:00
RCU It has been a relatively quiet cycle in docsland. 2021-02-22 10:57:46 -08:00
accounting
admin-guide locking/csd_lock: Add more data to CSD lock debugging 2021-03-06 12:49:48 +01:00
arm Documentation: ARM: fix reference to DT format documentation 2021-01-28 15:37:43 -07:00
arm64
block block/bfq: update comments and default value in docs for fifo_expire 2021-03-02 11:25:38 -07:00
bpf
cdrom
core-api Merge branch 'akpm' (patches from Andrew) 2021-02-24 16:20:38 -08:00
cpu-freq
crypto
dev-tools kasan: clarify that only first bug is reported in HW_TAGS 2021-02-26 09:41:03 -08:00
devicetree Devicetree fixes for v5.12-rc: 2021-03-05 12:12:28 -08:00
doc-guide docs: Document cross-referencing using relative path 2021-02-04 16:24:12 -07:00
driver-api Char/Misc driver patches for 5.12-rc1 2021-02-24 10:25:37 -08:00
fault-injection
fb
features Documentation: features: refresh feature list 2021-02-25 11:25:57 -07:00
filesystems Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-02-27 08:07:12 -08:00
firmware-guide Merge branch 'acpi-messages' 2021-02-15 17:04:53 +01:00
firmware_class
fpga
gpu It has been a relatively quiet cycle in docsland. 2021-02-22 10:57:46 -08:00
hid
hwmon hwmon: add Texas Instruments TPS23861 driver 2021-02-12 07:02:55 -08:00
i2c i2c: testunit: add support for block process calls 2021-02-12 11:11:04 +01:00
ia64
ide
iio
infiniband
input Documentation: input: define ABS_PRESSURE/ABS_MT_PRESSURE resolution as grams 2021-01-28 16:43:04 -07:00
isdn
kbuild Kbuild updates for v5.12 2021-02-25 10:17:31 -08:00
kernel-hacking docs: kernel-hacking: be more civil 2021-02-11 10:00:40 -07:00
leds
litmus-tests
livepatch
locking
m68k
maintainer
mhi
mips
misc-devices
netlabel
networking Staging/IIO driver patches for 5.12-rc1 2021-02-20 21:36:51 -08:00
nios2
nvdimm
openrisc
parisc
pcmcia
power It has been a relatively quiet cycle in docsland. 2021-02-22 10:57:46 -08:00
powerpc docs: powerpc: Fix tables in syscall64-abi.rst 2021-02-25 13:04:24 -07:00
process A handful of late-arriving documentation fixes, nothing all that notable. 2021-02-26 14:21:18 -08:00
riscv
s390
scheduler It has been a relatively quiet cycle in docsland. 2021-02-22 10:57:46 -08:00
scsi SCSI misc on 20210219 2021-02-22 10:24:58 -08:00
security Keyrings miscellany 2021-02-23 16:09:23 -08:00
sh
sound ALSA: jack: implement software jack injection via debugfs 2021-02-02 10:37:07 +01:00
sparc
sphinx docs: Enable usage of relative paths to docs on automarkup 2021-02-04 16:23:43 -07:00
sphinx-static
spi
staging
target
timers
trace Char/Misc driver patches for 5.12-rc1 2021-02-24 10:25:37 -08:00
translations A handful of late-arriving documentation fixes, nothing all that notable. 2021-02-26 14:21:18 -08:00
usb
userspace-api Char/Misc driver patches for 5.12-rc1 2021-02-24 10:25:37 -08:00
virt * Doc fixes 2021-03-04 11:26:17 -08:00
vm mm/debug_vm_pgtable/basic: add validation for dirtiness after write protect 2021-02-24 13:38:27 -08:00
w1
watchdog
x86 Documentation/x86/boot.rst: Correct the example of SETUP_INDIRECT 2021-01-28 15:25:31 -07:00
xtensa
.gitignore
COPYING-logo
Changes
CodingStyle
Kconfig
Makefile kbuild: remove PYTHON variable 2021-02-01 10:37:19 +09:00
SubmittingPatches
asm-annotations.rst
atomic_bitops.txt
atomic_t.txt
conf.py Fix unaesthetic indentation 2021-02-22 14:35:04 -07:00
docutils.conf
dontdiff
index.rst
logo.gif
memory-barriers.txt
watch_queue.rst