WSL2-Linux-Kernel/tools
Feng Tang fcd4f3a9d9 clocksource: Scale the watchdog read retries automatically
[ Upstream commit 2ed08e4bc53298db3f87b528cd804cb0cce066a9 ]

On a 8-socket server the TSC is wrongly marked as 'unstable' and disabled
during boot time on about one out of 120 boot attempts:

    clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns,
    wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable
    tsc: Marking TSC unstable due to clocksource watchdog
    TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
    sched_clock: Marking unstable (119294969739, 159204297)<-(125446229205, -5992055152)
    clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896.
    clocksource: Switched to clocksource hpet

The reason is that for platform with a large number of CPUs, there are
sporadic big or huge read latencies while reading the watchog/clocksource
during boot or when system is under stress work load, and the frequency and
maximum value of the latency goes up with the number of online CPUs.

The cCurrent code already has logic to detect and filter such high latency
case by reading the watchdog twice and checking the two deltas. Due to the
randomness of the latency, there is a low probabilty that the first delta
(latency) is big, but the second delta is small and looks valid. The
watchdog code retries the readouts by default twice, which is not
necessarily sufficient for systems with a large number of CPUs.

There is a command line parameter 'max_cswd_read_retries' which allows to
increase the number of retries, but that's not user friendly as it needs to
be tweaked per system. As the number of required retries is proportional to
the number of online CPUs, this parameter can be calculated at runtime.

Scale and enlarge the number of retries according to the number of online
CPUs and remove the command line parameter completely.

[ tglx: Massaged change log and comments ]

Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jin Wang <jin1.wang@intel.com>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Waiman Long <longman@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20240221060859.1027450-1-feng.tang@intel.com
Stable-dep-of: f2655ac2c06a ("clocksource: Fix brown-bag boolean thinko in cs_watchdog_read()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-19 05:45:45 +02:00
..
accounting
arch x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map 2024-06-16 13:39:31 +02:00
bootconfig bootconfig: Fix testcase to increase max node 2023-03-30 12:47:47 +02:00
bpf bpf: Fix potential integer overflow in resolve_btfids 2024-06-16 13:39:50 +02:00
build perf cs-etm: Bump minimum OpenCSD version to ensure a bugfix is present 2024-02-23 08:54:50 +01:00
cgroup
debugging
edid
firewire
firmware
gpio tools: gpio: fix debounce_period_us output of lsgpio 2023-06-21 15:59:12 +02:00
hv vmbus_testing: fix wrong python syntax for integer value comparison 2023-09-19 12:22:28 +02:00
iio tools: iio: replace seekdir() in iio_generic_buffer 2024-04-13 13:01:46 +02:00
include hugetlb_encode.h: fix undefined behaviour (34 << 26) 2024-07-05 09:14:23 +02:00
io_uring tools/io_uring/io_uring-cp: sync with liburing example 2021-08-13 08:58:11 -06:00
kvm/kvm_stat tools/kvm_stat: fix display of error when multiple processes are found 2022-08-11 13:07:51 +02:00
laptop
leds
lib libbpf: Fix no-args func prototype BTF dumping syntax 2024-08-19 05:45:23 +02:00
memory-model tools/memory-model: Fix bug in lock.cat 2024-08-19 05:45:15 +02:00
objtool exit: Rename module_put_and_exit to module_put_and_kthread_exit 2024-04-10 16:18:55 +02:00
pci tools: PCI: Zero-initialize param 2021-08-05 11:01:30 +01:00
pcmcia
perf perf intel-pt: Fix exclude_guest setting 2024-08-19 05:45:02 +02:00
power tools/power/cpupower: Fix Pstate frequency reporting on AMD Family 1Ah CPUs 2024-07-27 10:46:07 +02:00
rcu
scripts
spi
testing clocksource: Scale the watchdog read retries automatically 2024-08-19 05:45:45 +02:00
thermal/tmon tools/thermal: Fix possible path truncations 2022-08-17 14:24:15 +02:00
time
tracing tools/latency-collector: Fix -Wformat-security compile warns 2024-06-16 13:39:12 +02:00
usb usb: testusb: Fix for showing the connection speed 2021-09-14 10:31:41 +02:00
virtio tools/virtio: fix build 2024-03-01 13:21:52 +01:00
vm tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep" 2022-12-08 11:28:42 +01:00
wmi
Makefile