WSL2-Linux-Kernel/kernel
Nicholas Piggin 9d08fce64d workqueue: Improve scalability of workqueue watchdog touch
[ Upstream commit 98f887f820c993e05a12e8aa816c80b8661d4c87 ]

On a ~2000 CPU powerpc system, hard lockups have been observed in the
workqueue code when stop_machine runs (in this case due to CPU hotplug).
This is due to lots of CPUs spinning in multi_cpu_stop, calling
touch_nmi_watchdog() which ends up calling wq_watchdog_touch().
wq_watchdog_touch() writes to the global variable wq_watchdog_touched,
and that can find itself in the same cacheline as other important
workqueue data, which slows down operations to the point of lockups.

In the case of the following abridged trace, worker_pool_idr was in
the hot line, causing the lockups to always appear at idr_find.

  watchdog: CPU 1125 self-detected hard LOCKUP @ idr_find
  Call Trace:
  get_work_pool
  __queue_work
  call_timer_fn
  run_timer_softirq
  __do_softirq
  do_softirq_own_stack
  irq_exit
  timer_interrupt
  decrementer_common_virt
  * interrupt: 900 (timer) at multi_cpu_stop
  multi_cpu_stop
  cpu_stopper_thread
  smpboot_thread_fn
  kthread

Fix this by having wq_watchdog_touch() only write to the line if the
last time a touch was recorded exceeds 1/4 of the watchdog threshold.

Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-09-12 11:07:52 +02:00
..
bpf bpf: Eliminate remaining "make W=1" warnings in kernel/bpf/btf.o 2024-08-19 05:44:59 +02:00
cgroup cgroup: Protect css->cgroup write under css_set_lock 2024-09-12 11:07:48 +02:00
configs
debug kdb: Use the passed prompt in kdb_position_cursor() 2024-08-19 05:45:22 +02:00
dma dma-mapping: benchmark: Don't starve others when doing the test 2024-09-12 11:07:48 +02:00
entry entry: Respect changes to system call number by trace_sys_enter() 2024-04-10 16:18:46 +02:00
events perf/aux: Fix AUX buffer serialization 2024-09-12 11:07:51 +02:00
futex
gcov gcov: add support for GCC 14 2024-07-05 09:14:34 +02:00
irq genirq/irqdesc: Honor caller provided affinity in alloc_desc() 2024-08-19 05:45:46 +02:00
kcsan
livepatch
locking rtmutex: Drop rt_mutex::wait_lock before scheduling 2024-09-12 11:07:43 +02:00
power PM: suspend: Set mem_sleep_current during kernel command line setup 2024-04-10 16:18:36 +02:00
printk printk: Update @console_may_schedule in console_trylock_spinning() 2024-04-10 16:18:47 +02:00
rcu rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflow 2024-09-12 11:07:42 +02:00
sched sched/smt: Fix unbalance sched_smt_present dec/inc 2024-08-19 05:45:47 +02:00
time hrtimer: Prevent queuing of hrtimer without a function callback 2024-09-04 13:23:28 +02:00
trace tracing: Avoid possible softlockup in tracing_iter_reset() 2024-09-12 11:07:44 +02:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile
acct.c
async.c
audit.c
audit.h
audit_fsnotify.c fsnotify: make allow_dups a property of the group 2024-04-10 16:19:02 +02:00
audit_tree.c fsnotify: pass flags argument to fsnotify_alloc_group() 2024-04-10 16:19:02 +02:00
audit_watch.c fsnotify: pass flags argument to fsnotify_alloc_group() 2024-04-10 16:19:02 +02:00
auditfilter.c ima: Avoid blocking in RCU read-side critical section 2024-07-18 13:07:34 +02:00
auditsc.c
backtracetest.c
bounds.c bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS 2024-05-02 16:24:50 +02:00
capability.c
cfi.c
compat.c
configs.c
context_tracking.c
cpu.c cpu/hotplug: Fix dynstate assignment in __cpuhp_setup_state_cpuslocked() 2024-07-05 09:14:48 +02:00
cpu_pm.c
crash_core.c
crash_dump.c
cred.c
delayacct.c
dma.c
exec_domain.c
exit.c mm: optimize the redundant loop of mm_update_owner_next() 2024-07-18 13:07:32 +02:00
extable.c
fail_function.c
fork.c
freezer.c
gen_kheaders.sh kheaders: explicitly define file modes for archived headers 2024-07-05 09:14:37 +02:00
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c
kallsyms.c
kcmp.c
kcov.c kcov: properly check for softirq context 2024-08-19 05:45:46 +02:00
kexec.c
kexec_core.c
kexec_elf.c
kexec_file.c
kexec_internal.h
kheaders.c
kmod.c
kprobes.c kprobes: Fix to check symbol prefixes correctly 2024-08-19 05:45:42 +02:00
ksysfs.c
kthread.c exit: Implement kthread_exit 2024-04-10 16:18:55 +02:00
latencytop.c
module-internal.h
module.c NFSD: Remove svc_serv_ops::svo_module 2024-04-10 16:19:01 +02:00
module_signature.c
module_signing.c
notifier.c
nsproxy.c
padata.c padata: Fix possible divide-by-0 panic in padata_mt_helper() 2024-08-19 05:45:47 +02:00
panic.c panic: Flush kernel log buffer at the end 2024-04-13 13:01:43 +02:00
params.c
pid.c
pid_namespace.c zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING 2024-07-05 09:14:24 +02:00
profile.c profiling: remove profile=sleep support 2024-08-19 05:45:39 +02:00
ptrace.c
range.c
reboot.c
regset.c
relay.c
resource.c
resource_kunit.c
rseq.c
scftorture.c
scs.c
seccomp.c
signal.c kernel: rerun task_work while freezing in get_signal() 2024-08-19 05:45:22 +02:00
smp.c smp: Add missing destroy_work_on_stack() call in smp_call_on_cpu() 2024-09-12 11:07:49 +02:00
smpboot.c
smpboot.h
softirq.c softirq: Fix suspicious RCU usage in __do_softirq() 2024-06-16 13:39:15 +02:00
stackleak.c
stacktrace.c
static_call.c
static_call_inline.c
stop_machine.c
sys.c
sys_ni.c syscalls: fix compat_sys_io_pgetevents_time64 usage 2024-07-05 09:14:50 +02:00
sysctl-test.c
sysctl.c
task_work.c task_work: Introduce task_work_cancel() again 2024-08-19 05:45:13 +02:00
taskstats.c
test_kprobes.c
torture.c
tracepoint.c
tsacct.c
ucount.c
uid16.c
uid16.h
umh.c
up.c
user-return-notifier.c
user.c
user_namespace.c
usermode_driver.c
utsname.c
utsname_sysctl.c
watch_queue.c
watchdog.c
watchdog_hld.c watchdog/perf: properly initialize the turbo mode timestamp and rearm counter 2024-08-19 05:45:20 +02:00
workqueue.c workqueue: Improve scalability of workqueue watchdog touch 2024-09-12 11:07:52 +02:00
workqueue_internal.h