WSL2-Linux-Kernel/kernel
Valentin Schneider 1e0e63ad62 sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
[ Upstream commit 49bef33e4b ]

John reported that push_rt_task() can end up invoking
find_lowest_rq(rq->curr) when curr is not an RT task (in this case a CFS
one), which causes mayhem down convert_prio().

This can happen when current gets demoted to e.g. CFS when releasing an
rt_mutex, and the local CPU gets hit with an rto_push_work irqwork before
getting the chance to reschedule. Exactly who triggers this work isn't
entirely clear to me - switched_from_rt() only invokes rt_queue_pull_task()
if there are no RT tasks on the local RQ, which means the local CPU can't
be in the rto_mask.

My current suspected sequence is something along the lines of the below,
with the demoted task being current.

  mark_wakeup_next_waiter()
    rt_mutex_adjust_prio()
      rt_mutex_setprio() // deboost originally-CFS task
	check_class_changed()
	  switched_from_rt() // Only rt_queue_pull_task() if !rq->rt.rt_nr_running
	  switched_to_fair() // Sets need_resched
      __balance_callbacks() // if pull_rt_task(), tell_cpu_to_push() can't select local CPU per the above
      raw_spin_rq_unlock(rq)

       // need_resched is set, so task_woken_rt() can't
       // invoke push_rt_tasks(). Best I can come up with is
       // local CPU has rt_nr_migratory >= 2 after the demotion, so stays
       // in the rto_mask, and then:

       <some other CPU running rto_push_irq_work_func() queues rto_push_work on this CPU>
	 push_rt_task()
	   // breakage follows here as rq->curr is CFS

Move an existing check to check rq->curr vs the next pushable task's
priority before getting anywhere near find_lowest_rq(). While at it, add an
explicit sched_class of rq->curr check prior to invoking
find_lowest_rq(rq->curr). Align the DL logic to also reschedule regardless
of next_task's migratability.

Fixes: a7c81556ec ("sched: Fix migrate_disable() vs rt/dl balancing")
Reported-by: John Keeping <john@metanate.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: John Keeping <john@metanate.com>
Link: https://lore.kernel.org/r/20220127154059.974729-1-valentin.schneider@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08 14:23:11 +02:00
..
bpf bpf: Fix possible race in inc_misses_counter 2022-03-08 19:12:40 +01:00
cgroup cgroup-v1: Correct privileges check in release_agent writes 2022-03-02 11:47:47 +01:00
configs
debug kdb: Adopt scheduler's task classification 2021-11-18 19:17:06 +01:00
dma Revert "swiotlb: rework "fix info leak with DMA_FROM_DEVICE"" 2022-04-08 14:22:45 +02:00
entry signal: Replace force_fatal_sig with force_exit_sig when in doubt 2021-11-25 09:49:07 +01:00
events perf/core: Fix address filter parser for multiple filters 2022-04-08 14:23:10 +02:00
gcov
irq PCI/MSI: Move non-mask check back into low level accessors 2021-11-18 19:17:14 +01:00
kcsan LKMM updates: 2021-09-02 13:00:15 -07:00
livepatch
locking locking/lockdep: Avoid potential access of invalid memory in lock_class 2022-04-08 14:22:48 +02:00
power PM: suspend: fix return value of __setup handler 2022-04-08 14:23:07 +02:00
printk printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces 2021-11-25 09:48:45 +01:00
rcu rcu: Don't deboost before reporting expedited quiescent state 2022-03-28 09:58:45 +02:00
sched sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race 2022-04-08 14:23:11 +02:00
time clocksource: Avoid accidental unstable marking of clocksources 2022-01-27 11:04:08 +01:00
trace tracing: Have trace event string test handle zero length strings 2022-04-08 14:22:57 +02:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile
acct.c kernel/acct.c: use dedicated helper to access rlimit values 2021-09-08 11:50:26 -07:00
async.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-23 12:03:07 +01:00
audit.c audit: improve audit queue handling when "audit=1" on cmdline 2022-02-08 18:34:03 +01:00
audit.h audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:23:06 +02:00
audit_fsnotify.c
audit_tree.c audit: move put_tree() to avoid trim_trees refcount underflow and UAF 2021-08-24 18:52:36 -04:00
audit_watch.c
auditfilter.c
auditsc.c audit: log AUDIT_TIME_* records only from rules 2022-04-08 14:23:06 +02:00
backtracetest.c
bounds.c
capability.c
cfi.c
compat.c arch: remove compat_alloc_user_space 2021-09-08 15:32:35 -07:00
configs.c
context_tracking.c
cpu.c sched/scs: Reset task stack state in bringup_cpu() 2021-12-01 09:04:54 +01:00
cpu_pm.c
crash_core.c kernel/crash_core: suppress unknown crashkernel parameter warning 2021-12-29 12:28:49 +01:00
crash_dump.c
cred.c ucounts: Base set_cred_ucounts changes on the real user 2022-02-23 12:03:20 +01:00
delayacct.c
dma.c
exec_domain.c
exit.c io_uring: remove files pointer in cancellation functions 2021-08-23 13:10:37 -06:00
extable.c
fail_function.c
fork.c sched: Fix yet more sched_fork() races 2022-03-08 19:12:49 +01:00
freezer.c
futex.c futex: Remove unused variable 'vpid' in futex_proxy_trylock_atomic() 2021-09-03 23:00:22 +02:00
gen_kheaders.sh
groups.c
hung_task.c
iomem.c
irq_work.c
jump_label.c
kallsyms.c
kcmp.c
kcov.c
kexec.c kexec: avoid compat_alloc_user_space 2021-09-08 15:32:34 -07:00
kexec_core.c Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
kexec_elf.c
kexec_file.c
kexec_internal.h
kheaders.c
kmod.c
kprobes.c kprobes: Limit max data_size of the kretprobe instances 2021-12-08 09:04:41 +01:00
ksysfs.c
kthread.c
latencytop.c
module-internal.h
module.c Revert "module, async: async_synchronize_full() on module init iff async is used" 2022-02-23 12:03:07 +01:00
module_signature.c
module_signing.c
notifier.c
nsproxy.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
padata.c padata: Remove repeated verbose license text 2021-08-27 16:30:18 +08:00
panic.c Merge branch 'rework/printk_safe-removal' into for-linus 2021-08-30 16:36:10 +02:00
params.c
pid.c
pid_namespace.c memcg: enable accounting for new namesapces and struct nsproxy 2021-09-03 09:58:12 -07:00
profile.c profiling: fix shift-out-of-bounds bugs 2021-09-08 11:50:26 -07:00
ptrace.c ptrace: Check PTRACE_O_SUSPEND_SECCOMP permission on PTRACE_SEIZE 2022-04-08 14:22:50 +02:00
range.c
reboot.c
regset.c
relay.c
resource.c
resource_kunit.c
rseq.c rseq: Remove broken uapi field layout on 32-bit little endian 2022-04-08 14:23:10 +02:00
scftorture.c
scs.c scs: Release kasan vmalloc poison in scs_free process 2021-11-18 19:16:29 +01:00
seccomp.c seccomp: Invalidate seccomp mode to catch death failures 2022-02-16 12:56:38 +01:00
signal.c signal: In get_signal test for signal_group_exit every time through the loop 2022-03-08 19:12:34 +01:00
smp.c
smpboot.c
smpboot.h
softirq.c
stackleak.c gcc-plugins/stackleak: Use noinstr in favor of notrace 2022-02-23 12:03:07 +01:00
stacktrace.c
static_call.c
stop_machine.c
sys.c ucounts: Move RLIMIT_NPROC handling after set_user 2022-02-23 12:03:20 +01:00
sys_ni.c compat: remove some compat entry points 2021-09-08 15:32:35 -07:00
sysctl-test.c
sysctl.c x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting 2022-03-11 12:22:31 +01:00
task_work.c
taskstats.c
test_kprobes.c
torture.c
tracepoint.c
tsacct.c taskstats: Cleanup the use of task->exit_code 2022-01-27 11:05:35 +01:00
ucount.c ucounts: Handle wrapping in is_ucounts_overlimit 2022-02-23 12:03:20 +01:00
uid16.c
uid16.h
umh.c
up.c
user-return-notifier.c
user.c fs/epoll: use a per-cpu counter for user's watches count 2021-09-08 11:50:27 -07:00
user_namespace.c ucounts: Fix systemd LimitNPROC with private users regression 2022-03-08 19:12:42 +01:00
usermode_driver.c
utsname.c
utsname_sysctl.c
watch_queue.c watch_queue: Actually free the watch 2022-04-08 14:23:10 +02:00
watchdog.c
watchdog_hld.c
workqueue.c workqueue: Fix unbind_workers() VS wq_worker_running() race 2022-01-16 09:12:41 +01:00
workqueue_internal.h