WSL2-Linux-Kernel/kernel
Mel Gorman 1c0908d8e4 rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench
on XFS on arm64.

 kernel BUG at fs/inode.c:625!
 Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
 CPU: 11 PID: 6611 Comm: dbench Tainted: G            E   6.0.0-rt14-rt+ #1
 pc : clear_inode+0xa0/0xc0
 lr : clear_inode+0x38/0xc0
 Call trace:
  clear_inode+0xa0/0xc0
  evict+0x160/0x180
  iput+0x154/0x240
  do_unlinkat+0x184/0x300
  __arm64_sys_unlinkat+0x48/0xc0
  el0_svc_common.constprop.4+0xe4/0x2c0
  do_el0_svc+0xac/0x100
  el0_svc+0x78/0x200
  el0t_64_sync_handler+0x9c/0xc0
  el0t_64_sync+0x19c/0x1a0

It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this
is likely a bug that existed forever and only became visible when ARM
support was added to preempt-rt. The same problem does not occur on x86-64
and he also reported that converting sb->s_inode_wblist_lock to
raw_spinlock_t makes the problem disappear indicating that the RT spinlock
variant is the problem.

Which in turn means that RT mutexes on ARM64 and any other weakly ordered
architecture are affected by this independent of RT.

Will Deacon observed:

  "I'd be more inclined to be suspicious of the slowpath tbh, as we need to
   make sure that we have acquire semantics on all paths where the lock can
   be taken. Looking at the rtmutex code, this really isn't obvious to me
   -- for example, try_to_take_rt_mutex() appears to be able to return via
   the 'takeit' label without acquire semantics and it looks like we might
   be relying on the caller's subsequent _unlock_ of the wait_lock for
   ordering, but that will give us release semantics which aren't correct."

Sebastian Andrzej Siewior prototyped a fix that does work based on that
comment but it was a little bit overkill and added some fences that should
not be necessary.

The lock owner is updated with an IRQ-safe raw spinlock held, but the
spin_unlock does not provide acquire semantics which are needed when
acquiring a mutex.

Adds the necessary acquire semantics for lock owner updates in the slow path
acquisition and the waiter bit logic.

It successfully completed 10 iterations of the dbench workload while the
vanilla kernel fails on the first iteration.

[ bigeasy@linutronix.de: Initial prototype fix ]

Fixes: 700318d1d7 ("locking/rtmutex: Use acquire/release semantics")
Fixes: 23f78d4a03 ("[PATCH] pi-futex: rt mutex core")
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.net
2022-12-12 19:55:56 +01:00
..
bpf bpf: Do not copy spin lock field from user in bpf_selem_alloc 2022-11-21 11:45:37 -08:00
cgroup memcg: Fix possible use-after-free in memcg_write_event_control() 2022-12-08 10:40:58 -08:00
configs Kbuild: add Rust support 2022-09-28 09:02:20 +02:00
debug mm: remove vmacache 2022-09-26 19:46:18 -07:00
dma - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
entry entry: kmsan: introduce kmsan_unpoison_entry_regs() 2022-10-03 14:03:25 -07:00
events - Fix a use-after-free case where the perf pending task callback would 2022-12-04 12:36:23 -08:00
futex freezer,sched: Rewrite core freezer logic 2022-09-07 21:53:50 +02:00
gcov gcov: clang: fix the buffer overflow issue 2022-11-22 18:50:41 -08:00
irq genirq: Provide generic_handle_domain_irq_safe(). 2022-09-19 15:08:38 +02:00
kcsan treewide: use get_random_bytes() when possible 2022-10-11 17:42:58 -06:00
livepatch Livepatching changes for 6.1 2022-10-10 11:36:19 -07:00
locking rtmutex: Add acquire semantics for rtmutex lock acquisition slow path 2022-12-12 19:55:56 +01:00
module module: tracking: Keep a record of tainted unloaded modules only 2022-10-10 12:16:19 -07:00
power PM: hibernate: Allow hybrid sleep to work with s2idle 2022-10-25 14:53:19 +02:00
printk printk: Mark __printk percpu data ready __ro_after_init 2022-09-29 15:20:52 +02:00
rcu rcu: Keep synchronize_rcu() from enabling irqs in early boot 2022-10-20 15:34:49 -07:00
sched Revert "cpufreq: schedutil: Move max CPU capacity to sugov_policy" 2022-11-22 19:56:52 +01:00
time treewide: use prandom_u32_max() when possible, part 1 2022-10-11 17:42:55 -06:00
trace tracing: Free buffers when a used dynamic event is removed 2022-11-23 19:07:12 -05:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels" 2022-03-31 10:36:55 +02:00
Makefile kmsan: disable instrumentation of unsupported common kernel code 2022-10-03 14:03:20 -07:00
acct.c acct: use VMA iterator instead of linked list 2022-09-26 19:46:22 -07:00
async.c
audit.c audit: use time_after to compare time 2022-08-29 19:47:03 -04:00
audit.h audit: remove selinux_audit_rule_update() declaration 2022-09-07 11:30:15 -04:00
audit_fsnotify.c audit: fix potential double free on error path from fsnotify_add_inode_mark 2022-08-22 18:50:06 -04:00
audit_tree.c audit: use fsnotify group lock helpers 2022-04-25 14:37:28 +02:00
audit_watch.c audit_init_parent(): constify path 2022-09-01 17:39:30 -04:00
auditfilter.c
auditsc.c audit/stable-6.1 PR 20221003 2022-10-04 11:05:43 -07:00
backtracetest.c
bounds.c mm: multi-gen LRU: minimal implementation 2022-09-26 19:46:09 -07:00
capability.c xfs: don't generate selinux audit messages for capability testing 2022-03-09 10:32:06 -08:00
cfi.c cfi: Switch to -fsanitize=kcfi 2022-09-26 10:13:13 -07:00
compat.c
configs.c
context_tracking.c MAINTAINERS: Add Paul as context tracking maintainer 2022-07-05 13:33:00 -07:00
cpu.c Intel Trust Domain Extensions 2022-05-23 17:51:12 -07:00
cpu_pm.c context_tracking: Take IRQ eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
crash_core.c vmcoreinfo: add kallsyms_num_syms symbol 2022-08-28 14:02:44 -07:00
crash_dump.c
cred.c x86: Mark __invalid_creds() __noreturn 2022-03-15 10:32:44 +01:00
delayacct.c delayacct: support re-entrance detection of thrashing accounting 2022-09-26 19:46:07 -07:00
dma.c
exec_domain.c
exit.c - hfs and hfsplus kmap API modernization from Fabio Francesco 2022-10-12 11:00:22 -07:00
extable.c context_tracking: Take NMI eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
fail_function.c fail_function: fix wrong use of fei_attr_remove() 2022-09-11 21:55:11 -07:00
fork.c - hfs and hfsplus kmap API modernization from Fabio Francesco 2022-10-12 11:00:22 -07:00
freezer.c freezer,sched: Rewrite core freezer logic 2022-09-07 21:53:50 +02:00
gen_kheaders.sh kbuild: build init/built-in.a just once 2022-09-29 04:40:15 +09:00
groups.c security: Add LSM hook to setgroups() syscall 2022-07-15 18:21:49 +00:00
hung_task.c sched: Fix more TASK_state comparisons 2022-09-30 16:50:39 +02:00
iomem.c
irq_work.c irq_work: use kasan_record_aux_stack_noalloc() record callstack 2022-04-15 14:49:55 -07:00
jump_label.c jump_label: make initial NOP patching the special case 2022-06-24 09:48:55 +02:00
kallsyms.c kcfi updates for v6.1-rc1 2022-10-03 17:11:07 -07:00
kallsyms_internal.h kallsyms: move declarations to internal header 2022-07-17 17:31:39 -07:00
kcmp.c
kcov.c kcov: kmsan: unpoison area->list in kcov_remote_area_put() 2022-10-03 14:03:23 -07:00
kexec.c panic, kexec: make __crash_kexec() NMI safe 2022-09-11 21:55:06 -07:00
kexec_core.c kexec: replace kmap() with kmap_local_page() 2022-09-11 21:55:08 -07:00
kexec_elf.c
kexec_file.c panic, kexec: make __crash_kexec() NMI safe 2022-09-11 21:55:06 -07:00
kexec_internal.h panic, kexec: make __crash_kexec() NMI safe 2022-09-11 21:55:06 -07:00
kheaders.c
kmod.c
kprobes.c kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case 2022-11-18 10:15:34 +09:00
ksysfs.c kexec: turn all kexec_mutex acquisitions into trylocks 2022-09-11 21:55:06 -07:00
kthread.c signal: break out of wait loops on kthread_stop() 2022-10-09 16:01:59 -07:00
latencytop.c latencytop: use the last element of latency_record of system 2022-09-11 21:55:12 -07:00
module_signature.c
notifier.c notifier: Add blocking/atomic_notifier_chain_register_unique_prio() 2022-05-19 19:30:30 +02:00
nsproxy.c Revert "fs/exec: allow to unshare a time namespace on vfork+exec" 2022-09-13 10:38:43 -07:00
padata.c
panic.c kernel/panic: Drop unblank_screen call 2022-09-01 16:55:35 +02:00
params.c
pid.c gfs2: Add glockfd debugfs file 2022-06-29 13:07:16 +02:00
pid_namespace.c kernel: pid_namespace: use NULL instead of using plain integer as pointer 2022-04-29 14:38:00 -07:00
profile.c kernel/profile.c: simplify duplicated code in profile_setup() 2022-09-11 21:55:12 -07:00
ptrace.c freezer,sched: Rewrite core freezer logic 2022-09-07 21:53:50 +02:00
range.c
reboot.c kernel/reboot: Add SYS_OFF_MODE_RESTART_PREPARE mode 2022-10-04 15:59:36 +02:00
regset.c
relay.c relay: use kvcalloc to alloc page array in relay_alloc_page_array 2022-10-03 14:21:43 -07:00
resource.c resource: Introduce alloc_free_mem_region() 2022-07-21 17:19:25 -07:00
resource_kunit.c
rseq.c rseq: Use pr_warn_once() when deprecated/unknown ABI flags are encountered 2022-11-14 09:58:32 +01:00
scftorture.c scftorture: Fix distribution of short handler delays 2022-04-11 17:07:29 -07:00
scs.c kasan, vmalloc: only tag normal vmalloc allocations 2022-03-24 19:06:48 -07:00
seccomp.c seccomp: Add wait_killable semantic to seccomp user notifier 2022-05-03 14:11:58 -07:00
signal.c Scheduler changes for v6.1: 2022-10-10 09:10:28 -07:00
smp.c bitmap patches for v6.1-rc1 2022-10-10 12:49:34 -07:00
smpboot.c smpboot: use atomic_try_cmpxchg in cpu_wait_death and cpu_report_death 2022-09-11 21:55:10 -07:00
smpboot.h
softirq.c context_tracking: Take IRQ eqs entrypoints over RCU 2022-07-05 13:32:59 -07:00
stackleak.c stackleak: add on/off stack variants 2022-05-08 01:33:09 -07:00
stacktrace.c
static_call.c static_call: Don't make __static_call_return0 static 2022-04-05 09:59:38 +02:00
static_call_inline.c static_call: Don't make __static_call_return0 static 2022-04-05 09:59:38 +02:00
stop_machine.c Scheduler changes in this cycle were: 2022-05-24 11:11:13 -07:00
sys.c Random number generator updates for Linux 6.1-rc1. 2022-10-10 10:41:21 -07:00
sys_ni.c kernel/sys_ni: add compat entry for fadvise64_64 2022-08-20 15:17:45 -07:00
sysctl-test.c kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred} 2022-09-08 16:56:45 -07:00
sysctl.c proc: proc_skip_spaces() shouldn't think it is working on C strings 2022-12-05 12:09:06 -08:00
task_work.c task_work: use try_cmpxchg in task_work_add, task_work_cancel_match and task_work_run 2022-09-11 21:55:10 -07:00
taskstats.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
torture.c
tracepoint.c tracepoint: Optimize the critical region of mutex_lock in tracepoint_module_coming() 2022-09-26 13:01:18 -04:00
tsacct.c taskstats: version 12 with thread group and exe info 2022-04-29 14:38:03 -07:00
ucount.c ucounts: Split rlimit and ucount values and max values 2022-05-18 18:24:57 -05:00
uid16.c
uid16.h
umh.c freezer,sched: Rewrite core freezer logic 2022-09-07 21:53:50 +02:00
up.c
user-return-notifier.c
user.c
user_namespace.c ucounts: Split rlimit and ucount values and max values 2022-10-09 16:24:05 -07:00
usermode_driver.c blob_to_mnt(): kern_unmount() is needed to undo kern_mount() 2022-05-19 23:25:47 -04:00
utsname.c
utsname_sysctl.c kernel/utsname_sysctl.c: Fix hostname polling 2022-10-23 12:01:01 -07:00
watch_queue.c This was a moderately busy cycle for documentation, but nothing all that 2022-08-02 19:24:24 -07:00
watchdog.c powerpc updates for 6.0 2022-08-06 16:38:17 -07:00
watchdog_hld.c Revert "printk: add functions to prefer direct printing" 2022-06-23 18:41:40 +02:00
workqueue.c kcfi updates for v6.1-rc1 2022-10-03 17:11:07 -07:00
workqueue_internal.h