WSL2-Linux-Kernel/kernel/time
John Stultz 6fdda9a9c5 timekeeping: Avoid possible deadlock from clock_was_set_delayed
As part of normal operaions, the hrtimer subsystem frequently calls
into the timekeeping code, creating a locking order of
  hrtimer locks -> timekeeping locks

clock_was_set_delayed() was suppoed to allow us to avoid deadlocks
between the timekeeping the hrtimer subsystem, so that we could
notify the hrtimer subsytem the time had changed while holding
the timekeeping locks. This was done by scheduling delayed work
that would run later once we were out of the timekeeing code.

But unfortunately the lock chains are complex enoguh that in
scheduling delayed work, we end up eventually trying to grab
an hrtimer lock.

Sasha Levin noticed this in testing when the new seqlock lockdep
enablement triggered the following (somewhat abrieviated) message:

[  251.100221] ======================================================
[  251.100221] [ INFO: possible circular locking dependency detected ]
[  251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted
[  251.101967] -------------------------------------------------------
[  251.101967] kworker/10:1/4506 is trying to acquire lock:
[  251.101967]  (timekeeper_seq){----..}, at: [<ffffffff81160e96>] retrigger_next_event+0x56/0x70
[  251.101967]
[  251.101967] but task is already holding lock:
[  251.101967]  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[  251.101967]
[  251.101967] which lock already depends on the new lock.
[  251.101967]
[  251.101967]
[  251.101967] the existing dependency chain (in reverse order) is:
[  251.101967]
-> #5 (hrtimer_bases.lock#11){-.-...}:
[snipped]
-> #4 (&rt_b->rt_runtime_lock){-.-...}:
[snipped]
-> #3 (&rq->lock){-.-.-.}:
[snipped]
-> #2 (&p->pi_lock){-.-.-.}:
[snipped]
-> #1 (&(&pool->lock)->rlock){-.-...}:
[  251.101967]        [<ffffffff81194803>] validate_chain+0x6c3/0x7b0
[  251.101967]        [<ffffffff81194d9d>] __lock_acquire+0x4ad/0x580
[  251.101967]        [<ffffffff81194ff2>] lock_acquire+0x182/0x1d0
[  251.101967]        [<ffffffff84398500>] _raw_spin_lock+0x40/0x80
[  251.101967]        [<ffffffff81153e69>] __queue_work+0x1a9/0x3f0
[  251.101967]        [<ffffffff81154168>] queue_work_on+0x98/0x120
[  251.101967]        [<ffffffff81161351>] clock_was_set_delayed+0x21/0x30
[  251.101967]        [<ffffffff811c4bd1>] do_adjtimex+0x111/0x160
[  251.101967]        [<ffffffff811e2711>] compat_sys_adjtimex+0x41/0x70
[  251.101967]        [<ffffffff843a4b49>] ia32_sysret+0x0/0x5
[  251.101967]
-> #0 (timekeeper_seq){----..}:
[snipped]
[  251.101967] other info that might help us debug this:
[  251.101967]
[  251.101967] Chain exists of:
  timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11

[  251.101967]  Possible unsafe locking scenario:
[  251.101967]
[  251.101967]        CPU0                    CPU1
[  251.101967]        ----                    ----
[  251.101967]   lock(hrtimer_bases.lock#11);
[  251.101967]                                lock(&rt_b->rt_runtime_lock);
[  251.101967]                                lock(hrtimer_bases.lock#11);
[  251.101967]   lock(timekeeper_seq);
[  251.101967]
[  251.101967]  *** DEADLOCK ***
[  251.101967]
[  251.101967] 3 locks held by kworker/10:1/4506:
[  251.101967]  #0:  (events){.+.+.+}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
[  251.101967]  #1:  (hrtimer_work){+.+...}, at: [<ffffffff81154960>] process_one_work+0x200/0x530
[  251.101967]  #2:  (hrtimer_bases.lock#11){-.-...}, at: [<ffffffff81160e7c>] retrigger_next_event+0x3c/0x70
[  251.101967]
[  251.101967] stack backtrace:
[  251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053
[  251.101967] Workqueue: events clock_was_set_work

So the best solution is to avoid calling clock_was_set_delayed() while
holding the timekeeping lock, and instead using a flag variable to
decide if we should call clock_was_set() once we've released the locks.

This works for the case here, where the do_adjtimex() was the deadlock
trigger point. Unfortuantely, in update_wall_time() we still hold
the jiffies lock, which would deadlock with the ipi triggered by
clock_was_set(), preventing us from calling it even after we drop the
timekeeping lock. So instead call clock_was_set_delayed() at that point.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: stable <stable@vger.kernel.org> #3.10+
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
2013-12-23 11:53:33 -08:00
..
Kconfig nohz: Drop generic vtime obsolete dependency on CONFIG_64BIT 2013-09-30 15:37:01 +02:00
Makefile sched_clock: Make ARM's sched_clock generic for all architectures 2013-06-12 14:02:13 -07:00
alarmtimer.c alarmtimer: return EINVAL instead of ENOTSUPP if rtcdev doesn't exist 2013-10-18 16:23:58 -07:00
clockevents.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 10:36:00 +09:00
clocksource.c time: Fix signedness bug in sysfs_get_uname() and its callers 2013-10-18 16:45:58 -07:00
jiffies.c time: Kill xtime_lock, replacing it with jiffies_lock 2012-11-13 14:08:23 -05:00
ntp.c Linux 3.12-rc3 2013-10-03 07:52:21 +02:00
ntp_internal.h ntp: Rework do_adjtimex to take timespec and tai arguments 2013-04-04 13:18:15 -07:00
posix-clock.c kernel: Fix files explicitly needing EXPORT_SYMBOL infrastructure 2011-10-31 19:30:05 -04:00
sched_clock.c sched_clock: Remove sched_clock_func() hook 2013-10-09 16:54:39 -07:00
tick-broadcast.c nohz: Convert a few places to use local per cpu accesses 2013-12-02 20:39:30 +01:00
tick-common.c tick: Sanitize broadcast control logic 2013-07-02 14:26:45 +02:00
tick-internal.h nohz: Convert a few places to use local per cpu accesses 2013-12-02 20:39:30 +01:00
tick-oneshot.c clockevents: Make minimum delay adjustments configurable 2011-09-08 11:10:56 +02:00
tick-sched.c nohz: Convert a few places to use local per cpu accesses 2013-12-02 20:39:30 +01:00
timeconv.c time: add function to convert between calendar time and broken-down time for universal use 2009-09-24 07:20:56 -07:00
timekeeping.c timekeeping: Avoid possible deadlock from clock_was_set_delayed 2013-12-23 11:53:33 -08:00
timekeeping_debug.c power: Add option to log time spent in suspend 2013-05-29 12:57:34 -07:00
timekeeping_internal.h power: Add option to log time spent in suspend 2013-05-29 12:57:34 -07:00
timer_list.c timer_list: correct the iterator for timer_list 2013-08-28 19:26:38 -07:00
timer_stats.c timer stats: Add a 'Collection: active/inactive' line to timer usage statistics 2013-10-10 09:59:25 +02:00