workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()
busy_worker_rebind_fn() didn't clear WORKER_REBIND if rebinding failed
(CPU is down again). This used to be okay because the flag wasn't
used for anything else.
However, after 25511a477
"workqueue: reimplement CPU online rebinding
to handle idle workers", WORKER_REBIND is also used to command idle
workers to rebind. If not cleared, the worker may confuse the next
CPU_UP cycle by having REBIND spuriously set or oops / get stuck by
prematurely calling idle_worker_rebind().
WARNING: at /work/os/wq/kernel/workqueue.c:1323 worker_thread+0x4cd/0x5
00()
Hardware name: Bochs
Modules linked in: test_wq(O-)
Pid: 33, comm: kworker/1:1 Tainted: G O 3.6.0-rc1-work+ #3
Call Trace:
[<ffffffff8109039f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff810903fa>] warn_slowpath_null+0x1a/0x20
[<ffffffff810b3f1d>] worker_thread+0x4cd/0x500
[<ffffffff810bc16e>] kthread+0xbe/0xd0
[<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
---[ end trace e977cf20f4661968 ]---
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff810b3db0>] worker_thread+0x360/0x500
PGD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: test_wq(O-)
CPU 0
Pid: 33, comm: kworker/1:1 Tainted: G W O 3.6.0-rc1-work+ #3 Bochs Bochs
RIP: 0010:[<ffffffff810b3db0>] [<ffffffff810b3db0>] worker_thread+0x360/0x500
RSP: 0018:ffff88001e1c9de0 EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff88001e633e00 RCX: 0000000000004140
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
RBP: ffff88001e1c9ea0 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001fc8d580
R13: ffff88001fc8d590 R14: ffff88001e633e20 R15: ffff88001e1c6900
FS: 0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000130e8000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/1:1 (pid: 33, threadinfo ffff88001e1c8000, task ffff88001e1c6900)
Stack:
ffff880000000000 ffff88001e1c9e40 0000000000000001 ffff88001e1c8010
ffff88001e519c78 ffff88001e1c9e58 ffff88001e1c6900 ffff88001e1c6900
ffff88001e1c6900 ffff88001e1c6900 ffff88001fc8d340 ffff88001fc8d340
Call Trace:
[<ffffffff810bc16e>] kthread+0xbe/0xd0
[<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
Code: b1 00 f6 43 48 02 0f 85 91 01 00 00 48 8b 43 38 48 89 df 48 8b 00 48 89 45 90 e8 ac f0 ff ff 3c 01 0f 85 60 01 00 00 48 8b 53 50 <8b> 02 83 e8 01 85 c0 89 02 0f 84 3b 01 00 00 48 8b 43 38 48 8b
RIP [<ffffffff810b3db0>] worker_thread+0x360/0x500
RSP <ffff88001e1c9de0>
CR2: 0000000000000000
There was no reason to keep WORKER_REBIND on failure in the first
place - WORKER_UNBOUND is guaranteed to be set in such cases
preventing incorrectly activating concurrency management. Always
clear WORKER_REBIND.
tj: Updated comment and description.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
This commit is contained in:
Родитель
ee378aa49b
Коммит
960bd11bf2
|
@ -1349,7 +1349,15 @@ static void busy_worker_rebind_fn(struct work_struct *work)
|
|||
struct worker *worker = container_of(work, struct worker, rebind_work);
|
||||
struct global_cwq *gcwq = worker->pool->gcwq;
|
||||
|
||||
if (worker_maybe_bind_and_lock(worker))
|
||||
worker_maybe_bind_and_lock(worker);
|
||||
|
||||
/*
|
||||
* %WORKER_REBIND must be cleared even if the above binding failed;
|
||||
* otherwise, we may confuse the next CPU_UP cycle or oops / get
|
||||
* stuck by calling idle_worker_rebind() prematurely. If CPU went
|
||||
* down again inbetween, %WORKER_UNBOUND would be set, so clearing
|
||||
* %WORKER_REBIND is always safe.
|
||||
*/
|
||||
worker_clr_flags(worker, WORKER_REBIND);
|
||||
|
||||
spin_unlock_irq(&gcwq->lock);
|
||||
|
|
Загрузка…
Ссылка в новой задаче