WSL2-Linux-Kernel/kernel/sched
Zhang Qiao 3869eecf05 kernel/sched: Fix sched_fork() access an invalid sched_task_group
[ Upstream commit 4ef0c5c6b5 ]

There is a small race between copy_process() and sched_fork()
where child->sched_task_group point to an already freed pointer.

	parent doing fork()      | someone moving the parent
				 | to another cgroup
  -------------------------------+-------------------------------
  copy_process()
      + dup_task_struct()<1>
				  parent move to another cgroup,
				  and free the old cgroup. <2>
      + sched_fork()
	+ __set_task_cpu()<3>
	+ task_fork_fair()
	  + sched_slice()<4>

In the worst case, this bug can lead to "use-after-free" and
cause panic as shown above:

  (1) parent copy its sched_task_group to child at <1>;

  (2) someone move the parent to another cgroup and free the old
      cgroup at <2>;

  (3) the sched_task_group and cfs_rq that belong to the old cgroup
      will be accessed at <3> and <4>, which cause a panic:

  [] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  [] PGD 8000001fa0a86067 P4D 8000001fa0a86067 PUD 2029955067 PMD 0
  [] Oops: 0000 [#1] SMP PTI
  [] CPU: 7 PID: 648398 Comm: ebizzy Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0.x86_64+ #1
  [] RIP: 0010:sched_slice+0x84/0xc0

  [] Call Trace:
  []  task_fork_fair+0x81/0x120
  []  sched_fork+0x132/0x240
  []  copy_process.part.5+0x675/0x20e0
  []  ? __handle_mm_fault+0x63f/0x690
  []  _do_fork+0xcd/0x3b0
  []  do_syscall_64+0x5d/0x1d0
  []  entry_SYSCALL_64_after_hwframe+0x65/0xca
  [] RIP: 0033:0x7f04418cd7e1

Between cgroup_can_fork() and cgroup_post_fork(), the cgroup
membership and thus sched_task_group can't change. So update child's
sched_task_group at sched_post_fork() and move task_fork() and
__set_task_cpu() (where accees the sched_task_group) from sched_fork()
to sched_post_fork().

Fixes: 8323f26ce3 ("sched: Fix race in task_group")
Signed-off-by: Zhang Qiao <zhangqiao22@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lkml.kernel.org/r/20210915064030.2231-1-zhangqiao22@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-18 19:16:32 +01:00
..
Makefile sched: Trivial core scheduling cookie management 2021-05-12 11:43:31 +02:00
autogroup.c sched/autogroup: Make autogroup_path() always available 2019-06-24 19:23:40 +02:00
autogroup.h sched/headers: Simplify and clean up header usage in the scheduler 2018-03-04 12:39:29 +01:00
clock.c sched: Fix various typos 2021-03-22 00:11:52 +01:00
completion.c completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all() 2020-03-23 18:40:25 +01:00
core.c kernel/sched: Fix sched_fork() access an invalid sched_task_group 2021-11-18 19:16:32 +01:00
core_sched.c sched: prctl() core-scheduling interface 2021-05-12 11:43:31 +02:00
cpuacct.c sched: Wrap rq::lock access 2021-05-12 11:43:26 +02:00
cpudeadline.c sched,rt: Use the full cpumask for balancing 2020-11-10 18:39:00 +01:00
cpudeadline.h sched/headers: Simplify and clean up header usage in the scheduler 2018-03-04 12:39:29 +01:00
cpufreq.c cpufreq: Avoid leaving stale IRQ work items during CPU offline 2019-12-12 17:59:43 +01:00
cpufreq_schedutil.c cpufreq: schedutil: Use kobject release() method to free sugov_tunables 2021-08-06 15:34:55 +02:00
cpupri.c sched: Fix various typos 2021-03-22 00:11:52 +01:00
cpupri.h sched/cpupri: Add CPUPRI_HIGHER 2020-10-29 11:00:30 +01:00
cputime.c Scheduler updates for this cycle are: 2021-04-28 13:33:57 -07:00
deadline.c sched/deadline: Fix missing clock update in migrate_task_rq_dl() 2021-08-06 14:25:24 +02:00
debug.c sched/fair: Null terminate buffer when updating tunable_scaling 2021-10-01 13:57:57 +02:00
fair.c sched/fair: Add ancestors of unthrottled undecayed cfs_rq 2021-10-01 13:57:57 +02:00
features.h sched: Warn on long periods of pending need_resched 2021-04-21 13:55:41 +02:00
idle.c sched/idle: Make the idle timer expire in hard interrupt context 2021-09-09 10:36:16 +02:00
isolation.c sched/isolation: Reconcile rcu_nocbs= and nohz_full= 2021-05-13 14:12:47 +02:00
loadavg.c sched: Make multiple runqueue task counters 32-bit 2021-05-12 21:34:17 +02:00
membarrier.c sched/membarrier: fix missing local execution of ipi_sync_rq_state() 2021-03-06 12:40:21 +01:00
pelt.c sched: Fix various typos 2021-03-22 00:11:52 +01:00
pelt.h Merge branch 'sched/urgent' into sched/core, to resolve conflicts 2021-06-18 11:31:25 +02:00
psi.c Merge branch 'for-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2021-07-01 17:22:14 -07:00
rt.c sched/rt: Fix RT utilization tracking during policy change 2021-06-22 16:41:59 +02:00
sched-pelt.h sched/fair: Fix "runnable_avg_yN_inv" not used warnings 2019-06-17 12:15:58 +02:00
sched.h Scheduler changes for v5.15 are: 2021-08-30 13:42:10 -07:00
smp.h sched/headers: Split out open-coded prototypes into kernel/sched/smp.h 2020-05-28 11:03:20 +02:00
stats.c sched: Fix various typos 2021-03-22 00:11:52 +01:00
stats.h sched: Introduce task_is_running() 2021-06-18 11:43:07 +02:00
stop_task.c sched: Introduce sched_class::pick_task() 2021-05-12 11:43:28 +02:00
swait.c sched/swait: Prepare usage in completions 2020-03-21 16:00:23 +01:00
topology.c sched/topology: Skip updating masks for non-online nodes 2021-08-20 12:32:57 +02:00
wait.c rq-qos: fix missed wake-ups in rq_qos_throttle try two 2021-06-08 15:12:57 -06:00
wait_bit.c sched/wait: fix ___wait_var_event(exclusive) 2019-12-17 13:32:50 +01:00