WSL2-Linux-Kernel/kernel/sched
Pierre Gondois 94aeee7c21 sched/fair: Use all little CPUs for CPU-bound workloads
commit 3af7524b14198f5159a86692d57a9f28ec9375ce upstream.

Running N CPU-bound tasks on an N CPUs platform:

- with asymmetric CPU capacity

- not being a DynamIq system (i.e. having a PKG level sched domain
  without the SD_SHARE_PKG_RESOURCES flag set)

.. might result in a task placement where two tasks run on a big CPU
and none on a little CPU. This placement could be more optimal by
using all CPUs.

Testing platform:

  Juno-r2:
    - 2 big CPUs (1-2), maximum capacity of 1024
    - 4 little CPUs (0,3-5), maximum capacity of 383

Testing workload ([1]):

  Spawn 6 CPU-bound tasks. During the first 100ms (step 1), each tasks
  is affine to a CPU, except for:

    - one little CPU which is left idle.
    - one big CPU which has 2 tasks affine.

  After the 100ms (step 2), remove the cpumask affinity.

Behavior before the patch:

  During step 2, the load balancer running from the idle CPU tags sched
  domains as:

  - little CPUs: 'group_has_spare'. Cf. group_has_capacity() and
    group_is_overloaded(), 3 CPU-bound tasks run on a 4 CPUs
    sched-domain, and the idle CPU provides enough spare capacity
    regarding the imbalance_pct

  - big CPUs: 'group_overloaded'. Indeed, 3 tasks run on a 2 CPUs
    sched-domain, so the following path is used:

      group_is_overloaded()
      \-if (sgs->sum_nr_running <= sgs->group_weight) return true;

    The following path which would change the migration type to
    'migrate_task' is not taken:

      calculate_imbalance()
      \-if (env->idle != CPU_NOT_IDLE && env->imbalance == 0)

    as the local group has some spare capacity, so the imbalance
    is not 0.

  The migration type requested is 'migrate_util' and the busiest
  runqueue is the big CPU's runqueue having 2 tasks (each having a
  utilization of 512). The idle little CPU cannot pull one of these
  task as its capacity is too small for the task. The following path
  is used:

   detach_tasks()
   \-case migrate_util:
     \-if (util > env->imbalance) goto next;

After the patch:

As the number of failed balancing attempts grows (with
'nr_balance_failed'), progressively make it easier to migrate
a big task to the idling little CPU. A similar mechanism is
used for the 'migrate_load' migration type.

Improvement:

Running the testing workload [1] with the step 2 representing
a ~10s load for a big CPU:

  Before patch: ~19.3s
  After patch:  ~18s (-6.7%)

Similar issue reported at:

  https://lore.kernel.org/lkml/20230716014125.139577-1-qyousef@layalina.io/

Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Qais Yousef <qyousef@layalina.io>
Link: https://lore.kernel.org/r/20231206090043.634697-1-pierre.gondois@arm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-19 05:45:12 +02:00
..
Makefile sched: Trivial core scheduling cookie management 2021-05-12 11:43:31 +02:00
autogroup.c sched/fair: Prevent dead task groups from regaining cfs_rq's 2021-11-25 09:48:32 +01:00
autogroup.h
clock.c sched: Fix various typos 2021-03-22 00:11:52 +01:00
completion.c completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all() 2020-03-23 18:40:25 +01:00
core.c sched/fair: set_load_weight() must also call reweight_task() for SCHED_IDLE tasks 2024-08-19 05:45:11 +02:00
core_sched.c sched: prctl() core-scheduling interface 2021-05-12 11:43:31 +02:00
cpuacct.c sched/cpuacct: Optimize away RCU read lock 2023-10-06 13:18:19 +02:00
cpudeadline.c sched/core: Introduce sched_asym_cpucap_active() 2022-12-31 13:14:01 +01:00
cpudeadline.h
cpufreq.c cpufreq: Avoid leaving stale IRQ work items during CPU offline 2019-12-12 17:59:43 +01:00
cpufreq_schedutil.c sched/uclamp: Fix iowait boost escaping uclamp restriction 2022-04-08 14:23:10 +02:00
cpupri.c sched/rt: Fix live lock between select_fallback_rq() and RT push 2023-10-06 13:18:22 +02:00
cpupri.h sched/cpupri: Add CPUPRI_HIGHER 2020-10-29 11:00:30 +01:00
cputime.c cputime, cpuacct: Include guest time in user time in cpuacct.stat 2022-01-27 11:05:09 +01:00
deadline.c sched: Fix stop_one_cpu_nowait() vs hotplug 2023-11-20 11:08:13 +01:00
debug.c sched: Fix DEBUG && !SCHEDSTATS warn 2023-05-11 23:00:40 +09:00
fair.c sched/fair: Use all little CPUs for CPU-bound workloads 2024-08-19 05:45:12 +02:00
features.h sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg 2022-08-17 14:23:00 +02:00
idle.c Revert "kernel/sched: Modify initial boot task idle setup" 2023-10-19 23:05:38 +02:00
isolation.c sched/isolation: Reconcile rcu_nocbs= and nohz_full= 2021-05-13 14:12:47 +02:00
loadavg.c sched: Make multiple runqueue task counters 32-bit 2021-05-12 21:34:17 +02:00
membarrier.c sched/membarrier: reduce the ability to hammer on sys_membarrier 2024-02-23 08:55:14 +01:00
pelt.c sched: Fix various typos 2021-03-22 00:11:52 +01:00
pelt.h sched/fair: Fix cfs_rq_clock_pelt() for throttled cfs_rq 2022-06-09 10:22:48 +02:00
psi.c sched/psi: Fix use-after-free in ep_remove_wait_queue() 2023-02-22 12:57:06 +01:00
rt.c sched/rt: Disallow writing invalid values to sched_rt_period_us 2024-03-01 13:21:43 +01:00
sched-pelt.h sched/fair: Fix "runnable_avg_yN_inv" not used warnings 2019-06-17 12:15:58 +02:00
sched.h sched/fair: set_load_weight() must also call reweight_task() for SCHED_IDLE tasks 2024-08-19 05:45:11 +02:00
smp.h sched/headers: Split out open-coded prototypes into kernel/sched/smp.h 2020-05-28 11:03:20 +02:00
stats.c sched: Fix various typos 2021-03-22 00:11:52 +01:00
stats.h sched: Make struct sched_statistics independent of fair sched class 2023-05-11 23:00:34 +09:00
stop_task.c sched: Make struct sched_statistics independent of fair sched class 2023-05-11 23:00:34 +09:00
swait.c sched/swait: Prepare usage in completions 2020-03-21 16:00:23 +01:00
topology.c sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level 2024-06-16 13:39:34 +02:00
wait.c wait: add wake_up_pollfree() 2021-12-14 10:57:15 +01:00
wait_bit.c sched/wait: fix ___wait_var_event(exclusive) 2019-12-17 13:32:50 +01:00