WSL2-Linux-Kernel/include/linux/cpufreq.h

1230 строки
36 KiB
C
Исходник Обычный вид История

/* SPDX-License-Identifier: GPL-2.0-only */
/*
* linux/include/linux/cpufreq.h
*
* Copyright (C) 2001 Russell King
* (C) 2002 - 2003 Dominik Brodowski <linux@brodo.de>
*/
#ifndef _LINUX_CPUFREQ_H
#define _LINUX_CPUFREQ_H
#include <linux/clk.h>
#include <linux/cpu.h>
#include <linux/cpumask.h>
#include <linux/completion.h>
#include <linux/kobject.h>
#include <linux/notifier.h>
#include <linux/of.h>
#include <linux/of_device.h>
#include <linux/pm_opp.h>
cpufreq: Use per-policy frequency QoS Replace the CPU device PM QoS used for the management of min and max frequency constraints in cpufreq (and its users) with per-policy frequency QoS to avoid problems with cpufreq policies covering more then one CPU. Namely, a cpufreq driver is registered with the subsys interface which calls cpufreq_add_dev() for each CPU, starting from CPU0, so currently the PM QoS notifiers are added to the first CPU in the policy (i.e. CPU0 in the majority of cases). In turn, when the cpufreq driver is unregistered, the subsys interface doing that calls cpufreq_remove_dev() for each CPU, starting from CPU0, and the PM QoS notifiers are only removed when cpufreq_remove_dev() is called for the last CPU in the policy, say CPUx, which as a rule is not CPU0 if the policy covers more than one CPU. Then, the PM QoS notifiers cannot be removed, because CPUx does not have them, and they are still there in the device PM QoS notifiers list of CPU0, which prevents new PM QoS notifiers from being registered for CPU0 on the next attempt to register the cpufreq driver. The same issue occurs when the first CPU in the policy goes offline before unregistering the driver. After this change it does not matter which CPU is the policy CPU at the driver registration time and whether or not it is online all the time, because the frequency QoS is per policy and not per CPU. Fixes: 67d874c3b2c6 ("cpufreq: Register notifiers with the PM QoS framework") Reported-by: Dmitry Osipenko <digetx@gmail.com> Tested-by: Dmitry Osipenko <digetx@gmail.com> Reported-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Diagnosed-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lore.kernel.org/linux-pm/5ad2624194baa2f53acc1f1e627eb7684c577a19.1562210705.git.viresh.kumar@linaro.org/T/#md2d89e95906b8c91c15f582146173dce2e86e99f Link: https://lore.kernel.org/linux-pm/20191017094612.6tbkwoq4harsjcqv@vireshk-i7/T/#m30d48cc23b9a80467fbaa16e30f90b3828a5a29b Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-10-16 13:47:06 +03:00
#include <linux/pm_qos.h>
cpufreq: Make sure frequency transitions are serialized Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers should strictly alternate, thereby preventing two different sets of PRECHANGE or POSTCHANGE notifiers from interleaving arbitrarily. The following examples illustrate why this is important: Scenario 1: ----------- A thread reading the value of cpuinfo_cur_freq, will call __cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition() The ondemand governor can decide to change the frequency of the CPU at the same time and hence it can end up sending the notifications via ->target(). If the notifiers are not serialized, the following sequence can occur: - PRECHANGE Notification for freq A (from cpuinfo_cur_freq) - PRECHANGE Notification for freq B (from target()) - Freq changed by target() to B - POSTCHANGE Notification for freq B - POSTCHANGE Notification for freq A We can see from the above that the last POSTCHANGE Notification happens for freq A but the hardware is set to run at freq B. Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback() in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the loops_per_jiffy calculations will get messed up. Scenario 2: ----------- The governor calls __cpufreq_driver_target() to change the frequency. At the same time, if we change scaling_{min|max}_freq from sysfs, it will end up calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call __cpufreq_driver_target(). And hence we end up issuing concurrent calls to ->target(). Typically, platforms have the following logic in their ->target() routines: (Eg: cpufreq-cpu0, omap, exynos, etc) A. If new freq is more than old: Increase voltage B. Change freq C. If new freq is less than old: decrease voltage Now, if the two concurrent calls to ->target() are X and Y, where X is trying to increase the freq and Y is trying to decrease it, we get the following race condition: X.A: voltage gets increased for larger freq Y.A: nothing happens Y.B: freq gets decreased Y.C: voltage gets decreased X.B: freq gets increased X.C: nothing happens Thus we can end up setting a freq which is not supported by the voltage we have set. That will probably make the clock to the CPU unstable and the system might not work properly anymore. This patch introduces a set of synchronization primitives to serialize frequency transitions, which are to be used as shown below: cpufreq_freq_transition_begin(); //Perform the frequency change cpufreq_freq_transition_end(); The _begin() call sends the PRECHANGE notification whereas the _end() call sends the POSTCHANGE notification. Also, all the necessary synchronization is handled within these calls. In particular, even drivers which set the ASYNC_NOTIFICATION flag can also use these APIs for performing frequency transitions (ie., you can call _begin() from one task, and call the corresponding _end() from a different task). The actual synchronization underneath is not that complicated: The key challenge is to allow drivers to begin the transition from one thread and end it in a completely different thread (this is to enable drivers that do asynchronous POSTCHANGE notification from bottom-halves, to also use the same interface). To achieve this, a 'transition_ongoing' flag, a 'transition_lock' spinlock and a wait-queue are added per-policy. The flag and the wait-queue are used in conjunction to create an "uninterrupted flow" from _begin() to _end(). The spinlock is used to ensure that only one such "flow" is in flight at any given time. Put together, this provides us all the necessary synchronization. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-24 12:05:44 +04:00
#include <linux/spinlock.h>
#include <linux/sysfs.h>
/*********************************************************************
* CPUFREQ INTERFACE *
*********************************************************************/
/*
* Frequency values here are CPU kHz
*
* Maximum transition latency is in nanoseconds - if it's unknown,
* CPUFREQ_ETERNAL shall be used.
*/
#define CPUFREQ_ETERNAL (-1)
#define CPUFREQ_NAME_LEN 16
/* Print length for names. Extra 1 space for accommodating '\n' in prints */
#define CPUFREQ_NAME_PLEN (CPUFREQ_NAME_LEN + 1)
struct cpufreq_governor;
enum cpufreq_table_sorting {
CPUFREQ_TABLE_UNSORTED,
CPUFREQ_TABLE_SORTED_ASCENDING,
CPUFREQ_TABLE_SORTED_DESCENDING
};
struct cpufreq_cpuinfo {
unsigned int max_freq;
unsigned int min_freq;
/* in 10^(-9) s = nanoseconds */
unsigned int transition_latency;
};
struct cpufreq_policy {
/* CPUs sharing clock, require sw coordination */
cpumask_var_t cpus; /* Online CPUs only */
cpumask_var_t related_cpus; /* Online + Offline CPUs */
cpufreq: Avoid attempts to create duplicate symbolic links After commit 87549141d516 (cpufreq: Stop migrating sysfs files on hotplug) there is a problem with CPUs that share cpufreq policy objects with other CPUs and are initially offline. Say CPU1 shares a policy with CPU0 which is online and is registered first. As part of the registration process, cpufreq_add_dev() is called for it. It creates the policy object and a symbolic link to it from the CPU1's sysfs directory. If CPU1 is registered subsequently and it is offline at that time, cpufreq_add_dev() will attempt to create a symbolic link to the policy object for it, but that link is present already, so a warning about that will be triggered. To avoid that warning, make cpufreq use an additional CPU mask containing related CPUs that are actually present for each policy object. That mask is initialized when the policy object is populated after its creation (for the first online CPU using it) and it includes CPUs from the "policy CPUs" mask returned by the cpufreq driver's ->init() callback that are physically present at that time. Symbolic links to the policy are created only for the CPUs in that mask. If cpufreq_add_dev() is invoked for an offline CPU, it checks the new mask and only creates the symlink if the CPU was not in it (the CPU is added to the mask at the same time). In turn, cpufreq_remove_dev() drops the given CPU from the new mask, removes its symlink to the policy object and returns, unless it is the CPU owning the policy object. In that case, the policy object is moved to a new CPU's sysfs directory or deleted if the CPU being removed was the last user of the policy. While at it, notice that cpufreq_remove_dev() can't fail, because its return value is ignored, so make it ignore return values from __cpufreq_remove_dev_prepare() and __cpufreq_remove_dev_finish() and prevent these functions from aborting on errors returned by __cpufreq_governor(). Also drop the now unused sif argument from them. Fixes: 87549141d516 (cpufreq: Stop migrating sysfs files on hotplug) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-and-tested-by: Russell King <linux@arm.linux.org.uk> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2015-07-26 03:07:47 +03:00
cpumask_var_t real_cpus; /* Related and present */
unsigned int shared_type; /* ACPI: ANY or ALL affected CPUs
should set cpufreq */
unsigned int cpu; /* cpu managing this policy, must be online */
struct clk *clk;
struct cpufreq_cpuinfo cpuinfo;/* see above */
unsigned int min; /* in kHz */
unsigned int max; /* in kHz */
unsigned int cur; /* in kHz, only needed if cpufreq
* governors are used */
unsigned int suspend_freq; /* freq to set during suspend */
unsigned int policy; /* see above */
unsigned int last_policy; /* policy before unplug */
struct cpufreq_governor *governor; /* see below */
void *governor_data;
char last_governor[CPUFREQ_NAME_LEN]; /* last governor used */
struct work_struct update; /* if update_policy() needs to be
* called, but you're in IRQ context */
cpufreq: Use per-policy frequency QoS Replace the CPU device PM QoS used for the management of min and max frequency constraints in cpufreq (and its users) with per-policy frequency QoS to avoid problems with cpufreq policies covering more then one CPU. Namely, a cpufreq driver is registered with the subsys interface which calls cpufreq_add_dev() for each CPU, starting from CPU0, so currently the PM QoS notifiers are added to the first CPU in the policy (i.e. CPU0 in the majority of cases). In turn, when the cpufreq driver is unregistered, the subsys interface doing that calls cpufreq_remove_dev() for each CPU, starting from CPU0, and the PM QoS notifiers are only removed when cpufreq_remove_dev() is called for the last CPU in the policy, say CPUx, which as a rule is not CPU0 if the policy covers more than one CPU. Then, the PM QoS notifiers cannot be removed, because CPUx does not have them, and they are still there in the device PM QoS notifiers list of CPU0, which prevents new PM QoS notifiers from being registered for CPU0 on the next attempt to register the cpufreq driver. The same issue occurs when the first CPU in the policy goes offline before unregistering the driver. After this change it does not matter which CPU is the policy CPU at the driver registration time and whether or not it is online all the time, because the frequency QoS is per policy and not per CPU. Fixes: 67d874c3b2c6 ("cpufreq: Register notifiers with the PM QoS framework") Reported-by: Dmitry Osipenko <digetx@gmail.com> Tested-by: Dmitry Osipenko <digetx@gmail.com> Reported-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Diagnosed-by: Viresh Kumar <viresh.kumar@linaro.org> Link: https://lore.kernel.org/linux-pm/5ad2624194baa2f53acc1f1e627eb7684c577a19.1562210705.git.viresh.kumar@linaro.org/T/#md2d89e95906b8c91c15f582146173dce2e86e99f Link: https://lore.kernel.org/linux-pm/20191017094612.6tbkwoq4harsjcqv@vireshk-i7/T/#m30d48cc23b9a80467fbaa16e30f90b3828a5a29b Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2019-10-16 13:47:06 +03:00
struct freq_constraints constraints;
struct freq_qos_request *min_freq_req;
struct freq_qos_request *max_freq_req;
struct cpufreq_frequency_table *freq_table;
enum cpufreq_table_sorting freq_table_sorted;
struct list_head policy_list;
struct kobject kobj;
struct completion kobj_unregister;
/*
* The rules for this semaphore:
* - Any routine that wants to read from the policy structure will
* do a down_read on this semaphore.
* - Any routine that will write to the policy structure and/or may take away
* the policy altogether (eg. CPU hotplug), will hold this lock in write
* mode before doing so.
*/
struct rw_semaphore rwsem;
cpufreq: Make sure frequency transitions are serialized Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers should strictly alternate, thereby preventing two different sets of PRECHANGE or POSTCHANGE notifiers from interleaving arbitrarily. The following examples illustrate why this is important: Scenario 1: ----------- A thread reading the value of cpuinfo_cur_freq, will call __cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition() The ondemand governor can decide to change the frequency of the CPU at the same time and hence it can end up sending the notifications via ->target(). If the notifiers are not serialized, the following sequence can occur: - PRECHANGE Notification for freq A (from cpuinfo_cur_freq) - PRECHANGE Notification for freq B (from target()) - Freq changed by target() to B - POSTCHANGE Notification for freq B - POSTCHANGE Notification for freq A We can see from the above that the last POSTCHANGE Notification happens for freq A but the hardware is set to run at freq B. Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback() in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the loops_per_jiffy calculations will get messed up. Scenario 2: ----------- The governor calls __cpufreq_driver_target() to change the frequency. At the same time, if we change scaling_{min|max}_freq from sysfs, it will end up calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call __cpufreq_driver_target(). And hence we end up issuing concurrent calls to ->target(). Typically, platforms have the following logic in their ->target() routines: (Eg: cpufreq-cpu0, omap, exynos, etc) A. If new freq is more than old: Increase voltage B. Change freq C. If new freq is less than old: decrease voltage Now, if the two concurrent calls to ->target() are X and Y, where X is trying to increase the freq and Y is trying to decrease it, we get the following race condition: X.A: voltage gets increased for larger freq Y.A: nothing happens Y.B: freq gets decreased Y.C: voltage gets decreased X.B: freq gets increased X.C: nothing happens Thus we can end up setting a freq which is not supported by the voltage we have set. That will probably make the clock to the CPU unstable and the system might not work properly anymore. This patch introduces a set of synchronization primitives to serialize frequency transitions, which are to be used as shown below: cpufreq_freq_transition_begin(); //Perform the frequency change cpufreq_freq_transition_end(); The _begin() call sends the PRECHANGE notification whereas the _end() call sends the POSTCHANGE notification. Also, all the necessary synchronization is handled within these calls. In particular, even drivers which set the ASYNC_NOTIFICATION flag can also use these APIs for performing frequency transitions (ie., you can call _begin() from one task, and call the corresponding _end() from a different task). The actual synchronization underneath is not that complicated: The key challenge is to allow drivers to begin the transition from one thread and end it in a completely different thread (this is to enable drivers that do asynchronous POSTCHANGE notification from bottom-halves, to also use the same interface). To achieve this, a 'transition_ongoing' flag, a 'transition_lock' spinlock and a wait-queue are added per-policy. The flag and the wait-queue are used in conjunction to create an "uninterrupted flow" from _begin() to _end(). The spinlock is used to ensure that only one such "flow" is in flight at any given time. Put together, this provides us all the necessary synchronization. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-24 12:05:44 +04:00
/*
* Fast switch flags:
* - fast_switch_possible should be set by the driver if it can
* guarantee that frequency can be changed on any CPU sharing the
* policy and that the change will affect all of the policy CPUs then.
* - fast_switch_enabled is to be set by governors that support fast
* frequency switching with the help of cpufreq_enable_fast_switch().
*/
bool fast_switch_possible;
bool fast_switch_enabled;
/*
* Set if the CPUFREQ_GOV_STRICT_TARGET flag is set for the current
* governor.
*/
bool strict_target;
/*
* Set if inefficient frequencies were found in the frequency table.
* This indicates if the relation flag CPUFREQ_RELATION_E can be
* honored.
*/
bool efficiencies_available;
/*
* Preferred average time interval between consecutive invocations of
* the driver to set the frequency for this policy. To be set by the
* scaling driver (0, which is the default, means no preference).
*/
unsigned int transition_delay_us;
/*
* Remote DVFS flag (Not added to the driver structure as we don't want
* to access another structure from scheduler hotpath).
*
* Should be set if CPUs can do DVFS on behalf of other CPUs from
* different cpufreq policies.
*/
bool dvfs_possible_from_any_cpu;
/* Cached frequency lookup from cpufreq_driver_resolve_freq. */
unsigned int cached_target_freq;
unsigned int cached_resolved_idx;
cpufreq: Make sure frequency transitions are serialized Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers should strictly alternate, thereby preventing two different sets of PRECHANGE or POSTCHANGE notifiers from interleaving arbitrarily. The following examples illustrate why this is important: Scenario 1: ----------- A thread reading the value of cpuinfo_cur_freq, will call __cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition() The ondemand governor can decide to change the frequency of the CPU at the same time and hence it can end up sending the notifications via ->target(). If the notifiers are not serialized, the following sequence can occur: - PRECHANGE Notification for freq A (from cpuinfo_cur_freq) - PRECHANGE Notification for freq B (from target()) - Freq changed by target() to B - POSTCHANGE Notification for freq B - POSTCHANGE Notification for freq A We can see from the above that the last POSTCHANGE Notification happens for freq A but the hardware is set to run at freq B. Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback() in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the loops_per_jiffy calculations will get messed up. Scenario 2: ----------- The governor calls __cpufreq_driver_target() to change the frequency. At the same time, if we change scaling_{min|max}_freq from sysfs, it will end up calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call __cpufreq_driver_target(). And hence we end up issuing concurrent calls to ->target(). Typically, platforms have the following logic in their ->target() routines: (Eg: cpufreq-cpu0, omap, exynos, etc) A. If new freq is more than old: Increase voltage B. Change freq C. If new freq is less than old: decrease voltage Now, if the two concurrent calls to ->target() are X and Y, where X is trying to increase the freq and Y is trying to decrease it, we get the following race condition: X.A: voltage gets increased for larger freq Y.A: nothing happens Y.B: freq gets decreased Y.C: voltage gets decreased X.B: freq gets increased X.C: nothing happens Thus we can end up setting a freq which is not supported by the voltage we have set. That will probably make the clock to the CPU unstable and the system might not work properly anymore. This patch introduces a set of synchronization primitives to serialize frequency transitions, which are to be used as shown below: cpufreq_freq_transition_begin(); //Perform the frequency change cpufreq_freq_transition_end(); The _begin() call sends the PRECHANGE notification whereas the _end() call sends the POSTCHANGE notification. Also, all the necessary synchronization is handled within these calls. In particular, even drivers which set the ASYNC_NOTIFICATION flag can also use these APIs for performing frequency transitions (ie., you can call _begin() from one task, and call the corresponding _end() from a different task). The actual synchronization underneath is not that complicated: The key challenge is to allow drivers to begin the transition from one thread and end it in a completely different thread (this is to enable drivers that do asynchronous POSTCHANGE notification from bottom-halves, to also use the same interface). To achieve this, a 'transition_ongoing' flag, a 'transition_lock' spinlock and a wait-queue are added per-policy. The flag and the wait-queue are used in conjunction to create an "uninterrupted flow" from _begin() to _end(). The spinlock is used to ensure that only one such "flow" is in flight at any given time. Put together, this provides us all the necessary synchronization. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-24 12:05:44 +04:00
/* Synchronization for frequency transitions */
bool transition_ongoing; /* Tracks transition status */
spinlock_t transition_lock;
wait_queue_head_t transition_wait;
cpufreq: Catch double invocations of cpufreq_freq_transition_begin/end Some cpufreq drivers were redundantly invoking the _begin() and _end() APIs around frequency transitions, and this double invocation (one from the cpufreq core and the other from the cpufreq driver) used to result in a self-deadlock, leading to system hangs during boot. (The _begin() API makes contending callers wait until the previous invocation is complete. Hence, the cpufreq driver would end up waiting on itself!). Now all such drivers have been fixed, but debugging this issue was not very straight-forward (even lockdep didn't catch this). So let us add a debug infrastructure to the cpufreq core to catch such issues more easily in the future. We add a new field called 'transition_task' to the policy structure, to keep track of the task which is performing the frequency transition. Using this field, we make note of this task during _begin() and print a warning if we find a case where the same task is calling _begin() again, before completing the previous frequency transition using the corresponding _end(). We have left out ASYNC_NOTIFICATION drivers from this debug infrastructure for 2 reasons: 1. At the moment, we have no way to avoid a particular scenario where this debug infrastructure can emit false-positive warnings for such drivers. The scenario is depicted below: Task A Task B /* 1st freq transition */ Invoke _begin() { ... ... } Change the frequency /* 2nd freq transition */ Invoke _begin() { ... //waiting for B to ... //finish _end() for ... //the 1st transition ... | Got interrupt for successful ... | change of frequency (1st one). ... | ... | /* 1st freq transition */ ... | Invoke _end() { ... | ... ... V } ... ... } This scenario is actually deadlock-free because, once Task A changes the frequency, it is Task B's responsibility to invoke the corresponding _end() for the 1st frequency transition. Hence it is perfectly legal for Task A to go ahead and attempt another frequency transition in the meantime. (Of course it won't be able to proceed until Task B finishes the 1st _end(), but this doesn't cause a deadlock or a hang). The debug infrastructure cannot handle this scenario and will treat it as a deadlock and print a warning. To avoid this, we exclude such drivers from the purview of this code. 2. Luckily, we don't _need_ this infrastructure for ASYNC_NOTIFICATION drivers at all! The cpufreq core does not automatically invoke the _begin() and _end() APIs during frequency transitions in such drivers. Thus, the driver alone is responsible for invoking _begin()/_end() and hence there shouldn't be any conflicts which lead to double invocations. So, we can skip these drivers, since the probability that such drivers will hit this problem is extremely low, as outlined above. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-05-05 11:22:39 +04:00
struct task_struct *transition_task; /* Task which is doing the transition */
/* cpufreq-stats */
struct cpufreq_stats *stats;
/* For cpufreq driver's internal use */
void *driver_data;
/* Pointer to the cooling device if used for thermal mitigation */
struct thermal_cooling_device *cdev;
struct notifier_block nb_min;
struct notifier_block nb_max;
};
cpufreq: Avoid creating excessively large stack frames In the process of modifying a cpufreq policy, the cpufreq core makes a copy of it including all of the internals which is stored on the CPU stack. Because struct cpufreq_policy is relatively large, this may cause the size of the stack frame to exceed the 2 KB limit and so the GCC complains when -Wframe-larger-than= is used. In fact, it is not necessary to copy the entire policy structure in order to modify it, however. First, because cpufreq_set_policy() obtains the min and max policy limits from frequency QoS now, it is not necessary to pass the limits to it from the callers. The only things that need to be passed to it from there are the new governor pointer or (if there is a built-in governor in the driver) the "policy" value representing the governor choice. They both can be passed as individual arguments, though, so make cpufreq_set_policy() take them this way and rework its callers accordingly. This avoids making copies of cpufreq policies in the callers of cpufreq_set_policy(). Second, cpufreq_set_policy() still needs to pass the new policy data to the ->verify() callback of the cpufreq driver whose task is to sanitize the min and max policy limits. It still does not need to make a full copy of struct cpufreq_policy for this purpose, but it needs to pass a few items from it to the driver in case they are needed (different drivers have different needs in that respect and all of them have to be covered). For this reason, introduce struct cpufreq_policy_data to hold copies of the members of struct cpufreq_policy used by the existing ->verify() driver callbacks and pass a pointer to a temporary structure of that type to ->verify() (instead of passing a pointer to full struct cpufreq_policy to it). While at it, notice that intel_pstate and longrun don't really need to verify the "policy" value in struct cpufreq_policy, so drop those check from them to avoid copying "policy" into struct cpufreq_policy_data (which allows it to be slightly smaller). Also while at it fix up white space in a couple of places and make cpufreq_set_policy() static (as it can be so). Fixes: 3000ce3c52f8 ("cpufreq: Use per-policy frequency QoS") Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-01-27 01:40:11 +03:00
/*
* Used for passing new cpufreq policy data to the cpufreq driver's ->verify()
* callback for sanitization. That callback is only expected to modify the min
* and max values, if necessary, and specifically it must not update the
* frequency table.
*/
struct cpufreq_policy_data {
struct cpufreq_cpuinfo cpuinfo;
struct cpufreq_frequency_table *freq_table;
unsigned int cpu;
unsigned int min; /* in kHz */
unsigned int max; /* in kHz */
};
struct cpufreq_freqs {
struct cpufreq_policy *policy;
unsigned int old;
unsigned int new;
u8 flags; /* flags of cpufreq_driver, see below. */
};
/* Only for ACPI */
#define CPUFREQ_SHARED_TYPE_NONE (0) /* None */
#define CPUFREQ_SHARED_TYPE_HW (1) /* HW does needed coordination */
#define CPUFREQ_SHARED_TYPE_ALL (2) /* All dependent CPUs should set freq */
#define CPUFREQ_SHARED_TYPE_ANY (3) /* Freq can be set from any dependent CPU*/
#ifdef CONFIG_CPU_FREQ
struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu);
struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu);
void cpufreq_cpu_put(struct cpufreq_policy *policy);
#else
static inline struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu)
{
return NULL;
}
static inline struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu)
{
return NULL;
}
static inline void cpufreq_cpu_put(struct cpufreq_policy *policy) { }
#endif
static inline bool policy_is_inactive(struct cpufreq_policy *policy)
{
return cpumask_empty(policy->cpus);
}
static inline bool policy_is_shared(struct cpufreq_policy *policy)
{
return cpumask_weight(policy->cpus) > 1;
}
#ifdef CONFIG_CPU_FREQ
unsigned int cpufreq_get(unsigned int cpu);
unsigned int cpufreq_quick_get(unsigned int cpu);
unsigned int cpufreq_quick_get_max(unsigned int cpu);
unsigned int cpufreq_get_hw_max_freq(unsigned int cpu);
void disable_cpufreq(void);
u64 get_cpu_idle_time(unsigned int cpu, u64 *wall, int io_busy);
struct cpufreq_policy *cpufreq_cpu_acquire(unsigned int cpu);
void cpufreq_cpu_release(struct cpufreq_policy *policy);
int cpufreq_get_policy(struct cpufreq_policy *policy, unsigned int cpu);
void refresh_frequency_limits(struct cpufreq_policy *policy);
void cpufreq_update_policy(unsigned int cpu);
void cpufreq_update_limits(unsigned int cpu);
bool have_governor_per_policy(void);
bool cpufreq_supports_freq_invariance(void);
struct kobject *get_governor_parent_kobj(struct cpufreq_policy *policy);
void cpufreq_enable_fast_switch(struct cpufreq_policy *policy);
void cpufreq_disable_fast_switch(struct cpufreq_policy *policy);
#else
static inline unsigned int cpufreq_get(unsigned int cpu)
{
return 0;
}
static inline unsigned int cpufreq_quick_get(unsigned int cpu)
{
return 0;
}
static inline unsigned int cpufreq_quick_get_max(unsigned int cpu)
{
return 0;
}
static inline unsigned int cpufreq_get_hw_max_freq(unsigned int cpu)
{
return 0;
}
static inline bool cpufreq_supports_freq_invariance(void)
{
return false;
}
static inline void disable_cpufreq(void) { }
#endif
#ifdef CONFIG_CPU_FREQ_STAT
void cpufreq_stats_create_table(struct cpufreq_policy *policy);
void cpufreq_stats_free_table(struct cpufreq_policy *policy);
void cpufreq_stats_record_transition(struct cpufreq_policy *policy,
unsigned int new_freq);
#else
static inline void cpufreq_stats_create_table(struct cpufreq_policy *policy) { }
static inline void cpufreq_stats_free_table(struct cpufreq_policy *policy) { }
static inline void cpufreq_stats_record_transition(struct cpufreq_policy *policy,
unsigned int new_freq) { }
#endif /* CONFIG_CPU_FREQ_STAT */
/*********************************************************************
* CPUFREQ DRIVER INTERFACE *
*********************************************************************/
#define CPUFREQ_RELATION_L 0 /* lowest frequency at or above target */
#define CPUFREQ_RELATION_H 1 /* highest frequency below or at target */
#define CPUFREQ_RELATION_C 2 /* closest frequency to target */
/* relation flags */
#define CPUFREQ_RELATION_E BIT(2) /* Get if possible an efficient frequency */
#define CPUFREQ_RELATION_LE (CPUFREQ_RELATION_L | CPUFREQ_RELATION_E)
#define CPUFREQ_RELATION_HE (CPUFREQ_RELATION_H | CPUFREQ_RELATION_E)
#define CPUFREQ_RELATION_CE (CPUFREQ_RELATION_C | CPUFREQ_RELATION_E)
struct freq_attr {
struct attribute attr;
ssize_t (*show)(struct cpufreq_policy *, char *);
ssize_t (*store)(struct cpufreq_policy *, const char *, size_t count);
};
#define cpufreq_freq_attr_ro(_name) \
static struct freq_attr _name = \
__ATTR(_name, 0444, show_##_name, NULL)
#define cpufreq_freq_attr_ro_perm(_name, _perm) \
static struct freq_attr _name = \
__ATTR(_name, _perm, show_##_name, NULL)
#define cpufreq_freq_attr_rw(_name) \
static struct freq_attr _name = \
__ATTR(_name, 0644, show_##_name, store_##_name)
#define cpufreq_freq_attr_wo(_name) \
static struct freq_attr _name = \
__ATTR(_name, 0200, NULL, store_##_name)
#define define_one_global_ro(_name) \
static struct kobj_attribute _name = \
__ATTR(_name, 0444, show_##_name, NULL)
#define define_one_global_rw(_name) \
static struct kobj_attribute _name = \
__ATTR(_name, 0644, show_##_name, store_##_name)
struct cpufreq_driver {
char name[CPUFREQ_NAME_LEN];
u16 flags;
void *driver_data;
/* needed by all drivers */
int (*init)(struct cpufreq_policy *policy);
cpufreq: Avoid creating excessively large stack frames In the process of modifying a cpufreq policy, the cpufreq core makes a copy of it including all of the internals which is stored on the CPU stack. Because struct cpufreq_policy is relatively large, this may cause the size of the stack frame to exceed the 2 KB limit and so the GCC complains when -Wframe-larger-than= is used. In fact, it is not necessary to copy the entire policy structure in order to modify it, however. First, because cpufreq_set_policy() obtains the min and max policy limits from frequency QoS now, it is not necessary to pass the limits to it from the callers. The only things that need to be passed to it from there are the new governor pointer or (if there is a built-in governor in the driver) the "policy" value representing the governor choice. They both can be passed as individual arguments, though, so make cpufreq_set_policy() take them this way and rework its callers accordingly. This avoids making copies of cpufreq policies in the callers of cpufreq_set_policy(). Second, cpufreq_set_policy() still needs to pass the new policy data to the ->verify() callback of the cpufreq driver whose task is to sanitize the min and max policy limits. It still does not need to make a full copy of struct cpufreq_policy for this purpose, but it needs to pass a few items from it to the driver in case they are needed (different drivers have different needs in that respect and all of them have to be covered). For this reason, introduce struct cpufreq_policy_data to hold copies of the members of struct cpufreq_policy used by the existing ->verify() driver callbacks and pass a pointer to a temporary structure of that type to ->verify() (instead of passing a pointer to full struct cpufreq_policy to it). While at it, notice that intel_pstate and longrun don't really need to verify the "policy" value in struct cpufreq_policy, so drop those check from them to avoid copying "policy" into struct cpufreq_policy_data (which allows it to be slightly smaller). Also while at it fix up white space in a couple of places and make cpufreq_set_policy() static (as it can be so). Fixes: 3000ce3c52f8 ("cpufreq: Use per-policy frequency QoS") Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-01-27 01:40:11 +03:00
int (*verify)(struct cpufreq_policy_data *policy);
/* define one out of two */
int (*setpolicy)(struct cpufreq_policy *policy);
cpufreq: add support for intermediate (stable) frequencies Douglas Anderson, recently pointed out an interesting problem due to which udelay() was expiring earlier than it should. While transitioning between frequencies few platforms may temporarily switch to a stable frequency, waiting for the main PLL to stabilize. For example: When we transition between very low frequencies on exynos, like between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz. No CPUFREQ notification is sent for that. That means there's a period of time when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz and 300MHz. And so udelay behaves badly. To get this fixed in a generic way, introduce another set of callbacks get_intermediate() and target_intermediate(), only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION unset. get_intermediate() should return a stable intermediate frequency platform wants to switch to, and target_intermediate() should set CPU to that frequency, before jumping to the frequency corresponding to 'index'. Core will take care of sending notifications and driver doesn't have to handle them in target_intermediate() or target_index(). NOTE: ->target_index() should restore to policy->restore_freq in case of failures as core would send notifications for that. Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02 21:19:28 +04:00
int (*target)(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation); /* Deprecated */
int (*target_index)(struct cpufreq_policy *policy,
unsigned int index);
unsigned int (*fast_switch)(struct cpufreq_policy *policy,
unsigned int target_freq);
/*
* ->fast_switch() replacement for drivers that use an internal
* representation of performance levels and can pass hints other than
* the target performance level to the hardware.
*/
void (*adjust_perf)(unsigned int cpu,
unsigned long min_perf,
unsigned long target_perf,
unsigned long capacity);
cpufreq: add support for intermediate (stable) frequencies Douglas Anderson, recently pointed out an interesting problem due to which udelay() was expiring earlier than it should. While transitioning between frequencies few platforms may temporarily switch to a stable frequency, waiting for the main PLL to stabilize. For example: When we transition between very low frequencies on exynos, like between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz. No CPUFREQ notification is sent for that. That means there's a period of time when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz and 300MHz. And so udelay behaves badly. To get this fixed in a generic way, introduce another set of callbacks get_intermediate() and target_intermediate(), only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION unset. get_intermediate() should return a stable intermediate frequency platform wants to switch to, and target_intermediate() should set CPU to that frequency, before jumping to the frequency corresponding to 'index'. Core will take care of sending notifications and driver doesn't have to handle them in target_intermediate() or target_index(). NOTE: ->target_index() should restore to policy->restore_freq in case of failures as core would send notifications for that. Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02 21:19:28 +04:00
/*
* Only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION
* unset.
*
* get_intermediate should return a stable intermediate frequency
* platform wants to switch to and target_intermediate() should set CPU
* to that frequency, before jumping to the frequency corresponding
cpufreq: add support for intermediate (stable) frequencies Douglas Anderson, recently pointed out an interesting problem due to which udelay() was expiring earlier than it should. While transitioning between frequencies few platforms may temporarily switch to a stable frequency, waiting for the main PLL to stabilize. For example: When we transition between very low frequencies on exynos, like between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz. No CPUFREQ notification is sent for that. That means there's a period of time when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz and 300MHz. And so udelay behaves badly. To get this fixed in a generic way, introduce another set of callbacks get_intermediate() and target_intermediate(), only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION unset. get_intermediate() should return a stable intermediate frequency platform wants to switch to, and target_intermediate() should set CPU to that frequency, before jumping to the frequency corresponding to 'index'. Core will take care of sending notifications and driver doesn't have to handle them in target_intermediate() or target_index(). NOTE: ->target_index() should restore to policy->restore_freq in case of failures as core would send notifications for that. Tested-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Doug Anderson <dianders@chromium.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-06-02 21:19:28 +04:00
* to 'index'. Core will take care of sending notifications and driver
* doesn't have to handle them in target_intermediate() or
* target_index().
*
* Drivers can return '0' from get_intermediate() in case they don't
* wish to switch to intermediate frequency for some target frequency.
* In that case core will directly call ->target_index().
*/
unsigned int (*get_intermediate)(struct cpufreq_policy *policy,
unsigned int index);
int (*target_intermediate)(struct cpufreq_policy *policy,
unsigned int index);
/* should be defined, if possible */
unsigned int (*get)(unsigned int cpu);
/* Called to update policy limits on firmware notifications. */
void (*update_limits)(unsigned int cpu);
/* optional */
int (*bios_limit)(int cpu, unsigned int *limit);
int (*online)(struct cpufreq_policy *policy);
int (*offline)(struct cpufreq_policy *policy);
int (*exit)(struct cpufreq_policy *policy);
int (*suspend)(struct cpufreq_policy *policy);
int (*resume)(struct cpufreq_policy *policy);
/* Will be called after the driver is fully initialized */
void (*ready)(struct cpufreq_policy *policy);
struct freq_attr **attr;
/* platform specific boost support code */
bool boost_enabled;
int (*set_boost)(struct cpufreq_policy *policy, int state);
/*
* Set by drivers that want to register with the energy model after the
* policy is properly initialized, but before the governor is started.
*/
void (*register_em)(struct cpufreq_policy *policy);
};
/* flags */
cpufreq: Remove CPUFREQ_STICKY flag During cpufreq driver's registration, if the ->init() callback for all the CPUs fail then there is not much point in keeping the driver around as it will only account for more of unnecessary noise, for example cpufreq core will try to suspend/resume the driver which never got registered properly. The removal of such a driver is avoided if the driver carries the CPUFREQ_STICKY flag. This was added way back [1] in 2004 and perhaps no one should ever need it now. A lot of drivers do set this flag, probably because they just copied it from other drivers. This was added earlier for some platforms [2] because their cpufreq drivers were getting registered before the CPUs were registered with subsys framework. And hence they used to fail. The same isn't true anymore though. The current code flow in the kernel is: start_kernel() -> kernel_init() -> kernel_init_freeable() -> do_basic_setup() -> driver_init() -> cpu_dev_init() -> subsys_system_register() //For CPUs -> do_initcalls() -> cpufreq_register_driver() Clearly, the CPUs will always get registered with subsys framework before any cpufreq driver can get probed. Remove the flag and update the relevant drivers. Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/include/linux/cpufreq.h?id=7cc9f0d9a1ab04cedc60d64fd8dcf7df224a3b4d # [1] Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/arch/arm/mach-sa1100/cpu-sa1100.c?id=f59d3bbe35f6268d729f51be82af8325d62f20f5 # [2] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-02 07:55:11 +03:00
/*
* Set by drivers that need to update internal upper and lower boundaries along
cpufreq: Remove CPUFREQ_STICKY flag During cpufreq driver's registration, if the ->init() callback for all the CPUs fail then there is not much point in keeping the driver around as it will only account for more of unnecessary noise, for example cpufreq core will try to suspend/resume the driver which never got registered properly. The removal of such a driver is avoided if the driver carries the CPUFREQ_STICKY flag. This was added way back [1] in 2004 and perhaps no one should ever need it now. A lot of drivers do set this flag, probably because they just copied it from other drivers. This was added earlier for some platforms [2] because their cpufreq drivers were getting registered before the CPUs were registered with subsys framework. And hence they used to fail. The same isn't true anymore though. The current code flow in the kernel is: start_kernel() -> kernel_init() -> kernel_init_freeable() -> do_basic_setup() -> driver_init() -> cpu_dev_init() -> subsys_system_register() //For CPUs -> do_initcalls() -> cpufreq_register_driver() Clearly, the CPUs will always get registered with subsys framework before any cpufreq driver can get probed. Remove the flag and update the relevant drivers. Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/include/linux/cpufreq.h?id=7cc9f0d9a1ab04cedc60d64fd8dcf7df224a3b4d # [1] Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/arch/arm/mach-sa1100/cpu-sa1100.c?id=f59d3bbe35f6268d729f51be82af8325d62f20f5 # [2] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-02-02 07:55:11 +03:00
* with the target frequency and so the core and governors should also invoke
* the diver if the target frequency does not change, but the policy min or max
* may have changed.
*/
#define CPUFREQ_NEED_UPDATE_LIMITS BIT(0)
/* loops_per_jiffy or other kernel "constants" aren't affected by frequency transitions */
#define CPUFREQ_CONST_LOOPS BIT(1)
/*
* Set by drivers that want the core to automatically register the cpufreq
* driver as a thermal cooling device.
*/
#define CPUFREQ_IS_COOLING_DEV BIT(2)
/*
* This should be set by platforms having multiple clock-domains, i.e.
* supporting multiple policies. With this sysfs directories of governor would
* be created in cpu/cpu<num>/cpufreq/ directory and so they can use the same
* governor with different tunables for different clusters.
*/
#define CPUFREQ_HAVE_GOVERNOR_PER_POLICY BIT(3)
/*
* Driver will do POSTCHANGE notifications from outside of their ->target()
* routine and so must set cpufreq_driver->flags with this flag, so that core
* can handle them specially.
*/
#define CPUFREQ_ASYNC_NOTIFICATION BIT(4)
/*
* Set by drivers which want cpufreq core to check if CPU is running at a
* frequency present in freq-table exposed by the driver. For these drivers if
* CPU is found running at an out of table freq, we will try to set it to a freq
* from the table. And if that fails, we will stop further boot process by
* issuing a BUG_ON().
*/
#define CPUFREQ_NEED_INITIAL_FREQ_CHECK BIT(5)
/*
* Set by drivers to disallow use of governors with "dynamic_switching" flag
* set.
*/
#define CPUFREQ_NO_AUTO_DYNAMIC_SWITCHING BIT(6)
int cpufreq_register_driver(struct cpufreq_driver *driver_data);
int cpufreq_unregister_driver(struct cpufreq_driver *driver_data);
bool cpufreq_driver_test_flags(u16 flags);
const char *cpufreq_get_current_driver(void);
void *cpufreq_get_driver_data(void);
static inline int cpufreq_thermal_control_enabled(struct cpufreq_driver *drv)
{
return IS_ENABLED(CONFIG_CPU_THERMAL) &&
(drv->flags & CPUFREQ_IS_COOLING_DEV);
}
cpufreq: Avoid creating excessively large stack frames In the process of modifying a cpufreq policy, the cpufreq core makes a copy of it including all of the internals which is stored on the CPU stack. Because struct cpufreq_policy is relatively large, this may cause the size of the stack frame to exceed the 2 KB limit and so the GCC complains when -Wframe-larger-than= is used. In fact, it is not necessary to copy the entire policy structure in order to modify it, however. First, because cpufreq_set_policy() obtains the min and max policy limits from frequency QoS now, it is not necessary to pass the limits to it from the callers. The only things that need to be passed to it from there are the new governor pointer or (if there is a built-in governor in the driver) the "policy" value representing the governor choice. They both can be passed as individual arguments, though, so make cpufreq_set_policy() take them this way and rework its callers accordingly. This avoids making copies of cpufreq policies in the callers of cpufreq_set_policy(). Second, cpufreq_set_policy() still needs to pass the new policy data to the ->verify() callback of the cpufreq driver whose task is to sanitize the min and max policy limits. It still does not need to make a full copy of struct cpufreq_policy for this purpose, but it needs to pass a few items from it to the driver in case they are needed (different drivers have different needs in that respect and all of them have to be covered). For this reason, introduce struct cpufreq_policy_data to hold copies of the members of struct cpufreq_policy used by the existing ->verify() driver callbacks and pass a pointer to a temporary structure of that type to ->verify() (instead of passing a pointer to full struct cpufreq_policy to it). While at it, notice that intel_pstate and longrun don't really need to verify the "policy" value in struct cpufreq_policy, so drop those check from them to avoid copying "policy" into struct cpufreq_policy_data (which allows it to be slightly smaller). Also while at it fix up white space in a couple of places and make cpufreq_set_policy() static (as it can be so). Fixes: 3000ce3c52f8 ("cpufreq: Use per-policy frequency QoS") Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-01-27 01:40:11 +03:00
static inline void cpufreq_verify_within_limits(struct cpufreq_policy_data *policy,
unsigned int min,
unsigned int max)
{
if (policy->min < min)
policy->min = min;
if (policy->max < min)
policy->max = min;
if (policy->min > max)
policy->min = max;
if (policy->max > max)
policy->max = max;
if (policy->min > policy->max)
policy->min = policy->max;
return;
}
static inline void
cpufreq: Avoid creating excessively large stack frames In the process of modifying a cpufreq policy, the cpufreq core makes a copy of it including all of the internals which is stored on the CPU stack. Because struct cpufreq_policy is relatively large, this may cause the size of the stack frame to exceed the 2 KB limit and so the GCC complains when -Wframe-larger-than= is used. In fact, it is not necessary to copy the entire policy structure in order to modify it, however. First, because cpufreq_set_policy() obtains the min and max policy limits from frequency QoS now, it is not necessary to pass the limits to it from the callers. The only things that need to be passed to it from there are the new governor pointer or (if there is a built-in governor in the driver) the "policy" value representing the governor choice. They both can be passed as individual arguments, though, so make cpufreq_set_policy() take them this way and rework its callers accordingly. This avoids making copies of cpufreq policies in the callers of cpufreq_set_policy(). Second, cpufreq_set_policy() still needs to pass the new policy data to the ->verify() callback of the cpufreq driver whose task is to sanitize the min and max policy limits. It still does not need to make a full copy of struct cpufreq_policy for this purpose, but it needs to pass a few items from it to the driver in case they are needed (different drivers have different needs in that respect and all of them have to be covered). For this reason, introduce struct cpufreq_policy_data to hold copies of the members of struct cpufreq_policy used by the existing ->verify() driver callbacks and pass a pointer to a temporary structure of that type to ->verify() (instead of passing a pointer to full struct cpufreq_policy to it). While at it, notice that intel_pstate and longrun don't really need to verify the "policy" value in struct cpufreq_policy, so drop those check from them to avoid copying "policy" into struct cpufreq_policy_data (which allows it to be slightly smaller). Also while at it fix up white space in a couple of places and make cpufreq_set_policy() static (as it can be so). Fixes: 3000ce3c52f8 ("cpufreq: Use per-policy frequency QoS") Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-01-27 01:40:11 +03:00
cpufreq_verify_within_cpu_limits(struct cpufreq_policy_data *policy)
{
cpufreq_verify_within_limits(policy, policy->cpuinfo.min_freq,
cpufreq: Avoid creating excessively large stack frames In the process of modifying a cpufreq policy, the cpufreq core makes a copy of it including all of the internals which is stored on the CPU stack. Because struct cpufreq_policy is relatively large, this may cause the size of the stack frame to exceed the 2 KB limit and so the GCC complains when -Wframe-larger-than= is used. In fact, it is not necessary to copy the entire policy structure in order to modify it, however. First, because cpufreq_set_policy() obtains the min and max policy limits from frequency QoS now, it is not necessary to pass the limits to it from the callers. The only things that need to be passed to it from there are the new governor pointer or (if there is a built-in governor in the driver) the "policy" value representing the governor choice. They both can be passed as individual arguments, though, so make cpufreq_set_policy() take them this way and rework its callers accordingly. This avoids making copies of cpufreq policies in the callers of cpufreq_set_policy(). Second, cpufreq_set_policy() still needs to pass the new policy data to the ->verify() callback of the cpufreq driver whose task is to sanitize the min and max policy limits. It still does not need to make a full copy of struct cpufreq_policy for this purpose, but it needs to pass a few items from it to the driver in case they are needed (different drivers have different needs in that respect and all of them have to be covered). For this reason, introduce struct cpufreq_policy_data to hold copies of the members of struct cpufreq_policy used by the existing ->verify() driver callbacks and pass a pointer to a temporary structure of that type to ->verify() (instead of passing a pointer to full struct cpufreq_policy to it). While at it, notice that intel_pstate and longrun don't really need to verify the "policy" value in struct cpufreq_policy, so drop those check from them to avoid copying "policy" into struct cpufreq_policy_data (which allows it to be slightly smaller). Also while at it fix up white space in a couple of places and make cpufreq_set_policy() static (as it can be so). Fixes: 3000ce3c52f8 ("cpufreq: Use per-policy frequency QoS") Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-01-27 01:40:11 +03:00
policy->cpuinfo.max_freq);
}
#ifdef CONFIG_CPU_FREQ
void cpufreq_suspend(void);
void cpufreq_resume(void);
int cpufreq_generic_suspend(struct cpufreq_policy *policy);
#else
static inline void cpufreq_suspend(void) {}
static inline void cpufreq_resume(void) {}
#endif
/*********************************************************************
* CPUFREQ NOTIFIER INTERFACE *
*********************************************************************/
#define CPUFREQ_TRANSITION_NOTIFIER (0)
#define CPUFREQ_POLICY_NOTIFIER (1)
/* Transition notifiers */
#define CPUFREQ_PRECHANGE (0)
#define CPUFREQ_POSTCHANGE (1)
/* Policy Notifiers */
#define CPUFREQ_CREATE_POLICY (0)
#define CPUFREQ_REMOVE_POLICY (1)
#ifdef CONFIG_CPU_FREQ
int cpufreq_register_notifier(struct notifier_block *nb, unsigned int list);
int cpufreq_unregister_notifier(struct notifier_block *nb, unsigned int list);
cpufreq: Make sure frequency transitions are serialized Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers should strictly alternate, thereby preventing two different sets of PRECHANGE or POSTCHANGE notifiers from interleaving arbitrarily. The following examples illustrate why this is important: Scenario 1: ----------- A thread reading the value of cpuinfo_cur_freq, will call __cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition() The ondemand governor can decide to change the frequency of the CPU at the same time and hence it can end up sending the notifications via ->target(). If the notifiers are not serialized, the following sequence can occur: - PRECHANGE Notification for freq A (from cpuinfo_cur_freq) - PRECHANGE Notification for freq B (from target()) - Freq changed by target() to B - POSTCHANGE Notification for freq B - POSTCHANGE Notification for freq A We can see from the above that the last POSTCHANGE Notification happens for freq A but the hardware is set to run at freq B. Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback() in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the loops_per_jiffy calculations will get messed up. Scenario 2: ----------- The governor calls __cpufreq_driver_target() to change the frequency. At the same time, if we change scaling_{min|max}_freq from sysfs, it will end up calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call __cpufreq_driver_target(). And hence we end up issuing concurrent calls to ->target(). Typically, platforms have the following logic in their ->target() routines: (Eg: cpufreq-cpu0, omap, exynos, etc) A. If new freq is more than old: Increase voltage B. Change freq C. If new freq is less than old: decrease voltage Now, if the two concurrent calls to ->target() are X and Y, where X is trying to increase the freq and Y is trying to decrease it, we get the following race condition: X.A: voltage gets increased for larger freq Y.A: nothing happens Y.B: freq gets decreased Y.C: voltage gets decreased X.B: freq gets increased X.C: nothing happens Thus we can end up setting a freq which is not supported by the voltage we have set. That will probably make the clock to the CPU unstable and the system might not work properly anymore. This patch introduces a set of synchronization primitives to serialize frequency transitions, which are to be used as shown below: cpufreq_freq_transition_begin(); //Perform the frequency change cpufreq_freq_transition_end(); The _begin() call sends the PRECHANGE notification whereas the _end() call sends the POSTCHANGE notification. Also, all the necessary synchronization is handled within these calls. In particular, even drivers which set the ASYNC_NOTIFICATION flag can also use these APIs for performing frequency transitions (ie., you can call _begin() from one task, and call the corresponding _end() from a different task). The actual synchronization underneath is not that complicated: The key challenge is to allow drivers to begin the transition from one thread and end it in a completely different thread (this is to enable drivers that do asynchronous POSTCHANGE notification from bottom-halves, to also use the same interface). To achieve this, a 'transition_ongoing' flag, a 'transition_lock' spinlock and a wait-queue are added per-policy. The flag and the wait-queue are used in conjunction to create an "uninterrupted flow" from _begin() to _end(). The spinlock is used to ensure that only one such "flow" is in flight at any given time. Put together, this provides us all the necessary synchronization. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-03-24 12:05:44 +04:00
void cpufreq_freq_transition_begin(struct cpufreq_policy *policy,
struct cpufreq_freqs *freqs);
void cpufreq_freq_transition_end(struct cpufreq_policy *policy,
struct cpufreq_freqs *freqs, int transition_failed);
#else /* CONFIG_CPU_FREQ */
static inline int cpufreq_register_notifier(struct notifier_block *nb,
unsigned int list)
{
return 0;
}
static inline int cpufreq_unregister_notifier(struct notifier_block *nb,
unsigned int list)
{
return 0;
}
#endif /* !CONFIG_CPU_FREQ */
/**
* cpufreq_scale - "old * mult / div" calculation for large values (32-bit-arch
* safe)
* @old: old value
* @div: divisor
* @mult: multiplier
*
*
* new = old * mult / div
*/
static inline unsigned long cpufreq_scale(unsigned long old, u_int div,
u_int mult)
{
#if BITS_PER_LONG == 32
u64 result = ((u64) old) * ((u64) mult);
do_div(result, div);
return (unsigned long) result;
#elif BITS_PER_LONG == 64
unsigned long result = old * ((u64) mult);
result /= div;
return result;
#endif
}
/*********************************************************************
* CPUFREQ GOVERNORS *
*********************************************************************/
cpufreq: Avoid creating excessively large stack frames In the process of modifying a cpufreq policy, the cpufreq core makes a copy of it including all of the internals which is stored on the CPU stack. Because struct cpufreq_policy is relatively large, this may cause the size of the stack frame to exceed the 2 KB limit and so the GCC complains when -Wframe-larger-than= is used. In fact, it is not necessary to copy the entire policy structure in order to modify it, however. First, because cpufreq_set_policy() obtains the min and max policy limits from frequency QoS now, it is not necessary to pass the limits to it from the callers. The only things that need to be passed to it from there are the new governor pointer or (if there is a built-in governor in the driver) the "policy" value representing the governor choice. They both can be passed as individual arguments, though, so make cpufreq_set_policy() take them this way and rework its callers accordingly. This avoids making copies of cpufreq policies in the callers of cpufreq_set_policy(). Second, cpufreq_set_policy() still needs to pass the new policy data to the ->verify() callback of the cpufreq driver whose task is to sanitize the min and max policy limits. It still does not need to make a full copy of struct cpufreq_policy for this purpose, but it needs to pass a few items from it to the driver in case they are needed (different drivers have different needs in that respect and all of them have to be covered). For this reason, introduce struct cpufreq_policy_data to hold copies of the members of struct cpufreq_policy used by the existing ->verify() driver callbacks and pass a pointer to a temporary structure of that type to ->verify() (instead of passing a pointer to full struct cpufreq_policy to it). While at it, notice that intel_pstate and longrun don't really need to verify the "policy" value in struct cpufreq_policy, so drop those check from them to avoid copying "policy" into struct cpufreq_policy_data (which allows it to be slightly smaller). Also while at it fix up white space in a couple of places and make cpufreq_set_policy() static (as it can be so). Fixes: 3000ce3c52f8 ("cpufreq: Use per-policy frequency QoS") Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-01-27 01:40:11 +03:00
#define CPUFREQ_POLICY_UNKNOWN (0)
/*
* If (cpufreq_driver->target) exists, the ->governor decides what frequency
* within the limits is used. If (cpufreq_driver->setpolicy> exists, these
* two generic policies are available:
*/
#define CPUFREQ_POLICY_POWERSAVE (1)
#define CPUFREQ_POLICY_PERFORMANCE (2)
/*
* The polling frequency depends on the capability of the processor. Default
* polling frequency is 1000 times the transition latency of the processor. The
* ondemand governor will work on any processor with transition latency <= 10ms,
* using appropriate sampling rate.
*/
#define LATENCY_MULTIPLIER (1000)
struct cpufreq_governor {
char name[CPUFREQ_NAME_LEN];
int (*init)(struct cpufreq_policy *policy);
void (*exit)(struct cpufreq_policy *policy);
int (*start)(struct cpufreq_policy *policy);
void (*stop)(struct cpufreq_policy *policy);
void (*limits)(struct cpufreq_policy *policy);
ssize_t (*show_setspeed) (struct cpufreq_policy *policy,
char *buf);
int (*store_setspeed) (struct cpufreq_policy *policy,
unsigned int freq);
struct list_head governor_list;
struct module *owner;
u8 flags;
};
/* Governor flags */
/* For governors which change frequency dynamically by themselves */
#define CPUFREQ_GOV_DYNAMIC_SWITCHING BIT(0)
/* For governors wanting the target frequency to be set exactly */
#define CPUFREQ_GOV_STRICT_TARGET BIT(1)
/* Pass a target to the cpufreq driver */
unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
unsigned int target_freq);
void cpufreq_driver_adjust_perf(unsigned int cpu,
unsigned long min_perf,
unsigned long target_perf,
unsigned long capacity);
bool cpufreq_driver_has_adjust_perf(void);
int cpufreq_driver_target(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation);
int __cpufreq_driver_target(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation);
unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
unsigned int target_freq);
unsigned int cpufreq_policy_transition_delay_us(struct cpufreq_policy *policy);
int cpufreq_register_governor(struct cpufreq_governor *governor);
void cpufreq_unregister_governor(struct cpufreq_governor *governor);
cpufreq: intel_pstate: Implement passive mode with HWP enabled Allow intel_pstate to work in the passive mode with HWP enabled and make it set the HWP minimum performance limit (HWP floor) to the P-state value given by the target frequency supplied by the cpufreq governor, so as to prevent the HWP algorithm and the CPU scheduler from working against each other, at least when the schedutil governor is in use, and update the intel_pstate documentation accordingly. Among other things, this allows utilization clamps to be taken into account, at least to a certain extent, when intel_pstate is in use and makes it more likely that sufficient capacity for deadline tasks will be provided. After this change, the resulting behavior of an HWP system with intel_pstate in the passive mode should be close to the behavior of the analogous non-HWP system with intel_pstate in the passive mode, except that the HWP algorithm is generally allowed to make the CPU run at a frequency above the floor P-state set by intel_pstate in the entire available range of P-states, while without HWP a CPU can run in a P-state above the requested one if the latter falls into the range of turbo P-states (referred to as the turbo range) or if the P-states of all CPUs in one package are coordinated with each other at the hardware level. [Note that in principle the HWP floor may not be taken into account by the processor if it falls into the turbo range, in which case the processor has a license to choose any P-state, either below or above the HWP floor, just like a non-HWP processor in the case when the target P-state falls into the turbo range.] With this change applied, intel_pstate in the passive mode assumes complete control over the HWP request MSR and concurrent changes of that MSR (eg. via the direct MSR access interface) are overridden by it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2020-08-06 15:03:55 +03:00
int cpufreq_start_governor(struct cpufreq_policy *policy);
void cpufreq_stop_governor(struct cpufreq_policy *policy);
#define cpufreq_governor_init(__governor) \
static int __init __governor##_init(void) \
{ \
return cpufreq_register_governor(&__governor); \
} \
core_initcall(__governor##_init)
#define cpufreq_governor_exit(__governor) \
static void __exit __governor##_exit(void) \
{ \
return cpufreq_unregister_governor(&__governor); \
} \
module_exit(__governor##_exit)
struct cpufreq_governor *cpufreq_default_governor(void);
struct cpufreq_governor *cpufreq_fallback_governor(void);
static inline void cpufreq_policy_apply_limits(struct cpufreq_policy *policy)
{
if (policy->max < policy->cur)
__cpufreq_driver_target(policy, policy->max,
CPUFREQ_RELATION_HE);
else if (policy->min > policy->cur)
__cpufreq_driver_target(policy, policy->min,
CPUFREQ_RELATION_LE);
}
/* Governor attribute set */
struct gov_attr_set {
struct kobject kobj;
struct list_head policy_list;
struct mutex update_lock;
int usage_count;
};
/* sysfs ops for cpufreq governors */
extern const struct sysfs_ops governor_sysfs_ops;
static inline struct gov_attr_set *to_gov_attr_set(struct kobject *kobj)
{
return container_of(kobj, struct gov_attr_set, kobj);
}
void gov_attr_set_init(struct gov_attr_set *attr_set, struct list_head *list_node);
void gov_attr_set_get(struct gov_attr_set *attr_set, struct list_head *list_node);
unsigned int gov_attr_set_put(struct gov_attr_set *attr_set, struct list_head *list_node);
/* Governor sysfs attribute */
struct governor_attr {
struct attribute attr;
ssize_t (*show)(struct gov_attr_set *attr_set, char *buf);
ssize_t (*store)(struct gov_attr_set *attr_set, const char *buf,
size_t count);
};
/*********************************************************************
* FREQUENCY TABLE HELPERS *
*********************************************************************/
/* Special Values of .frequency field */
#define CPUFREQ_ENTRY_INVALID ~0u
#define CPUFREQ_TABLE_END ~1u
/* Special Values of .flags field */
#define CPUFREQ_BOOST_FREQ (1 << 0)
#define CPUFREQ_INEFFICIENT_FREQ (1 << 1)
struct cpufreq_frequency_table {
unsigned int flags;
unsigned int driver_data; /* driver specific data, not used by core */
unsigned int frequency; /* kHz - doesn't need to be in ascending
* order */
};
#if defined(CONFIG_CPU_FREQ) && defined(CONFIG_PM_OPP)
int dev_pm_opp_init_cpufreq_table(struct device *dev,
struct cpufreq_frequency_table **table);
void dev_pm_opp_free_cpufreq_table(struct device *dev,
struct cpufreq_frequency_table **table);
#else
static inline int dev_pm_opp_init_cpufreq_table(struct device *dev,
struct cpufreq_frequency_table
**table)
{
return -EINVAL;
}
static inline void dev_pm_opp_free_cpufreq_table(struct device *dev,
struct cpufreq_frequency_table
**table)
{
}
#endif
/*
* cpufreq_for_each_entry - iterate over a cpufreq_frequency_table
* @pos: the cpufreq_frequency_table * to use as a loop cursor.
* @table: the cpufreq_frequency_table * to iterate over.
*/
#define cpufreq_for_each_entry(pos, table) \
for (pos = table; pos->frequency != CPUFREQ_TABLE_END; pos++)
/*
* cpufreq_for_each_entry_idx - iterate over a cpufreq_frequency_table
* with index
* @pos: the cpufreq_frequency_table * to use as a loop cursor.
* @table: the cpufreq_frequency_table * to iterate over.
* @idx: the table entry currently being processed
*/
#define cpufreq_for_each_entry_idx(pos, table, idx) \
for (pos = table, idx = 0; pos->frequency != CPUFREQ_TABLE_END; \
pos++, idx++)
/*
* cpufreq_for_each_valid_entry - iterate over a cpufreq_frequency_table
* excluding CPUFREQ_ENTRY_INVALID frequencies.
* @pos: the cpufreq_frequency_table * to use as a loop cursor.
* @table: the cpufreq_frequency_table * to iterate over.
*/
#define cpufreq_for_each_valid_entry(pos, table) \
for (pos = table; pos->frequency != CPUFREQ_TABLE_END; pos++) \
if (pos->frequency == CPUFREQ_ENTRY_INVALID) \
continue; \
else
/*
* cpufreq_for_each_valid_entry_idx - iterate with index over a cpufreq
* frequency_table excluding CPUFREQ_ENTRY_INVALID frequencies.
* @pos: the cpufreq_frequency_table * to use as a loop cursor.
* @table: the cpufreq_frequency_table * to iterate over.
* @idx: the table entry currently being processed
*/
#define cpufreq_for_each_valid_entry_idx(pos, table, idx) \
cpufreq_for_each_entry_idx(pos, table, idx) \
if (pos->frequency == CPUFREQ_ENTRY_INVALID) \
continue; \
else
/**
* cpufreq_for_each_efficient_entry_idx - iterate with index over a cpufreq
* frequency_table excluding CPUFREQ_ENTRY_INVALID and
* CPUFREQ_INEFFICIENT_FREQ frequencies.
* @pos: the &struct cpufreq_frequency_table to use as a loop cursor.
* @table: the &struct cpufreq_frequency_table to iterate over.
* @idx: the table entry currently being processed.
* @efficiencies: set to true to only iterate over efficient frequencies.
*/
#define cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies) \
cpufreq_for_each_valid_entry_idx(pos, table, idx) \
if (efficiencies && (pos->flags & CPUFREQ_INEFFICIENT_FREQ)) \
continue; \
else
int cpufreq_frequency_table_cpuinfo(struct cpufreq_policy *policy,
struct cpufreq_frequency_table *table);
cpufreq: Avoid creating excessively large stack frames In the process of modifying a cpufreq policy, the cpufreq core makes a copy of it including all of the internals which is stored on the CPU stack. Because struct cpufreq_policy is relatively large, this may cause the size of the stack frame to exceed the 2 KB limit and so the GCC complains when -Wframe-larger-than= is used. In fact, it is not necessary to copy the entire policy structure in order to modify it, however. First, because cpufreq_set_policy() obtains the min and max policy limits from frequency QoS now, it is not necessary to pass the limits to it from the callers. The only things that need to be passed to it from there are the new governor pointer or (if there is a built-in governor in the driver) the "policy" value representing the governor choice. They both can be passed as individual arguments, though, so make cpufreq_set_policy() take them this way and rework its callers accordingly. This avoids making copies of cpufreq policies in the callers of cpufreq_set_policy(). Second, cpufreq_set_policy() still needs to pass the new policy data to the ->verify() callback of the cpufreq driver whose task is to sanitize the min and max policy limits. It still does not need to make a full copy of struct cpufreq_policy for this purpose, but it needs to pass a few items from it to the driver in case they are needed (different drivers have different needs in that respect and all of them have to be covered). For this reason, introduce struct cpufreq_policy_data to hold copies of the members of struct cpufreq_policy used by the existing ->verify() driver callbacks and pass a pointer to a temporary structure of that type to ->verify() (instead of passing a pointer to full struct cpufreq_policy to it). While at it, notice that intel_pstate and longrun don't really need to verify the "policy" value in struct cpufreq_policy, so drop those check from them to avoid copying "policy" into struct cpufreq_policy_data (which allows it to be slightly smaller). Also while at it fix up white space in a couple of places and make cpufreq_set_policy() static (as it can be so). Fixes: 3000ce3c52f8 ("cpufreq: Use per-policy frequency QoS") Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-01-27 01:40:11 +03:00
int cpufreq_frequency_table_verify(struct cpufreq_policy_data *policy,
struct cpufreq_frequency_table *table);
cpufreq: Avoid creating excessively large stack frames In the process of modifying a cpufreq policy, the cpufreq core makes a copy of it including all of the internals which is stored on the CPU stack. Because struct cpufreq_policy is relatively large, this may cause the size of the stack frame to exceed the 2 KB limit and so the GCC complains when -Wframe-larger-than= is used. In fact, it is not necessary to copy the entire policy structure in order to modify it, however. First, because cpufreq_set_policy() obtains the min and max policy limits from frequency QoS now, it is not necessary to pass the limits to it from the callers. The only things that need to be passed to it from there are the new governor pointer or (if there is a built-in governor in the driver) the "policy" value representing the governor choice. They both can be passed as individual arguments, though, so make cpufreq_set_policy() take them this way and rework its callers accordingly. This avoids making copies of cpufreq policies in the callers of cpufreq_set_policy(). Second, cpufreq_set_policy() still needs to pass the new policy data to the ->verify() callback of the cpufreq driver whose task is to sanitize the min and max policy limits. It still does not need to make a full copy of struct cpufreq_policy for this purpose, but it needs to pass a few items from it to the driver in case they are needed (different drivers have different needs in that respect and all of them have to be covered). For this reason, introduce struct cpufreq_policy_data to hold copies of the members of struct cpufreq_policy used by the existing ->verify() driver callbacks and pass a pointer to a temporary structure of that type to ->verify() (instead of passing a pointer to full struct cpufreq_policy to it). While at it, notice that intel_pstate and longrun don't really need to verify the "policy" value in struct cpufreq_policy, so drop those check from them to avoid copying "policy" into struct cpufreq_policy_data (which allows it to be slightly smaller). Also while at it fix up white space in a couple of places and make cpufreq_set_policy() static (as it can be so). Fixes: 3000ce3c52f8 ("cpufreq: Use per-policy frequency QoS") Link: https://lore.kernel.org/linux-pm/CAMuHMdX6-jb1W8uC2_237m8ctCpsnGp=JCxqt8pCWVqNXHmkVg@mail.gmail.com Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: 5.4+ <stable@vger.kernel.org> # 5.4+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2020-01-27 01:40:11 +03:00
int cpufreq_generic_frequency_table_verify(struct cpufreq_policy_data *policy);
int cpufreq_table_index_unsorted(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation);
int cpufreq_frequency_table_get_index(struct cpufreq_policy *policy,
unsigned int freq);
ssize_t cpufreq_show_cpus(const struct cpumask *mask, char *buf);
#ifdef CONFIG_CPU_FREQ
int cpufreq_boost_trigger_state(int state);
int cpufreq_boost_enabled(void);
int cpufreq_enable_boost_support(void);
bool policy_has_boost_freq(struct cpufreq_policy *policy);
/* Find lowest freq at or above target in a table in ascending order */
static inline int cpufreq_table_find_index_al(struct cpufreq_policy *policy,
unsigned int target_freq,
bool efficiencies)
{
struct cpufreq_frequency_table *table = policy->freq_table;
struct cpufreq_frequency_table *pos;
unsigned int freq;
int idx, best = -1;
cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies) {
freq = pos->frequency;
if (freq >= target_freq)
return idx;
best = idx;
}
return best;
}
/* Find lowest freq at or above target in a table in descending order */
static inline int cpufreq_table_find_index_dl(struct cpufreq_policy *policy,
unsigned int target_freq,
bool efficiencies)
{
struct cpufreq_frequency_table *table = policy->freq_table;
struct cpufreq_frequency_table *pos;
unsigned int freq;
int idx, best = -1;
cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies) {
freq = pos->frequency;
if (freq == target_freq)
return idx;
if (freq > target_freq) {
best = idx;
continue;
}
/* No freq found above target_freq */
if (best == -1)
return idx;
return best;
}
return best;
}
/* Works only on sorted freq-tables */
static inline int cpufreq_table_find_index_l(struct cpufreq_policy *policy,
unsigned int target_freq,
bool efficiencies)
{
target_freq = clamp_val(target_freq, policy->min, policy->max);
if (policy->freq_table_sorted == CPUFREQ_TABLE_SORTED_ASCENDING)
return cpufreq_table_find_index_al(policy, target_freq,
efficiencies);
else
return cpufreq_table_find_index_dl(policy, target_freq,
efficiencies);
}
/* Find highest freq at or below target in a table in ascending order */
static inline int cpufreq_table_find_index_ah(struct cpufreq_policy *policy,
unsigned int target_freq,
bool efficiencies)
{
struct cpufreq_frequency_table *table = policy->freq_table;
struct cpufreq_frequency_table *pos;
unsigned int freq;
int idx, best = -1;
cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies) {
freq = pos->frequency;
if (freq == target_freq)
return idx;
if (freq < target_freq) {
best = idx;
continue;
}
/* No freq found below target_freq */
if (best == -1)
return idx;
return best;
}
return best;
}
/* Find highest freq at or below target in a table in descending order */
static inline int cpufreq_table_find_index_dh(struct cpufreq_policy *policy,
unsigned int target_freq,
bool efficiencies)
{
struct cpufreq_frequency_table *table = policy->freq_table;
struct cpufreq_frequency_table *pos;
unsigned int freq;
int idx, best = -1;
cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies) {
freq = pos->frequency;
if (freq <= target_freq)
return idx;
best = idx;
}
return best;
}
/* Works only on sorted freq-tables */
static inline int cpufreq_table_find_index_h(struct cpufreq_policy *policy,
unsigned int target_freq,
bool efficiencies)
{
target_freq = clamp_val(target_freq, policy->min, policy->max);
if (policy->freq_table_sorted == CPUFREQ_TABLE_SORTED_ASCENDING)
return cpufreq_table_find_index_ah(policy, target_freq,
efficiencies);
else
return cpufreq_table_find_index_dh(policy, target_freq,
efficiencies);
}
/* Find closest freq to target in a table in ascending order */
static inline int cpufreq_table_find_index_ac(struct cpufreq_policy *policy,
unsigned int target_freq,
bool efficiencies)
{
struct cpufreq_frequency_table *table = policy->freq_table;
struct cpufreq_frequency_table *pos;
unsigned int freq;
int idx, best = -1;
cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies) {
freq = pos->frequency;
if (freq == target_freq)
return idx;
if (freq < target_freq) {
best = idx;
continue;
}
/* No freq found below target_freq */
if (best == -1)
return idx;
/* Choose the closest freq */
if (target_freq - table[best].frequency > freq - target_freq)
return idx;
return best;
}
return best;
}
/* Find closest freq to target in a table in descending order */
static inline int cpufreq_table_find_index_dc(struct cpufreq_policy *policy,
unsigned int target_freq,
bool efficiencies)
{
struct cpufreq_frequency_table *table = policy->freq_table;
struct cpufreq_frequency_table *pos;
unsigned int freq;
int idx, best = -1;
cpufreq_for_each_efficient_entry_idx(pos, table, idx, efficiencies) {
freq = pos->frequency;
if (freq == target_freq)
return idx;
if (freq > target_freq) {
best = idx;
continue;
}
/* No freq found above target_freq */
if (best == -1)
return idx;
/* Choose the closest freq */
if (table[best].frequency - target_freq > target_freq - freq)
return idx;
return best;
}
return best;
}
/* Works only on sorted freq-tables */
static inline int cpufreq_table_find_index_c(struct cpufreq_policy *policy,
unsigned int target_freq,
bool efficiencies)
{
target_freq = clamp_val(target_freq, policy->min, policy->max);
if (policy->freq_table_sorted == CPUFREQ_TABLE_SORTED_ASCENDING)
return cpufreq_table_find_index_ac(policy, target_freq,
efficiencies);
else
return cpufreq_table_find_index_dc(policy, target_freq,
efficiencies);
}
static inline int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
unsigned int target_freq,
unsigned int relation)
{
bool efficiencies = policy->efficiencies_available &&
(relation & CPUFREQ_RELATION_E);
int idx;
/* cpufreq_table_index_unsorted() has no use for this flag anyway */
relation &= ~CPUFREQ_RELATION_E;
if (unlikely(policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED))
return cpufreq_table_index_unsorted(policy, target_freq,
relation);
retry:
switch (relation) {
case CPUFREQ_RELATION_L:
idx = cpufreq_table_find_index_l(policy, target_freq,
efficiencies);
break;
case CPUFREQ_RELATION_H:
idx = cpufreq_table_find_index_h(policy, target_freq,
efficiencies);
break;
case CPUFREQ_RELATION_C:
idx = cpufreq_table_find_index_c(policy, target_freq,
efficiencies);
break;
default:
WARN_ON_ONCE(1);
return 0;
}
if (idx < 0 && efficiencies) {
efficiencies = false;
goto retry;
}
return idx;
}
static inline int cpufreq_table_count_valid_entries(const struct cpufreq_policy *policy)
{
struct cpufreq_frequency_table *pos;
int count = 0;
if (unlikely(!policy->freq_table))
return 0;
cpufreq_for_each_valid_entry(pos, policy->freq_table)
count++;
return count;
}
/**
* cpufreq_table_set_inefficient() - Mark a frequency as inefficient
* @policy: the &struct cpufreq_policy containing the inefficient frequency
* @frequency: the inefficient frequency
*
* The &struct cpufreq_policy must use a sorted frequency table
*
* Return: %0 on success or a negative errno code
*/
static inline int
cpufreq_table_set_inefficient(struct cpufreq_policy *policy,
unsigned int frequency)
{
struct cpufreq_frequency_table *pos;
/* Not supported */
if (policy->freq_table_sorted == CPUFREQ_TABLE_UNSORTED)
return -EINVAL;
cpufreq_for_each_valid_entry(pos, policy->freq_table) {
if (pos->frequency == frequency) {
pos->flags |= CPUFREQ_INEFFICIENT_FREQ;
policy->efficiencies_available = true;
return 0;
}
}
return -EINVAL;
}
static inline int parse_perf_domain(int cpu, const char *list_name,
const char *cell_name)
{
struct device_node *cpu_np;
struct of_phandle_args args;
int ret;
cpu_np = of_cpu_device_node_get(cpu);
if (!cpu_np)
return -ENODEV;
ret = of_parse_phandle_with_args(cpu_np, list_name, cell_name, 0,
&args);
if (ret < 0)
return ret;
of_node_put(cpu_np);
return args.args[0];
}
static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_name,
const char *cell_name, struct cpumask *cpumask)
{
int target_idx;
int cpu, ret;
ret = parse_perf_domain(pcpu, list_name, cell_name);
if (ret < 0)
return ret;
target_idx = ret;
cpumask_set_cpu(pcpu, cpumask);
for_each_possible_cpu(cpu) {
if (cpu == pcpu)
continue;
ret = parse_perf_domain(cpu, list_name, cell_name);
if (ret < 0)
continue;
if (target_idx == ret)
cpumask_set_cpu(cpu, cpumask);
}
return target_idx;
}
#else
static inline int cpufreq_boost_trigger_state(int state)
{
return 0;
}
static inline int cpufreq_boost_enabled(void)
{
return 0;
}
static inline int cpufreq_enable_boost_support(void)
{
return -EINVAL;
}
static inline bool policy_has_boost_freq(struct cpufreq_policy *policy)
{
return false;
}
static inline int
cpufreq_table_set_inefficient(struct cpufreq_policy *policy,
unsigned int frequency)
{
return -EINVAL;
}
static inline int of_perf_domain_get_sharing_cpumask(int pcpu, const char *list_name,
const char *cell_name, struct cpumask *cpumask)
{
return -EOPNOTSUPP;
}
#endif
sched/topology: Make Energy Aware Scheduling depend on schedutil Energy Aware Scheduling (EAS) is designed with the assumption that frequencies of CPUs follow their utilization value. When using a CPUFreq governor other than schedutil, the chances of this assumption being true are small, if any. When schedutil is being used, EAS' predictions are at least consistent with the frequency requests. Although those requests have no guarantees to be honored by the hardware, they should at least guide DVFS in the right direction and provide some hope in regards to the EAS model being accurate. To make sure EAS is only used in a sane configuration, create a strong dependency on schedutil being used. Since having sugov compiled-in does not provide that guarantee, make CPUFreq call a scheduler function on governor changes hence letting it rebuild the scheduling domains, check the governors of the online CPUs, and enable/disable EAS accordingly. Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: morten.rasmussen@arm.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-9-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-12-03 12:56:21 +03:00
#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
void sched_cpufreq_governor_change(struct cpufreq_policy *policy,
struct cpufreq_governor *old_gov);
#else
static inline void sched_cpufreq_governor_change(struct cpufreq_policy *policy,
struct cpufreq_governor *old_gov) { }
#endif
x86 / CPU: Always show current CPU frequency in /proc/cpuinfo After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo on x86 can be either the nominal CPU frequency (which is constant) or the frequency most recently requested by a scaling governor in cpufreq, depending on the cpufreq configuration. That is somewhat inconsistent and is different from what it was before 4.13, so in order to restore the previous behavior, make it report the current CPU frequency like the scaling_cur_freq sysfs file in cpufreq. To that end, modify the /proc/cpuinfo implementation on x86 to use aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback registers, if available, and use their values to compute the CPU frequency to be reported as "cpu MHz". However, do that carefully enough to avoid accumulating delays that lead to unacceptable access times for /proc/cpuinfo on systems with many CPUs. Run aperfmperf_snapshot_khz() once on all CPUs asynchronously at the /proc/cpuinfo open time, add a single delay upfront (if necessary) at that point and simply compute the current frequency while running show_cpuinfo() for each individual CPU. Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce the default delay between consecutive APERF and MPERF reads to 10 ms, which should be sufficient to get large enough numbers for the frequency computation in all cases. Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@kernel.org>
2017-11-15 04:13:40 +03:00
extern void arch_freq_prepare_all(void);
x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF The goal of this change is to give users a uniform and meaningful result when they read /sys/...cpufreq/scaling_cur_freq on modern x86 hardware, as compared to what they get today. Modern x86 processors include the hardware needed to accurately calculate frequency over an interval -- APERF, MPERF, and the TSC. Here we provide an x86 routine to make this calculation on supported hardware, and use it in preference to any driver driver-specific cpufreq_driver.get() routine. MHz is computed like so: MHz = base_MHz * delta_APERF / delta_MPERF MHz is the average frequency of the busy processor over a measurement interval. The interval is defined to be the time between successive invocations of aperfmperf_khz_on_cpu(), which are expected to to happen on-demand when users read sysfs attribute cpufreq/scaling_cur_freq. As with previous methods of calculating MHz, idle time is excluded. base_MHz above is from TSC calibration global "cpu_khz". This x86 native method to calculate MHz returns a meaningful result no matter if P-states are controlled by hardware or firmware and/or if the Linux cpufreq sub-system is or is-not installed. When this routine is invoked more frequently, the measurement interval becomes shorter. However, the code limits re-computation to 10ms intervals so that average frequency remains meaningful. Discerning users are encouraged to take advantage of the turbostat(8) utility, which can gracefully handle concurrent measurement intervals of arbitrary length. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-06-24 08:11:52 +03:00
extern unsigned int arch_freq_get_on_cpu(int cpu);
cpufreq,arm,arm64: restructure definitions of arch_set_freq_scale() Compared to other arch_* functions, arch_set_freq_scale() has an atypical weak definition that can be replaced by a strong architecture specific implementation. The more typical support for architectural functions involves defining an empty stub in a header file if the symbol is not already defined in architecture code. Some examples involve: - #define arch_scale_freq_capacity topology_get_freq_scale - #define arch_scale_freq_invariant topology_scale_freq_invariant - #define arch_scale_cpu_capacity topology_get_cpu_scale - #define arch_update_cpu_topology topology_update_cpu_topology - #define arch_scale_thermal_pressure topology_get_thermal_pressure - #define arch_set_thermal_pressure topology_set_thermal_pressure Bring arch_set_freq_scale() in line with these functions by renaming it to topology_set_freq_scale() in the arch topology driver, and by defining the arch_set_freq_scale symbol to point to the new function for arm and arm64. While there are other users of the arch_topology driver, this patch defines arch_set_freq_scale for arm and arm64 only, due to their existing definitions of arch_scale_freq_capacity. This is the getter function of the frequency invariance scale factor and without a getter function, the setter function - arch_set_freq_scale() has not purpose. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Sudeep Holla <sudeep.holla@arm.com> (BL_SWITCHER and topology parts) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2020-09-24 15:30:15 +03:00
#ifndef arch_set_freq_scale
static __always_inline
void arch_set_freq_scale(const struct cpumask *cpus,
unsigned long cur_freq,
unsigned long max_freq)
{
}
#endif
/* the following are really really optional */
extern struct freq_attr cpufreq_freq_attr_scaling_available_freqs;
extern struct freq_attr cpufreq_freq_attr_scaling_boost_freqs;
extern struct freq_attr *cpufreq_generic_attr[];
int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy);
unsigned int cpufreq_generic_get(unsigned int cpu);
void cpufreq_generic_init(struct cpufreq_policy *policy,
struct cpufreq_frequency_table *table,
unsigned int transition_latency);
static inline void cpufreq_register_em_with_opp(struct cpufreq_policy *policy)
{
dev_pm_opp_of_register_em(get_cpu_device(policy->cpu),
policy->related_cpus);
}
#endif /* _LINUX_CPUFREQ_H */