WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Rafael J. Wysocki	f5c233c8fe	Merge branch 'pm-opp' into pm * pm-opp: (37 commits) PM / devfreq: Add required OPPs support to passive governor PM / devfreq: Cache OPP table reference in devfreq OPP: Add function to look up required OPP's for a given OPP opp: Replace ENOTSUPP with EOPNOTSUPP opp: Fix "foo * bar" should be "foo *bar" opp: Don't ignore clk_get() errors other than -ENOENT opp: Update bandwidth requirements based on scaling up/down opp: Allow lazy-linking of required-opps opp: Remove dev_pm_opp_set_bw() devfreq: tegra30: Migrate to dev_pm_opp_set_opp() drm: msm: Migrate to dev_pm_opp_set_opp() cpufreq: qcom: Migrate to dev_pm_opp_set_opp() opp: Implement dev_pm_opp_set_opp() opp: Update parameters of _set_opp_custom() opp: Allow _generic_set_opp_clk_only() to work for non-freq devices opp: Allow _generic_set_opp_regulator() to work for non-freq devices opp: Allow _set_opp() to work for non-freq devices opp: Split _set_opp() out of dev_pm_opp_set_rate() opp: Keep track of currently programmed OPP opp: No need to check clk for errors ...	2021-02-15 17:01:46 +01:00
Rafael J. Wysocki	8a3f1f181d	Merge back cpufreq updates for v5.12.	2021-02-10 19:11:06 +01:00
Rafael J. Wysocki	7ac839a0a7	Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull ARM cpufreq changes for v5.12 from Viresh Kumar: "- Removal of Tango driver as the platform got removed (Arnd Bergmann). - Use resource managed APIs for tegra20 (Dmitry Osipenko). - Generic cleanups for brcmstb (Christophe JAILLET). - Enable boost support for qcom-hw (Shawn Guo)." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: remove tango driver cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove() cpufreq: brcmstb-avs-cpufreq: Free resources in error path cpufreq: qcom-hw: enable boost support cpufreq: tegra20: Use resource-managed API	2021-02-08 13:54:58 +01:00
Rafael J. Wysocki	d11a1d08a0	cpufreq: ACPI: Update arch scale-invariance max perf ratio if CPPC is not there If the maximum performance level taken for computing the arch_max_freq_ratio value used in the x86 scale-invariance code is higher than the one corresponding to the cpuinfo.max_freq value coming from the acpi_cpufreq driver, the scale-invariant utilization falls below 100% even if the CPU runs at cpuinfo.max_freq or slightly faster, which causes the schedutil governor to select a frequency below cpuinfo.max_freq. That frequency corresponds to a frequency table entry below the maximum performance level necessary to get to the "boost" range of CPU frequencies which prevents "boost" frequencies from being used in some workloads. While this issue is related to scale-invariance, it may be amplified by commit `db865272d9` ("cpufreq: Avoid configuring old governors as default with intel_pstate") from the 5.10 development cycle which made it extremely easy to default to schedutil even if the preferred driver is acpi_cpufreq as long as intel_pstate is built too, because the mere presence of the latter effectively removes the ondemand governor from the defaults. Distro kernels are likely to include both intel_pstate and acpi_cpufreq on x86, so their users who cannot use intel_pstate or choose to use acpi_cpufreq may easily be affectecd by this issue. If CPPC is available, it can be used to address this issue by extending the frequency tables created by acpi_cpufreq to cover the entire available frequency range (including "boost" frequencies) for each CPU, but if CPPC is not there, acpi_cpufreq has no idea what the maximum "boost" frequency is and the frequency tables created by it cannot be extended in a meaningful way, so in that case make it ask the arch scale-invariance code to to use the "nominal" performance level for CPU utilization scaling in order to avoid the issue at hand. Fixes: `db865272d9` ("cpufreq: Avoid configuring old governors as default with intel_pstate") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2021-02-08 13:45:51 +01:00
Rafael J. Wysocki	3c55e94c0a	cpufreq: ACPI: Extend frequency tables to cover boost frequencies A severe performance regression on AMD EPYC processors when using the schedutil scaling governor was discovered by Phoronix.com and attributed to the following commits: `41ea667227` ("x86, sched: Calculate frequency invariance for AMD systems") `976df7e573` ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC") The source of the problem is that the maximum performance level taken for computing the arch_max_freq_ratio value used in the x86 scale- invariance code is higher than the one corresponding to the cpuinfo.max_freq value coming from the acpi_cpufreq driver. This effectively causes the scale-invariant utilization to fall below 100% even if the CPU runs at cpuinfo.max_freq or slightly faster, so the schedutil governor selects a frequency below cpuinfo.max_freq then. That frequency corresponds to a frequency table entry below the maximum performance level necessary to get to the "boost" range of CPU frequencies. However, if the cpuinfo.max_freq value coming from acpi_cpufreq was higher, the schedutil governor would select higher frequencies which in turn would allow acpi_cpufreq to set more adequate performance levels and to get to the "boost" range of CPU frequencies more often. This issue affects any systems where acpi_cpufreq is used and the "boost" (or "turbo") frequencies are enabled, not just AMD EPYC. Moreover, commit `db865272d9` ("cpufreq: Avoid configuring old governors as default with intel_pstate") from the 5.10 development cycle made it extremely easy to default to schedutil even if the preferred driver is acpi_cpufreq as long as intel_pstate is built too, because the mere presence of the latter effectively removes the ondemand governor from the defaults. Distro kernels are likely to include both intel_pstate and acpi_cpufreq on x86, so their users who cannot use intel_pstate or choose to use acpi_cpufreq may easily be affectecd by this issue. To address this issue, extend the frequency table constructed by acpi_cpufreq for each CPU to cover the entire range of available frequencies (including the "boost" ones) if CPPC is available and indicates that "boost" (or "turbo") frequencies are enabled. That causes cpuinfo.max_freq to become the maximum "boost" frequency of the given CPU (instead of the maximum frequency returned by the ACPI _PSS object that corresponds to the "nominal" performance level). Fixes: `41ea667227` ("x86, sched: Calculate frequency invariance for AMD systems") Fixes: `976df7e573` ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC") Fixes: `db865272d9` ("cpufreq: Avoid configuring old governors as default with intel_pstate") Link: https://www.phoronix.com/scan.php?page=article&item=linux511-amd-schedutil&num=1 Link: https://lore.kernel.org/linux-pm/20210203135321.12253-2-ggherdovich@suse.cz/ Reported-by: Michael Larabel <Michael@phoronix.com> Diagnosed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz> Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz> Tested-by: Michael Larabel <Michael@phoronix.com>	2021-02-08 13:45:51 +01:00
Viresh Kumar	2f0531869f	cpufreq: Remove unused flag CPUFREQ_PM_NO_WARN This flag is set by one of the drivers but it isn't used in the code otherwise. Remove the unused flag and update the driver. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-02-04 19:25:47 +01:00
Viresh Kumar	5ae4a4b45d	cpufreq: Remove CPUFREQ_STICKY flag During cpufreq driver's registration, if the ->init() callback for all the CPUs fail then there is not much point in keeping the driver around as it will only account for more of unnecessary noise, for example cpufreq core will try to suspend/resume the driver which never got registered properly. The removal of such a driver is avoided if the driver carries the CPUFREQ_STICKY flag. This was added way back [1] in 2004 and perhaps no one should ever need it now. A lot of drivers do set this flag, probably because they just copied it from other drivers. This was added earlier for some platforms [2] because their cpufreq drivers were getting registered before the CPUs were registered with subsys framework. And hence they used to fail. The same isn't true anymore though. The current code flow in the kernel is: start_kernel() -> kernel_init() -> kernel_init_freeable() -> do_basic_setup() -> driver_init() -> cpu_dev_init() -> subsys_system_register() //For CPUs -> do_initcalls() -> cpufreq_register_driver() Clearly, the CPUs will always get registered with subsys framework before any cpufreq driver can get probed. Remove the flag and update the relevant drivers. Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/include/linux/cpufreq.h?id=7cc9f0d9a1ab04cedc60d64fd8dcf7df224a3b4d # [1] Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/arch/arm/mach-sa1100/cpu-sa1100.c?id=f59d3bbe35f6268d729f51be82af8325d62f20f5 # [2] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-02-04 19:23:20 +01:00
Viresh Kumar	8d25157f73	cpufreq: qcom: Migrate to dev_pm_opp_set_opp() dev_pm_opp_set_bw() is getting removed and dev_pm_opp_set_opp() should be used instead. Migrate to the new API. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Dmitry Osipenko <digetx@gmail.com>	2021-02-02 10:28:09 +05:30
Nigel Christian	75a8d877d6	cpufreq: intel_pstate: Remove repeated word In the comment for trace in passive mode there is an unnecessary "the". Eradicate it. Signed-off-by: Nigel Christian <nigel.l.christian@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-01-22 17:04:56 +01:00
Arnd Bergmann	7114ebffd3	cpufreq: remove tango driver The tango platform is getting removed, so the driver is no longer needed. Cc: Marc Gonzalez <marc.w.gonzalez@free.fr> Cc: Mans Rullgard <mans@mansr.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> [ Viresh: Update cpufreq-dt-platdev.c as well ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2021-01-21 09:34:46 +05:30
Christophe JAILLET	3657f729b6	cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove() If 'cpufreq_unregister_driver()' fails, just WARN and continue, so that other resources are freed. Fixes: `de322e0859` ("cpufreq: brcmstb-avs-cpufreq: AVS CPUfreq driver for Broadcom STB SoCs") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> [ Viresh: Updated Subject ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2021-01-18 12:23:43 +05:30
Christophe JAILLET	05f456286f	cpufreq: brcmstb-avs-cpufreq: Free resources in error path If 'cpufreq_register_driver()' fails, we must release the resources allocated in 'brcm_avs_prepare_init()' as already done in the remove function. To do that, introduce a new function 'brcm_avs_prepare_uninit()' in order to avoid code duplication. This also makes the code more readable (IMHO). Fixes: `de322e0859` ("cpufreq: brcmstb-avs-cpufreq: AVS CPUfreq driver for Broadcom STB SoCs") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> [ Viresh: Updated Subject ] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2021-01-18 12:23:28 +05:30
Shawn Guo	266991721c	cpufreq: qcom-hw: enable boost support At least on sdm850, the `2956800` khz is detected as a boost frequency in function qcom_cpufreq_hw_read_lut(). Let's enable boost support by calling cpufreq_enable_boost_support(), so that we can get the boost frequency by switching it on via 'boost' sysfs entry like below. $ echo 1 > /sys/devices/system/cpu/cpufreq/boost Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Tested-by: Steev Klimaszewski <steev@kali.org> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2021-01-18 12:13:53 +05:30
Dmitry Osipenko	763ec5daae	cpufreq: tegra20: Use resource-managed API Switch cpufreq-tegra20 driver to use resource-managed API. This removes the need to get opp_table pointer using dev_pm_opp_get_opp_table() in order to release OPP table that was requested by dev_pm_opp_set_supported_hw(), making the code a bit more straightforward. Signed-off-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2021-01-18 12:02:53 +05:30
Chen Yu	6f67e06008	cpufreq: intel_pstate: Get per-CPU max freq via MSR_HWP_CAPABILITIES if available Currently, when turbo is disabled (either by BIOS or by the user), the intel_pstate driver reads the max non-turbo frequency from the package-wide MSR_PLATFORM_INFO(0xce) register. However, on asymmetric platforms it is possible in theory that small and big core with HWP enabled might have different max non-turbo CPU frequency, because MSR_HWP_CAPABILITIES is per-CPU scope according to Intel Software Developer Manual. The turbo max freq is already per-CPU in current code, so make similar change to the max non-turbo frequency as well. Reported-by: Wendy Wang <wendy.wang@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> [ rjw: Subject and changelog edits ] Cc: 4.18+ <stable@vger.kernel.org> # 4.18+: a45ee4d4e13b: cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-01-12 19:44:49 +01:00
Rafael J. Wysocki	597ffbc8d0	cpufreq: intel_pstate: Rename two functions Rename intel_cpufreq_adjust_hwp() and intel_cpufreq_adjust_perf_ctl() to intel_cpufreq_hwp_update() and intel_cpufreq_perf_ctl_update(), respectively, to avoid possible confusion with the ->adjist_perf() callback function, intel_cpufreq_adjust_perf(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com>	2021-01-12 19:34:05 +01:00
Rafael J. Wysocki	a45ee4d4e1	cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument All of the callers of intel_pstate_get_hwp_max() access the struct cpudata object that corresponds to the given CPU already and the function itself needs to access that object (in order to update hwp_cap_cached), so modify the code to pass a struct cpudata pointer to it instead of the CPU number. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com>	2021-01-12 19:34:04 +01:00
Rafael J. Wysocki	9dd04ec6bc	cpufreq: intel_pstate: Always read hwp_cap_cached with READ_ONCE() Because intel_pstate_get_hwp_max() which updates hwp_cap_cached may run in parallel with the readers of it, annotate all of the read accesses to it with READ_ONCE(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Chen Yu <yu.c.chen@intel.com>	2021-01-12 19:34:04 +01:00
Lukas Bulwahn	c4151604f0	cpufreq: intel_pstate: remove obsolete functions percent_fp() was used in intel_pstate_pid_reset(), which was removed in commit `9d0ef7af1f` ("cpufreq: intel_pstate: Do not use PID-based P-state selection") and hence, percent_fp() is unused since then. percent_ext_fp() was last used in intel_pstate_update_perf_limits(), which was refactored in commit `1a4fe38add` ("cpufreq: intel_pstate: Remove max/min fractions to limit performance"), and hence, percent_ext_fp() is unused since then. make CC=clang W=1 points us those unused functions: drivers/cpufreq/intel_pstate.c:79:23: warning: unused function 'percent_fp' [-Wunused-function] static inline int32_t percent_fp(int percent) ^ drivers/cpufreq/intel_pstate.c:94:23: warning: unused function 'percent_ext_fp' [-Wunused-function] static inline int32_t percent_ext_fp(int percent) ^ Remove those obsolete functions. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-01-07 18:22:46 +01:00
Colin Ian King	943bdd0cec	cpufreq: powernow-k8: pass policy rather than use cpufreq_cpu_get() Currently there is an unlikely case where cpufreq_cpu_get() returns a NULL policy and this will cause a NULL pointer dereference later on. Fix this by passing the policy to transition_frequency_fidvid() from the caller and hence eliminating the need for the cpufreq_cpu_get() and cpufreq_cpu_put(). Thanks to Viresh Kumar for suggesting the fix. Addresses-Coverity: ("Dereference null return") Fixes: `b43a7ffbf3` ("cpufreq: Notify all policy->cpus in cpufreq_notify_transition()") Suggested-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2021-01-07 17:37:33 +01:00
Rafael J. Wysocki	17ffd35809	cpufreq: intel_pstate: Use HWP capabilities in intel_cpufreq_adjust_perf() If turbo P-states cannot be used, either due to the configuration of the processor, or because intel_pstate is not allowed to used them, the maximum available P-state with HWP enabled corresponds to the HWP_CAP.GUARANTEED value which is not static. It can be adjusted by an out-of-band agent or during an Intel Speed Select performance level change, so long as it remains less than or equal to HWP_CAP.MAX. However, if turbo P-states cannot be used, intel_cpufreq_adjust_perf() always uses pstate.max_pstate (set during the initialization of the driver only) as the maximum available P-state, so it may miss a change of the HWP_CAP.GUARANTEED value. Prevent that from happening by modifyig intel_cpufreq_adjust_perf() to always read the "guaranteed" and "maximum turbo" performance levels from the cached HWP_CAP value. Fixes: `a365ab6b9d` ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>	2021-01-07 17:34:32 +01:00
Rafael J. Wysocki	be1283454b	cpufreq: intel_pstate: Fix fast-switch fallback path When sugov_update_single_perf() falls back to the "frequency" path due to the missing scale-invariance, it will call cpufreq_driver_fast_switch() via sugov_fast_switch() and the driver's ->fast_switch() callback will be invoked, so it must not be NULL. However, after commit `a365ab6b9d` ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") intel_pstate sets ->fast_switch() to NULL when it is going to use intel_cpufreq_adjust_perf(), which is a mistake, because on x86 the scale-invariance may be turned off dynamically, so modify it to retain the original ->adjust_perf() callback pointer. Fixes: `a365ab6b9d` ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback") Reported-by: Kenneth R. Crudup <kenny@panix.com> Tested-by: Kenneth R. Crudup <kenny@panix.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-12-30 18:22:17 +01:00
Rafael J. Wysocki	c3a74f8e25	Merge branch 'pm-cpufreq' * pm-cpufreq: cpufreq: intel_pstate: Use most recent guaranteed performance values cpufreq: intel_pstate: Implement the ->adjust_perf() callback cpufreq: Add special-purpose fast-switching callback for drivers cpufreq: schedutil: Add util to struct sg_cpu cppc_cpufreq: replace per-cpu data array with a list cppc_cpufreq: expose information on frequency domains cppc_cpufreq: clarify support for coordination types cppc_cpufreq: use policy->cpu as driver of frequency setting ACPI: processor: fix NONE coordination for domain mapping failure ACPI: processor: Drop duplicate setting of shared_cpu_map	2020-12-22 17:59:11 +01:00
Rafael J. Wysocki	e40ad84c26	cpufreq: intel_pstate: Use most recent guaranteed performance values When turbo has been disabled by the BIOS, but HWP_CAP.GUARANTEED is changed later, user space may want to take advantage of this increased guaranteed performance. HWP_CAP.GUARANTEED is not a static value. It can be adjusted by an out-of-band agent or during an Intel Speed Select performance level change. The HWP_CAP.MAX is still the maximum achievable performance with turbo disabled by the BIOS, so HWP_CAP.GUARANTEED can still change as long as it remains less than or equal to HWP_CAP.MAX. When HWP_CAP.GUARANTEED is changed, the sysfs base_frequency attribute shows the most recent guaranteed frequency value. This attribute can be used by user space software to update the scaling min/max limits of the CPU. Currently, the ->setpolicy() callback already uses the latest HWP_CAP values when setting HWP_REQ, but the ->verify() callback will restrict the user settings to the to old guaranteed performance value which prevents user space from making use of the extra CPU capacity theoretically available to it after increasing HWP_CAP.GUARANTEED. To address this, read HWP_CAP in intel_pstate_verify_cpu_policy() to obtain the maximum P-state that can be used and use that to confine the policy max limit instead of using the cached and possibly stale pstate.max_freq value for this purpose. For consistency, update intel_pstate_update_perf_limits() to use the maximum available P-state returned by intel_pstate_get_hwp_max() to compute the maximum frequency instead of using the return value of intel_pstate_get_max_freq() which, again, may be stale. This issue is a side-effect of fixing the scaling frequency limits in commit `eacc9c5a92` ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled") which corrected the setting of the reduced scaling frequency values, but caused stale HWP_CAP.GUARANTEED to be used in the case at hand. Fixes: `eacc9c5a92` ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled") Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.8+ <stable@vger.kernel.org> # 5.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-12-21 10:51:00 +01:00
Rafael J. Wysocki	a365ab6b9d	cpufreq: intel_pstate: Implement the ->adjust_perf() callback Make intel_pstate expose the ->adjust_perf() callback when it operates in the passive mode with HWP enabled which causes the schedutil governor to use that callback instead of ->fast_switch(). The minimum and target performance-level values passed by the governor to ->adjust_perf() are converted to HWP.REQ.MIN and HWP.REQ.DESIRED, respectively, which allows the processor to adjust its configuration to maximize energy-efficiency while providing sufficient capacity. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-15 19:24:19 +01:00
Rafael J. Wysocki	ee2cc4276b	cpufreq: Add special-purpose fast-switching callback for drivers First off, some cpufreq drivers (eg. intel_pstate) can pass hints beyond the current target frequency to the hardware and there are no provisions for doing that in the cpufreq framework. In particular, today the driver has to assume that it should not allow the frequency to fall below the one requested by the governor (or the required capacity may not be provided) which may not be the case and which may lead to excessive energy usage in some scenarios. Second, the hints passed by these drivers to the hardware need not be in terms of the frequency, so representing the utilization numbers coming from the scheduler as frequency before passing them to those drivers is not really useful. Address the two points above by adding a special-purpose replacement for the ->fast_switch callback, called ->adjust_perf, allowing the governor to pass abstract performance level (rather than frequency) values for the minimum (required) and target (desired) performance along with the CPU capacity to compare them to. Also update the schedutil governor to use the new callback instead of ->fast_switch if present and if the utilization mertics are frequency-invariant (that is requisite for the direct mapping between the utilization and the CPU performance levels to be a reasonable approximation). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-15 19:24:18 +01:00
Ionela Voinescu	a28b2bfc09	cppc_cpufreq: replace per-cpu data array with a list The cppc_cpudata per-cpu storage was inefficient (1) additional to causing functional issues (2) when CPUs are hotplugged out, due to per-cpu data being improperly initialised. (1) The amount of information needed for CPPC performance control in its cpufreq driver depends on the domain (PSD) coordination type: ANY: One set of CPPC control and capability data (e.g desired performance, highest/lowest performance, etc) applies to all CPUs in the domain. ALL: Same as ANY. To be noted that this type is not currently supported. When supported, information about which CPUs belong to a domain is needed in order for frequency change requests to be sent to each of them. HW: It's necessary to store CPPC control and capability information for all the CPUs. HW will then coordinate the performance state based on their limitations and requests. NONE: Same as HW. No HW coordination is expected. Despite this, the previous initialisation code would indiscriminately allocate memory for all CPUs (all_cpu_data) and unnecessarily duplicate performance capabilities and the domain sharing mask and type for each possible CPU. (2) With the current per-cpu structure, when having ANY coordination, the cppc_cpudata cpu information is not initialised (will remain 0) for all CPUs in a policy, other than policy->cpu. When policy->cpu is hotplugged out, the driver will incorrectly use the uninitialised (0) value of the other CPUs when making frequency changes. Additionally, the previous values stored in the perf_ctrls.desired_perf will be lost when policy->cpu changes. Therefore replace the array of per cpu data with a list. The memory for each structure is allocated at policy init, where a single structure can be allocated per policy, not per cpu. In order to accommodate the struct list_head node in the cppc_cpudata structure, the now unused cpu and cur_policy variables are removed. For example, on a arm64 Juno platform with 6 CPUs: (0, 1, 2, 3) in PSD1, (4, 5) in PSD2 - ANY coordination, the memory allocation comparison shows: Before patch: - ANY coordination: total slack req alloc/free caller 0 0 0 0/1 _kernel_size_le_hi32+0x0xffff800008ff7810 0 0 0 0/6 _kernel_size_le_hi32+0x0xffff800008ff7808 128 80 48 1/0 _kernel_size_le_hi32+0x0xffff800008ffc070 768 0 768 6/0 _kernel_size_le_hi32+0x0xffff800008ffc0e4 After patch: - ANY coordination: total slack req alloc/free caller 256 0 256 2/0 _kernel_size_le_hi32+0x0xffff800008fed410 0 0 0 0/2 _kernel_size_le_hi32+0x0xffff800008fed274 Additional notes: - A pointer to the policy's cppc_cpudata is stored in policy->driver_data - Driver registration is skipped if _CPC entries are not present. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-12-15 19:19:32 +01:00
Ionela Voinescu	cfdc589f4b	cppc_cpufreq: expose information on frequency domains Use the existing sysfs attribute "freqdomain_cpus" to expose information to userspace about CPUs in the same frequency domain. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-12-15 19:19:32 +01:00
Ionela Voinescu	bf76bb208f	cppc_cpufreq: clarify support for coordination types The previous coordination type handling in the cppc_cpufreq init code created some confusion: the comment mentioned "Support only SW_ANY for now" while only the SW_ALL/ALL case resulted in a failure. The other coordination types (HW_ALL/HW, NONE) were silently supported. Clarify support for coordination types while describing in comments the intended behavior. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-12-15 19:19:32 +01:00
Ionela Voinescu	d2641a5c3d	cppc_cpufreq: use policy->cpu as driver of frequency setting Considering only the currently supported coordination types (ANY, HW, NONE), this change only makes a difference for the ANY type, when policy->cpu is hotplugged out. In that case the new policy->cpu will be different from ((struct cppc_cpudata )policy->driver_data)->cpu. While in this case the controls of ANY* CPU could be used to drive frequency changes, it's more consistent to use policy->cpu as the leading CPU, as used in all other cppc_cpufreq functions. Additionally, the debug prints in cppc_set_perf() would no longer create confusion when referring to a CPU that is hotplugged out. Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-12-15 19:19:31 +01:00
Rafael J. Wysocki	e1f1320fc0	Merge branch 'pm-cpufreq' * pm-cpufreq: (31 commits) cpufreq: Fix cpufreq_online() return value on errors cpufreq: Fix up several kerneldoc comments cpufreq: stats: Use local_clock() instead of jiffies cpufreq: schedutil: Simplify sugov_update_next_freq() cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate() cpufreq: arm_scmi: Discover the power scale in performance protocol firmware: arm_scmi: Add power_scale_mw_get() interface cpufreq: tegra194: Rename tegra194_get_speed_common function cpufreq: tegra194: Remove unnecessary frequency calculation cpufreq: tegra186: Simplify cluster information lookup cpufreq: tegra186: Fix sparse 'incorrect type in assignment' warning cpufreq: imx: fix NVMEM_IMX_OCOTP dependency cpufreq: vexpress-spc: Add missing MODULE_ALIAS cpufreq: scpi: Add missing MODULE_ALIAS cpufreq: loongson1: Add missing MODULE_ALIAS cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE cpufreq: st: Add missing MODULE_DEVICE_TABLE cpufreq: qcom: Add missing MODULE_DEVICE_TABLE cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE cpufreq: highbank: Add missing MODULE_DEVICE_TABLE ...	2020-12-15 15:24:52 +01:00
Rafael J. Wysocki	30c768829a	Merge branch 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull ARM cpufreq updates for 5.11-rc1 from Viresh Kumar: "This contains the following updates: - Fix imx's NVMEM_IMX_OCOTP dependency (Arnd Bergmann). - Add support for mt8167 and blacklist mt8516 (Fabien Parent). - Some ->get() callback related cleanups to the tegra194 driver and some optimizations in tegra186 driver (Jon Hunter and Sumit Gupta). - Power scale improvements to arm_scmi driver (Lukasz Luba). - Add missing MODULE_DEVICE_TABLE and MODULE_ALIAS to several drivers (Pali Rohár). - Fix error path in mediatek driver (Qinglang Miao). - Fix memleak in ST's cpufreq driver (Yangtao Li)." * 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: (22 commits) cpufreq: arm_scmi: Discover the power scale in performance protocol firmware: arm_scmi: Add power_scale_mw_get() interface cpufreq: tegra194: Rename tegra194_get_speed_common function cpufreq: tegra194: Remove unnecessary frequency calculation cpufreq: tegra186: Simplify cluster information lookup cpufreq: tegra186: Fix sparse 'incorrect type in assignment' warning cpufreq: imx: fix NVMEM_IMX_OCOTP dependency cpufreq: vexpress-spc: Add missing MODULE_ALIAS cpufreq: scpi: Add missing MODULE_ALIAS cpufreq: loongson1: Add missing MODULE_ALIAS cpufreq: sun50i: Add missing MODULE_DEVICE_TABLE cpufreq: st: Add missing MODULE_DEVICE_TABLE cpufreq: qcom: Add missing MODULE_DEVICE_TABLE cpufreq: mediatek: Add missing MODULE_DEVICE_TABLE cpufreq: highbank: Add missing MODULE_DEVICE_TABLE cpufreq: ap806: Add missing MODULE_DEVICE_TABLE cpufreq: mediatek: add missing platform_driver_unregister() on error in mtk_cpufreq_driver_init cpufreq: tegra194: get consistent cpuinfo_cur_freq cpufreq: blacklist mt8516 in cpufreq-dt-platdev cpufreq: mediatek: Add support for mt8167 ...	2020-12-14 20:29:50 +01:00
Rafael J. Wysocki	f0f6dbaf06	Merge branch 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull OPP (Operating Performance Points) updates for 5.11-rc1 from Viresh Kumar: "This contains the following updates: - Allow empty (node-less) OPP tables in DT for passing just the dependency related information (Nicola Mazzucato). - Fix a potential lockdep in OPP core and other OPP core cleanups (Viresh Kumar). - Don't abuse dev_pm_opp_get_opp_table() to create an OPP table, fix cpufreq-dt driver for the same (Viresh Kumar). - dev_pm_opp_put_regulators() accepts a NULL argument now, updates to all the users as well (Viresh Kumar)." * 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: opp: of: Allow empty opp-table with opp-shared dt-bindings: opp: Allow empty OPP tables media: venus: dev_pm_opp_put_() accepts NULL argument drm/panfrost: dev_pm_opp_put_() accepts NULL argument drm/lima: dev_pm_opp_put_() accepts NULL argument PM / devfreq: exynos: dev_pm_opp_put_() accepts NULL argument cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_() accepts NULL argument cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument opp: Allow dev_pm_opp_put_() APIs to accept NULL opp_table opp: Don't create an OPP table from dev_pm_opp_get_opp_table() cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table opp: Reduce the size of critical section in _opp_kref_release() opp: Don't return opp_dev from _find_opp_dev() opp: Allocate the OPP table outside of opp_table_lock opp: Always add entries in dev_list with opp_table->lock held	2020-12-14 20:26:17 +01:00
Wang ShaoBo	b96f038432	cpufreq: Fix cpufreq_online() return value on errors Make cpufreq_online() return negative error codes on all errors that cause the policy to be destroyed, as appropriate. Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com> [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-12-11 19:54:17 +01:00
Rafael J. Wysocki	ec06e586ab	cpufreq: Fix up several kerneldoc comments Fix up the remaining kerneldoc comments that don't adhere to the expected format and clarify some of them a bit. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-11 19:54:08 +01:00
Viresh Kumar	7854c7520b	cpufreq: stats: Use local_clock() instead of jiffies local_clock() has better precision and accuracy as compared to jiffies, lets use it for time management in cpufreq stats. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-12-11 19:53:58 +01:00
Rafael J. Wysocki	2554c32f0b	cpufreq: intel_pstate: Simplify intel_cpufreq_update_pstate() Avoid doing the same assignment in both branches of a conditional, do it after the whole conditional instead. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-12-11 19:53:58 +01:00
Rafael J. Wysocki	42807537b6	Merge back cpufreq material for v5.11.	2020-12-11 19:52:52 +01:00
Viresh Kumar	2ff8fe13ac	cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_() accepts NULL argument The dev_pm_opp_put_() APIs now accepts a NULL opp_table pointer and so there is no need for us to carry the extra checks. Drop them. Reviewed-by: Ilia Lin <ilia.lin@kernel.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-09 11:21:12 +05:30
Viresh Kumar	5f6ffb8d8f	cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument The dev_pm_opp_put_*() APIs now accepts a NULL opp_table pointer and so there is no need for us to carry the extra checks. Drop them. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-09 11:21:11 +05:30
Viresh Kumar	873c9851eb	cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table Initially, the helper dev_pm_opp_get_opp_table() was supposed to be used only for the OPP core's internal use (it tries to find an existing OPP table and if it doesn't find one, then it allocates the OPP table). Sometime back, the cpufreq-dt driver started using it to make sure all the relevant resources required by the OPP core are available earlier during initialization process to properly propagate -EPROBE_DEFER. It worked but it also abused the API to create an OPP table, which should be created with the help of other helpers provided by the OPP core. The OPP core will be updated in a later commit to limit the scope of dev_pm_opp_get_opp_table() to only finding an existing OPP table and not create one. This commit updates the cpufreq-dt driver before that happens. Now the cpufreq-dt driver creates the OPP and cpufreq tables for all the CPUs from driver's init callback itself. Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-09 11:21:11 +05:30
Viresh Kumar	c8bb452054	Merge branch 'cpufreq/scmi' into cpufreq/arm/linux-next	2020-12-08 11:22:17 +05:30
Lukasz Luba	f9b0498d29	cpufreq: arm_scmi: Discover the power scale in performance protocol Add mechanism to discover the power scale present in the performance protocol for all domains. Provide this information to Energy Model, which then can be checked in other frameworks, e.g. thermal. Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-08 10:16:13 +05:30
Jon Hunter	f45f89a778	cpufreq: tegra194: Rename tegra194_get_speed_common function The function tegra194_get_speed_common() uses hardware timers to calculate the current CPUFREQ and so rename this function to be tegra194_calculate_speed() to reflect what it does. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-07 13:02:44 +05:30
Jon Hunter	93549516d4	cpufreq: tegra194: Remove unnecessary frequency calculation The Tegra194 CPUFREQ driver sets the CPUFREQ_NEED_INITIAL_FREQ_CHECK flag which means that the CPUFREQ framework will call the 'get' callback on boot to determine the current frequency of the CPUs. Therefore, it is not necessary for the Tegra194 CPUFREQ driver to internally call the tegra194_get_speed_common() during initialisation to query the current frequency as well. Fix this by removing the call to the tegra194_get_speed_common() during initialisation and simplify the code. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-07 13:02:44 +05:30
Jon Hunter	cfef4bcacc	cpufreq: tegra186: Simplify cluster information lookup The CPUFREQ driver framework references each individual CPUs when getting and setting the speed. Tegra186 has 3 clusters of A57 CPUs and 1 cluster of Denver CPUs. Hence, the Tegra186 CPUFREQ driver need to know which cluster a given CPU belongs to. The logic in the Tegra186 driver can be greatly simplified by storing the cluster ID associated with each CPU in the tegra186_cpufreq_cpu structure. This allow us to completely remove the Tegra cluster info structure from the driver and simplifiy the code. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-07 13:02:44 +05:30
Jon Hunter	b7b4e78552	cpufreq: tegra186: Fix sparse 'incorrect type in assignment' warning Sparse warns that the incorrect type is being assigned to the CPUFREQ driver_data variable in the Tegra186 CPUFREQ driver. The Tegra186 CPUFREQ driver is assigned a type of 'void __iomem ' to a pointer of type 'void ' ... drivers/cpufreq/tegra186-cpufreq.c:72:37: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected void driver_data @@ got void [noderef] __iomem @@ ... drivers/cpufreq/tegra186-cpufreq.c:87:40: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected void [noderef] __iomem edvd_reg @@ got void driver_data @@ The Tegra186 CPUFREQ driver is using the policy->driver_data variable to store and iomem pointer to a Tegra186 CPU register that is used to set the clock speed for the CPU. This is not necessary because the register base address is already stored in the driver data and the offset of the register for each CPU is static. Therefore, fix this by adding a new structure with the register offsets for each CPU and store this in the main driver data structure along with the register base address. Please note that a new structure has been added for storing the register offsets rather than a simple array, because this will permit further clean-ups and simplification of the driver. Signed-off-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-07 13:02:44 +05:30
Arnd Bergmann	fc928b901d	cpufreq: imx: fix NVMEM_IMX_OCOTP dependency A driver should not 'select' drivers from another subsystem. If NVMEM is disabled, this one results in a warning: WARNING: unmet direct dependencies detected for NVMEM_IMX_OCOTP Depends on [n]: NVMEM [=n] && (ARCH_MXC [=y] \|\| COMPILE_TEST [=y]) && HAS_IOMEM [=y] Selected by [y]: - ARM_IMX6Q_CPUFREQ [=y] && CPU_FREQ [=y] && (ARM \|\| ARM64 [=y]) && ARCH_MXC [=y] && REGULATOR_ANATOP [=y] Change the 'select' to 'depends on' to prevent it from going wrong, and allow compile-testing without that driver, since it is only a runtime dependency. Fixes: `2782ef34ed` ("cpufreq: imx: Select NVMEM_IMX_OCOTP") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-07 13:02:38 +05:30
Pali Rohár	d15183991c	cpufreq: vexpress-spc: Add missing MODULE_ALIAS This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: `47ac9aa165` ("cpufreq: arm_big_little: add vexpress SPC interface driver") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-07 13:02:38 +05:30
Pali Rohár	c0382d049d	cpufreq: scpi: Add missing MODULE_ALIAS This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq driver when it is compiled as an external module. Signed-off-by: Pali Rohár <pali@kernel.org> Fixes: `8def31034d` ("cpufreq: arm_big_little: add SCPI interface driver") Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>	2020-12-07 13:02:37 +05:30

1 2 3 4 5 ...

2979 Коммитов