Add trace event to monitor the performance value changes which is
controlled by cpu governors.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In some of Zen2 and Zen3 based processors, they are using the shared
memory that exposed from ACPI SBIOS. In this kind of the processors,
there is no MSR support, so we add acpi cppc function as the backend for
them.
It is using a module param (shared_mem) to enable related processors
manually. We will enable this by default once we address performance
issue on this solution.
Signed-off-by: Jinzhou Su <Jinzhou.Su@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Introduce the fast switch function for AMD P-State on the AMD processors
which support the full MSR register control. It's able to decrease the
latency on interrupt context.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
AMD P-State is the AMD CPU performance scaling driver that introduces a
new CPU frequency control mechanism on AMD Zen based CPU series in Linux
kernel. The new mechanism is based on Collaborative processor
performance control (CPPC) which is finer grain frequency management
than legacy ACPI hardware P-States. Current AMD CPU platforms are using
the ACPI P-states driver to manage CPU frequency and clocks with
switching only in 3 P-states. AMD P-State is to replace the ACPI
P-states controls, allows a flexible, low-latency interface for the
Linux kernel to directly communicate the performance hints to hardware.
AMD P-State leverages the Linux kernel governors such as *schedutil*,
*ondemand*, etc. to manage the performance hints which are provided by CPPC
hardware functionality. The first version for AMD P-State is to support one
of the Zen3 processors, and we will support more in future after we verify
the hardware and SBIOS functionalities.
There are two types of hardware implementations for AMD P-State: one is full
MSR support and another is shared memory support. It can use
X86_FEATURE_CPPC feature flag to distinguish the different types.
Using the new AMD P-State method + kernel governors (*schedutil*,
*ondemand*, ...) to manage the frequency update is the most appropriate
bridge between AMD Zen based hardware processor and Linux kernel, the
processor is able to adjust to the most efficiency frequency according to
the kernel scheduler loading.
Please check the detailed CPU feature and MSR register description in
Processor Programming Reference (PPR) for AMD Family 19h Model 51h,
Revision A1 Processors:
https://www.amd.com/system/files/TechDocs/56569-A1-PUB.zip
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull ARM cpufreq updates for 5.17-rc1 from Viresh Kumar:
"- Qcom cpufreq driver updates improve irq support (Ard Biesheuvel, Stephen Boyd,
and Vladimir Zapolskiy).
- Fixes double devm_remap for mediatek driver (Hector Yuan).
- Introduces thermal pressure helpers (Lukasz Luba)."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: mediatek-hw: Fix double devm_remap in hotplug case
cpufreq: qcom-hw: Use optional irq API
cpufreq: qcom-hw: Set CPU affinity of dcvsh interrupts
cpufreq: qcom-hw: Fix probable nested interrupt handling
cpufreq: qcom-cpufreq-hw: Avoid stack buffer for IRQ name
arch_topology: Remove unused topology_set_thermal_pressure() and related
cpufreq: qcom-cpufreq-hw: Use new thermal pressure update function
cpufreq: qcom-cpufreq-hw: Update offline CPUs per-cpu thermal pressure
thermal: cpufreq_cooling: Use new thermal pressure update function
arch_topology: Introduce thermal pressure update function
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the cpufreq code to use default_groups field which has been
the preferred way since aa30f47cf6 ("kobject: Add support for default
attribute groups to kobj_type") so that we can soon get rid of the
obsolete default_attrs field.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When hotpluging policy cpu, cpu policy init will be called multiple times.
Unplug CPU7 -> CPU6 -> CPU5 -> CPU4, then plug CPU4 again.
In this case, devm_remap will double remap and resource allocate fail.
So replace devm_remap to ioremap and release resources in cpu policy exit.
Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
With HWP enabled, when the turbo range of performance levels is
disabled by the platform firmware, the CPU capacity is given by
the "guaranteed performance" field in MSR_HWP_CAPABILITIES which
is generally dynamic. When it changes, the kernel receives an HWP
notification interrupt handled by notify_hwp_interrupt().
When the "guaranteed performance" value changes in the above
configuration, the CPU performance scaling needs to be adjusted so
as to use the new CPU capacity in computations, which means that
the cpuinfo.max_freq value needs to be updated for that CPU.
Accordingly, modify intel_pstate_notify_work() to read
MSR_HWP_CAPABILITIES and update cpuinfo.max_freq to reflect the
new configuration (this update can be carried out even if the
configuration doesn't actually change, because it simply doesn't
matter then and it takes less time to update it than to do extra
checks to decide whether or not a change has really occurred).
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The min and max frequency QoS requests in the cpufreq core are
initialized to whatever the current min and max frequency values are
at the init time, but if any of these values change later (for
example, cpuinfo.max_freq is updated by the driver), these initial
request values will be limiting the CPU frequency unnecessarily
unless they are changed by user space via sysfs.
To address this, initialize min_freq_req and max_freq_req to
FREQ_QOS_MIN_DEFAULT_VALUE and FREQ_QOS_MAX_DEFAULT_VALUE,
respectively, so they don't really limit anything until user
space updates them.
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is an expectation from users that they can get frequency specified
by cpufreq/cpuinfo_max_freq when conditions permit. But with AlderLake
mobile it may not be possible. This is possible that frequency is clipped
based on the system power-up EPP value. In this case users can update
cpufreq/energy_performance_preference to some performance oriented EPP to
limit clipping of frequencies.
To get out of box behavior as the prior generations of CPUs, update EPP
for AlderLake mobile CPUs on boot. On prior generations of CPUs EPP = 128
was enough to get maximum frequency, but with AlderLake mobile the
equivalent EPP is 102. Since EPP is model specific, this is possible that
they have different meaning on each generation of CPU.
The current EPP string "balance_performance" corresponds to EPP = 128.
Change the EPP corresponding to "balance_performance" to 102 for only
AlderLake mobile CPUs and update this on each CPU during boot.
To implement reuse epp_values[] array and update the modified EPP at the
index for BALANCE_PERFORMANCE. Add a dummy EPP_INDEX_DEFAULT to
epp_values[] to match indexes in the energy_perf_strings[].
After HWP PM is enabled also update EPP when "balance_performance" is
redefined for the very first time after the boot on each CPU. On
subsequent suspend/resume or offline/online the old EPP is restored,
so no specific action is needed.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is not necessary to call intel_pstate_get_hwp_cap() from
intel_pstate_update_perf_limits(), because it gets called from
intel_pstate_verify_cpu_policy() which is either invoked directly
right before intel_pstate_update_perf_limits(), in
intel_cpufreq_verify_policy() in the passive mode, or called
from driver callbacks in a sequence that causes it to be followed
by an immediate intel_pstate_update_perf_limits().
Namely, in the active mode intel_cpufreq_verify_policy() is called
by intel_pstate_verify_policy() which is the ->verify() callback
routine of intel_pstate and gets called by the cpufreq core right
before intel_pstate_set_policy(), which is the driver's ->setoplicy()
callback routine, where intel_pstate_update_perf_limits() is called.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use platform_get_irq_optional() to avoid a noisy error message when the
irq isn't specified. The irq is definitely optional given that we only
care about errors that are -EPROBE_DEFER here.
Cc: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Make the comment in blocking_notifier_call_chain() easier to
understand.
Signed-off-by: Tang Yizhou <tangyizhou@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When I hot added a CPU, I found 'cpufreq' directory was not created
below /sys/devices/system/cpu/cpuX/.
It is because get_cpu_device() failed in add_cpu_dev_symlink().
cpufreq_add_dev() is the .add_dev callback of a CPU subsys interface.
It will be called when the CPU device registered into the system.
The call chain is as follows:
register_cpu()
->device_register()
->device_add()
->bus_probe_device()
->cpufreq_add_dev()
But only after the CPU device has been registered, we can get the
CPU device by get_cpu_device(), otherwise it will return NULL.
Since we already have the CPU device in cpufreq_add_dev(), pass
it to add_cpu_dev_symlink().
I noticed that the 'kobj' of the CPU device has been added into
the system before cpufreq_add_dev().
Fixes: 2f0ba790df ("cpufreq: Fix creation of symbolic links to policy directories")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In runtime CPU cluster specific dcvsh interrupts may be handled on
unrelated CPU cores, it leads to an issue of too excessive number of
received and handled interrupts, but this is not observed, if CPU
affinity of the interrupt handler is set in accordance to CPU clusters.
The change reduces a number of received interrupts in about 10-100 times.
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Re-enabling an interrupt from its own interrupt handler may cause
an interrupt storm, if there is a pending interrupt and because its
handling is disabled due to already done entrance into the handler
above in the stack.
Also, apparently it is improper to lock a mutex in an interrupt contex.
Fixes: 275157b367 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Registering an IRQ requires the string buffer containing the name to
remain allocated, as the name is not copied into another buffer.
So let's add a irq_name field to the data struct instead, which is
guaranteed to have the appropriate lifetime.
Cc: Thara Gopinath <thara.gopinath@linaro.org>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Cc: Andy Gross <agross@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Signed-off-by: Vladimir Zapolskiy <vladimir.zapolskiy@linaro.org>
Reviewed-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
On systems with overclocking enabled, CPPC Highest Performance can be
hard coded to 0xff. In this case even if we have cores with different
highest performance, ITMT can't be enabled as the current implementation
depends on CPPC Highest Performance.
On such systems we can use MSR_HWP_CAPABILITIES maximum performance field
when CPPC.Highest Performance is 0xff.
Due to legacy reasons, we can't solely depend on MSR_HWP_CAPABILITIES as
in some older systems CPPC Highest Performance is the only way to identify
different performing cores.
Reported-by: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Michael Larabel <Michael@MichaelLarabel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After commit 4adcf2e582 ("cpufreq: intel_pstate: Add ->offline and
->online callbacks") the EPP value set by the "performance" scaling
algorithm in the active mode is not restored after an offline/online
cycle which replaces it with the saved EPP value coming from user
space.
Address this issue by forcing intel_pstate_hwp_set() to set a new
EPP value when it runs first time after online.
Fixes: 4adcf2e582 ("cpufreq: intel_pstate: Add ->offline and ->online callbacks")
Link: https://lore.kernel.org/linux-pm/adc7132c8655bd4d1c8b6129578e931a14fe1db2.camel@linux.intel.com/
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit fbdc21e9b0 ("cpufreq: intel_pstate: Add Icelake servers
support in no-HWP mode") enabled the use of Intel P-State driver
for Ice Lake servers.
But it doesn't cover the case when OS can't control P-States.
Therefore, for Ice Lake server, if MSR_MISC_PWR_MGMT bits 8 or 18
are enabled, then the Intel P-State driver should exit as OS can't
control P-States.
Fixes: fbdc21e9b0 ("cpufreq: intel_pstate: Add Icelake servers support in no-HWP mode")
Signed-off-by: Adamos Ttofari <attofari@amazon.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Thermal pressure provides a new API, which allows to use CPU frequency
as an argument. That removes the need of local conversion to capacity.
Use this new API and remove old local conversion code.
The new arch_update_thermal_pressure() also accepts boost frequencies,
which solves issue in the driver code with wrong reduced capacity
calculation. The reduced capacity was calculated wrongly due to
'policy->cpuinfo.max_freq' used as a divider. The value present there was
actually the boost frequency. Thus, even a normal maximum frequency value
which corresponds to max CPU capacity (arch_scale_cpu_capacity(cpu_id))
is not able to remove the capping.
The second side effect which is solved is that the reduced frequency wasn't
properly translated into the right reduced capacity,
e.g.
boost frequency = 3000MHz (stored in policy->cpuinfo.max_freq)
max normal frequency = 2500MHz (which is 1024 capacity)
2nd highest frequency = 2000MHz (which translates to 819 capacity)
Then in a scenario when the 'throttled_freq' max allowed frequency was
2000MHz the driver translated it into 682 capacity:
capacity = 1024 * 2000 / 3000 = 682
Then set the pressure value bigger than actually applied by the HW:
max_capacity - capacity => 1024 - 682 = 342 (<- thermal pressure)
Which was causing higher throttling and misleading task scheduler
about available CPU capacity.
A proper calculation in such case should be:
capacity = 1024 * 2000 / 2500 = 819
1024 - 819 = 205 (<- thermal pressure)
This patch relies on the new arch_update_thermal_pressure() handling
correctly such use case (with boost frequencies).
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The thermal pressure signal gives information to the scheduler about
reduced CPU capacity due to thermal. It is based on a value stored in
a per-cpu 'thermal_pressure' variable. The online CPUs will get the
new value there, while the offline won't. Unfortunately, when the CPU
is back online, the value read from per-cpu variable might be wrong
(stale data). This might affect the scheduler decisions, since it
sees the CPU capacity differently than what is actually available.
Fix it by making sure that all online+offline CPUs would get the
proper value in their per-cpu variable when there is throttling
or throttling is removed.
Fixes: 275157b367 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
It is possible that some performance excursions happened before OS boot
or enable HWP interrupts. So clear MSR_HWP_STATUS bits when we enable
HWP interrupt. In this way a next excursion will results in a HWP
interrupt.
The status bits of MSR_HWP_STATUS must be cleared (0) by software so
that a new status condition change will cause the hardware to set the
bit again and issue the notification.
Fixes: 57577c996d ("cpufreq: intel_pstate: Process HWP Guaranteed change notification")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is possible that on some platforms HWP interrupts are disabled. In
that case accessing MSR 0x773 will result in warning.
So check X86_FEATURE_HWP_NOTIFY feature to access MSR 0x773. The other
places in code where this MSR is accessed, already checks this feature
except during disable path called during cpufreq offline and suspend
callbacks.
Fixes: 57577c996d ("cpufreq: intel_pstate: Process HWP Guaranteed change notification")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit a365ab6b9d ("cpufreq: intel_pstate: Implement the
->adjust_perf() callback") caused intel_pstate to use nonzero HWP
desired values in certain usage scenarios, but it did not prevent
them from being leaked into the confugirations in which HWP desired
is expected to be 0.
The failing scenarios are switching the driver from the passive
mode to the active mode and starting a new kernel via kexec() while
intel_pstate is running in the passive mode.
To address this issue, ensure that HWP desired will be cleared on
offline and suspend/shutdown.
Fixes: a365ab6b9d ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Tested-by: Julia Lawall <julia.lawall@inria.fr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Merge Energy Model and power capping updates for 5.16-rc1:
- Add support for inefficient operating performance points to the
Energy Model and modify cpufreq to use them properly (Vincent
Donnefort).
- Rearrange the DTPM framework code to simplify it and make it easier
to follow (Daniel Lezcano).
- Fix power intialization in DTPM (Daniel Lezcano).
- Add CPU load consideration when estimating the instaneous power
consumption in DTPM (Daniel Lezcano).
* pm-em:
cpufreq: mediatek-hw: Fix cpufreq_table_find_index_dl() call
PM: EM: Mark inefficiencies in CPUFreq
cpufreq: Use CPUFREQ_RELATION_E in DVFS governors
cpufreq: Introducing CPUFREQ_RELATION_E
cpufreq: Add an interface to mark inefficient frequencies
cpufreq: Make policy min/max hard requirements
PM: EM: Allow skipping inefficient states
PM: EM: Extend em_perf_domain with a flag field
PM: EM: Mark inefficient states
PM: EM: Fix inefficient states detection
* powercap:
powercap/drivers/dtpm: Fix power limit initialization
powercap/drivers/dtpm: Scale the power with the load
powercap/drivers/dtpm: Use container_of instead of a private data field
powercap/drivers/dtpm: Simplify the dtpm table
powercap/drivers/dtpm: Encapsulate even more the code
Pull ARM cpufreq updates for 5.16-rc1 from Viresh Kumar:
"- Fix tegra driver to handle BPMP errors properly (Mikko Perttunen).
- Fix the parameter usage of the newly added perf-domain API (Hector
Yuan).
- Minor cleanups to cppc, vexpress and s3c244x drivers (Han Wang,
Guenter Roeck, and Arnd Bergmann)."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: Fix parameter in parse_perf_domain()
cpufreq: tegra186/tegra194: Handle errors in BPMP response
cpufreq: remove useless INIT_LIST_HEAD()
cpufreq: s3c244x: add fallthrough comments for switch
cpufreq: vexpress: Drop unused variable
Fix a problem in active mode that cpu->pstate.turbo_freq is initialized
only if HWP-to-frequency scaling factor is refined.
In passive mode, this problem is not exposed, because
cpu->pstate.turbo_freq is set again, later in
intel_cpufreq_cpu_init()->intel_pstate_get_hwp_cap().
Fixes: eb3693f052 ("cpufreq: intel_pstate: hybrid: CPU-specific scaling factor")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The new cpufreq table flag RELATION_E introduced a new "efficient"
parameter for the cpufreq_table_find*() functions.
Fixes: 1f39fa0dcc (cpufreq: Introducing CPUFREQ_RELATION_E)
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Let the governors schedutil, conservative and ondemand to work, if possible
on efficient frequencies only.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This newly introduced flag can be applied by a governor to a CPUFreq
relation, when looking for a frequency within the policy table. The
resolution would then only walk through efficient frequencies.
Even with the flag set, the policy max limit will still be honoured. If no
efficient frequencies can be found within the limits of the policy, an
inefficient one would be returned.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When applying the policy min/max limits, the requested frequency is
simply clamped to not be out of range. It means, however, if one of the
boundaries isn't an available frequency, the frequency resolution can
return a value out of those limits, depending on the relation used.
e.g. freq{0,1,2} being available frequencies.
freq0 policy->min freq1 policy->max freq2
| | | | |
17kHz 18kHz 19kHz 20kHz 21kHz
__resolve_freq(21kHz, CPUFREQ_RELATION_L) -> 21kHz (out of bounds)
__resolve_freq(17kHz, CPUFREQ_RELATION_H) -> 17kHz (out of bounds)
If, during the policy init, we resolve the requested min/max to existing
frequencies, we ensure that any CPUFREQ_RELATION_* would resolve to a
frequency which is inside the policy min/max range.
Making the policy limits rigid helps to introduce the inefficient
frequencies support. Resolving an inefficient frequency to an efficient
one should not transgress policy->max (which can be set for thermal
reason) and having a value we can trust simplify this comparison.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is possible that HWP guaranteed ratio is changed in response to
change in power and thermal limits. For example when Intel Speed Select
performance profile is changed or there is change in TDP, hardware can
send notifications. It is possible that the guaranteed ratio is
increased. This creates an issue when turbo is disabled, as the old
limits set in MSR_HWP_REQUEST are still lower and hardware will clip
to older limits.
This change enables HWP interrupt and process HWP interrupts. When
guaranteed is changed, calls cpufreq_update_policy() so that driver
callbacks are called to update to new HWP limits. This callback
is called from a delayed workqueue of 10ms to avoid frequent updates.
Although the scope of IA32_HWP_INTERRUPT is per logical cpu, on some
plaforms interrupt is generated on all CPUs. This is particularly a
problem during initialization, when the driver didn't allocated
data for other CPUs. So this change uses a cpumask of enabled CPUs and
process interrupts on those CPUs only.
When the cpufreq offline() or suspend() callback is called, HWP interrupt
is disabled on those CPUs and also cancels any pending work item.
Spin lock is used to protect data and processing shared with interrupt
handler. Here READ_ONCE(), WRITE_ONCE() macros are used to designate
shared data, even though spin lock act as an optimization barrier here.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: pablomh@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The return value from tegra_bpmp_transfer indicates the success or
failure of the IPC transaction with BPMP. If the transaction
succeeded, we also need to check the actual command's result code.
Add code to do this.
While at it, explicitly handle missing CPU clusters, which can
occur on floorswept chips. This worked before as well, but
possibly only by accident.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
list cpu_data_list has been inited staticly through LIST_HEAD,
so there's no need to call another INIT_LIST_HEAD. Simply remove
it from cppc_cpufreq_init.
Signed-off-by: Han Wang <zjuwanghan@outlook.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Apparently nobody has so far caught this warning, I hit it in randconfig
build testing:
drivers/cpufreq/s3c2440-cpufreq.c: In function 's3c2440_cpufreq_setdivs':
drivers/cpufreq/s3c2440-cpufreq.c:175:10: error: this statement may fall through [-Werror=implicit-fallthrough=]
camdiv |= S3C2440_CAMDIVN_HCLK3_HALF;
^
drivers/cpufreq/s3c2440-cpufreq.c:176:2: note: here
case 3:
^~~~
drivers/cpufreq/s3c2440-cpufreq.c:181:10: error: this statement may fall through [-Werror=implicit-fallthrough=]
camdiv |= S3C2440_CAMDIVN_HCLK4_HALF;
^
drivers/cpufreq/s3c2440-cpufreq.c:182:2: note: here
case 4:
^~~~
Both look like the fallthrough is intentional, so add the new
"fallthrough;" keyword.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
- Prevent intel_pstate from avoiding to use HWP, even if instructed
to do so via the kernel command line, when HWP has been enabled
already by the platform firmware (Doug Smythies).
- Prevent use-after-free from occurring in the schedutil cpufreq
governor on exit by fixing a core helper function that attempts
to access memory associated with a kobject after calling
kobject_put() on it (James Morse).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmFE2wsSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx050P/0tsaaPhfNuguwORa/rV6UCWOmFtIuSA
Td9tmkwee7ff3WKDCOFqMU+peKC443C/jVGLy7lPs/v9vDfD5QaJTcfL4Ua1ifjP
gW+2Rf1omOdFQuB31JZNO9HV3/FRgo5KX0hUA+3qh+MsPGOF9BlWf96N1rDrwtyA
Vsgx66LcOmAJg67YX0MZhtNm/RMBV4aFVArU6GgEGaURoHH5OgVg4yuRgeKarZin
weyVPqY7rOEMCUq5gXl5qpMlLw6O8P89lsQJrqoIWeOh1zmz5CPrzcE5QT0pVWcd
6m704TYC5ajpqUc6XaMkgkQXSIwHbcPe3IDB5SvP1eQE/duhD+gw2osyHKqE5C3c
pVtS2z1PFnhByVnRbUP53znwaYyyp63vriyq7zGxs5SwNmgENEavMMu6/C0loTOp
fpfGvwZbVItxrRVHm+Y49pzSoSwy+aNQq5wTT0rVD670TqpZREKl/HcACihyZLQO
E23zvOH4OyAW5j8SdPmZMU9m/aJqjfVCduapf5iLe72pOhB28rVQ+DWupJ5395RP
5qoLKeGxPewy1qtc2PF41lztSfah9pDSrtxBv5ebVDu5uOLt71svaF1fdsixWBlo
+MHMS+njeJzq5C0iXbfLvUBGw7XCUauax6CkkNl5ooYaFZFp0Gb+YHEs6fiXA8Oh
ZlkT9hVE9ARj
=ytGW
-----END PGP SIGNATURE-----
Merge tag 'pm-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix two cpufreq issues, one in the intel_pstate driver and one
in the core.
Specifics:
- Prevent intel_pstate from avoiding to use HWP, even if instructed
to do so via the kernel command line, when HWP has been enabled
already by the platform firmware (Doug Smythies).
- Prevent use-after-free from occurring in the schedutil cpufreq
governor on exit by fixing a core helper function that attempts to
access memory associated with a kobject after calling kobject_put()
on it (James Morse)"
* tag 'pm-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: schedutil: Destroy mutex before kobject_put() frees the memory
cpufreq: intel_pstate: Override parameters if HWP forced by BIOS
Since commit e5c6b312ce ("cpufreq: schedutil: Use kobject release()
method to free sugov_tunables") kobject_put() has kfree()d the
attr_set before gov_attr_set_put() returns.
kobject_put() isn't the last user of attr_set in gov_attr_set_put(),
the subsequent mutex_destroy() triggers a use-after-free:
| BUG: KASAN: use-after-free in mutex_is_locked+0x20/0x60
| Read of size 8 at addr ffff000800ca4250 by task cpuhp/2/20
|
| CPU: 2 PID: 20 Comm: cpuhp/2 Not tainted 5.15.0-rc1 #12369
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development
| Platform, BIOS EDK II Jul 30 2018
| Call trace:
| dump_backtrace+0x0/0x380
| show_stack+0x1c/0x30
| dump_stack_lvl+0x8c/0xb8
| print_address_description.constprop.0+0x74/0x2b8
| kasan_report+0x1f4/0x210
| kasan_check_range+0xfc/0x1a4
| __kasan_check_read+0x38/0x60
| mutex_is_locked+0x20/0x60
| mutex_destroy+0x80/0x100
| gov_attr_set_put+0xfc/0x150
| sugov_exit+0x78/0x190
| cpufreq_offline.isra.0+0x2c0/0x660
| cpuhp_cpufreq_offline+0x14/0x24
| cpuhp_invoke_callback+0x430/0x6d0
| cpuhp_thread_fun+0x1b0/0x624
| smpboot_thread_fn+0x5e0/0xa6c
| kthread+0x3a0/0x450
| ret_from_fork+0x10/0x20
Swap the order of the calls.
Fixes: e5c6b312ce ("cpufreq: schedutil: Use kobject release() method to free sugov_tunables")
Cc: 4.7+ <stable@vger.kernel.org> # 4.7+
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If HWP has been already been enabled by BIOS, it may be
necessary to override some kernel command line parameters.
Once it has been enabled it requires a reset to be disabled.
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Add new cpufreq driver for the MediaTek MT6779 platform called
mediatek-hw along with corresponding DT bindings (Hector.Yuan).
- Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara
Gopinath).
- Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu
policy flag (Taniya Das).
- Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn
Andersson).
- Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV
flag (Viresh Kumar).
- Add new cpufreq driver callback to allow drivers to register
with the Energy Model in a consistent way and make several
drivers use it (Viresh Kumar).
- Change the remaining users of the .ready() cpufreq driver callback
to move the code from it elsewhere and drop it from the cpufreq
core (Viresh Kumar).
- Revert recent intel_pstate change adding HWP guaranteed performance
change notification support to it that led to problems, because
the notification in question is triggered prematurely on some
systems (Rafael Wysocki).
- Convert the OPP DT bindings to DT schema and clean them up while
at it (Rob Herring).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmE41VgSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxCN4P+gMjMrrZmuU6gZsbbvpDlaBhCd2Xq3TD
xR/DMDi7znkh3TUX3uwL+xnr+k0krIH0jBIeUQUE7NeNIoT6wgbjJ4Ty5rFq76qB
AODmmZ4vO7lmnupSyqUQbHfYohDmyICSKiStf8UOEj1o+jSWNmrgUYUv0tDtDUH+
Cn0vByah8gJAnoZX8Y8BM1jmRc3YoNHWpvtTQIhIBPkVZ//+NOKvDZvwUUPZFb+M
1PzMSfX7WsIDiUrUHpdvtZsoBniaMk0WS1EqVBRvEprqUXad1eHF19yuhtLxeUPH
8xh/7o8kYzjqVJvs7blTT8DztxRDScWHeGKSVdwoEJupbCwc5R3qfGaD6PWyhI1x
9R5Swsp64nLptTCwH7ZmgdJbC9IqN3cz1Nadd5v2Q2wr21KvZnj7zI2ijkPKGnZo
kqYQHghqnDkGPFVjdls/RKUXGCQIXZFQb+FeuyCvpVlz9Ol8+DnTYtgFjjc6VcU1
kApIqsE8V8GQEyzmm/OIAf6xtsA+mEUUh3Qds16KNCwBCRglC8I6v5IWnBc8PEJz
a+wtwjx+tyKSSlAMEvcDNtrWVN+3JNrwCFG+Q+QjMrwLiAgACrtJZ6e6PmLj9sZv
FPbZM8rzbzN7Zqd7XVNY37KHjqNs7zzDScAnATTUCKThp8ijDgtmG3ZhExmOPayc
74aQLham4TBO
=WSpr
-----END PGP SIGNATURE-----
Merge tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These are mostly ARM cpufreq driver updates, including one new
MediaTek driver that has just passed all of the reviews, with the
addition of a revert of a recent intel_pstate commit, some core
cpufreq changes and a DT-related update of the operating performance
points (OPP) support code.
Specifics:
- Add new cpufreq driver for the MediaTek MT6779 platform called
mediatek-hw along with corresponding DT bindings (Hector.Yuan).
- Add DCVS interrupt support to the qcom-cpufreq-hw driver (Thara
Gopinath).
- Make the qcom-cpufreq-hw driver set the dvfs_possible_from_any_cpu
policy flag (Taniya Das).
- Blocklist more Qualcomm platforms in cpufreq-dt-platdev (Bjorn
Andersson).
- Make the vexpress cpufreq driver set the CPUFREQ_IS_COOLING_DEV
flag (Viresh Kumar).
- Add new cpufreq driver callback to allow drivers to register with
the Energy Model in a consistent way and make several drivers use
it (Viresh Kumar).
- Change the remaining users of the .ready() cpufreq driver callback
to move the code from it elsewhere and drop it from the cpufreq
core (Viresh Kumar).
- Revert recent intel_pstate change adding HWP guaranteed performance
change notification support to it that led to problems, because the
notification in question is triggered prematurely on some systems
(Rafael Wysocki).
- Convert the OPP DT bindings to DT schema and clean them up while at
it (Rob Herring)"
* tag 'pm-5.15-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
Revert "cpufreq: intel_pstate: Process HWP Guaranteed change notification"
cpufreq: mediatek-hw: Add support for CPUFREQ HW
cpufreq: Add of_perf_domain_get_sharing_cpumask
dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
cpufreq: Remove ready() callback
cpufreq: sh: Remove sh_cpufreq_cpu_ready()
cpufreq: acpi: Remove acpi_cpufreq_cpu_ready()
cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag
cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
cpufreq: scmi: Use .register_em() to register with energy model
cpufreq: vexpress: Use .register_em() to register with energy model
cpufreq: scpi: Use .register_em() to register with energy model
dt-bindings: opp: Convert to DT schema
dt-bindings: Clean-up OPP binding node names in examples
ARM: dts: omap: Drop references to opp.txt
cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model
cpufreq: omap: Use .register_em() to register with energy model
cpufreq: mediatek: Use .register_em() to register with energy model
cpufreq: imx6q: Use .register_em() to register with energy model
...
The current HWP calibration for hybrid processors in intel_pstate is
fragile, because it depends too much on the information provided by
the platform firmware via CPPC which may not be reliable enough. It
also need not be so complicated.
In order to improve that mechanism and make it more resistant to
platform firmware issues, make it only use the CPPC nominal_perf
values to compute the HWP-to-frequency scaling factors for all
CPUs and possibly use the HWP_CAP highest_perf values to recompute
them if the ones derived from the CPPC nominal_perf values alone
appear to be too high.
Namely, fetch CPC.nominal_perf for all CPUs present in the system,
find the minimum one and use it as a reference for computing all of
the CPUs' scaling factors (using the observation that for the CPUs
having the minimum CPC.nominal_perf the HWP range of available
performance levels should be the same as the range of available
"legacy" P-states and so the HWP-to-frequency scaling factor for
them should be the same as the corresponding scaling factor used
for representing the P-state values in kHz).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Pull more ARM cpufreq changes for v5.15-rc1 from Viresh Kumar:
"This adds a new cpufreq driver for Mediatek, which had been going
through reviews since last one year."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: mediatek-hw: Add support for CPUFREQ HW
cpufreq: Add of_perf_domain_get_sharing_cpumask
dt-bindings: cpufreq: add bindings for MediaTek cpufreq HW
Revert commit d0e936adbd ("cpufreq: intel_pstate: Process HWP
Guaranteed change notification"), because it causes a NULL pointer
dereference to occur on Lenovo X1 gen9 laptops due to an HWP
guaranteed performance change interrupt arriving prematurely.
This feature will be revisited in the next cycle.
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Introduce cpufreq HW driver which can support
CPU frequency adjust in MT6779 platform.
Signed-off-by: Hector.Yuan <hector.yuan@mediatek.com>
[ Viresh: Massaged the patch and cleaned some stuff. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
- Convert pseries & powernv to use MSI IRQ domains.
- Rework the pseries CPU numbering so that CPUs that are removed, and later re-added, are
given a CPU number on the same node as previously, when possible.
- Add support for a new more flexible device-tree format for specifying NUMA distances.
- Convert powerpc to GENERIC_PTDUMP.
- Retire sbc8548 and sbc8641d board support.
- Various other small features and fixes.
Thanks to: Alexey Kardashevskiy, Aneesh Kumar K.V, Anton Blanchard, Cédric Le Goater,
Christophe Leroy, Emmanuel Gil Peyrot, Fabiano Rosas, Fangrui Song, Finn Thain, Gautham R.
Shenoy, Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Laurent Dufour, Leonardo
Bras, Lukas Bulwahn, Marc Zyngier, Masahiro Yamada, Michal Suchanek, Nathan Chancellor,
Nicholas Piggin, Parth Shah, Paul Gortmaker, Pratik R. Sampat, Randy Dunlap, Sebastian
Andrzej Siewior, Srikar Dronamraju, Wan Jiabing, Xiongwei Song, Zheng Yongjun.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmEyHTYTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDo3D/9aXMVP2wsEMNB0XhTiJ1UUdi311Uq9
PvkAaGZH14ZqZLVigeiD3gt6YzTH0cEuGj6qgwsJrPDjF8FESnMbBsprMLr5/qE1
itWRGMAMCFaeTcB9ogYVJkzwg6RN2ZgIqoq4NVswNSXoAQGWb+1bvXq3RnXXNuGR
TQmLL02poNC6nX0YbRaQoT1Xx4nfUTiKHhU+Aok9uOCMJIyYZVATR6Qafb7/j7tO
UvjwOHztbu84lcJOGmSnw4LcmwNORLuP9IwR0r+O1M3ijEZqDo9TPkvtSz8HZwjU
mxdJwhrUmN0euMcghuiFxW+1XG2eM49ugsdJugiezG2RaIijbIp0nAIvdeaKAgT1
OSSwvWCQ0fkTPyLXE+O6tVqMhlUMdqQlRcyNwmN9svIip9VnwGNq3vA4ePlJm6Fi
i0i/tLqVNlJwFokZ7blW5g8SRgGRuFfXd5XUYLFvy5Teez+/7b1mW95gPQZSJ8kV
Tbx2e0nHAPX4hCAxJ1AB3/zTlnjY+4+WJ9bD5XdgXkeVE8PPh1BEkulhMi1R1OMj
57D1W6OgsBu/Pze78wjAvwO8+NAb1T/2mv2Bd/LY6Q+7hNDqOOhuajyBTxbH41FG
sqx5bKjKOwgTybfV9A0Eo0e4FQBX07yXltBFHaPlyA4sOsIhM59+PxNrEwN1eZrQ
LVVsdBXg8pHxrw==
=EbN0
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Convert pseries & powernv to use MSI IRQ domains.
- Rework the pseries CPU numbering so that CPUs that are removed, and
later re-added, are given a CPU number on the same node as
previously, when possible.
- Add support for a new more flexible device-tree format for specifying
NUMA distances.
- Convert powerpc to GENERIC_PTDUMP.
- Retire sbc8548 and sbc8641d board support.
- Various other small features and fixes.
Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Anton Blanchard,
Cédric Le Goater, Christophe Leroy, Emmanuel Gil Peyrot, Fabiano Rosas,
Fangrui Song, Finn Thain, Gautham R. Shenoy, Hari Bathini, Joel
Stanley, Jordan Niethe, Kajol Jain, Laurent Dufour, Leonardo Bras, Lukas
Bulwahn, Marc Zyngier, Masahiro Yamada, Michal Suchanek, Nathan
Chancellor, Nicholas Piggin, Parth Shah, Paul Gortmaker, Pratik R.
Sampat, Randy Dunlap, Sebastian Andrzej Siewior, Srikar Dronamraju, Wan
Jiabing, Xiongwei Song, and Zheng Yongjun.
* tag 'powerpc-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (154 commits)
powerpc/bug: Cast to unsigned long before passing to inline asm
powerpc/ptdump: Fix generic ptdump for 64-bit
KVM: PPC: Fix clearing never mapped TCEs in realmode
powerpc/pseries/iommu: Rename "direct window" to "dma window"
powerpc/pseries/iommu: Make use of DDW for indirect mapping
powerpc/pseries/iommu: Find existing DDW with given property name
powerpc/pseries/iommu: Update remove_dma_window() to accept property name
powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper
powerpc/pseries/iommu: Add ddw_property_create() and refactor enable_ddw()
powerpc/pseries/iommu: Allow DDW windows starting at 0x00
powerpc/pseries/iommu: Add ddw_list_new_entry() helper
powerpc/pseries/iommu: Add iommu_pseries_alloc_table() helper
powerpc/kernel/iommu: Add new iommu_table_in_use() helper
powerpc/pseries/iommu: Replace hard-coded page shift
powerpc/numa: Update cpu_cpu_map on CPU online/offline
powerpc/numa: Print debug statements only when required
powerpc/numa: convert printk to pr_xxx
powerpc/numa: Drop dbg in favour of pr_debug
powerpc/smp: Enable CACHE domain for shared processor
powerpc/smp: Update cpu_core_map on all PowerPc systems
...
This isn't used anymore, get rid of it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The ->ready() callback is going away and since we don't do any important
stuff in sh_cpufreq_cpu_ready(), remove it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The ready() callback was implemented earlier for acpi-cpufreq driver as
we wanted to use policy->cpuinfo.max_freq for which the policy was
required to be verified.
That is no longer the case and we can do the pr_warn() right from
->init() callback now. Remove acpi_cpufreq_cpu_ready().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull ARM cpufreq driver changes for v5.15 from Viresh Kumar:
"This contains:
- Update cpufreq-dt blocklist with more platforms (Bjorn Andersson).
- Allow freq changes from any CPU for qcom-hw driver (Taniya Das).
- Add DSVS interrupt's support for qcom-hw driver (Thara Gopinath).
- A new callback (->register_em()) to register EM at a more convenient
point of time."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: qcom-hw: Set dvfs_possible_from_any_cpu cpufreq driver flag
cpufreq: blocklist more Qualcomm platforms in cpufreq-dt-platdev
cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support
cpufreq: scmi: Use .register_em() to register with energy model
cpufreq: vexpress: Use .register_em() to register with energy model
cpufreq: scpi: Use .register_em() to register with energy model
cpufreq: qcom-cpufreq-hw: Use .register_em() to register with energy model
cpufreq: omap: Use .register_em() to register with energy model
cpufreq: mediatek: Use .register_em() to register with energy model
cpufreq: imx6q: Use .register_em() to register with energy model
cpufreq: dt: Use .register_em() to register with energy model
cpufreq: Add callback to register with energy model
cpufreq: vexpress: Set CPUFREQ_IS_COOLING_DEV flag
As remote cpufreq updates are supported on QCOM platforms, set
dvfs_possible_from_any_cpu cpufreq driver flag.
Signed-off-by: Taniya Das <tdas@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The Qualcomm sa8155p, sm6350, sm8250 and sm8350 platforms also uses the
qcom-cpufreq-hw driver, so add them to the cpufreq-dt-platdev driver's
blocklist.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Add interrupt support to notify the kernel of h/w initiated frequency
throttling by LMh. Convey this to scheduler via thermal presssure
interface.
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
[Viresh: Added changes for arch_topology.c to fix build errors ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Set the newly added .register_em() callback to register with the EM
after the cpufreq policy is properly initialized.
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Set the newly added .register_em() callback with
cpufreq_register_em_with_opp() to register with the EM core.
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Set the newly added .register_em() callback with
cpufreq_register_em_with_opp() to register with the EM core.
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
It is possible that HWP guaranteed ratio is changed in response to
change in power and thermal limits. For example when Intel Speed Select
performance profile is changed or there is change in TDP, hardware can
send notifications. It is possible that the guaranteed ratio is
increased. This creates an issue when turbo is disabled, as the old
limits set in MSR_HWP_REQUEST are still lower and hardware will clip
to older limits.
This change enables HWP interrupt and process HWP interrupts. When
guaranteed is changed, calls cpufreq_update_policy() so that driver
callbacks are called to update to new HWP limits. This callback
is called from a delayed workqueue of 10ms to avoid frequent updates.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull ARM cpufreq fixes for v5.14 from Viresh Kumar:
"This contains:
- Addition of SoCs to blocklist for cpufreq-dt driver (Bjorn Andersson
and Thara Gopinath).
- Fix error path for scmi driver (Lukasz Luba).
- Temporarily disable highest frequency for armada, its unsafe and
breaks stuff."
* 'cpufreq/arm/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant
cpufreq: blocklist Qualcomm sm8150 in cpufreq-dt-platdev
cpufreq: arm_scmi: Fix error path when allocation failed
cpufreq: blacklist Qualcomm sc8180x in cpufreq-dt-platdev
In the numa=off kernel command-line configuration init_chip_info() loops
around the number of chips and attempts to copy the cpumask of that node
which is NULL for all iterations after the first chip.
Hence, store the cpu mask for each chip instead of derving cpumask from
node while populating the "chips" struct array and copy that to the
chips[i].mask
Fixes: 053819e0bf ("cpufreq: powernv: Handle throttling due to Pmax capping at chip level")
Cc: stable@vger.kernel.org # v4.3+
Reported-by: Shirisha Ganta <shirisha.ganta1@ibm.com>
Signed-off-by: Pratik R. Sampat <psampat@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
[mpe: Rename goto label to out_free_chip_cpu_mask]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210728120500.87549-2-psampat@linux.ibm.com
Set the newly added .register_em() callback with
cpufreq_register_em_with_opp() to register with the EM core.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Set the newly added .register_em() callback with
cpufreq_register_em_with_opp() to register with the EM core.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Set the newly added .register_em() callback with
cpufreq_register_em_with_opp() to register with the EM core.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Set the newly added .register_em() callback with
cpufreq_register_em_with_opp() to register with the EM core.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Set the newly added .register_em() callback with
cpufreq_register_em_with_opp() to register with the EM core.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Many cpufreq drivers register with the energy model for each policy and
do exactly the same thing. Follow the footsteps of thermal-cooling, to
get it done from the cpufreq core itself.
Provide a new callback, which will be called, if present, by the cpufreq
core at the right moment (more on that in the code's comment). Also
provide a generic implementation that uses dev_pm_opp_of_register_em().
This also allows us to register with the EM at a later point of time,
compared to ->init(), from where the EM core can access cpufreq policy
directly using cpufreq_cpu_get() type of helpers and perform other work,
like marking few frequencies inefficient, this will be done separately.
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reuse the cpufreq core's registration of cooling device by setting the
CPUFREQ_IS_COOLING_DEV flag. Set this only if bL switcher isn't enabled.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The 1.2 GHz variant of the Armada 3720 SOC is unstable with DVFS: when
the SOC boots, the WTMI firmware sets clocks and AVS values that work
correctly with 1.2 GHz CPU frequency, but random crashes occur once
cpufreq driver starts scaling.
We do not know currently what is the reason:
- it may be that the voltage value for L0 for 1.2 GHz variant provided
by the vendor in the OTP is simply incorrect when scaling is used,
- it may be that some delay is needed somewhere,
- it may be something else.
The most sane solution now seems to be to simply forbid the cpufreq
driver on 1.2 GHz variant.
Signed-off-by: Marek Behún <kabel@kernel.org>
Fixes: 92ce45fb87 ("cpufreq: Add DVFS support for Armada 37xx")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The Qualcomm sm8150 platform uses the qcom-cpufreq-hw driver, so
add it to the cpufreq-dt-platdev driver's blocklist.
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().
Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Stop the initialization when cpumask allocation failed and return an
error.
Fixes: 80a064dbd5 ("scmi-cpufreq: Get opp_shared_cpus from opp-v2 for EM")
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The Qualcomm SC8180x platform uses the qcom-cpufreq-hw driver, so
it in the cpufreq-dt-platdev driver's blocklist.
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
In preparation to enable -Wimplicit-fallthrough for Clang, fix a
fallthrough warning by simply dropping the empty default case at
the bottom.
Link: https://github.com/KSPP/linux/issues/115
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Pull ARM cpufreq updates for v5.14-rc1 from Viresh Kumar:
"- Add frequency invariance support for CPPC driver again and related
fixes/changes."
- Minor changes/cleanups for Meditak driver (Fabien Parent and Seiya
Wang), Qcom platform (Sibi Sankar), and SCMI driver (Christophe
JAILLET).
- New bindings for generic performance domains (Sudeep Holla).
- Rename black/white-lists (Viresh Kumar)."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: CPPC: Add support for frequency invariance
arch_topology: Avoid use-after-free for scale_freq_data
cpufreq: CPPC: Pass structure instance by reference
cpufreq: CPPC: Fix potential memleak in cppc_cpufreq_cpu_init
dt-bindings: cpufreq: update cpu type and clock name for MT8173 SoC
clk: mediatek: remove deprecated CLK_INFRA_CA57SEL for MT8173 SoC
cpufreq: dt: Rename black/white-lists
cpufreq: scmi: Fix an error message
cpufreq: mediatek: add support for mt8365
dt-bindings: dvfs: Add support for generic performance domains
cpufreq: blacklist SC7280 in cpufreq-dt-platdev
The Frequency Invariance Engine (FIE) is providing a frequency scaling
correction factor that helps achieve more accurate load-tracking.
Normally, this scaling factor can be obtained directly with the help of
the cpufreq drivers as they know the exact frequency the hardware is
running at. But that isn't the case for CPPC cpufreq driver.
Another way of obtaining that is using the arch specific counter
support, which is already present in kernel, but that hardware is
optional for platforms.
This patch updates the CPPC driver to register itself with the topology
core to provide its own implementation (cppc_scale_freq_tick()) of
topology_scale_freq_tick() which gets called by the scheduler on every
tick. Note that the arch specific counters have higher priority than
CPPC counters, if available, though the CPPC driver doesn't need to have
any special handling for that.
On an invocation of cppc_scale_freq_tick(), we schedule an irq work
(since we reach here from hard-irq context), which then schedules a
normal work item and cppc_scale_freq_workfn() updates the per_cpu
arch_freq_scale variable based on the counter updates since the last
tick.
To allow platforms to disable this CPPC counter-based frequency
invariance support, this is all done under CONFIG_ACPI_CPPC_CPUFREQ_FIE,
which is enabled by default.
This also exports sched_setattr_nocheck() as the CPPC driver can be
built as a module.
Cc: linux-acpi@vger.kernel.org
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
It's a classic example of memleak, we allocate something, we fail and
never free the resources.
Make sure we free all resources on policy ->init() failures.
Fixes: a28b2bfc09 ("cppc_cpufreq: replace per-cpu data array with a list")
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Commit e3c0623608 ("cpufreq: add cpufreq_driver_resolve_freq()")
introduced this callback, back in 2016, for drivers that provide the
->target() callback.
The kernel hasn't seen a single user of it in the past 5 years and
it is not likely to be used any time soon.
Remove it for now.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
__cpufreq_driver_target() open codes cpufreq_driver_resolve_freq(), lets
make the former reuse the later.
Separate out __resolve_freq() to accept relation as well as an argument
and use it at both the locations.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Now that all users of ->stop_cpu() have been migrated to using other
callbacks, drop it from the core.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Minor edits in the subject and changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 367dc4aa93 ("cpufreq: Add stop CPU callback to cpufreq_driver
interface") added the ->stop_cpu() callback to allow the drivers to do
clean up before the CPU is completely down and its state can't be
modified.
At that time the CPU hotplug framework used to call the cpufreq core's
registered notifier for different events like CPU_DOWN_PREPARE and
CPU_POST_DEAD. The ->stop_cpu() callback was called during the
CPU_DOWN_PREPARE event.
This is no longer the case, cpuhp_cpufreq_offline() is called only
once by the CPU hotplug core now and we don't really need two
separate callbacks for cpufreq drivers, i.e. ->stop_cpu() and
->exit(), as everything can be done from the ->exit() callback
itself.
Migrate to using the ->exit() callback instead of ->stop_cpu().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 367dc4aa93 ("cpufreq: Add stop CPU callback to cpufreq_driver
interface") added the ->stop_cpu() callback to allow the drivers to do
clean up before the CPU is completely down and its state can't be
modified.
At that time the CPU hotplug framework used to call the cpufreq core's
registered notifier for different events like CPU_DOWN_PREPARE and
CPU_POST_DEAD. The ->stop_cpu() callback was called during the
CPU_DOWN_PREPARE event.
This is no longer the case, cpuhp_cpufreq_offline() is called only
once by the CPU hotplug core now and we don't really need two
separate callbacks for cpufreq drivers, i.e. ->stop_cpu() and
-<exit(), as everything can be done from the ->exit() callback
itself.
Migrate to using the ->exit() callback instead of ->stop_cpu().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Minor edits in the changelog and subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Combine the ->stop_cpu() and ->offline() callback routines for
intel_pstate in the active mode so as to avoid setting the
->stop_cpu callback pointer which is going to be dropped from
the framework.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
In the CPU removal path the ->offline() callback provided by the
driver is always invoked before ->exit(), but in the cpufreq_online()
error path it is not, so ->exit() is expected to somehow know the
context in which it has been called and act accordingly.
That is less than straightforward, so make cpufreq_online() invoke
the driver's ->offline() callback, if present, on errors before
->exit() too.
This only potentially affects intel_pstate.
Fixes: 91a12e91dc ("cpufreq: Allow light-weight tear down and bring up of CPUs")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
This reverts commit 4c38f2df71.
There are few races in the frequency invariance support for CPPC driver,
namely the driver doesn't stop the kthread_work and irq_work on policy
exit during suspend/resume or CPU hotplug.
A proper fix won't be possible for the 5.13-rc, as it requires a lot of
changes. Lets revert the patch instead for now.
Fixes: 4c38f2df71 ("cpufreq: CPPC: Add support for frequency invariance")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since commit 759f534e93ac(CPUFREQ: Loongson2: drop set_cpus_allowed_ptr()),
the header <linux/sched.h> is useless in oongson2_cpufreq.c, so remove it.
Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since commit '205dcc1ecbc5(cpufreq/sh: Replace racy task affinity logic)'
the header <linux/sched.h> is useless in sh-cpufreq.c, so remove it.
Signed-off-by: Hailong Liu <liu.hailong6@zte.com.cn>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Local variable 'count' will be initialized and 'ret' is also not
required, so remove the redundant initialization and get rid of
'ret'.
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
One of the previous commits introducing hybrid processor support to
intel_pstate broke build with CONFIG_ACPI unset.
Fix that and while at it make empty stubs of two functions related
to ACPI CPPC static inline and fix a spelling mistake in the name of
one of them.
Fixes: eb3693f052 ("cpufreq: intel_pstate: hybrid: CPU-specific scaling factor")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
Rename them in accordance with the coding guidelines.
Reviewed-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Quieten an implicit-fallthrough warning in sc520_freq.c:
../drivers/cpufreq/sc520_freq.c: In function 'sc520_freq_get_cpu_frequency':
../include/linux/printk.h:343:2: warning: this statement may fall through [-Wimplicit-fallthrough=]
printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
../drivers/cpufreq/sc520_freq.c:43:3: note: in expansion of macro 'pr_err'
pr_err("error: cpuctl register has unexpected value %02x\n",
../drivers/cpufreq/sc520_freq.c:45:2: note: here
case 0x01:
Fixes: bf6fc9fd2d ("[CPUFREQ] AMD Elan SC520 cpufreq driver.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Users may disable HWP in firmware, in which case intel_pstate wouldn't load
unless the CPU model is explicitly supported.
See also commit d8de7a44e1 ("cpufreq: intel_pstate: Add Skylake servers
support").
Suggested-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Users may disable HWP in firmware, in which case intel_pstate wouldn't load
unless the CPU model is explicitly supported.
Add ICELAKE_X to the list of CPUs that can register intel_pstate while not
advertising the HWP capability. Without this change, an ICELAKE_X in no-HWP
mode could only use the acpi_cpufreq frequency scaling driver.
See also commit d8de7a44e1 ("cpufreq: intel_pstate: Add Skylake servers
support").
Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The scaling factor between HWP performance levels and CPU frequency
may be different for different types of CPUs in a hybrid processor
and in general the HWP performance levels need not correspond to
"P-states" representing values that would be written to
MSR_IA32_PERF_CTL if HWP was disabled.
However, the policy limits control in cpufreq is defined in terms
of CPU frequency, so it is necessary to map the frequency limits set
through that interface to HWP performance levels with reasonable
accuracy and the behavior of that interface on hybrid processors
has to be compatible with its behavior on non-hybrid ones.
To address this problem, use the observations that (1) on hybrid
processors the sysfs interface can operate by mapping frequency
to "P-states" and translating those "P-states" to specific HWP
performance levels of the given CPU and (2) the scaling factor
between the MSR_IA32_PERF_CTL "P-states" and CPU frequency can be
regarded as a known value. Moreover, the mapping between the
HWP performance levels and CPU frequency can be assumed to be
linear and such that HWP performance level 0 correspond to the
frequency value of 0, so it is only necessary to know the
frequency corresponding to one specific HWP performance level
to compute the scaling factor applicable to all of them.
One possibility is to take the nominal performance value from CPPC,
if available, and use cpu_khz as the corresponding frequency. If
the CPPC capabilities interface is not there or the nominal
performance value provided by it is out of range, though, something
else needs to be done.
Namely, the guaranteed performance level either from CPPC or from
MSR_HWP_CAPABILITIES can be used instead, but the corresponding
frequency needs to be determined. That can be done by computing the
product of the (known) scaling factor between the MSR_IA32_PERF_CTL
P-states and CPU frequency (the PERF_CTL scaling factor) and the
P-state value referred to as the "TDP ratio".
If the HWP-to-frequency scaling factor value obtained in one of the
ways above turns out to be euqal to the PERF_CTL scaling factor, it
can be assumed that the number of HWP performance levels is equal to
the number of P-states and the given CPU can be handled as though
this was not a hybrid processor.
Otherwise, one more adjustment may still need to be made, because the
HWP-to-frequency scaling factor computed so far may not be accurate
enough (e.g. because the CPPC information does not match the exact
behavior of the processor). Specifically, in that case the frequency
corresponding to the highest HWP performance value from
MSR_HWP_CAPABILITIES (computed as the product of that value and the
HWP-to-frequency scaling factor) cannot exceed the frequency that
corresponds to the maximum 1-core turbo P-state value from
MSR_TURBO_RATIO_LIMIT (computed as the procuct of that value and the
PERF_CTL scaling factor) and the HWP-to-frequency scaling factor may
need to be adjusted accordingly.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The turbo_pct and num_pstates sysfs attributes represent CPU
properties that may be different for differenty types of CPUs in
a hybrid processor, so avoid exposing them in that case.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
'ret' is known to be 0 here.
The last error code is stored in 'nr_opp', so use it in the error message.
Fixes: 71a37cd6a5 ("scmi-cpufreq: Remove deferred probe")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Add compatible stirng for MediaTek MT8365 SoC. Add also the
compatible in the blacklist of the cpufreq-dt-platdev driver.
Signed-off-by: Fabien Parent <fparent@baylibre.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Add SC7280 to cpufreq-dt-platdev blacklist since the actual scaling is
handled by the 'qcom-cpufreq-hw' driver.
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: Sibi Sankar <sibis@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmCffOARHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1iHBRAAm7p68+/sec2neJ2SxrOdl3kWU5yUXgM/
X2WUQiU8ERAI1IfaKcJBbJCDlIr7Pufwec31IvLpyM5my+pfNkuB9EcLxwQuUZ8y
2IZXF3HlaxWUEfwVqAQ/Dm1J1jExz20vSVzom/2TeE8H1kibdjs6EfouW17FZbwc
CXtZC5MWArU/Wt5cjm84Cn5JAx0Udw3RKv8O5o3w/gz0RMjTGCzxlS54QwF+j1fG
r1kRL+64yS1LPofnsEDSqfw52J/agSpVOgOiRtn7RUYPoTlmkYZ7l1JeZe/bukDi
YsF6uE8nfoRrjhdWVwOpvjEeTzP1hnNBT64piOY+G0wdoBJHmU+jzu5mJIyjxAeY
BnJqA7cH16F9cIKCPilmsifbptJtli+Y301036sxMBj8IlcbPKdHlW/qG9ibUCeN
r6IPZnONd5JaDeEUCQl91fhGxDn8JrSew5Bh6Yp8B2KsJ9cXirUoPORjqu7Fccfe
YRHNPfK8JpSPGv5SSXRrrr6bSdPBhEueqUemfItTGsPpZY/mD0iTIlecol6o0Wfc
A11rk6Hb1BMVveNSCTrH7VFJ9nsql1XI5C7rp0D4+9uEDEYRHsq9rInZSevbytsI
ocF03ineypbGmiiLT5cYiwR2+ucheX8WaS+BpGXlxjTwvAV+s0QdeTe9UyW9mySl
R1ly0Jwpd3Q=
=Ggm4
-----END PGP SIGNATURE-----
Merge tag 'sched-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Fix an idle CPU selection bug, and an AMD Ryzen maximum frequency
enumeration bug"
* tag 'sched-urgent-2021-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, sched: Fix the AMD CPPC maximum performance value on certain AMD Ryzen generations
sched/fair: Fix clearing of has_idle_cores flag in select_idle_cpu()
Some AMD Ryzen generations has different calculation method on maximum
performance. 255 is not for all ASICs, some specific generations should use 166
as the maximum performance. Otherwise, it will report incorrect frequency value
like below:
~ → lscpu | grep MHz
CPU MHz: 3400.000
CPU max MHz: 7228.3198
CPU min MHz: 2200.0000
[ mingo: Tidied up whitespace use. ]
[ Alexander Monakov <amonakov@ispras.ru>: fix 225 -> 255 typo. ]
Fixes: 41ea667227 ("x86, sched: Calculate frequency invariance for AMD systems")
Fixes: 3c55e94c0a ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies")
Reported-by: Jason Bagavatsingham <jason.bagavatsingham@gmail.com>
Fixed-by: Alexander Monakov <amonakov@ispras.ru>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jason Bagavatsingham <jason.bagavatsingham@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210425073451.2557394-1-ray.huang@amd.com
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211791
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It turns out that there are systems where HWP is enabled during
initialization by the platform firmware (BIOS), but HWP EPP support
is not advertised.
After commit 7aa1031223 ("cpufreq: intel_pstate: Avoid enabling HWP
if EPP is not supported") intel_pstate refuses to use HWP on those
systems, but the fallback PERF_CTL interface does not work on them
either because of enabled HWP, and once enabled, HWP cannot be
disabled. Consequently, the users of those systems cannot control
CPU performance scaling.
Address this issue by making intel_pstate use HWP unconditionally if
it is enabled already when the driver starts.
Fixes: 7aa1031223 ("cpufreq: intel_pstate: Avoid enabling HWP if EPP is not supported")
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+
- Add idle states table for IceLake-D to the intel_idle driver and
update IceLake-X C6 data in it (Artem Bityutskiy).
- Fix the C7 idle state on Tegra114 in the tegra cpuidle driver and
drop the unused do_idle() firmware call from it (Dmitry Osipenko).
- Fix cpuidle-qcom-spm Kconfig entry (He Ying).
- Fix handling of possible negative tick_nohz_get_next_hrtimer()
return values of in cpuidle governors (Rafael Wysocki).
- Add support for frequency-invariance to the ACPI CPPC cpufreq
driver and update the frequency-invariance engine (FIE) to use it
as needed (Viresh Kumar).
- Simplify the default delay_us setting in the ACPI CPPC cpufreq
driver (Tom Saeger).
- Clean up frequency-related computations in the intel_pstate
cpufreq driver (Rafael Wysocki).
- Fix TBG parent setting for load levels in the armada-37xx
cpufreq driver and drop the CPU PM clock .set_parent method for
armada-37xx (Marek Behún).
- Fix multiple issues in the armada-37xx cpufreq driver (Pali Rohár).
- Fix handling of dev_pm_opp_of_cpumask_add_table() return values
in cpufreq-dt to take the -EPROBE_DEFER one into acconut as
appropriate (Quanyang Wang).
- Fix format string in ia64-acpi-cpufreq (Sergei Trofimovich).
- Drop the unused for_each_policy() macro from cpufreq (Shaokun
Zhang).
- Simplify computations in the schedutil cpufreq governor to avoid
unnecessary overhead (Yue Hu).
- Fix typos in the s5pv210 cpufreq driver (Bhaskar Chowdhury).
- Fix cpufreq documentation links in Kconfig (Alexander Monakov).
- Fix PCI device power state handling in pci_enable_device_flags()
to avoid issuse in some cases when the device depends on an ACPI
power resource (Rafael Wysocki).
- Add missing documentation of pm_runtime_resume_and_get() (Alan
Stern).
- Add missing static inline stub for pm_runtime_has_no_callbacks()
to pm_runtime.h and drop the unused try_to_freeze_nowarn()
definition (YueHaibing).
- Drop duplicate struct device declaration from pm.h and fix a
structure type declaration in intel_rapl.h (Wan Jiabing).
- Use dev_set_name() instead of an open-coded equivalent of it in
the wakeup sources code and drop a redundant local variable
initialization from it (Andy Shevchenko, Colin Ian King).
- Use crc32 instead of md5 for e820 memory map integrity check
during resume from hibernation on x86 (Chris von Recklinghausen).
- Fix typos in comments in the system-wide and hibernation support
code (Lu Jialin).
- Modify the generic power domains (genpd) code to avoid resuming
devices in the "prepare" phase of system-wide suspend and
hibernation (Ulf Hansson).
- Add Hygon Fam18h RAPL support to the intel_rapl power capping
driver (Pu Wen).
- Add MAINTAINERS entry for the dynamic thermal power management
(DTPM) code (Daniel Lezcano).
- Add devm variants of operating performance points (OPP) API
functions and switch over some users of the OPP framework to
the new resource-managed API (Yangtao Li and Dmitry Osipenko).
- Update devfreq core:
* Register devfreq devices as cooling devices on demand (Daniel
Lezcano).
* Add missing unlock opeation in devfreq_add_device() (Lukasz
Luba).
* Use the next frequency as resume_freq instead of the previous
frequency when using the opp-suspend property (Dong Aisheng).
* Check get_dev_status in devfreq_update_stats() (Dong Aisheng).
* Fix set_freq path for the userspace governor in Kconfig (Dong
Aisheng).
* Remove invalid description of get_target_freq() (Dong Aisheng).
- Update devfreq drivers:
* imx8m-ddrc: Remove imx8m_ddrc_get_dev_status() and unneeded
of_match_ptr() (Dong Aisheng, Fabio Estevam).
* rk3399_dmc: dt-bindings: Add rockchip,pmu phandle and drop
references to undefined symbols (Enric Balletbo i Serra, Gaël
PORTAY).
* rk3399_dmc: Use dev_err_probe() to simplify the code (Krzysztof
Kozlowski).
* imx-bus: Remove unneeded of_match_ptr() (Fabio Estevam).
- Fix kernel-doc warnings in three places (Pierre-Louis Bossart).
- Fix typo in the pm-graph utility code (Ricardo Ribalda).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmCHAUISHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxAxMP/0tFjgxeaJ3chYaiqoPlk2QX/XdwqJvm
8OOu2qBMWbt2bubcIlAPpdlCNaERI4itF7E8za7t9alswdq7YPWGmNR9snCXUKhD
BzERuicZTeOcCk2P3DTgzLVc4EzF6wutA3lTdYYZIpf+LuuB+guG8zgMzScRHIsM
N3I83O+iLTA9ifIqN0/wH//a0ISvo6rSWtcbx+48d5bYvYixW7CsBmoxWHhGiYsw
4PJ4AzbdNJEhQp91SBYPIAmqwV88FZUPoYnRazXMxOSevMewhP9JuCHXAujC3gLV
l5d2TBaBV4EBYLD5tfCpJvHMXhv/yBpg6KRivjri+zEnY1TAqIlfR4vYiL7puVvQ
PdwjyvNFDNGyUSX/AAwYF6F4WCtIhw8hCahw6Dw2zcDz0plFdRZmWAiTdmIjECJK
8EvwJNlmdl27G1y+EBc6NnwzEFZQwiu9F5bUHUkmc3fF1M1aFTza8WDNDo30TC94
94c+uVBRL2fBePl4FfGZATfJbOMb8+vDIkroQxrIjQDT/7Ha3Mz75JZDRHItZo92
+4fES2eFdfZISCLIQMBY5TeXox3O8qsirC1k4qELwy71gPUE9CpP3FkxKIvyZLlv
+6fS9ttpUfyFBF7gqrEy+ziVk1Rm4oorLmWCtthz4xyerzj5+vtZQqKzcySk0GA5
hYkseZkedR6y
=t+SG
-----END PGP SIGNATURE-----
Merge tag 'pm-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These add some new hardware support (for example, IceLake-D idle
states in intel_idle), fix some issues (for example, the handling of
negative "sleep length" values in cpuidle governors), add new
functionality to the existing drivers (for example, scale-invariance
support in the ACPI CPPC cpufreq driver) and clean up code all over.
Specifics:
- Add idle states table for IceLake-D to the intel_idle driver and
update IceLake-X C6 data in it (Artem Bityutskiy).
- Fix the C7 idle state on Tegra114 in the tegra cpuidle driver and
drop the unused do_idle() firmware call from it (Dmitry Osipenko).
- Fix cpuidle-qcom-spm Kconfig entry (He Ying).
- Fix handling of possible negative tick_nohz_get_next_hrtimer()
return values of in cpuidle governors (Rafael Wysocki).
- Add support for frequency-invariance to the ACPI CPPC cpufreq
driver and update the frequency-invariance engine (FIE) to use it
as needed (Viresh Kumar).
- Simplify the default delay_us setting in the ACPI CPPC cpufreq
driver (Tom Saeger).
- Clean up frequency-related computations in the intel_pstate cpufreq
driver (Rafael Wysocki).
- Fix TBG parent setting for load levels in the armada-37xx cpufreq
driver and drop the CPU PM clock .set_parent method for armada-37xx
(Marek Behún).
- Fix multiple issues in the armada-37xx cpufreq driver (Pali Rohár).
- Fix handling of dev_pm_opp_of_cpumask_add_table() return values in
cpufreq-dt to take the -EPROBE_DEFER one into acconut as
appropriate (Quanyang Wang).
- Fix format string in ia64-acpi-cpufreq (Sergei Trofimovich).
- Drop the unused for_each_policy() macro from cpufreq (Shaokun
Zhang).
- Simplify computations in the schedutil cpufreq governor to avoid
unnecessary overhead (Yue Hu).
- Fix typos in the s5pv210 cpufreq driver (Bhaskar Chowdhury).
- Fix cpufreq documentation links in Kconfig (Alexander Monakov).
- Fix PCI device power state handling in pci_enable_device_flags() to
avoid issuse in some cases when the device depends on an ACPI power
resource (Rafael Wysocki).
- Add missing documentation of pm_runtime_resume_and_get() (Alan
Stern).
- Add missing static inline stub for pm_runtime_has_no_callbacks() to
pm_runtime.h and drop the unused try_to_freeze_nowarn() definition
(YueHaibing).
- Drop duplicate struct device declaration from pm.h and fix a
structure type declaration in intel_rapl.h (Wan Jiabing).
- Use dev_set_name() instead of an open-coded equivalent of it in the
wakeup sources code and drop a redundant local variable
initialization from it (Andy Shevchenko, Colin Ian King).
- Use crc32 instead of md5 for e820 memory map integrity check during
resume from hibernation on x86 (Chris von Recklinghausen).
- Fix typos in comments in the system-wide and hibernation support
code (Lu Jialin).
- Modify the generic power domains (genpd) code to avoid resuming
devices in the "prepare" phase of system-wide suspend and
hibernation (Ulf Hansson).
- Add Hygon Fam18h RAPL support to the intel_rapl power capping
driver (Pu Wen).
- Add MAINTAINERS entry for the dynamic thermal power management
(DTPM) code (Daniel Lezcano).
- Add devm variants of operating performance points (OPP) API
functions and switch over some users of the OPP framework to the
new resource-managed API (Yangtao Li and Dmitry Osipenko).
- Update devfreq core:
* Register devfreq devices as cooling devices on demand (Daniel
Lezcano).
* Add missing unlock opeation in devfreq_add_device() (Lukasz
Luba).
* Use the next frequency as resume_freq instead of the previous
frequency when using the opp-suspend property (Dong Aisheng).
* Check get_dev_status in devfreq_update_stats() (Dong Aisheng).
* Fix set_freq path for the userspace governor in Kconfig (Dong
Aisheng).
* Remove invalid description of get_target_freq() (Dong Aisheng).
- Update devfreq drivers:
* imx8m-ddrc: Remove imx8m_ddrc_get_dev_status() and unneeded
of_match_ptr() (Dong Aisheng, Fabio Estevam).
* rk3399_dmc: dt-bindings: Add rockchip,pmu phandle and drop
references to undefined symbols (Enric Balletbo i Serra, Gaël
PORTAY).
* rk3399_dmc: Use dev_err_probe() to simplify the code (Krzysztof
Kozlowski).
* imx-bus: Remove unneeded of_match_ptr() (Fabio Estevam).
- Fix kernel-doc warnings in three places (Pierre-Louis Bossart).
- Fix typo in the pm-graph utility code (Ricardo Ribalda)"
* tag 'pm-5.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (74 commits)
PM: wakeup: remove redundant assignment to variable retval
PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check
cpufreq: Kconfig: fix documentation links
PM: wakeup: use dev_set_name() directly
PM: runtime: Add documentation for pm_runtime_resume_and_get()
cpufreq: intel_pstate: Simplify intel_pstate_update_perf_limits()
cpufreq: armada-37xx: Fix module unloading
cpufreq: armada-37xx: Remove cur_frequency variable
cpufreq: armada-37xx: Fix determining base CPU frequency
cpufreq: armada-37xx: Fix driver cleanup when registration failed
clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0
clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz
cpufreq: armada-37xx: Fix the AVS value for load L1
clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock
cpufreq: armada-37xx: Fix setting TBG parent for load levels
cpuidle: Fix ARM_QCOM_SPM_CPUIDLE configuration
cpuidle: tegra: Remove do_idle firmware call
cpuidle: tegra: Fix C7 idling state on Tegra114
PM: sleep: fix typos in comments
cpufreq: Remove unused for_each_policy macro
...
Updates for SoC specific drivers include a few subsystems that
have their own maintainers but send them through the soc tree:
TEE/OP-TEE:
- Add tracepoints around calls to secure world
Memory controller drivers:
- Minor fixes for Renesas, Exynos, Mediatek and Tegra platforms
- Add debug statistics to Tegra20 memory controller
- Update Tegra bindings and convert to dtschema
ARM SCMI Firmware:
- Support for modular SCMI protocols and vendor specific extensions
- New SCMI IIO driver
- Per-cpu DVFS
The other driver changes are all from the platform maintainers
directly and reflect the drivers that don't fit into any other
subsystem as well as treewide changes for a particular platform.
SoCFPGA:
- Various cleanups contributed by Krzysztof Kozlowski
Mediatek:
- add MT8183 support to mutex driver
- MMSYS: use per SoC array to describe the possible routing
- add MMSYS support for MT8183 and MT8167
- add support for PMIC wrapper with integrated arbiter
- add support for MT8192/MT6873
Tegra:
- Bug fixes to PMC and clock drivers
NXP/i.MX:
- Update SCU power domain driver to keep console domain power on.
- Add missing ADC1 power domain to SCU power domain driver.
- Update comments for single global power domain in SCU power domain
driver.
- Add i.MX51/i.MX53 unique id support to i.MX SoC driver.
NXP/FSL SoC driver updates for v5.13
- Add ACPI support for RCPM driver
- Use generic io{read,write} for QE drivers after performance optimized
for PowerPC
- Fix QBMAN probe to cleanup HW states correctly for kexec
- Various cleanup and style fix for QBMAN/QE/GUTS drivers
OMAP:
- Preparation to use devicetree for genpd
- ti-sysc needs iorange check improved when the interconnect target module
has no control registers listed
- ti-sysc needs to probe l4_wkup and l4_cfg interconnects first to avoid
issues with missing resources and unnecessary deferred probe
- ti-sysc debug option can now detect more devices
- ti-sysc now warns if an old incomplete devicetree data is found as we
now rely on it being complete for am3 and 4
- soc init code needs to check for prcm and prm nodes for omap4/5 and dra7
- omap-prm driver needs to enable autoidle retention support for omap4
- omap5 clocks are missing gpmc and ocmc clock registers
- pci-dra7xx now needs to use builtin_platform_driver instead of using
builtin_platform_driver_probe for deferred probe to work
Raspberry Pi:
- Fix-up all RPi firmware drivers so as for unbind to happen in an
orderly fashion
- Support for RPi's PoE hat PWM bus
Qualcomm
- Improved detection for SCM calling conventions
- Support for OEM specific wifi firmware path
- Added drivers for SC7280/SM8350: RPMH, LLCC< AOSS QMP
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmCC2JwACgkQmmx57+YA
GNkgRg//cBtq2NyDbjiNABxFSkmGCfcc0w0C2wjVzr4cfg6BLTbuvvlpZxI912pu
P1G2sbsdfQJ8sSeIyZos+PilWK0zHrqlaGZfKI19US45dMjpteDBgsPd7wNZwBjQ
jbops3YLjztZK1HpY4dIdvMnfxt7yRqhBWaTbPuCwQ35c5KsOM8NHB3cP3BUINWK
x1uuBCv9svppzwdDiPxneV93WKEzabOUo+WBMPyh5vnyvmW17Iif4BA/VKQxzymm
mWUi8HHpKBpvntJOKwAD2hnLAdpR3SwX20SLOpyLhnJMotbzNUEqq3LdRxDNPdHk
ry+rarJ78JGlYfpcfegf2bLf5ITNMfOyRGkjtzeYpcZIXPjufOg9DA9YtAy37k0u
L0T/9gQ+tQ01WGMca77OyUtIqJKdblZrQMfuH/yGlR99bqFQMV7rNc7GNlX1MXp/
zw4aOYrRWGtGEeAjx5JJWcYydvMSJpCrqxTz3YhgeJECHB2iA6YkV3NROR4TLW//
tfxaKqxR/KmSqE6hoVOAuuQ0BLXNlql/+4EE6MKsAOBiKPJclvmJg4CyuY8G21ev
9Su0zJnXMzai7gNu32v1pizGj26+AOhxCEgAG0mGgk2jlQSn24CKgm5e7kCUewcF
j/1XksNPT95v/K8MsLpXe5xGvF3jhA1BlFfvjJNZOrcZywBXRxg=
=iidq
-----END PGP SIGNATURE-----
Merge tag 'arm-drivers-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC driver updates from Arnd Bergmann:
"Updates for SoC specific drivers include a few subsystems that have
their own maintainers but send them through the soc tree:
TEE/OP-TEE:
- Add tracepoints around calls to secure world
Memory controller drivers:
- Minor fixes for Renesas, Exynos, Mediatek and Tegra platforms
- Add debug statistics to Tegra20 memory controller
- Update Tegra bindings and convert to dtschema
ARM SCMI Firmware:
- Support for modular SCMI protocols and vendor specific extensions
- New SCMI IIO driver
- Per-cpu DVFS
The other driver changes are all from the platform maintainers
directly and reflect the drivers that don't fit into any other
subsystem as well as treewide changes for a particular platform.
SoCFPGA:
- Various cleanups contributed by Krzysztof Kozlowski
Mediatek:
- add MT8183 support to mutex driver
- MMSYS: use per SoC array to describe the possible routing
- add MMSYS support for MT8183 and MT8167
- add support for PMIC wrapper with integrated arbiter
- add support for MT8192/MT6873
Tegra:
- Bug fixes to PMC and clock drivers
NXP/i.MX:
- Update SCU power domain driver to keep console domain power on.
- Add missing ADC1 power domain to SCU power domain driver.
- Update comments for single global power domain in SCU power domain
driver.
- Add i.MX51/i.MX53 unique id support to i.MX SoC driver.
NXP/FSL SoC driver updates for v5.13
- Add ACPI support for RCPM driver
- Use generic io{read,write} for QE drivers after performance
optimized for PowerPC
- Fix QBMAN probe to cleanup HW states correctly for kexec
- Various cleanup and style fix for QBMAN/QE/GUTS drivers
OMAP:
- Preparation to use devicetree for genpd
- ti-sysc needs iorange check improved when the interconnect target
module has no control registers listed
- ti-sysc needs to probe l4_wkup and l4_cfg interconnects first to
avoid issues with missing resources and unnecessary deferred probe
- ti-sysc debug option can now detect more devices
- ti-sysc now warns if an old incomplete devicetree data is found as
we now rely on it being complete for am3 and 4
- soc init code needs to check for prcm and prm nodes for omap4/5 and
dra7
- omap-prm driver needs to enable autoidle retention support for
omap4
- omap5 clocks are missing gpmc and ocmc clock registers
- pci-dra7xx now needs to use builtin_platform_driver instead of
using builtin_platform_driver_probe for deferred probe to work
Raspberry Pi:
- Fix-up all RPi firmware drivers so as for unbind to happen in an
orderly fashion
- Support for RPi's PoE hat PWM bus
Qualcomm
- Improved detection for SCM calling conventions
- Support for OEM specific wifi firmware path
- Added drivers for SC7280/SM8350: RPMH, LLCC< AOSS QMP"
* tag 'arm-drivers-5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits)
soc: aspeed: fix a ternary sign expansion bug
memory: mtk-smi: Add device-link between smi-larb and smi-common
memory: samsung: exynos5422-dmc: handle clk_set_parent() failure
memory: renesas-rpc-if: fix possible NULL pointer dereference of resource
clk: socfpga: fix iomem pointer cast on 64-bit
soc: aspeed: Adapt to new LPC device tree layout
pinctrl: aspeed-g5: Adapt to new LPC device tree layout
ipmi: kcs: aspeed: Adapt to new LPC DTS layout
ARM: dts: Remove LPC BMC and Host partitions
dt-bindings: aspeed-lpc: Remove LPC partitioning
soc: fsl: enable acpi support in RCPM driver
soc: qcom: mdt_loader: Detect truncated read of segments
soc: qcom: mdt_loader: Validate that p_filesz < p_memsz
soc: qcom: pdr: Fix error return code in pdr_register_listener
firmware: qcom_scm: Fix kernel-doc function names to match
firmware: qcom_scm: Suppress sysfs bind attributes
firmware: qcom_scm: Workaround lack of "is available" call on SC7180
firmware: qcom_scm: Reduce locking section for __get_convention()
firmware: qcom_scm: Make __qcom_scm_is_call_available() return bool
Revert "soc: fsl: qe: introduce qe_io{read,write}* wrappers"
...
User documentation for cpufreq governors and drivers has been moved to
admin-guide; adjust references from Kconfig entries accordingly.
Remove references from undocumented cpufreq drivers, as well as the
'userspace' cpufreq governor, for which no additional details are
provided in the admin-guide text.
Fixes: 2a0e492798 ("cpufreq: User/admin documentation update and consolidation")
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull ARM cpufreq updates for v5.13 from Viresh Kumar:
"- Fix typos in s5pv210 cpufreq driver (Bhaskar Chowdhury).
- Armada 37xx: Fix cpufreq changing base CPU speed to 800 MHz from
1000 MHz (Pali Rohár and Marek Behún).
- cpufreq-dt: Return -EPROBE_DEFER on failure to add table (Quanyang
Wang).
- Minor cleanup in cppc driver (Tom Saeger).
- Add frequency invariance support for CPPC driver and generalize
freq invariance support arch-topology driver (Viresh Kumar)."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: armada-37xx: Fix module unloading
cpufreq: armada-37xx: Remove cur_frequency variable
cpufreq: armada-37xx: Fix determining base CPU frequency
cpufreq: armada-37xx: Fix driver cleanup when registration failed
clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0
clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz
cpufreq: armada-37xx: Fix the AVS value for load L1
clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock
cpufreq: armada-37xx: Fix setting TBG parent for load levels
cpufreq: dt: dev_pm_opp_of_cpumask_add_table() may return -EPROBE_DEFER
cpufreq: cppc: simplify default delay_us setting
cpufreq: Rudimentary typos fix in the file s5pv210-cpufreq.c
cpufreq: CPPC: Add support for frequency invariance
arch_topology: Export arch_freq_scale and helpers
arch_topology: Allow multiple entities to provide sched_freq_tick() callback
arch_topology: Rename freq_scale as arch_freq_scale
Because pstate.max_freq is always equal to the product of
pstate.max_pstate and pstate.scaling and, analogously,
pstate.turbo_freq is always equal to the product of
pstate.turbo_pstate and pstate.scaling, the result of the
max_policy_perf computation in intel_pstate_update_perf_limits() is
always equal to the quotient of policy_max and pstate.scaling,
regardless of whether or not turbo is disabled. Analogously, the
result of min_policy_perf in intel_pstate_update_perf_limits() is
always equal to the quotient of policy_min and pstate.scaling.
Accordingly, intel_pstate_update_perf_limits() need not check
whether or not turbo is enabled at all and in order to compute
max_policy_perf and min_policy_perf it can always divide policy_max
and policy_min, respectively, by pstate.scaling. Make it do so.
While at it, move the definition and initialization of the
turbo_max local variable to the code branch using it.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
This driver is missing module_exit hook. Add proper driver exit function
which unregisters the platform device and cleans up the data.
Signed-off-by: Pali Rohár <pali@kernel.org>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
When current CPU load is not L0 then loading armada-37xx-cpufreq.ko driver
fails with following error:
# modprobe armada-37xx-cpufreq
[ 502.702097] Unsupported CPU frequency 250 MHz
This issue was partially fixed by commit 8db8256345 ("cpufreq:
armada-37xx: fix frequency calculation for opp"), but only for calculating
CPU frequency for opp.
Fix this also for determination of base CPU frequency.
Signed-off-by: Pali Rohár <pali@kernel.org>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Fixes: 92ce45fb87 ("cpufreq: Add DVFS support for Armada 37xx")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Commit 8db8256345 ("cpufreq: armada-37xx: fix frequency calculation for
opp") changed calculation of frequency passed to the dev_pm_opp_add()
function call. But the code for dev_pm_opp_remove() function call was not
updated, so the driver cleanup phase does not work when registration fails.
This fixes the issue by using the same frequency in both calls.
Signed-off-by: Pali Rohár <pali@kernel.org>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Fixes: 8db8256345 ("cpufreq: armada-37xx: fix frequency calculation for opp")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The original CPU voltage value for load L1 is too low for Armada 37xx SoC
when base CPU frequency is 1000 or 1200 MHz. It leads to instabilities
where CPU gets stuck soon after dynamic voltage scaling from load L1 to L0.
Update the CPU voltage value for load L1 accordingly when base frequency is
1000 or 1200 MHz. The minimal L1 value for base CPU frequency 1000 MHz is
updated from the original 1.05V to 1.108V and for 1200 MHz is updated to
1.155V. This minimal L1 value is used only in the case when it is lower
than value for L0.
This change fixes CPU instability issues on 1 GHz and 1.2 GHz variants of
Espressobin and 1 GHz Turris Mox.
Marvell previously for 1 GHz variant of Espressobin provided a patch [1]
suitable only for their Marvell Linux kernel 4.4 fork which workarounded
this issue. Patch forced CPU voltage value to 1.108V in all loads. But
such change does not fix CPU instability issues on 1.2 GHz variants of
Armada 3720 SoC.
During testing we come to the conclusion that using 1.108V as minimal
value for L1 load makes 1 GHz variants of Espressobin and Turris Mox boards
stable. And similarly 1.155V for 1.2 GHz variant of Espressobin.
These two values 1.108V and 1.155V are documented in Armada 3700 Hardware
Specifications as typical initial CPU voltage values.
Discussion about this issue is also at the Armbian forum [2].
[1] - dc33b62c90
[2] - https://forum.armbian.com/topic/10429-how-to-make-espressobin-v7-stable/
Signed-off-by: Pali Rohár <pali@kernel.org>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Fixes: 1c3528232f ("cpufreq: armada-37xx: Add AVS support")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
With CPU frequency determining software [1] we have discovered that
after this driver does one CPU frequency change, the base frequency of
the CPU is set to the frequency of TBG-A-P clock, instead of the TBG
that is parent to the CPU.
This can be reproduced on EspressoBIN and Turris MOX:
cd /sys/devices/system/cpu/cpufreq/policy0
echo powersave >scaling_governor
echo performance >scaling_governor
Running the mhz tool before this driver is loaded reports 1000 MHz, and
after loading the driver and executing commands above the tool reports
800 MHz.
The change of TBG clock selector is supposed to happen in function
armada37xx_cpufreq_dvfs_setup. Before the function returns, it does
this:
parent = clk_get_parent(clk);
clk_set_parent(clk, parent);
The armada-37xx-periph clock driver has the .set_parent method
implemented correctly for this, so if the method was actually called,
this would work. But since the introduction of the common clock
framework in commit b2476490ef ("clk: introduce the common clock..."),
the clk_set_parent function checks whether the parent is actually
changing, and if the requested new parent is same as the old parent
(which is obviously the case for the code above), the .set_parent method
is not called at all.
This patch fixes this issue by filling the correct TBG clock selector
directly in the armada37xx_cpufreq_dvfs_setup during the filling of
other registers at the same address. But the determination of CPU TBG
index cannot be done via the common clock framework, therefore we need
to access the North Bridge Peripheral Clock registers directly in this
driver.
[1] https://github.com/wtarreau/mhz
Signed-off-by: Marek Behún <kabel@kernel.org>
Acked-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Tested-by: Pali Rohár <pali@kernel.org>
Tested-by: Tomasz Maciej Nowak <tmn505@gmail.com>
Tested-by: Anders Trier Olesen <anders.trier.olesen@gmail.com>
Tested-by: Philip Soares <philips@netisense.com>
Fixes: 92ce45fb87 ("cpufreq: Add DVFS support for Armada 37xx")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Macro 'for_each_policy' has become unused since commit
f963735a3c ("cpufreq: Create for_each_{in}active_policy()"), so
remove it.
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Port driver to the new SCMI perf interface based on protocol handles
and common devm_get_ops().
Link: https://lore.kernel.org/r/20210316124903.35011-13-cristian.marussi@arm.com
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
The function names in the comment blocks for the functions
scaling_available_frequencies_show() and
scaling_boost_frequencies_show() do not match the actual names.
Fixes: 6f19efc0a1 ("cpufreq: Add boost frequency support in core")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The function dev_pm_opp_of_cpumask_add_table() may return -EPROBE_DEFER,
which needs to be propagated to the caller to try probing the driver
later on.
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
[ Viresh: Massage changelog/subject, improve code. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Notice that some computations related to frequency in intel_pstate
can be simplified if (a) intel_pstate_get_hwp_max() updates the
relevant members of struct cpudata by itself and (b) the "turbo
disabled" check is moved from it to its callers, so modify the code
accordingly and while at it rename intel_pstate_get_hwp_max() to
intel_pstate_get_hwp_cap() which better reflects its purpose and
provide a simplified variat of it, __intel_pstate_get_hwp_cap(),
suitable for the initialization path.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Simplify case when setting default in cppc_cpufreq_get_transition_delay_us.
Signed-off-by: Tom Saeger <tom.saeger@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The Frequency Invariance Engine (FIE) is providing a frequency scaling
correction factor that helps achieve more accurate load-tracking.
Normally, this scaling factor can be obtained directly with the help of
the cpufreq drivers as they know the exact frequency the hardware is
running at. But that isn't the case for CPPC cpufreq driver.
Another way of obtaining that is using the arch specific counter
support, which is already present in kernel, but that hardware is
optional for platforms.
This patch updates the CPPC driver to register itself with the topology
core to provide its own implementation (cppc_scale_freq_tick()) of
topology_scale_freq_tick() which gets called by the scheduler on every
tick. Note that the arch specific counters have higher priority than
CPPC counters, if available, though the CPPC driver doesn't need to have
any special handling for that.
On an invocation of cppc_scale_freq_tick(), we schedule an irq work
(since we reach here from hard-irq context), which then schedules a
normal work item and cppc_scale_freq_workfn() updates the per_cpu
arch_freq_scale variable based on the counter updates since the last
tick.
To allow platforms to disable this CPPC counter-based frequency
invariance support, this is all done under CONFIG_ACPI_CPPC_CPUFREQ_FIE,
which is enabled by default.
This also exports sched_setattr_nocheck() as the CPPC driver can be
built as a module.
Cc: linux-acpi@vger.kernel.org
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Fix warning with %lx / s64 mismatch:
CC [M] drivers/cpufreq/ia64-acpi-cpufreq.o
drivers/cpufreq/ia64-acpi-cpufreq.c: In function 'processor_get_pstate':
warning: format '%lx' expects argument of type 'long unsigned int',
but argument 3 has type 's64' {aka 'long long int'} [-Wformat=]
Signed-off-by: Sergei Trofimovich <slyfox@gentoo.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
By design, SCMI performance domains define the granularity of
performance controls, they do not describe any underlying hardware
dependencies (although they may match in many cases).
It is therefore possible to have some platforms where hardware may have
the ability to control CPU performance at different granularity and choose
to describe fine-grained performance control through SCMI.
In such situations, the energy model would be provided with inaccurate
information based on controls, while it still needs to know the
performance boundaries.
To restore correct functionality, retrieve information of CPUs under the
same performance domain from operating-points-v2 in DT, and pass it on to
EM.
Link: https://lore.kernel.org/r/20210218222326.15788-3-nicola.mazzucato@arm.com
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
The current implementation of the scmi_cpufreq_init() function returns
-EPROBE_DEFER when the OPP table is not populated. In practice the
cpufreq core cannot handle this error code.
Therefore, fix the return value and clarify the error message.
Link: https://lore.kernel.org/r/20210218222326.15788-2-nicola.mazzucato@arm.com
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Add "arm,vexpress" to cpufreq-dt-platdev blacklist since the actual
scaling is handled by the firmware cpufreq drivers(scpi, scmi and
vexpress-spc).
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
In case of error, the function ioremap() returns NULL pointer
not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.
Fixes: 67fc209b52 ("cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Drop support for depercated platforms using SFI, drop the entire
support for SFI that has been long deprecated too and make some
janitorial changes on top of that (Andy Shevchenko).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmA2ZukSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxKcAP/RAkbRVFndhQIZYTCu74O64v86FjTBcS
3vvcKevVkBJiPJL1l10Yo3UMEYAbJIRZY00jkUjX7pq4eurELu6LwdMtJlHwh0p5
ZP5QeSdq1xN+9UGwBGXlnka2ypmD8fjbQyxHKErYgvmOl4ltFm40PyUC9GCVFLnW
/1o83t/dcmTtaOGPYWTW3HuCsbYqANG/x8PYAFeAk5dBxoSaNV69gAEuCYr1JC5N
Nie4x2m2I5v9egJFhy6rmRrpHPBvocCho+FipJFagSKWHPCI2rBSKESVOj23zWt2
eIWhK5T/ZR3OqQb9tZN6uAPJmBAerc3l7ZHZ1oFBP68MjUJJJhduQ+hNxljOyLLw
CVx0UhuancIWZdyJon5f7E9S9STZLIZ/3usx3K+7AZK+PSmH8d/UEIeXfkC0FcAr
eO3gwalB9KuhhXbVvihW79RkfkV5pTaMvVS7l1BffN4WE1dB9PKtJ8/MKFbGaTUF
4Rev6BdAEDqJrw6OIARvNcI6TAEhbKe5yIghzhQWn+fZ7oEm6f6fvFObBzD0KvQP
4RwYJhXU0gtK5yo/Ib1sUqjVQn8Jgqb7Xq46WZsP07Yc6O2Ws/86qCpX1GSCv5FU
1CZEJLGLGTbjDYOyMaUDfO/tI5kXG11e0Ss7Q+snWH4Iyhg0aNEYChKjOAFIxIxg
JJYOH8O5p2IP
=jlPz
-----END PGP SIGNATURE-----
Merge tag 'sfi-removal-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull Simple Firmware Interface (SFI) support removal from Rafael Wysocki:
"Drop support for depercated platforms using SFI, drop the entire
support for SFI that has been long deprecated too and make some
janitorial changes on top of that (Andy Shevchenko)"
* tag 'sfi-removal-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
x86/platform/intel-mid: Update Copyright year and drop file names
x86/platform/intel-mid: Remove unused header inclusion in intel-mid.h
x86/platform/intel-mid: Drop unused __intel_mid_cpu_chip and Co.
x86/platform/intel-mid: Get rid of intel_scu_ipc_legacy.h
x86/PCI: Describe @reg for type1_access_ok()
x86/PCI: Get rid of custom x86 model comparison
sfi: Remove framework for deprecated firmware
cpufreq: sfi-cpufreq: Remove driver for deprecated firmware
media: atomisp: Remove unused header
mfd: intel_msic: Remove driver for deprecated platform
x86/apb_timer: Remove driver for deprecated platform
x86/platform/intel-mid: Remove unused leftovers (vRTC)
x86/platform/intel-mid: Remove unused leftovers (msic)
x86/platform/intel-mid: Remove unused leftovers (msic_thermal)
x86/platform/intel-mid: Remove unused leftovers (msic_power_btn)
x86/platform/intel-mid: Remove unused leftovers (msic_gpio)
x86/platform/intel-mid: Remove unused leftovers (msic_battery)
x86/platform/intel-mid: Remove unused leftovers (msic_ocd)
x86/platform/intel-mid: Remove unused leftovers (msic_audio)
platform/x86: intel_scu_wdt: Drop mistakenly added const
* pm-cpufreq:
cpufreq: Fix typo in kerneldoc comment
cpufreq: schedutil: Remove update_lock comment from struct sugov_policy definition
cpufreq: schedutil: Remove needless sg_policy parameter from ignore_dl_rate_limit()
cpufreq: ACPI: Set cpuinfo.max_freq directly if max boost is known
cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks
* pm-opp:
opp: Don't skip freq update for different frequency
Change 'Terget' to 'Target'.
Should be Target.
Signed-off-by: Yue Hu <huyue2@yulong.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull ARM cpufreq fix for 5.12 from Viresh Kumar:
"Single patch to fix issue with cpu hotplug and policy recreation for
qcom-cpufreq-hw driver."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: qcom-hw: drop devm_xxx() calls from init/exit hooks
Commit 3c55e94c0a ("cpufreq: ACPI: Extend frequency tables to cover
boost frequencies") attempted to address a performance issue involving
acpi-cpufreq, the schedutil governor and scale-invariance on x86 by
extending the frequency tables created by acpi-cpufreq to cover the
entire range of "turbo" (or "boost") frequencies, but that caused
frequencies reported via /proc/cpuinfo and the scaling_cur_freq
attribute in sysfs to change which may confuse users and monitoring
tools.
For this reason, revert the part of commit 3c55e94c0a adding the
extra entry to the frequency table and use the observation that
in principle cpuinfo.max_freq need not be equal to the maximum
frequency listed in the frequency table for the given policy.
Namely, modify cpufreq_frequency_table_cpuinfo() to allow cpufreq
drivers to set their own cpuinfo.max_freq above that frequency and
change acpi-cpufreq to set cpuinfo.max_freq to the maximum boost
frequency found via CPPC.
This should be sufficient to let all of the cpufreq subsystem know
the real maximum frequency of the CPU without changing frequency
reporting.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=211305
Fixes: 3c55e94c0a ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies")
Reported-by: Matt McDonald <gardotd426@gmail.com>
Tested-by: Matt McDonald <gardotd426@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Tested-by: Michael Larabel <Michael@phoronix.com>
Cc: 5.11+ <stable@vger.kernel.org> # 5.11+
Commit f17b3e4432 ("cpufreq: qcom-hw: Use
devm_platform_ioremap_resource() to simplify code") introduces
a regression on platforms using the driver, by failing to initialise
a policy, when one is created post hotplug.
When all the CPUs of a policy are hoptplugged out, the call to .exit()
and later to devm_iounmap() does not release the memory region that was
requested during devm_platform_ioremap_resource(). Therefore,
a subsequent call to .init() will result in the following error, which
will prevent a new policy to be initialised:
[ 3395.915416] CPU4: shutdown
[ 3395.938185] psci: CPU4 killed (polled 0 ms)
[ 3399.071424] CPU5: shutdown
[ 3399.094316] psci: CPU5 killed (polled 0 ms)
[ 3402.139358] CPU6: shutdown
[ 3402.161705] psci: CPU6 killed (polled 0 ms)
[ 3404.742939] CPU7: shutdown
[ 3404.765592] psci: CPU7 killed (polled 0 ms)
[ 3411.492274] Detected VIPT I-cache on CPU4
[ 3411.492337] GICv3: CPU4: found redistributor 400 region 0:0x0000000017ae0000
[ 3411.492448] CPU4: Booted secondary processor 0x0000000400 [0x516f802d]
[ 3411.503654] qcom-cpufreq-hw 17d43000.cpufreq: can't request region for resource [mem 0x17d45800-0x17d46bff]
With that being said, the original code was tricky and skipping memory
region request intentionally to hide this issue. The true cause is that
those devm_xxx() device managed functions shouldn't be used for cpufreq
init/exit hooks, because &pdev->dev is alive across the hooks and will
not trigger auto resource free-up. Let's drop the use of device managed
functions and manually allocate/free resources, so that the issue can be
fixed properly.
Cc: v5.10+ <stable@vger.kernel.org> # v5.10+
Fixes: f17b3e4432 ("cpufreq: qcom-hw: Use devm_platform_ioremap_resource() to simplify code")
Suggested-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
SFI-based platforms are gone. So does this driver.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pm-opp: (37 commits)
PM / devfreq: Add required OPPs support to passive governor
PM / devfreq: Cache OPP table reference in devfreq
OPP: Add function to look up required OPP's for a given OPP
opp: Replace ENOTSUPP with EOPNOTSUPP
opp: Fix "foo * bar" should be "foo *bar"
opp: Don't ignore clk_get() errors other than -ENOENT
opp: Update bandwidth requirements based on scaling up/down
opp: Allow lazy-linking of required-opps
opp: Remove dev_pm_opp_set_bw()
devfreq: tegra30: Migrate to dev_pm_opp_set_opp()
drm: msm: Migrate to dev_pm_opp_set_opp()
cpufreq: qcom: Migrate to dev_pm_opp_set_opp()
opp: Implement dev_pm_opp_set_opp()
opp: Update parameters of _set_opp_custom()
opp: Allow _generic_set_opp_clk_only() to work for non-freq devices
opp: Allow _generic_set_opp_regulator() to work for non-freq devices
opp: Allow _set_opp() to work for non-freq devices
opp: Split _set_opp() out of dev_pm_opp_set_rate()
opp: Keep track of currently programmed OPP
opp: No need to check clk for errors
...
Pull ARM cpufreq changes for v5.12 from Viresh Kumar:
"- Removal of Tango driver as the platform got removed (Arnd Bergmann).
- Use resource managed APIs for tegra20 (Dmitry Osipenko).
- Generic cleanups for brcmstb (Christophe JAILLET).
- Enable boost support for qcom-hw (Shawn Guo)."
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
cpufreq: remove tango driver
cpufreq: brcmstb-avs-cpufreq: Fix resource leaks in ->remove()
cpufreq: brcmstb-avs-cpufreq: Free resources in error path
cpufreq: qcom-hw: enable boost support
cpufreq: tegra20: Use resource-managed API
If the maximum performance level taken for computing the
arch_max_freq_ratio value used in the x86 scale-invariance code is
higher than the one corresponding to the cpuinfo.max_freq value
coming from the acpi_cpufreq driver, the scale-invariant utilization
falls below 100% even if the CPU runs at cpuinfo.max_freq or slightly
faster, which causes the schedutil governor to select a frequency
below cpuinfo.max_freq. That frequency corresponds to a frequency
table entry below the maximum performance level necessary to get to
the "boost" range of CPU frequencies which prevents "boost"
frequencies from being used in some workloads.
While this issue is related to scale-invariance, it may be amplified
by commit db865272d9 ("cpufreq: Avoid configuring old governors as
default with intel_pstate") from the 5.10 development cycle which
made it extremely easy to default to schedutil even if the preferred
driver is acpi_cpufreq as long as intel_pstate is built too, because
the mere presence of the latter effectively removes the ondemand
governor from the defaults. Distro kernels are likely to include
both intel_pstate and acpi_cpufreq on x86, so their users who cannot
use intel_pstate or choose to use acpi_cpufreq may easily be
affectecd by this issue.
If CPPC is available, it can be used to address this issue by
extending the frequency tables created by acpi_cpufreq to cover the
entire available frequency range (including "boost" frequencies) for
each CPU, but if CPPC is not there, acpi_cpufreq has no idea what
the maximum "boost" frequency is and the frequency tables created by
it cannot be extended in a meaningful way, so in that case make it
ask the arch scale-invariance code to to use the "nominal" performance
level for CPU utilization scaling in order to avoid the issue at hand.
Fixes: db865272d9 ("cpufreq: Avoid configuring old governors as default with intel_pstate")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
A severe performance regression on AMD EPYC processors when using
the schedutil scaling governor was discovered by Phoronix.com and
attributed to the following commits:
41ea667227 ("x86, sched: Calculate frequency invariance for AMD
systems")
976df7e573 ("x86, sched: Use midpoint of max_boost and max_P for
frequency invariance on AMD EPYC")
The source of the problem is that the maximum performance level taken
for computing the arch_max_freq_ratio value used in the x86 scale-
invariance code is higher than the one corresponding to the
cpuinfo.max_freq value coming from the acpi_cpufreq driver.
This effectively causes the scale-invariant utilization to fall below
100% even if the CPU runs at cpuinfo.max_freq or slightly faster, so
the schedutil governor selects a frequency below cpuinfo.max_freq
then. That frequency corresponds to a frequency table entry below
the maximum performance level necessary to get to the "boost" range
of CPU frequencies.
However, if the cpuinfo.max_freq value coming from acpi_cpufreq was
higher, the schedutil governor would select higher frequencies which
in turn would allow acpi_cpufreq to set more adequate performance
levels and to get to the "boost" range of CPU frequencies more often.
This issue affects any systems where acpi_cpufreq is used and the
"boost" (or "turbo") frequencies are enabled, not just AMD EPYC.
Moreover, commit db865272d9 ("cpufreq: Avoid configuring old
governors as default with intel_pstate") from the 5.10 development
cycle made it extremely easy to default to schedutil even if the
preferred driver is acpi_cpufreq as long as intel_pstate is built
too, because the mere presence of the latter effectively removes the
ondemand governor from the defaults. Distro kernels are likely to
include both intel_pstate and acpi_cpufreq on x86, so their users
who cannot use intel_pstate or choose to use acpi_cpufreq may
easily be affectecd by this issue.
To address this issue, extend the frequency table constructed by
acpi_cpufreq for each CPU to cover the entire range of available
frequencies (including the "boost" ones) if CPPC is available and
indicates that "boost" (or "turbo") frequencies are enabled. That
causes cpuinfo.max_freq to become the maximum "boost" frequency of
the given CPU (instead of the maximum frequency returned by the ACPI
_PSS object that corresponds to the "nominal" performance level).
Fixes: 41ea667227 ("x86, sched: Calculate frequency invariance for AMD systems")
Fixes: 976df7e573 ("x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC")
Fixes: db865272d9 ("cpufreq: Avoid configuring old governors as default with intel_pstate")
Link: https://www.phoronix.com/scan.php?page=article&item=linux511-amd-schedutil&num=1
Link: https://lore.kernel.org/linux-pm/20210203135321.12253-2-ggherdovich@suse.cz/
Reported-by: Michael Larabel <Michael@phoronix.com>
Diagnosed-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Reviewed-by: Giovanni Gherdovich <ggherdovich@suse.cz>
Tested-by: Michael Larabel <Michael@phoronix.com>
This flag is set by one of the drivers but it isn't used in the code
otherwise. Remove the unused flag and update the driver.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
During cpufreq driver's registration, if the ->init() callback for all
the CPUs fail then there is not much point in keeping the driver around
as it will only account for more of unnecessary noise, for example
cpufreq core will try to suspend/resume the driver which never got
registered properly.
The removal of such a driver is avoided if the driver carries the
CPUFREQ_STICKY flag. This was added way back [1] in 2004 and perhaps no
one should ever need it now. A lot of drivers do set this flag, probably
because they just copied it from other drivers.
This was added earlier for some platforms [2] because their cpufreq
drivers were getting registered before the CPUs were registered with
subsys framework. And hence they used to fail.
The same isn't true anymore though. The current code flow in the kernel
is:
start_kernel()
-> kernel_init()
-> kernel_init_freeable()
-> do_basic_setup()
-> driver_init()
-> cpu_dev_init()
-> subsys_system_register() //For CPUs
-> do_initcalls()
-> cpufreq_register_driver()
Clearly, the CPUs will always get registered with subsys framework
before any cpufreq driver can get probed. Remove the flag and update the
relevant drivers.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/include/linux/cpufreq.h?id=7cc9f0d9a1ab04cedc60d64fd8dcf7df224a3b4d # [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/arch/arm/mach-sa1100/cpu-sa1100.c?id=f59d3bbe35f6268d729f51be82af8325d62f20f5 # [2]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
dev_pm_opp_set_bw() is getting removed and dev_pm_opp_set_opp() should
be used instead. Migrate to the new API.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
In the comment for trace in passive mode there is an
unnecessary "the". Eradicate it.
Signed-off-by: Nigel Christian <nigel.l.christian@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The tango platform is getting removed, so the driver is no
longer needed.
Cc: Marc Gonzalez <marc.w.gonzalez@free.fr>
Cc: Mans Rullgard <mans@mansr.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[ Viresh: Update cpufreq-dt-platdev.c as well ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
If 'cpufreq_unregister_driver()' fails, just WARN and continue, so that
other resources are freed.
Fixes: de322e0859 ("cpufreq: brcmstb-avs-cpufreq: AVS CPUfreq driver for Broadcom STB SoCs")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
[ Viresh: Updated Subject ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
If 'cpufreq_register_driver()' fails, we must release the resources
allocated in 'brcm_avs_prepare_init()' as already done in the remove
function.
To do that, introduce a new function 'brcm_avs_prepare_uninit()' in order
to avoid code duplication. This also makes the code more readable (IMHO).
Fixes: de322e0859 ("cpufreq: brcmstb-avs-cpufreq: AVS CPUfreq driver for Broadcom STB SoCs")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
[ Viresh: Updated Subject ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
At least on sdm850, the 2956800 khz is detected as a boost frequency in
function qcom_cpufreq_hw_read_lut(). Let's enable boost support by
calling cpufreq_enable_boost_support(), so that we can get the boost
frequency by switching it on via 'boost' sysfs entry like below.
$ echo 1 > /sys/devices/system/cpu/cpufreq/boost
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Steev Klimaszewski <steev@kali.org>
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Switch cpufreq-tegra20 driver to use resource-managed API.
This removes the need to get opp_table pointer using
dev_pm_opp_get_opp_table() in order to release OPP table that
was requested by dev_pm_opp_set_supported_hw(), making the code
a bit more straightforward.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Currently, when turbo is disabled (either by BIOS or by the user),
the intel_pstate driver reads the max non-turbo frequency from the
package-wide MSR_PLATFORM_INFO(0xce) register.
However, on asymmetric platforms it is possible in theory that small
and big core with HWP enabled might have different max non-turbo CPU
frequency, because MSR_HWP_CAPABILITIES is per-CPU scope according
to Intel Software Developer Manual.
The turbo max freq is already per-CPU in current code, so make
similar change to the max non-turbo frequency as well.
Reported-by: Wendy Wang <wendy.wang@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[ rjw: Subject and changelog edits ]
Cc: 4.18+ <stable@vger.kernel.org> # 4.18+: a45ee4d4e13b: cpufreq: intel_pstate: Change intel_pstate_get_hwp_max() argument
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Rename intel_cpufreq_adjust_hwp() and intel_cpufreq_adjust_perf_ctl()
to intel_cpufreq_hwp_update() and intel_cpufreq_perf_ctl_update(),
respectively, to avoid possible confusion with the ->adjist_perf()
callback function, intel_cpufreq_adjust_perf().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
All of the callers of intel_pstate_get_hwp_max() access the struct
cpudata object that corresponds to the given CPU already and the
function itself needs to access that object (in order to update
hwp_cap_cached), so modify the code to pass a struct cpudata pointer
to it instead of the CPU number.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Because intel_pstate_get_hwp_max() which updates hwp_cap_cached
may run in parallel with the readers of it, annotate all of the
read accesses to it with READ_ONCE().
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
percent_fp() was used in intel_pstate_pid_reset(), which was removed in
commit 9d0ef7af1f ("cpufreq: intel_pstate: Do not use PID-based P-state
selection") and hence, percent_fp() is unused since then.
percent_ext_fp() was last used in intel_pstate_update_perf_limits(), which
was refactored in commit 1a4fe38add ("cpufreq: intel_pstate: Remove
max/min fractions to limit performance"), and hence, percent_ext_fp() is
unused since then.
make CC=clang W=1 points us those unused functions:
drivers/cpufreq/intel_pstate.c:79:23: warning: unused function 'percent_fp' [-Wunused-function]
static inline int32_t percent_fp(int percent)
^
drivers/cpufreq/intel_pstate.c:94:23: warning: unused function 'percent_ext_fp' [-Wunused-function]
static inline int32_t percent_ext_fp(int percent)
^
Remove those obsolete functions.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently there is an unlikely case where cpufreq_cpu_get() returns a
NULL policy and this will cause a NULL pointer dereference later on.
Fix this by passing the policy to transition_frequency_fidvid() from
the caller and hence eliminating the need for the cpufreq_cpu_get()
and cpufreq_cpu_put().
Thanks to Viresh Kumar for suggesting the fix.
Addresses-Coverity: ("Dereference null return")
Fixes: b43a7ffbf3 ("cpufreq: Notify all policy->cpus in cpufreq_notify_transition()")
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If turbo P-states cannot be used, either due to the configuration of
the processor, or because intel_pstate is not allowed to used them,
the maximum available P-state with HWP enabled corresponds to the
HWP_CAP.GUARANTEED value which is not static. It can be adjusted by
an out-of-band agent or during an Intel Speed Select performance
level change, so long as it remains less than or equal to
HWP_CAP.MAX.
However, if turbo P-states cannot be used, intel_cpufreq_adjust_perf()
always uses pstate.max_pstate (set during the initialization of the
driver only) as the maximum available P-state, so it may miss a change
of the HWP_CAP.GUARANTEED value.
Prevent that from happening by modifyig intel_cpufreq_adjust_perf()
to always read the "guaranteed" and "maximum turbo" performance
levels from the cached HWP_CAP value.
Fixes: a365ab6b9d ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
When sugov_update_single_perf() falls back to the "frequency"
path due to the missing scale-invariance, it will call
cpufreq_driver_fast_switch() via sugov_fast_switch()
and the driver's ->fast_switch() callback will be invoked,
so it must not be NULL.
However, after commit a365ab6b9d ("cpufreq: intel_pstate: Implement
the ->adjust_perf() callback") intel_pstate sets ->fast_switch() to
NULL when it is going to use intel_cpufreq_adjust_perf(), which is a
mistake, because on x86 the scale-invariance may be turned off
dynamically, so modify it to retain the original ->adjust_perf()
callback pointer.
Fixes: a365ab6b9d ("cpufreq: intel_pstate: Implement the ->adjust_perf() callback")
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pm-cpufreq:
cpufreq: intel_pstate: Use most recent guaranteed performance values
cpufreq: intel_pstate: Implement the ->adjust_perf() callback
cpufreq: Add special-purpose fast-switching callback for drivers
cpufreq: schedutil: Add util to struct sg_cpu
cppc_cpufreq: replace per-cpu data array with a list
cppc_cpufreq: expose information on frequency domains
cppc_cpufreq: clarify support for coordination types
cppc_cpufreq: use policy->cpu as driver of frequency setting
ACPI: processor: fix NONE coordination for domain mapping failure
ACPI: processor: Drop duplicate setting of shared_cpu_map
When turbo has been disabled by the BIOS, but HWP_CAP.GUARANTEED is
changed later, user space may want to take advantage of this increased
guaranteed performance.
HWP_CAP.GUARANTEED is not a static value. It can be adjusted by an
out-of-band agent or during an Intel Speed Select performance level
change. The HWP_CAP.MAX is still the maximum achievable performance
with turbo disabled by the BIOS, so HWP_CAP.GUARANTEED can still
change as long as it remains less than or equal to HWP_CAP.MAX.
When HWP_CAP.GUARANTEED is changed, the sysfs base_frequency
attribute shows the most recent guaranteed frequency value. This
attribute can be used by user space software to update the scaling
min/max limits of the CPU.
Currently, the ->setpolicy() callback already uses the latest
HWP_CAP values when setting HWP_REQ, but the ->verify() callback will
restrict the user settings to the to old guaranteed performance value
which prevents user space from making use of the extra CPU capacity
theoretically available to it after increasing HWP_CAP.GUARANTEED.
To address this, read HWP_CAP in intel_pstate_verify_cpu_policy()
to obtain the maximum P-state that can be used and use that to
confine the policy max limit instead of using the cached and
possibly stale pstate.max_freq value for this purpose.
For consistency, update intel_pstate_update_perf_limits() to use the
maximum available P-state returned by intel_pstate_get_hwp_max() to
compute the maximum frequency instead of using the return value of
intel_pstate_get_max_freq() which, again, may be stale.
This issue is a side-effect of fixing the scaling frequency limits in
commit eacc9c5a92 ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max()
for turbo disabled") which corrected the setting of the reduced scaling
frequency values, but caused stale HWP_CAP.GUARANTEED to be used in
the case at hand.
Fixes: eacc9c5a92 ("cpufreq: intel_pstate: Fix intel_pstate_get_hwp_max() for turbo disabled")
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 5.8+ <stable@vger.kernel.org> # 5.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Make intel_pstate expose the ->adjust_perf() callback when it
operates in the passive mode with HWP enabled which causes the
schedutil governor to use that callback instead of ->fast_switch().
The minimum and target performance-level values passed by the
governor to ->adjust_perf() are converted to HWP.REQ.MIN and
HWP.REQ.DESIRED, respectively, which allows the processor to
adjust its configuration to maximize energy-efficiency while
providing sufficient capacity.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
First off, some cpufreq drivers (eg. intel_pstate) can pass hints
beyond the current target frequency to the hardware and there are no
provisions for doing that in the cpufreq framework. In particular,
today the driver has to assume that it should not allow the frequency
to fall below the one requested by the governor (or the required
capacity may not be provided) which may not be the case and which may
lead to excessive energy usage in some scenarios.
Second, the hints passed by these drivers to the hardware need not be
in terms of the frequency, so representing the utilization numbers
coming from the scheduler as frequency before passing them to those
drivers is not really useful.
Address the two points above by adding a special-purpose replacement
for the ->fast_switch callback, called ->adjust_perf, allowing the
governor to pass abstract performance level (rather than frequency)
values for the minimum (required) and target (desired) performance
along with the CPU capacity to compare them to.
Also update the schedutil governor to use the new callback instead
of ->fast_switch if present and if the utilization mertics are
frequency-invariant (that is requisite for the direct mapping
between the utilization and the CPU performance levels to be a
reasonable approximation).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
The cppc_cpudata per-cpu storage was inefficient (1) additional to causing
functional issues (2) when CPUs are hotplugged out, due to per-cpu data
being improperly initialised.
(1) The amount of information needed for CPPC performance control in its
cpufreq driver depends on the domain (PSD) coordination type:
ANY: One set of CPPC control and capability data (e.g desired
performance, highest/lowest performance, etc) applies to all
CPUs in the domain.
ALL: Same as ANY. To be noted that this type is not currently
supported. When supported, information about which CPUs
belong to a domain is needed in order for frequency change
requests to be sent to each of them.
HW: It's necessary to store CPPC control and capability
information for all the CPUs. HW will then coordinate the
performance state based on their limitations and requests.
NONE: Same as HW. No HW coordination is expected.
Despite this, the previous initialisation code would indiscriminately
allocate memory for all CPUs (all_cpu_data) and unnecessarily
duplicate performance capabilities and the domain sharing mask and type
for each possible CPU.
(2) With the current per-cpu structure, when having ANY coordination,
the cppc_cpudata cpu information is not initialised (will remain 0)
for all CPUs in a policy, other than policy->cpu. When policy->cpu is
hotplugged out, the driver will incorrectly use the uninitialised (0)
value of the other CPUs when making frequency changes. Additionally,
the previous values stored in the perf_ctrls.desired_perf will be
lost when policy->cpu changes.
Therefore replace the array of per cpu data with a list. The memory for
each structure is allocated at policy init, where a single structure
can be allocated per policy, not per cpu. In order to accommodate the
struct list_head node in the cppc_cpudata structure, the now unused cpu
and cur_policy variables are removed.
For example, on a arm64 Juno platform with 6 CPUs: (0, 1, 2, 3) in PSD1,
(4, 5) in PSD2 - ANY coordination, the memory allocation comparison shows:
Before patch:
- ANY coordination:
total slack req alloc/free caller
0 0 0 0/1 _kernel_size_le_hi32+0x0xffff800008ff7810
0 0 0 0/6 _kernel_size_le_hi32+0x0xffff800008ff7808
128 80 48 1/0 _kernel_size_le_hi32+0x0xffff800008ffc070
768 0 768 6/0 _kernel_size_le_hi32+0x0xffff800008ffc0e4
After patch:
- ANY coordination:
total slack req alloc/free caller
256 0 256 2/0 _kernel_size_le_hi32+0x0xffff800008fed410
0 0 0 0/2 _kernel_size_le_hi32+0x0xffff800008fed274
Additional notes:
- A pointer to the policy's cppc_cpudata is stored in policy->driver_data
- Driver registration is skipped if _CPC entries are not present.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use the existing sysfs attribute "freqdomain_cpus" to expose
information to userspace about CPUs in the same frequency domain.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The previous coordination type handling in the cppc_cpufreq init code
created some confusion: the comment mentioned "Support only SW_ANY for
now" while only the SW_ALL/ALL case resulted in a failure. The other
coordination types (HW_ALL/HW, NONE) were silently supported.
Clarify support for coordination types while describing in comments the
intended behavior.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Considering only the currently supported coordination types (ANY, HW,
NONE), this change only makes a difference for the ANY type, when
policy->cpu is hotplugged out. In that case the new policy->cpu will
be different from ((struct cppc_cpudata *)policy->driver_data)->cpu.
While in this case the controls of *ANY* CPU could be used to drive
frequency changes, it's more consistent to use policy->cpu as the
leading CPU, as used in all other cppc_cpufreq functions. Additionally,
the debug prints in cppc_set_perf() would no longer create confusion
when referring to a CPU that is hotplugged out.
Signed-off-by: Ionela Voinescu <ionela.voinescu@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Mian Yousaf Kaukab <ykaukab@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull OPP (Operating Performance Points) updates for 5.11-rc1 from
Viresh Kumar:
"This contains the following updates:
- Allow empty (node-less) OPP tables in DT for passing just the
dependency related information (Nicola Mazzucato).
- Fix a potential lockdep in OPP core and other OPP core cleanups
(Viresh Kumar).
- Don't abuse dev_pm_opp_get_opp_table() to create an OPP table, fix
cpufreq-dt driver for the same (Viresh Kumar).
- dev_pm_opp_put_regulators() accepts a NULL argument now, updates to
all the users as well (Viresh Kumar)."
* 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
opp: of: Allow empty opp-table with opp-shared
dt-bindings: opp: Allow empty OPP tables
media: venus: dev_pm_opp_put_*() accepts NULL argument
drm/panfrost: dev_pm_opp_put_*() accepts NULL argument
drm/lima: dev_pm_opp_put_*() accepts NULL argument
PM / devfreq: exynos: dev_pm_opp_put_*() accepts NULL argument
cpufreq: qcom-cpufreq-nvmem: dev_pm_opp_put_*() accepts NULL argument
cpufreq: dt: dev_pm_opp_put_regulators() accepts NULL argument
opp: Allow dev_pm_opp_put_*() APIs to accept NULL opp_table
opp: Don't create an OPP table from dev_pm_opp_get_opp_table()
cpufreq: dt: Don't (ab)use dev_pm_opp_get_opp_table() to create OPP table
opp: Reduce the size of critical section in _opp_kref_release()
opp: Don't return opp_dev from _find_opp_dev()
opp: Allocate the OPP table outside of opp_table_lock
opp: Always add entries in dev_list with opp_table->lock held
Make cpufreq_online() return negative error codes on all errors that
cause the policy to be destroyed, as appropriate.
Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix up the remaining kerneldoc comments that don't adhere to the
expected format and clarify some of them a bit.
No functional changes.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
local_clock() has better precision and accuracy as compared to jiffies,
lets use it for time management in cpufreq stats.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Avoid doing the same assignment in both branches of a conditional,
do it after the whole conditional instead.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The dev_pm_opp_put_*() APIs now accepts a NULL opp_table pointer and so
there is no need for us to carry the extra checks. Drop them.
Reviewed-by: Ilia Lin <ilia.lin@kernel.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The dev_pm_opp_put_*() APIs now accepts a NULL opp_table pointer and so
there is no need for us to carry the extra checks. Drop them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Initially, the helper dev_pm_opp_get_opp_table() was supposed to be used
only for the OPP core's internal use (it tries to find an existing OPP
table and if it doesn't find one, then it allocates the OPP table).
Sometime back, the cpufreq-dt driver started using it to make sure all
the relevant resources required by the OPP core are available earlier
during initialization process to properly propagate -EPROBE_DEFER.
It worked but it also abused the API to create an OPP table, which
should be created with the help of other helpers provided by the OPP
core.
The OPP core will be updated in a later commit to limit the scope of
dev_pm_opp_get_opp_table() to only finding an existing OPP table and not
create one. This commit updates the cpufreq-dt driver before that
happens.
Now the cpufreq-dt driver creates the OPP and cpufreq tables for all the
CPUs from driver's init callback itself.
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Add mechanism to discover the power scale present in the performance
protocol for all domains. Provide this information to Energy Model,
which then can be checked in other frameworks, e.g. thermal.
Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The function tegra194_get_speed_common() uses hardware timers to
calculate the current CPUFREQ and so rename this function to be
tegra194_calculate_speed() to reflect what it does.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The Tegra194 CPUFREQ driver sets the CPUFREQ_NEED_INITIAL_FREQ_CHECK
flag which means that the CPUFREQ framework will call the 'get' callback
on boot to determine the current frequency of the CPUs. Therefore, it is
not necessary for the Tegra194 CPUFREQ driver to internally call the
tegra194_get_speed_common() during initialisation to query the current
frequency as well. Fix this by removing the call to the
tegra194_get_speed_common() during initialisation and simplify the code.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The CPUFREQ driver framework references each individual CPUs when
getting and setting the speed. Tegra186 has 3 clusters of A57 CPUs and
1 cluster of Denver CPUs. Hence, the Tegra186 CPUFREQ driver need to
know which cluster a given CPU belongs to. The logic in the Tegra186
driver can be greatly simplified by storing the cluster ID associated
with each CPU in the tegra186_cpufreq_cpu structure. This allow us to
completely remove the Tegra cluster info structure from the driver and
simplifiy the code.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Sparse warns that the incorrect type is being assigned to the CPUFREQ
driver_data variable in the Tegra186 CPUFREQ driver. The Tegra186
CPUFREQ driver is assigned a type of 'void __iomem *' to a pointer of
type 'void *' ...
drivers/cpufreq/tegra186-cpufreq.c:72:37: sparse: sparse: incorrect
type in assignment (different address spaces) @@
expected void *driver_data @@ got void [noderef] __iomem * @@
...
drivers/cpufreq/tegra186-cpufreq.c:87:40: sparse: sparse: incorrect
type in initializer (different address spaces) @@
expected void [noderef] __iomem *edvd_reg @@ got void *driver_data @@
The Tegra186 CPUFREQ driver is using the policy->driver_data variable to
store and iomem pointer to a Tegra186 CPU register that is used to set
the clock speed for the CPU. This is not necessary because the register
base address is already stored in the driver data and the offset of the
register for each CPU is static. Therefore, fix this by adding a new
structure with the register offsets for each CPU and store this in the
main driver data structure along with the register base address. Please
note that a new structure has been added for storing the register
offsets rather than a simple array, because this will permit further
clean-ups and simplification of the driver.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
A driver should not 'select' drivers from another subsystem.
If NVMEM is disabled, this one results in a warning:
WARNING: unmet direct dependencies detected for NVMEM_IMX_OCOTP
Depends on [n]: NVMEM [=n] && (ARCH_MXC [=y] || COMPILE_TEST [=y]) && HAS_IOMEM [=y]
Selected by [y]:
- ARM_IMX6Q_CPUFREQ [=y] && CPU_FREQ [=y] && (ARM || ARM64 [=y]) && ARCH_MXC [=y] && REGULATOR_ANATOP [=y]
Change the 'select' to 'depends on' to prevent it from going wrong,
and allow compile-testing without that driver, since it is only
a runtime dependency.
Fixes: 2782ef34ed ("cpufreq: imx: Select NVMEM_IMX_OCOTP")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq
driver when it is compiled as an external module.
Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 47ac9aa165 ("cpufreq: arm_big_little: add vexpress SPC interface driver")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq
driver when it is compiled as an external module.
Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 8def31034d ("cpufreq: arm_big_little: add SCPI interface driver")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This patch adds missing MODULE_ALIAS for automatic loading of this cpufreq
driver when it is compiled as an external module.
Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: a0a22cf144 ("cpufreq: Loongson1: Add cpufreq driver for Loongson1B")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this cpufreq driver when it is
compiled as an external module.
Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: f328584f7b ("cpufreq: Add sun50i nvmem based CPU scaling driver")
Reviewed-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this cpufreq driver when it is
compiled as an external module.
Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: ab0ea257fc ("cpufreq: st: Provide runtime initialised driver for ST's platforms")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this cpufreq driver when it is
compiled as an external module.
Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 46e2856b8e ("cpufreq: Add Kryo CPU scaling driver")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this cpufreq driver when it is
compiled as an external module.
Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 501c574f4e ("cpufreq: mediatek: Add support of cpufreq to MT2701/MT7623 SoC")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this cpufreq driver when it is
compiled as an external module.
Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: 6754f55610 ("cpufreq / highbank: add support for highbank cpufreq")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
This patch adds missing MODULE_DEVICE_TABLE definition which generates
correct modalias for automatic loading of this cpufreq driver when it is
compiled as an external module.
Signed-off-by: Pali Rohár <pali@kernel.org>
Fixes: f525a67053 ("cpufreq: ap806: add cpufreq driver for Armada 8K")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Add the missing platform_driver_unregister() before return from
mtk_cpufreq_driver_init in the error handling case when failed
to register mtk-cpufreq platform device
Signed-off-by: Qinglang Miao <miaoqinglang@huawei.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Frequency returned by 'cpuinfo_cur_freq' using counters is not fixed
and keeps changing slightly. This change returns a consistent value
from freq_table. If the reconstructed frequency has acceptable delta
from the last written value, then return the frequency corresponding
to the last written ndiv value from freq_table. Otherwise, print a
warning and return the reconstructed freq.
Signed-off-by: Sumit Gupta <sumitg@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Add MT8516 to cpufreq-dt-platdev blacklist since the actual scaling is
handled by the 'mediatek-cpufreq' driver.
Signed-off-by: Fabien Parent <fparent@baylibre.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Use dev_pm_opp_put_prop_name() to avoid mem leak, which free opp_table.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: Yangtao Li <frank@allwinnertech.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>