Get rid of the __call_single_node union and cleanup the API a little
to avoid external code relying on the structure layout as much.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Annotate tegra_pm_set[clear]_cpu_in_lp2() with RCU_NONIDLE in order to
fix lockdep warning about suspicious RCU usage of a spinlock during late
idling phase.
WARNING: suspicious RCU usage
...
include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage!
...
(dump_stack) from (lock_acquire)
(lock_acquire) from (_raw_spin_lock)
(_raw_spin_lock) from (tegra_pm_set_cpu_in_lp2)
(tegra_pm_set_cpu_in_lp2) from (tegra_cpuidle_enter)
(tegra_cpuidle_enter) from (cpuidle_enter_state)
(cpuidle_enter_state) from (cpuidle_enter_state_coupled)
(cpuidle_enter_state_coupled) from (cpuidle_enter)
(cpuidle_enter) from (do_idle)
...
Tested-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
To select domain idlestates for cpuidle-psci when OSI mode has been
enabled, the PM domains via genpd are being managed through runtime PM.
This works fine for the regular idlepath, but it doesn't during system wide
suspend. More precisely, the domain idlestates becomes temporarily
disabled, which is because the PM core disables runtime PM for devices
during system wide suspend.
Later in the system suspend phase, genpd intends to deal with this from its
->suspend_noirq() callback, but this doesn't work as expected for a device
corresponding to a CPU, because the domain idlestates needs to be selected
on a per CPU basis (the PM core doesn't invoke the callbacks like that).
To address this problem, let's enable the syscore flag for the
corresponding CPU device that becomes successfully attached to its PM
domain (applicable only in OSI mode). This informs the PM core to skip
invoke the system wide suspend/resume callbacks for the device, thus also
prevents genpd from screwing up its internal state of it.
Moreover, to properly select a domain idlestate for the CPUs during
suspend-to-idle, let's assign a specific ->enter_s2idle() callback for the
corresponding domain idlestate (applicable only in OSI mode). From that
callback, let's invoke dev_pm_genpd_suspend|resume(), as this allows a
domain idlestate to be selected for the current CPU by genpd.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This driver always worked properly only on the Exynos 5420/5800 based
Chromebooks (Peach-Pit/Pi), so change the required compatible string to
the 'google,peach', to avoid enabling it on the other Exynos 542x/5800
boards, which hangs in such case. The main difference between Peach-Pit/Pi
and other Exynos 542x/5800 boards is the firmware - Peach platform doesn't
use secure firmware at all.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
- A series from Nick adding ARCH_WANT_IRQS_OFF_ACTIVATE_MM & selecting it for
powerpc, as well as a related fix for sparc.
- Remove support for PowerPC 601.
- Some fixes for watchpoints & addition of a new ptrace flag for detecting ISA
v3.1 (Power10) watchpoint features.
- A fix for kernels using 4K pages and the hash MMU on bare metal Power9
systems with > 16TB of RAM, or RAM on the 2nd node.
- A basic idle driver for shallow stop states on Power10.
- Tweaks to our sched domains code to better inform the scheduler about the
hardware topology on Power9/10, where two SMT4 cores can be presented by
firmware as an SMT8 core.
- A series doing further reworks & cleanups of our EEH code.
- Addition of a filter for RTAS (firmware) calls done via sys_rtas(), to
prevent root from overwriting kernel memory.
- Other smaller features, fixes & cleanups.
Thanks to:
Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Athira Rajeev, Biwen
Li, Cameron Berkenpas, Cédric Le Goater, Christophe Leroy, Christoph Hellwig,
Colin Ian King, Daniel Axtens, David Dai, Finn Thain, Frederic Barrat, Gautham
R. Shenoy, Greg Kurz, Gustavo Romero, Ira Weiny, Jason Yan, Joel Stanley,
Jordan Niethe, Kajol Jain, Konrad Rzeszutek Wilk, Laurent Dufour, Leonardo
Bras, Liu Shixin, Luca Ceresoli, Madhavan Srinivasan, Mahesh Salgaonkar,
Nathan Lynch, Nicholas Mc Guire, Nicholas Piggin, Nick Desaulniers, Oliver
O'Halloran, Pedro Miraglia Franco de Carvalho, Pratik Rajesh Sampat, Qian Cai,
Qinglang Miao, Ravi Bangoria, Russell Currey, Satheesh Rajendran, Scott
Cheloha, Segher Boessenkool, Srikar Dronamraju, Stan Johnson, Stephen Kitt,
Stephen Rothwell, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain,
Vaidyanathan Srinivasan, Vasant Hegde, Wang Wensheng, Wolfram Sang, Yang
Yingliang, zhengbin.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl+JBQoTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgJJAD/0e3tsFP+9rFlxKSJlDcMW3w7kXDRXE
tG40F1ubYFLU8wtFVR0De3njTRsz5HyaNU6SI8CwPq48mCa7OFn1D1OeHonHXDX9
w6v3GE2S1uXXQnjm+czcfdjWQut0IwWBLx007/S23WcPff3Abc2irupKLNu+Gx29
b/yxJHZSRJVX59jSV94HkdJS75mDHQ3oUOlFGXtuGcUZDufpD1ynRcQOjr0V/8JU
F4WAblFSe7hiczHGqIvfhFVJ+OikEhnj2aEMAL8U7vxzrAZ7RErKCN9s/0Tf0Ktx
FzNEFNLHZGqh+qNDpKKmM+RnaeO2Lcoc9qVn7vMHOsXPzx9F5LJwkI/DgPjtgAq/
mFvGnQB/FapATnQeMluViC/qhEe5bQXLUfPP5i2+QOjK0QqwyFlUMgaVNfsY8jRW
0Q/sNA72Opzst4WUTveCd4SOInlUuat09e5nLooCRLW7u7/jIiXNRSFNvpOiwkfF
EcIPJsi6FUQ4SNbqpRSNEO9fK5JZrrUtmr0pg8I7fZhHYGcxEjqPR6IWCs3DTsak
4/KhjhhTnP/IWJRw6qKAyNhEyEwpWqYZ97SIQbvSb1g/bS47AIdQdJRb0eEoRjhx
sbbnnYFwPFkG4c1yQSIFanT9wNDQ2hFx/c/mRfbd7J+ordx9JsoqXjqrGuhsU/pH
GttJLmkJ5FH+pQ==
=akeX
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- A series from Nick adding ARCH_WANT_IRQS_OFF_ACTIVATE_MM & selecting
it for powerpc, as well as a related fix for sparc.
- Remove support for PowerPC 601.
- Some fixes for watchpoints & addition of a new ptrace flag for
detecting ISA v3.1 (Power10) watchpoint features.
- A fix for kernels using 4K pages and the hash MMU on bare metal
Power9 systems with > 16TB of RAM, or RAM on the 2nd node.
- A basic idle driver for shallow stop states on Power10.
- Tweaks to our sched domains code to better inform the scheduler about
the hardware topology on Power9/10, where two SMT4 cores can be
presented by firmware as an SMT8 core.
- A series doing further reworks & cleanups of our EEH code.
- Addition of a filter for RTAS (firmware) calls done via sys_rtas(),
to prevent root from overwriting kernel memory.
- Other smaller features, fixes & cleanups.
Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V,
Athira Rajeev, Biwen Li, Cameron Berkenpas, Cédric Le Goater, Christophe
Leroy, Christoph Hellwig, Colin Ian King, Daniel Axtens, David Dai, Finn
Thain, Frederic Barrat, Gautham R. Shenoy, Greg Kurz, Gustavo Romero,
Ira Weiny, Jason Yan, Joel Stanley, Jordan Niethe, Kajol Jain, Konrad
Rzeszutek Wilk, Laurent Dufour, Leonardo Bras, Liu Shixin, Luca
Ceresoli, Madhavan Srinivasan, Mahesh Salgaonkar, Nathan Lynch, Nicholas
Mc Guire, Nicholas Piggin, Nick Desaulniers, Oliver O'Halloran, Pedro
Miraglia Franco de Carvalho, Pratik Rajesh Sampat, Qian Cai, Qinglang
Miao, Ravi Bangoria, Russell Currey, Satheesh Rajendran, Scott Cheloha,
Segher Boessenkool, Srikar Dronamraju, Stan Johnson, Stephen Kitt,
Stephen Rothwell, Thiago Jung Bauermann, Tyrel Datwyler, Vaibhav Jain,
Vaidyanathan Srinivasan, Vasant Hegde, Wang Wensheng, Wolfram Sang, Yang
Yingliang, zhengbin.
* tag 'powerpc-5.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (228 commits)
Revert "powerpc/pci: unmap legacy INTx interrupts when a PHB is removed"
selftests/powerpc: Fix eeh-basic.sh exit codes
cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier
powerpc/time: Make get_tb() common to PPC32 and PPC64
powerpc/time: Make get_tbl() common to PPC32 and PPC64
powerpc/time: Remove get_tbu()
powerpc/time: Avoid using get_tbl() and get_tbu() internally
powerpc/time: Make mftb() common to PPC32 and PPC64
powerpc/time: Rename mftbl() to mftb()
powerpc/32s: Remove #ifdef CONFIG_PPC_BOOK3S_32 in head_book3s_32.S
powerpc/32s: Rename head_32.S to head_book3s_32.S
powerpc/32s: Setup the early hash table at all time.
powerpc/time: Remove ifdef in get_dec() and set_dec()
powerpc: Remove get_tb_or_rtc()
powerpc: Remove __USE_RTC()
powerpc: Tidy up a bit after removal of PowerPC 601.
powerpc: Remove support for PowerPC 601
powerpc: Remove PowerPC 601
powerpc: Drop SYNC_601() ISYNC_601() and SYNC()
powerpc: Remove CONFIG_PPC601_SYNC_FIX
...
CPUs may fail to enter the chosen idle state if there was a
pending interrupt, causing the cpuidle driver to return an error
value.
Record that and export it via sysfs along with the other idle state
statistics.
This could prove useful in understanding behavior of the governor
and the system during usecases that involve multiple CPUs.
Signed-off-by: Lina Iyer <ilina@codeaurora.org>
[ rjw: Changelog and documentation edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The commit 1098582a0f ("sched,idle,rcu: Push rcu_idle deeper into the
idle path"), moved the calls rcu_idle_enter|exit() into the cpuidle core.
However, it forgot to remove a couple of comments in enter_s2idle_proper()
about why RCU_NONIDLE earlier was needed. So, let's drop them as they have
become a bit misleading.
Fixes: 1098582a0f ("sched,idle,rcu: Push rcu_idle deeper into the idle path")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If the PSCI OSI mode isn't supported or fails to be enabled, the PM domain
topology with the genpd providers isn't initialized. This is perfectly fine
from cpuidle-psci point of view.
However, since the PM domain topology in the DTS files is a description of
the HW, no matter of whether the PSCI OSI mode is supported or not, other
consumers besides the CPUs may rely on it.
Therefore, let's always allow the initialization of the PM domain topology
to succeed, independently of whether the PSCI OSI mode is supported.
Consequentially we need to track if we succeed to enable the OSI mode, as
to know when a domain idlestate can be selected.
Note that, CPU devices are still not being attached to the PM domain
topology, unless the PSCI OSI mode is supported.
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The current user (cpuidle-psci) of psci_set_osi_mode() only needs to enable
the PSCI OSI mode. Although, as subsequent changes shows, there is a need
to be able to reset back into the PSCI PC mode.
Therefore, let's extend psci_set_osi_mode() to take a bool as in-parameter,
to let the user indicate whether to enable OSI or to switch back to PC
mode.
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The enter() callback of CPUIDLE drivers returns index of the entered idle
state on success or a negative value on failure. The negative value could
any negative value, i.e. it doesn't necessarily needs to be a error code.
That's because CPUIDLE core only cares about the fact of failure and not
about the reason of the enter() failure.
Like every other enter() callback, the arm_cpuidle_simple_enter() returns
the entered idle-index on success. Unlike some of other drivers, it never
fails. It happened that TEGRA_C1=index=err=0 in the code of cpuidle-tegra
driver, and thus, there is no problem for the cpuidle-tegra driver created
by the typo in the code which assumes that the arm_cpuidle_simple_enter()
returns a error code.
The arm_cpuidle_simple_enter() also may return a -ENODEV error if CPU_IDLE
is disabled in a kernel's config, but all CPUIDLE drivers are disabled if
CPU_IDLE is disabled, including the cpuidle-tegra driver. So we can't ever
see the error code from arm_cpuidle_simple_enter() today.
Of course the code may get some changes in the future and then the
typo may transform into a real bug, so let's correct the typo! The
tegra_cpuidle_state_enter() is now changed to make it return the entered
idle-index on success and negative error code on fail, which puts it on
par with the arm_cpuidle_simple_enter(), making code consistent in regards
to the error handling.
This patch fixes a minor typo in the code, it doesn't fix any bugs.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Reviewed-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The commit eb1f00237a ("lockdep,trace: Expose tracepoints"), started to
expose us for tracepoints. This lead to the following RCU splat on an ARM64
Qcom board.
[ 5.529634] WARNING: suspicious RCU usage
[ 5.537307] sdhci-pltfm: SDHCI platform and OF driver helper
[ 5.541092] 5.9.0-rc3 #86 Not tainted
[ 5.541098] -----------------------------
[ 5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
[ 5.541110]
[ 5.541110] other info that might help us debug this:
[ 5.541110]
[ 5.541116]
[ 5.541116] rcu_scheduler_active = 2, debug_locks = 1
[ 5.541122] RCU used illegally from extended quiescent state!
[ 5.541129] no locks held by swapper/0/0.
[ 5.541134]
[ 5.541134] stack backtrace:
[ 5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86
[ 5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[ 5.541157] Call trace:
[ 5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO
[ 5.574186] dump_backtrace+0x0/0x1c8
[ 5.574206] show_stack+0x14/0x20
[ 5.574229] dump_stack+0xe8/0x154
[ 5.574250] lockdep_rcu_suspicious+0xd4/0xf8
[ 5.574269] lock_acquire+0x3f0/0x460
[ 5.574292] _raw_spin_lock_irqsave+0x80/0xb0
[ 5.574314] __pm_runtime_suspend+0x4c/0x188
[ 5.574341] psci_enter_domain_idle_state+0x40/0xa0
[ 5.574362] cpuidle_enter_state+0xc0/0x610
[ 5.646487] cpuidle_enter+0x38/0x50
[ 5.650651] call_cpuidle+0x18/0x40
[ 5.654467] do_idle+0x228/0x278
[ 5.657678] cpu_startup_entry+0x24/0x70
[ 5.661153] rest_init+0x1a4/0x278
[ 5.665061] arch_call_rest_init+0xc/0x14
[ 5.668272] start_kernel+0x508/0x540
Following the path in pm_runtime_put_sync_suspend() from
psci_enter_domain_idle_state(), it seems like we end up using the RCU.
Therefore, let's simply silence the splat by informing the RCU about it
with RCU_NONIDLE.
Note that, this is a temporary solution. Instead we should strive to avoid
using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit()
further down, closer to the arch specific code. However, as the CPU PM
notifiers are also using the RCU, additional rework is needed.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Opt us out of the DEBUG_VM_PGTABLE support for now as it's causing crashes.
Fix a long standing bug in our DMA mask handling that was hidden until recently,
and which caused problems with some drivers.
Fix a boot failure on systems with large amounts of RAM, and no hugepage support
and using Radix MMU, only seen in the lab.
A few other minor fixes.
Thanks to:
Alexey Kardashevskiy, Aneesh Kumar K.V, Gautham R. Shenoy, Hari Bathini, Ira
Weiny, Nick Desaulniers, Shirisha Ganta, Vaibhav Jain, Vaidyanathan
Srinivasan.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl9kk4UTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLM5D/42Wuuq6hOpGEfE2XjBjsOxCR07SCw9
CQ/c72jw/1tDLe0YclPVYAZc8BmT8uLo6tE2Ot+0vI+1Y06rRvC5g5uQBwp1zD/t
MOwC0d4zf8a7WzBCcVaBv9HMHVOaKaTBwQc2R4k6NzYtARIf5m0evMPOWINioRsv
/x4+Np8aeQd1WiVn6PBdqL8w1yRhk8LsVDvX35lFzQlgZvH09umXSGjw9K442xdE
lr1PrV9GKd4DeudwLHPkMNs8Ul1QTxmY5vKIAklsJ5g3dBySfM7+GMrTzYOHYkWr
aqGfGH6ojdFSQZRo7QwFhO52Kni7JN7AIoUPEBDqLb1fR10w8wesdjCs8JQMXIMc
8Eo210EbiSsq6kG/LzqZuStLUAup3rQd20+wWua7jo8HbcZOLDH7pPGwNtJInPTr
gwH7sALYhTzFUAO4LzqVVE+yA8wndFPHoz+QSkO6LZJKuszON2LJ0r+IEx1l9Wmr
ClZYujK36/N+ih42xBFcBZi2lL0VKzlkn7u7NDeZQBONgoJMCoWkGDkEHnMI9zdT
1iYcVZyY+IpJLcwxH+NP8weJKHIZbP4kvuFXR/QIpHdrDWKaTT95BvBhAK1KMYBo
lLo81zMTzJCz2wWDg/+3sLuEzHdGr//uxcVXxjqE2vyEckeuZjiCyz0dRNEY4Sw3
vB2p3Zl7L0J4pQ==
=cj6B
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Some more powerpc fixes for 5.9:
- Opt us out of the DEBUG_VM_PGTABLE support for now as it's causing
crashes.
- Fix a long standing bug in our DMA mask handling that was hidden
until recently, and which caused problems with some drivers.
- Fix a boot failure on systems with large amounts of RAM, and no
hugepage support and using Radix MMU, only seen in the lab.
- A few other minor fixes.
Thanks to Alexey Kardashevskiy, Aneesh Kumar K.V, Gautham R. Shenoy,
Hari Bathini, Ira Weiny, Nick Desaulniers, Shirisha Ganta, Vaibhav
Jain, and Vaidyanathan Srinivasan"
* tag 'powerpc-5.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/papr_scm: Limit the readability of 'perf_stats' sysfs attribute
cpuidle: pseries: Fix CEDE latency conversion from tb to us
powerpc/dma: Fix dma_map_ops::get_required_mask
Revert "powerpc/build: vdso linker warning for orphan sections"
powerpc/mm: Remove DEBUG_VM_PGTABLE support on powerpc
selftests/powerpc: Skip PROT_SAO test in guests/LPARS
powerpc/book3s64/radix: Fix boot failure with large amount of guest memory
Some drivers have to do significant work, some of which relies on RCU
still being active. Instead of using RCU_NONIDLE in the drivers and
flipping RCU back on, allow drivers to take over RCU-idle duty.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This driver does not restore stop > 3 state, so it limits itself
to states which do not lose full state or TB.
The POWER10 SPRs are sufficiently different from P9 that it seems
easier to split out the P10 code. The POWER10 deep sleep code
(e.g., the BHRB restore) has been taken out, but it can be re-added
when stop > 3 support is added.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Tested-by: Pratik Rajesh Sampat<psampat@linux.ibm.com>
Tested-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Reviewed-by: Pratik Rajesh Sampat<psampat@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200819094700.493399-1-npiggin@gmail.com
Commit d947fb4c96 ("cpuidle: pseries: Fixup exit latency for
CEDE(0)") sets the exit latency of CEDE(0) based on the latency values
of the Extended CEDE states advertised by the platform. The values
advertised by the platform are in timebase ticks. However the cpuidle
framework requires the latency values in microseconds.
If the tb-ticks value advertised by the platform correspond to a value
smaller than 1us, during the conversion from tb-ticks to microseconds,
in the current code, the result becomes zero. This is incorrect as it
puts a CEDE state on par with the snooze state.
This patch fixes this by rounding up the result obtained while
converting the latency value from tb-ticks to microseconds. It also
prints a warning in case we discover an extended-cede state with
wakeup latency to be 0. In such a case, ensure that CEDE(0) has a
non-zero wakeup latency.
Fixes: d947fb4c96 ("cpuidle: pseries: Fixup exit latency for CEDE(0)")
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Reviewed-by: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1599125247-28488-1-git-send-email-ego@linux.vnet.ibm.com
This allows moving the leave_mm() call into generic code before
rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org
Lots of things take locks, due to a wee bug, rcu_lockdep didn't notice
that the locking tracepoints were using RCU.
Push rcu_idle_{enter,exit}() as deep as possible into the idle paths,
this also resolves a lot of _rcuidle()/RCU_NONIDLE() usage.
Specifically, sched_clock_idle_wakeup_event() will use ktime which
will use seqlocks which will tickle lockdep, and
stop_critical_timings() uses lock.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.310943801@infradead.org
Match the pattern elsewhere in this file.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200821085348.251340558@infradead.org
- Add support for (optionally) using queued spinlocks & rwlocks.
- Support for a new faster system call ABI using the scv instruction on Power9
or later.
- Drop support for the PROT_SAO mmap/mprotect flag as it will be unsupported on
Power10 and future processors, leaving us with no way to implement the
functionality it requests. This risks breaking userspace, though we believe
it is unused in practice.
- A bug fix for, and then the removal of, our custom stack expansion checking.
We now allow stack expansion up to the rlimit, like other architectures.
- Remove the remnants of our (previously disabled) topology update code, which
tried to react to NUMA layout changes on virtualised systems, but was prone
to crashes and other problems.
- Add PMU support for Power10 CPUs.
- A change to our signal trampoline so that we don't unbalance the link stack
(branch return predictor) in the signal delivery path.
- Lots of other cleanups, refactorings, smaller features and so on as usual.
Thanks to:
Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey Kardashevskiy,
Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju T Sudhakar, Anton
Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan S, Bharata B Rao, Bill
Wendling, Bin Meng, Cédric Le Goater, Chris Packham, Christophe Leroy,
Christoph Hellwig, Daniel Axtens, Dan Williams, David Lamparter, Desnes A.
Nunes do Rosario, Erhard F., Finn Thain, Frederic Barrat, Ganesh Goudar,
Gautham R. Shenoy, Geoff Levand, Greg Kurz, Gustavo A. R. Silva, Hari Bathini,
Harish, Imre Kaloz, Joel Stanley, Joe Perches, John Crispin, Jordan Niethe,
Kajol Jain, Kamalesh Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li
RongQing, Madhavan Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal
Suchanek, Milton Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan
Chancellor, Nathan Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver
O'Halloran, Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe
Bergheaud, Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar Dronamraju,
Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza Cascardo, Thiago Jung
Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov, Wei Yongjun, Wen Xiong,
YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl8tOxATHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgDQfEAClXHWf6hnxB84bEu39D51NkVotL1IG
BRWFvyix+xHuUkHIouBPAAMl6ngY5X6wkYd+Z+CY9zHNtdSDoVlJE30YXdMQA/dE
L/rYxR1884yGR/uU/3wusboO68ReXwcKQPmKOymUfh0zH7ujyJsSWLpXFK1YDC5d
2TVVTi0Q+P5ucMHDh0L+AHirIxZvtZSp43+J7xLtywsj+XAxJWCTGo5WCJbdgbCA
Qbv3aOkVyUa3EgsbdM/STPpv82ebqT+PHxeSIO4Jw6ZODtKRH0R5YsWCApuY9eZ+
ebY9RLmgv9ZAhJqB2fv9A5NDcMoGpZNmjM7HrWpXwULKQpkBGHCzJ9FcSdHVMOx8
nbVMFjt4uzLwV1w8lFYslQ2tNH/uH2o9BlryV1RLpiiKokDAJO/NOsWN9y0u/I4J
EmAM5DSX2LgVvvas96IlGK8KX4xkOkf8FLX/H5UDvvAfloH8J4CZXk/CWCab/nqY
KEHPnMmYvQZ1w9SzyZg9sO/1p6Bl1Gmm75Jv2F1lBiRW/42VcGBI/qLsJ4lC59Fc
KbwufYNYYG38wbxDLW1HAPJhRonxIcaZj3EEqk7aTiLZ55nNbu8e2k32CpNXTGqt
npOhzJHimcq7L6+878ZW+xpbZwogIEUdRSsmwb6aT8za3ShnYwSA2Q3LYxh9xyGH
j3GifvPq6Efp3Q==
=QMY1
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add support for (optionally) using queued spinlocks & rwlocks.
- Support for a new faster system call ABI using the scv instruction on
Power9 or later.
- Drop support for the PROT_SAO mmap/mprotect flag as it will be
unsupported on Power10 and future processors, leaving us with no way
to implement the functionality it requests. This risks breaking
userspace, though we believe it is unused in practice.
- A bug fix for, and then the removal of, our custom stack expansion
checking. We now allow stack expansion up to the rlimit, like other
architectures.
- Remove the remnants of our (previously disabled) topology update
code, which tried to react to NUMA layout changes on virtualised
systems, but was prone to crashes and other problems.
- Add PMU support for Power10 CPUs.
- A change to our signal trampoline so that we don't unbalance the link
stack (branch return predictor) in the signal delivery path.
- Lots of other cleanups, refactorings, smaller features and so on as
usual.
Thanks to: Abhishek Goel, Alastair D'Silva, Alexander A. Klimov, Alexey
Kardashevskiy, Alistair Popple, Andrew Donnellan, Aneesh Kumar K.V, Anju
T Sudhakar, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Balamuruhan
S, Bharata B Rao, Bill Wendling, Bin Meng, Cédric Le Goater, Chris
Packham, Christophe Leroy, Christoph Hellwig, Daniel Axtens, Dan
Williams, David Lamparter, Desnes A. Nunes do Rosario, Erhard F., Finn
Thain, Frederic Barrat, Ganesh Goudar, Gautham R. Shenoy, Geoff Levand,
Greg Kurz, Gustavo A. R. Silva, Hari Bathini, Harish, Imre Kaloz, Joel
Stanley, Joe Perches, John Crispin, Jordan Niethe, Kajol Jain, Kamalesh
Babulal, Kees Cook, Laurent Dufour, Leonardo Bras, Li RongQing, Madhavan
Srinivasan, Mahesh Salgaonkar, Mark Cave-Ayland, Michal Suchanek, Milton
Miller, Mimi Zohar, Murilo Opsfelder Araujo, Nathan Chancellor, Nathan
Lynch, Naveen N. Rao, Nayna Jain, Nicholas Piggin, Oliver O'Halloran,
Palmer Dabbelt, Pedro Miraglia Franco de Carvalho, Philippe Bergheaud,
Pingfan Liu, Pratik Rajesh Sampat, Qian Cai, Qinglang Miao, Randy
Dunlap, Ravi Bangoria, Sachin Sant, Sam Bobroff, Sandipan Das, Santosh
Sivaraj, Satheesh Rajendran, Shirisha Ganta, Sourabh Jain, Srikar
Dronamraju, Stan Johnson, Stephen Rothwell, Thadeu Lima de Souza
Cascardo, Thiago Jung Bauermann, Tom Lane, Vaibhav Jain, Vladis Dronov,
Wei Yongjun, Wen Xiong, YueHaibing.
* tag 'powerpc-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (337 commits)
selftests/powerpc: Fix pkey syscall redefinitions
powerpc: Fix circular dependency between percpu.h and mmu.h
powerpc/powernv/sriov: Fix use of uninitialised variable
selftests/powerpc: Skip vmx/vsx/tar/etc tests on older CPUs
powerpc/40x: Fix assembler warning about r0
powerpc/papr_scm: Add support for fetching nvdimm 'fuel-gauge' metric
powerpc/papr_scm: Fetch nvdimm performance stats from PHYP
cpuidle: pseries: Fixup exit latency for CEDE(0)
cpuidle: pseries: Add function to parse extended CEDE records
cpuidle: pseries: Set the latency-hint before entering CEDE
selftests/powerpc: Fix online CPU selection
powerpc/perf: Consolidate perf_callchain_user_[64|32]()
powerpc/pseries/hotplug-cpu: Remove double free in error path
powerpc/pseries/mobility: Add pr_debug() for device tree changes
powerpc/pseries/mobility: Set pr_fmt()
powerpc/cacheinfo: Warn if cache object chain becomes unordered
powerpc/cacheinfo: Improve diagnostics about malformed cache lists
powerpc/cacheinfo: Use name@unit instead of full DT path in debug messages
powerpc/cacheinfo: Set pr_fmt()
powerpc: fix function annotations to avoid section mismatch warnings with gcc-10
...
We are currently assuming that CEDE(0) has exit latency 10us, since
there is no way for us to query from the platform. However, if the
wakeup latency of an Extended CEDE state is smaller than 10us, then we
can be sure that the exit latency of CEDE(0) cannot be more than that.
In this patch, we fix the exit latency of CEDE(0) if we discover an
Extended CEDE state with wakeup latency smaller than 10us.
Benchmark results:
On POWER8, this patch does not have any impact since the advertized
latency of Extended CEDE (1) is 30us which is higher than the default
latency of CEDE (0) which is 10us.
On POWER9 we see improvement the single-threaded performance of
ebizzy, and no regression in the wakeup latency or the number of
context-switches.
ebizzy:
2 ebizzy threads bound to the same big-core. 25% improvement in the
avg records/s with patch.
x without_patch
* with_patch
N Min Max Median Avg Stddev
x 10 2491089 5834307 5398375 4244335 1596244.9
* 10 2893813 5834474 5832448 5327281.3 1055941.4
context_switch2:
There is no major regression observed with this patch as seen from the
context_switch2 benchmark.
context_switch2 across CPU0 CPU1 (Both belong to same big-core, but
different small cores). We observe a minor 0.14% regression in the
number of context-switches (higher is better).
x without_patch
* with_patch
N Min Max Median Avg Stddev
x 500 348872 362236 354712 354745.69 2711.827
* 500 349422 361452 353942 354215.4 2576.9258
Difference at 99.0% confidence
-530.288 +/- 430.963
-0.149484% +/- 0.121485%
(Student's t, pooled s = 2645.24)
context_switch2 across CPU0 CPU8 (Different big-cores). We observe a
0.37% improvement in the number of context-switches (higher is
better).
x without_patch
* with_patch
N Min Max Median Avg Stddev
x 500 287956 294940 288896 288977.23 646.59295
* 500 288300 294646 289582 290064.76 1161.9992
Difference at 99.0% confidence
1087.53 +/- 153.194
0.376337% +/- 0.0530125%
(Student's t, pooled s = 940.299)
schbench:
No major difference could be seen until the 99.9th percentile.
Without-patch:
Latency percentiles (usec)
50.0th: 29
75.0th: 39
90.0th: 49
95.0th: 59
*99.0th: 13104
99.5th: 14672
99.9th: 15824
min=0, max=17993
With-patch:
Latency percentiles (usec)
50.0th: 29
75.0th: 40
90.0th: 50
95.0th: 61
*99.0th: 13648
99.5th: 14768
99.9th: 15664
min=0, max=29812
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
[mpe: Minor formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1596087177-30329-4-git-send-email-ego@linux.vnet.ibm.com
Currently we use CEDE with latency-hint 0 as the only other idle state
on a dedicated LPAR apart from the polling "snooze" state.
The platform might support additional extended CEDE idle states, which
can be discovered through the "ibm,get-system-parameter" rtas-call
made with CEDE_LATENCY_TOKEN.
This patch adds a function to obtain information about the extended
CEDE idle states from the platform and parse the contents to populate
an array of extended CEDE states. These idle states thus discovered
will be added to the cpuidle framework in the next patch.
dmesg on a POWER8 and POWER9 LPAR, demonstrating the output of parsing
the extended CEDE latency parameters are as follows
POWER8
[ 10.093279] xcede : xcede_record_size = 10
[ 10.093285] xcede : Record 0 : hint = 1, latency = 0x3c00 tb ticks, Wake-on-irq = 1
[ 10.093291] xcede : Record 1 : hint = 2, latency = 0x4e2000 tb ticks, Wake-on-irq = 0
[ 10.093297] cpuidle : Skipping the 2 Extended CEDE idle states
POWER9
[ 5.913180] xcede : xcede_record_size = 10
[ 5.913183] xcede : Record 0 : hint = 1, latency = 0x400 tb ticks, Wake-on-irq = 1
[ 5.913188] xcede : Record 1 : hint = 2, latency = 0x3e8000 tb ticks, Wake-on-irq = 0
[ 5.913193] cpuidle : Skipping the 2 Extended CEDE idle states
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
[mpe: Make space for 16 records, drop memset, minor cleanup & formatting]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1596087177-30329-3-git-send-email-ego@linux.vnet.ibm.com
As per the PAPR, each H_CEDE call is associated with a latency-hint to
be passed in the VPA field "cede_latency_hint". The CEDE states that
we were implicitly entering so far is CEDE with latency-hint = 0.
This patch explicitly sets the latency hint corresponding to the CEDE
state that we are currently entering. While at it, we save the
previous hint, to be restored once we wakeup from CEDE. This will be
required in the future when we expose extended-cede states through the
cpuidle framework, where each of them will have a different
cede-latency hint.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
[mpe: Make cede_latency_hint static]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1596087177-30329-2-git-send-email-ego@linux.vnet.ibm.com
Control Flow Integrity(CFI) is a security mechanism that disallows
changes to the original control flow graph of a compiled binary,
making it significantly harder to perform such attacks.
init_state_node() assign same function callback to different
function pointer declarations.
static int init_state_node(struct cpuidle_state *idle_state,
const struct of_device_id *matches,
struct device_node *state_node) { ...
idle_state->enter = match_id->data; ...
idle_state->enter_s2idle = match_id->data; }
Function declarations:
struct cpuidle_state { ...
int (*enter) (struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index);
void (*enter_s2idle) (struct cpuidle_device *dev,
struct cpuidle_driver *drv,
int index); };
In this case, either enter() or enter_s2idle() would cause CFI check
failed since they use same callee.
Align function prototype of enter() since it needs return value for
some use cases. The return value of enter_s2idle() is no
need currently.
Signed-off-by: Neal Liu <neal.liu@mediatek.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Depending on the SoC/platform, additional devices may be part of the PSCI
PM domain topology. This is the case with 'qcom,rpmh-rsc' device, for
example, even if this is not yet visible in the corresponding DTS-files.
Without going into too much details, a device like the 'qcom,rpmh-rsc' may
have HW constraints that needs to be obeyed to, before a domain idlestate
can be picked.
Therefore, let's implement the ->sync_state() callback to receive a
notification when all consumers of the PSCI PM domain providers have been
attached/probed to it. In this way, we can make sure all constraints from
all relevant devices, are taken into account before allowing a domain
idlestate to be picked.
Acked-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
To enable support for deferred probing and to allow implementation of the
->sync_state() callback from subsequent changes, let's convert into a
platform driver.
Reviewed-by: Lina Iyer <ilina@codeaurora.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The current error paths for the cpuidle-psci driver, may leak memory or
possibly leave CPU devices attached to their PM domains. These are quite
harmless issues, but still deserves to be taken care of.
Although, rather than fixing them by keeping track of allocations that
needs to be freed, which tends to become a bit messy, let's convert into a
platform driver. In this way, it gets easier to fix the memory leaks as we
can rely on the devm_* functions.
Moreover, converting to a platform driver also enables support for deferred
probe, which subsequent changes takes benefit from.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently we allow the cpuidle driver registration to succeed, even if we
failed to enable the OSI mode when the hierarchical DT layout is used. This
means running in a degraded mode, by using the available idle states per
CPU, while also preventing the domain idle states.
Moving forward, this behaviour looks quite questionable to maintain, as
complexity seems to grow around it, especially when trying to add support
for deferred probe, for example.
Therefore, let's make the cpuidle driver registration to fail in this
situation, thus relying on the default architectural cpuidle backend for
WFI to be used.
Reviewed-by: Lina Iyer <ilina@codeaurora.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The combined build object for the PSCI cpuidle driver and the PSCI PM
domain, is a bit messy. Therefore let's split it up by adding a new Kconfig
ARM_PSCI_CPUIDLE_DOMAIN and convert into two separate objects.
Reviewed-by: Lina Iyer <ilina@codeaurora.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The sparse tool complains as follows:
drivers/cpuidle/cpuidle-pseries.c:25:23: warning:
symbol 'pseries_idle_driver' was not declared. Should it be static?
'pseries_idle_driver' is not used outside of this file, so marks
it static.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200714142424.66648-1-weiyongjun1@huawei.com
Implement call_cpuidle_s2idle() in analogy with call_cpuidle()
for the s2idle-specific idle state entry and invoke it from
cpuidle_idle_call() to make the s2idle-specific idle entry code
path look more similar to the "regular" idle entry one.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Chen Yu <yu.c.chen@intel.com>
Suspend to idle was found to not work on Goldmont CPU recently.
The issue happens due to:
1. On Goldmont the CPU in idle can only be woken up via IPIs,
not POLLING mode, due to commit 08e237fa56 ("x86/cpu: Add
workaround for MONITOR instruction erratum on Goldmont based
CPUs")
2. When the CPU is entering suspend to idle process, the
_TIF_POLLING_NRFLAG remains on, because cpuidle_enter_s2idle()
doesn't match call_cpuidle() exactly.
3. Commit b2a02fc43a ("smp: Optimize send_call_function_single_ipi()")
makes use of _TIF_POLLING_NRFLAG to avoid sending IPIs to idle
CPUs.
4. As a result, some IPIs related functions might not work
well during suspend to idle on Goldmont. For example, one
suspected victim:
tick_unfreeze() -> timekeeping_resume() -> hrtimers_resume()
-> clock_was_set() -> on_each_cpu() might wait forever,
because the IPIs will not be sent to the CPUs which are
sleeping with _TIF_POLLING_NRFLAG set, and Goldmont CPU
could not be woken up by only setting _TIF_NEED_RESCHED
on the monitor address.
To avoid that, clear the _TIF_POLLING_NRFLAG flag before invoking
enter_s2idle_proper() in cpuidle_enter_s2idle() in analogy with the
call_cpuidle() code flow.
Fixes: b2a02fc43a ("smp: Optimize send_call_function_single_ipi()")
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
[ rjw: Subject / changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Thermal framework cleanups (self-encapsulation, pointless stubs,
private structures) (Daniel Lezcano)
- Use the PM QoS frequency changes for the devfreq cooling device (Matthias
Kaehlcke)
- Remove duplicate error messages from platform_get_irq() error handling
(Markus Elfring)
- Add support for the bandgap sensors (Keerthy)
- Statically initialize .get_mode/.set_mode ops (Andrzej Pietrasiewicz)
- Add Renesas R-Car maintainer entry (Niklas Söderlund)
- Fix error checking after calling ti_bandgap_get_sensor_data() for the TI SoC
thermal (Sudip Mukherjee)
- Add latency constraint for the idle injection, the DT binding and the change
the registering function (Daniel Lezcano)
- Convert the thermal framework binding to the Yaml schema (Amit Kucheria)
- Replace zero-length array with flexible-array on i.MX 8MM (Gustavo A. R. Silva)
- Thermal framework cleanups (alphabetic order for heads, replace module.h by
export.h, make file naming consistent) (Amit Kucheria)
- Merge tsens-common into the tsens driver (Amit Kucheria)
- Fix platform dependency for the Qoriq driver (Geert Uytterhoeven)
- Clean up the rcar_thermal_update_temp() function in the rcar thermal driver
(Niklas Söderlund)
- Fix the TMSAR register for the TMUv2 on the Qoriq platform (Yuantian Tang)
- Export GDDV, OEM vendor variables, and don't require IDSP for the int340x
thermal driver - trivial conflicts fixed (Matthew Garrett)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAl7jra8ACgkQqDIjiipP
6E+ugAgApBF6FsHoonWIvoSrzBrrbU2oqhEJA42Mx+iY/UnXi01I79vZ/8WpZt7M
D1J01Kf0PUhRbywoKaoCX3Oh9ZO9PKq4N9ZC8yqdoD6GLl+rC9Wmr7Ui+c80klcv
M9rYhpPYfNXTFj0saSbbFWNNhP4TvhzGsNj8foYVQDKyhjbSmNE5ipZlbmP23jlr
O53SmJAwS5zxLOd8QA5nfSWP9FYYMuCR2AHj8BUCmxiAjXZLPNB/Hz2RRBr7q0MF
zRo/4HJ04mSQYp0kluP/EBhz9g2wM/htIPyWRveB/ByKEYt3UNKjB++PJmPbu5UG
dS3aXZhRfaPqpdsWrMB9fY7ll+oyfw==
=T+RI
-----END PGP SIGNATURE-----
Merge tag 'thermal-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux
Pull thermal updates from Daniel Lezcano:
- Add the hwmon support on the i.MX SC (Anson Huang)
- Thermal framework cleanups (self-encapsulation, pointless stubs,
private structures) (Daniel Lezcano)
- Use the PM QoS frequency changes for the devfreq cooling device
(Matthias Kaehlcke)
- Remove duplicate error messages from platform_get_irq() error
handling (Markus Elfring)
- Add support for the bandgap sensors (Keerthy)
- Statically initialize .get_mode/.set_mode ops (Andrzej Pietrasiewicz)
- Add Renesas R-Car maintainer entry (Niklas Söderlund)
- Fix error checking after calling ti_bandgap_get_sensor_data() for the
TI SoC thermal (Sudip Mukherjee)
- Add latency constraint for the idle injection, the DT binding and the
change the registering function (Daniel Lezcano)
- Convert the thermal framework binding to the Yaml schema (Amit
Kucheria)
- Replace zero-length array with flexible-array on i.MX 8MM (Gustavo A.
R. Silva)
- Thermal framework cleanups (alphabetic order for heads, replace
module.h by export.h, make file naming consistent) (Amit Kucheria)
- Merge tsens-common into the tsens driver (Amit Kucheria)
- Fix platform dependency for the Qoriq driver (Geert Uytterhoeven)
- Clean up the rcar_thermal_update_temp() function in the rcar thermal
driver (Niklas Söderlund)
- Fix the TMSAR register for the TMUv2 on the Qoriq platform (Yuantian
Tang)
- Export GDDV, OEM vendor variables, and don't require IDSP for the
int340x thermal driver - trivial conflicts fixed (Matthew Garrett)
* tag 'thermal-v5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (48 commits)
thermal/int340x_thermal: Don't require IDSP to exist
thermal/int340x_thermal: Export OEM vendor variables
thermal/int340x_thermal: Export GDDV
thermal: qoriq: Update the settings for TMUv2
thermal: rcar_thermal: Clean up rcar_thermal_update_temp()
thermal: qoriq: Add platform dependencies
drivers: thermal: tsens: Merge tsens-common.c into tsens.c
thermal/of: Rename of-thermal.c
thermal/governors: Prefix all source files with gov_
thermal/drivers/user_space: Sort headers alphabetically
thermal/drivers/of-thermal: Sort headers alphabetically
thermal/drivers/cpufreq_cooling: Replace module.h with export.h
thermal/drivers/cpufreq_cooling: Sort headers alphabetically
thermal/drivers/clock_cooling: Include export.h
thermal/drivers/clock_cooling: Sort headers alphabetically
thermal/drivers/thermal_hwmon: Include export.h
thermal/drivers/thermal_hwmon: Sort headers alphabetically
thermal/drivers/thermal_helpers: Include export.h
thermal/drivers/thermal_helpers: Sort headers alphabetically
thermal/core: Replace module.h with export.h
...
- Support for userspace to send requests directly to the on-chip GZIP
accelerator on Power9.
- Rework of our lockless page table walking (__find_linux_pte()) to make it
safe against parallel page table manipulations without relying on an IPI for
serialisation.
- A series of fixes & enhancements to make our machine check handling more
robust.
- Lots of plumbing to add support for "prefixed" (64-bit) instructions on
Power10.
- Support for using huge pages for the linear mapping on 8xx (32-bit).
- Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound driver.
- Removal of some obsolete 40x platforms and associated cruft.
- Initial support for booting on Power10.
- Lots of other small features, cleanups & fixes.
Thanks to:
Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan, Andrey Abramov,
Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent Abali, Cédric Le
Goater, Chen Zhou, Christian Zigotzky, Christophe JAILLET, Christophe Leroy,
Dmitry Torokhov, Emmanuel Nicolet, Erhard F., Gautham R. Shenoy, Geoff Levand,
George Spelvin, Greg Kurz, Gustavo A. R. Silva, Gustavo Walbon, Haren Myneni,
Hari Bathini, Joel Stanley, Jordan Niethe, Kajol Jain, Kees Cook, Leonardo
Bras, Madhavan Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael
Neuling, Michal Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao,
Nicholas Piggin, Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram
Pai, Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler, Wolfram
Sang, Xiongfeng Wang.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl7aYZ8THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPiKD/9zNCuZLFMAFrIdbm0HlYA2RGYZFT75
GUHsqYyei1pxA7PgM3KwJiXELVODsBv0eQbgNh1tbecKrxPRegN/cywd1KLjPZ7I
v5/qweQP8MvR0RhzjbhvUcO0jq/f8u2LbJr5mUfVzjU6tAvrvcWo3oZqDElsekCS
kgyOH3r1vZ2PLTMiGFhb0gWi2iqc+6BHU1AFCGPCMjB1Vu5d5+54VvZ/6lllGsOF
yg9CBXmmVvQ+Bn6tH4zdEB78FYxnAIwBqlbmL79i5ca+HQJ0Sw6HuPRy9XYq35p6
2EiXS4Wrgp7i7+1TN3HO362u5Onb8TSyQU7NS6yCFPoJ6JQxcJMBIw6mHhnXOPuZ
CrjgcdwUMjx8uDoKmX1Epbfuex2w+AysW+4yBHPFiSgl3klKC3D0wi95mR485w2F
rN8uzJtrDeFKcYZJG7IoB/cgFCCPKGf9HaXr8q0S/jBKMffx91ul3cfzlfdIXOCw
FDNw/+ZX7UD6ddFEG12ZTO+vdL8yf1uCRT/DIZwUiDMIA0+M6F4nc7j3lfyZfoO1
65f9UlhoLxScq7VH2fKH4UtZatO9cPID2z1CmiY4UbUIPtFDepSuYClgLF+Duf4b
rkfxhKU0+Ja1zNH5XNc+L+Bc5/W4lFiJXz02dYIjtHoUpWkc1aToOETVwzggYFNM
G3PXIBOI0jRgRw==
=o0WU
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Support for userspace to send requests directly to the on-chip GZIP
accelerator on Power9.
- Rework of our lockless page table walking (__find_linux_pte()) to
make it safe against parallel page table manipulations without
relying on an IPI for serialisation.
- A series of fixes & enhancements to make our machine check handling
more robust.
- Lots of plumbing to add support for "prefixed" (64-bit) instructions
on Power10.
- Support for using huge pages for the linear mapping on 8xx (32-bit).
- Remove obsolete Xilinx PPC405/PPC440 support, and an associated sound
driver.
- Removal of some obsolete 40x platforms and associated cruft.
- Initial support for booting on Power10.
- Lots of other small features, cleanups & fixes.
Thanks to: Alexey Kardashevskiy, Alistair Popple, Andrew Donnellan,
Andrey Abramov, Aneesh Kumar K.V, Balamuruhan S, Bharata B Rao, Bulent
Abali, Cédric Le Goater, Chen Zhou, Christian Zigotzky, Christophe
JAILLET, Christophe Leroy, Dmitry Torokhov, Emmanuel Nicolet, Erhard F.,
Gautham R. Shenoy, Geoff Levand, George Spelvin, Greg Kurz, Gustavo A.
R. Silva, Gustavo Walbon, Haren Myneni, Hari Bathini, Joel Stanley,
Jordan Niethe, Kajol Jain, Kees Cook, Leonardo Bras, Madhavan
Srinivasan., Mahesh Salgaonkar, Markus Elfring, Michael Neuling, Michal
Simek, Nathan Chancellor, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
Oliver O'Halloran, Paul Mackerras, Pingfan Liu, Qian Cai, Ram Pai,
Raphael Moreira Zinsly, Ravi Bangoria, Sam Bobroff, Sandipan Das, Segher
Boessenkool, Stephen Rothwell, Sukadev Bhattiprolu, Tyrel Datwyler,
Wolfram Sang, Xiongfeng Wang.
* tag 'powerpc-5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (299 commits)
powerpc/pseries: Make vio and ibmebus initcalls pseries specific
cxl: Remove dead Kconfig options
powerpc: Add POWER10 architected mode
powerpc/dt_cpu_ftrs: Add MMA feature
powerpc/dt_cpu_ftrs: Enable Prefixed Instructions
powerpc/dt_cpu_ftrs: Advertise support for ISA v3.1 if selected
powerpc: Add support for ISA v3.1
powerpc: Add new HWCAP bits
powerpc/64s: Don't set FSCR bits in INIT_THREAD
powerpc/64s: Save FSCR to init_task.thread.fscr after feature init
powerpc/64s: Don't let DT CPU features set FSCR_DSCR
powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()
powerpc/32s: Fix another build failure with CONFIG_PPC_KUAP_DEBUG
powerpc/module_64: Use special stub for _mcount() with -mprofile-kernel
powerpc/module_64: Simplify check for -mprofile-kernel ftrace relocations
powerpc/module_64: Consolidate ftrace code
powerpc/32: Disable KASAN with pages bigger than 16k
powerpc/uaccess: Don't set KUEP by default on book3s/32
powerpc/uaccess: Don't set KUAP by default on book3s/32
powerpc/8xx: Reduce time spent in allow_user_access() and friends
...
These are updates to SoC specific drivers that did not have
another subsystem maintainer tree to go through for some
reason:
- Some bus and memory drivers for the MIPS P5600 based
Baikal-T1 SoC that is getting added through the MIPS tree.
- There are new soc_device identification drivers for TI K3,
Qualcomm MSM8939
- New reset controller drivers for NXP i.MX8MP, Renesas
RZ/G1H, and Hisilicon hi6220
- The SCMI firmware interface can now work across ARM SMC/HVC
as a transport.
- Mediatek platforms now use a new driver for their "MMSYS"
hardware block that controls clocks and some other aspects
in behalf of the media and gpu drivers.
- Some Tegra processors have improved power management
support, including getting woken up by the PMIC and cluster
power down during idle.
- A new v4l staging driver for Tegra is added.
- Cleanups and minor bugfixes for TI, NXP, Hisilicon,
Mediatek, and Tegra.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAl7XvgAACgkQmmx57+YA
GNmj/hAAnAJ/hYehLfgCe711HUntgeRkaoTVpCt8BJNMdxsa23sn3V6k5+WYn1uG
PtlgpefZEMHLUEEVDegR4nZXLG0Pzu1SR12KW34YPcQKkNo/+vlQ9zYUajnJ/KX6
10zdLSIzHfk1VtXKvvQQ8xFyE+S/trGmjC57E6gfoCUT3rl1maD+ccVXUBaz9oob
wuMxGXQAl57mio5yT1OfSk6Fev39xRE2dN1hzP7KUYhsemZajBwBBW5wVJZCsCB8
LCGmxVkavM7BV4r2NokbBDs5rlfedBl/P/IPd9Is5a5tuGUkSsVRG9zqShxYLGM3
S06az6POQFwXKFJoUKW0dK/Koy0D7BK+vhUBPzFv4HZ8iDCVf6Jju2MJ02GMqHPj
OOrXaCbLYrvN/edVUWeeFywqwMbYTRwC4DxyTq5m7HxEB004xTOhs0rX5aR0u4n1
bbsR97LguolwH9iEMzd3F3jCiKBcMecH3lAh5WcrtwlFIRrNhbWoGDoA/4TuORFS
b11rgsTRIJ5Vc++D1HnSnx0ZZvUzyluMvygdALnSgVah6xYe6KVw9Kg/wioAJ04G
uSTidqP3qRhsyET2HQo7CxdVfZbKfP25iKCKrdhQziztKvhF8qrUmZKloXOodRw+
ewYSRmv8c324OYYit1X43oAdW8dntq1XbSIauaqxEb4JC7x/xRY=
=44Jc
-----END PGP SIGNATURE-----
Merge tag 'arm-drivers-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM/SoC driver updates from Arnd Bergmann:
"These are updates to SoC specific drivers that did not have another
subsystem maintainer tree to go through for some reason:
- Some bus and memory drivers for the MIPS P5600 based Baikal-T1 SoC
that is getting added through the MIPS tree.
- There are new soc_device identification drivers for TI K3, Qualcomm
MSM8939
- New reset controller drivers for NXP i.MX8MP, Renesas RZ/G1H, and
Hisilicon hi6220
- The SCMI firmware interface can now work across ARM SMC/HVC as a
transport.
- Mediatek platforms now use a new driver for their "MMSYS" hardware
block that controls clocks and some other aspects in behalf of the
media and gpu drivers.
- Some Tegra processors have improved power management support,
including getting woken up by the PMIC and cluster power down
during idle.
- A new v4l staging driver for Tegra is added.
- Cleanups and minor bugfixes for TI, NXP, Hisilicon, Mediatek, and
Tegra"
* tag 'arm-drivers-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (155 commits)
clk: sprd: fix compile-testing
bus: bt1-axi: Build the driver into the kernel
bus: bt1-apb: Build the driver into the kernel
bus: bt1-axi: Use sysfs_streq instead of strncmp
bus: bt1-axi: Optimize the return points in the driver
bus: bt1-apb: Use sysfs_streq instead of strncmp
bus: bt1-apb: Use PTR_ERR_OR_ZERO to return from request-regs method
bus: bt1-apb: Fix show/store callback identations
bus: bt1-apb: Include linux/io.h
dt-bindings: memory: Add Baikal-T1 L2-cache Control Block binding
memory: Add Baikal-T1 L2-cache Control Block driver
bus: Add Baikal-T1 APB-bus driver
bus: Add Baikal-T1 AXI-bus driver
dt-bindings: bus: Add Baikal-T1 APB-bus binding
dt-bindings: bus: Add Baikal-T1 AXI-bus binding
staging: tegra-video: fix V4L2 dependency
tee: fix crypto select
drivers: soc: ti: knav_qmss_queue: Make knav_gp_range_ops static
soc: ti: add k3 platforms chipid module driver
dt-bindings: soc: ti: add binding for k3 platforms chipid module
...
kobject_init_and_add() takes reference even when it fails.
If this function returns an error, kobject_put() must be called to
properly clean up the memory associated with the object.
Previous commit "b8eb718348b8" fixed a similar problem.
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Qualcomm SPM cpuidle driver seems to be the last driver still
using the generic ARM CPUidle infrastructure.
Converting it actually allows us to simplify the driver,
and we end up being able to remove more lines than adding new ones:
- We can parse the CPUidle states in the device tree directly
with dt_idle_states (and don't need to duplicate that
functionality into the spm driver).
- Each "saw" device managed by the SPM driver now directly
registers its own cpuidle driver, removing the need for
any global (per cpu) state.
The device tree binding is the same, so the driver stays
compatible with all old device trees.
Signed-off-by: Stephan Gerhold <stephan@gerhold.net>
Reviewed-by: Lina Iyer <ilina@codeaurora.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since the cpuidle governor can be switched via sysfs in default,
remove sysfs_switch and cpuidle_switch_attrs.
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
For now cpuidle governor can be switched via sysfs only when the
boot option "cpuidle_sysfs_switch" is passed, but it's important
to switch the governor to adapt to different workloads, especially
after TEO and haltpoll governor were introduced.
Add available_governors and current_governor into the default
attributes, but reserve the current_governor_ro for compatiblity.
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CPUIDLE_NAME_LEN is 16, so it's possible to accept governor name
with 15 characters, but now store_current_governor() rejects
governor name with 15 characters as it returns -EINVAL if count
equals CPUIDLE_NAME_LEN.
Refactor the code to accept such case and simplify the code.
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When showing the available governors, it's "%s " in scnprintf(),
not "%s", so if the governor name has 15 characters, it will
overlap with the later one, fix it by adding one more for the
size.
While we are at it, fix the minor coding style issue and remove
the "/sizeof(char)" since sizeof(char) always equals 1.
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpuidle driver can be used as a cooling device by injecting idle
cycles.
When the property is set, register the cpuidle driver with the idle
state node pointer as a cooling device. The thermal framework will do
the association automatically with the thermal zone via the
cooling-device defined in the device tree cooling-maps section.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Link: https://lore.kernel.org/r/20200429103644.5492-4-daniel.lezcano@linaro.org
Moving forward, platforms are going to need to execute specific "last-man"
operations before a domain idle state can be entered. In one way or the
other, these operations needs to be triggered while walking the
hierarchical topology via runtime PM and genpd, as it's at that point the
last-man becomes known.
Moreover, executing last-man operations needs to be done after the CPU PM
notifications are sent through cpu_pm_enter(), as otherwise it's likely
that some notifications would fail. Therefore, let's re-order the sequence
in psci_enter_domain_idle_state(), so cpu_pm_enter() gets called prior
pm_runtime_put_sync().
Fixes: ce85aef570 ("cpuidle: psci: Manage runtime PM in the idle path")
Reported-by: Lina Iyer <ilina@codeaurora.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The new Tegra CPU Idle driver now has a unified code path for the coupled
CC6 (LP2) state, this allows to enable the deepest idling state on Tegra30
SoC where the whole CPU cluster is power-gated.
Tested-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Tested-by: Jasper Korten <jja2000@gmail.com>
Tested-by: David Heidelberg <david@ixit.cz>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Currently when CPU goes idle, we take a snapshot of PURR via
pseries_idle_prolog() which is used at the CPU idle exit to compute
the idle PURR cycles via the function pseries_idle_epilog(). Thus,
the value of idle PURR cycle thus read before pseries_idle_prolog() and
after pseries_idle_epilog() is always correct.
However, if we were to read the idle PURR cycles from an interrupt
context between pseries_idle_prolog() and pseries_idle_epilog() (this
will be done in a future patch), then, the value of the idle PURR thus
read will not include the cycles spent in the most recent idle period.
Thus, in that interrupt context, we will need access to the snapshot
of the PURR before going idle, in order to compute the idle PURR
cycles for the latest idle duration.
In this patch, we save the snapshot of PURR in pseries_idle_prolog()
in a per-cpu variable, instead of on the stack, so that it can be
accessed from an interrupt context.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1586249263-14048-3-git-send-email-ego@linux.vnet.ibm.com
Currently prior to entering an idle state on a Linux Guest, the
pseries cpuidle driver implement an idle_loop_prolog() and
idle_loop_epilog() functions which ensure that idle_purr is correctly
computed, and the hypervisor is informed that the CPU cycles have been
donated.
These prolog and epilog functions are also required in the default
idle call, i.e pseries_lpar_idle(). Hence move these accessor
functions to a common header file and call them from
pseries_lpar_idle(). Since the existing header files such as
asm/processor.h have enough clutter, create a new header file
asm/idle.h. Finally rename idle_loop_prolog() and idle_loop_epilog()
to pseries_idle_prolog() and pseries_idle_epilog() as they are only
relavent for on pseries guests.
Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1586249263-14048-2-git-send-email-ego@linux.vnet.ibm.com
The define_one_ro and define_one_rw macros are not used,
remove it.
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The code changes are mostly for 32-bit platforms and include:
- Lots of updates for the Nvidia Tegra platform, including
cpuidle, pmc, and dt-binding changes
- Microchip at91 power management updates for the recently added
sam9x60 SoC
- Treewide setup_irq deprecation by afzal mohammed
- STMicroelectronics stm32 gains earlycon support
- Renesas platforms with Cortex-A9 can now use the global timer
- Some TI OMAP2+ platforms gain cpuidle support
- Various cleanups for the i.MX6 and Orion platforms, as well as
Kconfig files across all platforms
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAl6EaLsACgkQmmx57+YA
GNkg/Q/9EmLGT/bznqi6P/8V75qPY6j4swife5HlI43EdCEyN+iME1w1rFfrNilf
A/QyXzxhYgUZoyIt7K4qjWc8fPNlJZ8X10UqeEftQFxi82cmX2+OaT2fy6OdHVRJ
SAGb1pw7463TQ6LKA7LC7rztFjahsjnE/9szlgXQT4v5OzayGyxd3OKy2Q1zASEi
rwN+85Lh+xJvWhhenPKVvs2Dei+up8Y9uyPfJkj2QudCB+zx5k5stkk4EiQLBd1W
SHzhijbSU3MrAUz2Cnp9Qa+86DdGPvFPfpzQy3pSYU9nNrC0aKpS8YmHT99SIWVN
6R1YJF7Htmui5I9+O0baejJMEqvzGUysqe+rQdCofD7ooVKWU7WKYj5HxZxyBCEN
dvlN3KRmS6l5KLsZARSxuBUw2MPTgjsxczZ84NKMLj8czw6yXyrePZ1RWiEZ4HIu
4GiFNLYSMmAynD/dLuC9USDsjlPsQKnQ3e8hDf3a6oK5OHUIkr3uvguhqWa5WJsi
cy2DUeWJkXgJDhlxcfr8MiPpPJRo3N/8O8PYci8dkdDFRs32j/5Qf22vywDdUHaZ
I9Pl+VOGOSGiqRc9gmay6DNXpJusfuv72omrz4rL4kGMahpk0LeV5w+a9TdrkreY
sLM67wtshOjOVMc/ebQ5Q8RR17Go+MFK+Co9Yc1ybbQ6lkSzeuY=
=UTk1
-----END PGP SIGNATURE-----
Merge tag 'arm-soc-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC updates from Arnd Bergmann:
"The code changes are mostly for 32-bit platforms and include:
- Lots of updates for the Nvidia Tegra platform, including cpuidle,
pmc, and dt-binding changes
- Microchip at91 power management updates for the recently added
sam9x60 SoC
- Treewide setup_irq deprecation by afzal mohammed
- STMicroelectronics stm32 gains earlycon support
- Renesas platforms with Cortex-A9 can now use the global timer
- Some TI OMAP2+ platforms gain cpuidle support
- Various cleanups for the i.MX6 and Orion platforms, as well as
Kconfig files across all platforms"
* tag 'arm-soc-5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (75 commits)
ARM: qcom: Add support for IPQ40xx
ARM: mmp: replace setup_irq() by request_irq()
ARM: cns3xxx: replace setup_irq() by request_irq()
ARM: spear: replace setup_irq() by request_irq()
ARM: ep93xx: Replace setup_irq() by request_irq()
ARM: iop32x: replace setup_irq() by request_irq()
arm: mach-dove: Mark dove_io_desc as __maybe_unused
ARM: orion: replace setup_irq() by request_irq()
ARM: debug: stm32: add UART early console support for STM32MP1
ARM: debug: stm32: add UART early console support for STM32H7
ARM: debug: stm32: add UART early console configuration for STM32F7
ARM: debug: stm32: add UART early console configuration for STM32F4
cpuidle: tegra: Disable CC6 state if LP2 unavailable
cpuidle: tegra: Squash Tegra114 driver into the common driver
cpuidle: tegra: Squash Tegra30 driver into the common driver
cpuidle: Refactor and move out NVIDIA Tegra20 driver into drivers/cpuidle
ARM: tegra: cpuidle: Remove unnecessary memory barrier
ARM: tegra: cpuidle: Make abort_flag atomic
ARM: tegra: cpuidle: Handle case where secondary CPU hangs on entering LP2
ARM: tegra: Make outer_disable() open-coded
...
* pm-cpuidle:
cpuidle: haltpoll: allow force loading on hosts without the REALTIME hint
intel_idle: Update copyright notice, known limitations and version
intel_idle: Define CPUIDLE_FLAG_TLB_FLUSHED as BIT(16)
intel_idle: Clean up kerneldoc comments for multiple functions
intel_idle: Reorder declarations of static variables
intel_idle: Annotate init time data structures
intel_idle: Add __initdata annotations to init time variables
intel_idle: Relocate definitions of cpuidle callbacks
intel_idle: Clean up definitions of cpuidle callbacks
intel_idle: Simplify LAPIC timer reliability checks
To make the code a bit more readable, let's move the OSI specific
initialization out of the psci_dt_cpu_init_idle() and into a separate
function.
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Before commit 1328edca4a ("cpuidle-haltpoll: Enable kvm guest polling
when dedicated physical CPUs are available") the cpuidle-haltpoll driver
could also be used in scenarios when the host does not advertise the
KVM_HINTS_REALTIME hint.
While the behavior introduced by the aforementioned commit makes sense as
the default there are cases where the old behavior is desired, for example,
when other kernel changes triggered by presence by this hint are unwanted,
for some workloads where the latency benefit from polling overweights the
loss from idle CPU capacity that otherwise would be available, or just when
running under older Qemu versions that lack this hint.
Let's provide a typical "force" module parameter that allows restoring the
old behavior.
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
LP2 suspending could be unavailable, for example if it is disabled in a
device-tree. CC6 cpuidle state won't work in that case.
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tegra20/30/114/124 SoCs have common idling states, thus there is no much
point in having separate drivers for a similar hardware. This patch moves
Tegra114/124 arch/ drivers into the common driver without any functional
changes. The CC6 state is kept disabled on Tegra114/124 because the core
Tegra PM code needs some more work in order to support that state.
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Tegra20 and Terga30 SoCs have common C1 and CC6 idling states and thus
share the same code paths, there is no point in having separate drivers
for a similar hardware. This patch merely moves functionality of the old
driver into the new, although the CC6 state is kept disabled for now since
old driver had a rudimentary support for this state (allowing to enter
into CC6 only when secondary CPUs are put offline), while new driver can
provide a full-featured support. The new feature will be enabled by
another patch.
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Jasper Korten <jja2000@gmail.com>
Tested-by: David Heidelberg <david@ixit.cz>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
The driver's code is refactored in a way that will make it easy to
support Tegra30/114/124 SoCs by this unified driver later on. The
current functionality is equal to the old Tegra20 driver, only the
code's structure changed a tad. This is also a proper platform driver
now.
Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Call cpu_latency_qos_limit() instead of pm_qos_request(), because the
latter is going to be dropped.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
Notice that pm_qos_remove_notifier() is not used at all and the only
caller of pm_qos_add_notifier() is the cpuidle core, which only needs
the PM_QOS_CPU_DMA_LATENCY notifier to invoke wake_up_all_idle_cpus()
upon changes of the PM_QOS_CPU_DMA_LATENCY target value.
First, to ensure that wake_up_all_idle_cpus() will be called
whenever the PM_QOS_CPU_DMA_LATENCY target value changes, modify the
pm_qos_add/update/remove_request() family of functions to check if
the effective constraint for the PM_QOS_CPU_DMA_LATENCY has changed
and call wake_up_all_idle_cpus() directly in that case.
Next, drop the PM_QOS_CPU_DMA_LATENCY notifier from cpuidle as it is
not necessary any more.
Finally, drop both pm_qos_add_notifier() and pm_qos_remove_notifier(),
as they have no callers now, along with cpu_dma_lat_notifier which is
only used by them.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Tested-by: Amit Kucheria <amit.kucheria@linaro.org>
Various driver updates for platforms:
- Nvidia: Fuse support for Tegra194, continued memory controller pieces
for Tegra30
- NXP/FSL: Refactorings of QuickEngine drivers to support ARM/ARM64/PPC
- NXP/FSL: i.MX8MP SoC driver pieces
- TI Keystone: ring accelerator driver
- Qualcomm: SCM driver cleanup/refactoring + support for new SoCs.
- Xilinx ZynqMP: feature checking interface for firmware. Mailbox
communication for power management
- Overall support patch set for cpuidle on more complex hierarchies
(PSCI-based)
+ Misc cleanups, refactorings of Marvell, TI, other platforms.
-----BEGIN PGP SIGNATURE-----
iQJDBAABCAAtFiEElf+HevZ4QCAJmMQ+jBrnPN6EHHcFAl4+lTYPHG9sb2ZAbGl4
b20ubmV0AAoJEIwa5zzehBx3nQcQAJm91+6hZbmMjlBySGS7ISjYvOcrI/hMgiOl
uhhEP0Dcylvf9A9x3wcIbLwixe+2pvie9DQh2u5F80ShYimidtFi/2xCfuTb9fKu
sxxKjrXWyVKhkpW0z+tedY08ftVhkwwcyD4m2C7uVl6AwTP7c367vFeU7XjF2APn
drfgmgbjm8U3XbSyAqv+k6z6tyqaCnFM7vbPupSKHgHJ3mfByxOa+XyBN2RdgBbs
0KrVfbXGv80zFIFrMPwaWG7G52bu7K68nVdgy44MpKdRZ6QTjhnR+kerFxHsYgV4
bM55Fya52nTCSTGdKaQakDtKwbAUdCDTSkxgOHGcQoyFi0R/VaEUJtcysnvLbI6c
+n/yFIzGyEdXcvIzfv2SoDYhogw19I6RR/M9K5Ni29eazkDVYx2z3rI+2QYeqCiF
u7cq52gW6JLP0SI/9kuUrRFiR8v19Ixap7qokAxgqQwYB3NzT8a7WsYPkzdpDZGQ
ETSDFMyBWT6UvBe/HWkQluBabbet53rG8BF0OHFrQuMK0u/ieKgSGuTB9XN2djEW
PHMOMz2vhi+8XTfpkskhF2tTxlA/k4R6QwCdIMpIkMRVnVQCh1XdPr3Fi2NrgB+S
kIXHD4vV6zLYh04zHyKewSPHAXWgraFpg2qKnvL5+KWMTnW6QH+RNjOt9xKDNXOd
+iDXpOad
=ONtb
-----END PGP SIGNATURE-----
Merge tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC-related driver updates from Olof Johansson:
"Various driver updates for platforms:
- Nvidia: Fuse support for Tegra194, continued memory controller
pieces for Tegra30
- NXP/FSL: Refactorings of QuickEngine drivers to support
ARM/ARM64/PPC
- NXP/FSL: i.MX8MP SoC driver pieces
- TI Keystone: ring accelerator driver
- Qualcomm: SCM driver cleanup/refactoring + support for new SoCs.
- Xilinx ZynqMP: feature checking interface for firmware. Mailbox
communication for power management
- Overall support patch set for cpuidle on more complex hierarchies
(PSCI-based)
and misc cleanups, refactorings of Marvell, TI, other platforms"
* tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (166 commits)
drivers: soc: xilinx: Use mailbox IPI callback
dt-bindings: power: reset: xilinx: Add bindings for ipi mailbox
drivers: soc: ti: knav_qmss_queue: Pass lockdep expression to RCU lists
MAINTAINERS: Add brcmstb PCIe controller entry
soc/tegra: fuse: Unmap registers once they are not needed anymore
soc/tegra: fuse: Correct straps' address for older Tegra124 device trees
soc/tegra: fuse: Warn if straps are not ready
soc/tegra: fuse: Cache values of straps and Chip ID registers
memory: tegra30-emc: Correct error message for timed out auto calibration
memory: tegra30-emc: Firm up hardware programming sequence
memory: tegra30-emc: Firm up suspend/resume sequence
soc/tegra: regulators: Do nothing if voltage is unchanged
memory: tegra: Correct reset value of xusb_hostr
soc/tegra: fuse: Add APB DMA dependency for Tegra20
bus: tegra-aconnect: Remove PM_CLK dependency
dt-bindings: mediatek: add MT6765 power dt-bindings
soc: mediatek: cmdq: delete not used define
memory: tegra: Add support for the Tegra194 memory controller
memory: tegra: Only include support for enabled SoCs
memory: tegra: Support DVFS on Tegra186 and later
...
Merge changes updating the ACPI processor driver in order to export
acpi_processor_evaluate_cst() to the code outside of it and adding
ACPI support to the intel_idle driver based on that.
* intel_idle+acpi:
Documentation: admin-guide: PM: Add intel_idle document
intel_idle: Use ACPI _CST on server systems
intel_idle: Add module parameter to prevent ACPI _CST from being used
intel_idle: Allow ACPI _CST to be used for selected known processors
cpuidle: Allow idle states to be disabled by default
intel_idle: Use ACPI _CST for processor models without C-state tables
intel_idle: Refactor intel_idle_cpuidle_driver_init()
ACPI: processor: Export acpi_processor_evaluate_cst()
ACPI: processor: Make ACPI_PROCESSOR_CSTATE depend on ACPI_PROCESSOR
ACPI: processor: Clean up acpi_processor_evaluate_cst()
ACPI: processor: Introduce acpi_processor_evaluate_cst()
ACPI: processor: Export function to claim _CST control
Fix cpuidle_find_deepest_state() kernel documentation to avoid
warnings when compiling with W=1.
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@st.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix kernel documentation comments to remove warnings when
compiling with W=1.
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@st.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix warnings that show up when compiling with W=1
Signed-off-by: Benjamin Gaignard <benjamin.gaignard@st.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some of cpuidle drivers for ARMv7 can be compile tested on this
architecture because they do not depend on mach-specific bits. Enable
compile testing for big.LITTLE, Kirkwood, Zynq, AT91, Exynos and mvebu
cpuidle drivers.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix a simple bug in rotating array index.
Fixes: b26bf6ab71 ("cpuidle: New timer events oriented governor for tickless systems")
Signed-off-by: Ikjoon Jang <ikjn@chromium.org>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpuidle_driver_ref() and cpuidle_driver_unref() functions are not
used and the refcnt field in struct cpuidle_driver operated by them
is not updated anywhere else (so it is permanently equal to 0), so
drop both of them along with refcnt.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
When the hierarchical CPU topology layout is used in DT and the PSCI OSI
mode is supported by the PSCI FW, let's initialize a corresponding PM
domain topology by using genpd. This enables a CPU and a group of CPUs,
when attached to the topology, to be power-managed accordingly.
To trigger the attempt to initialize the genpd data structures let's use a
subsys_initcall, which should be early enough to allow CPUs, but also other
devices to be attached.
The initialization consists of parsing the PSCI OF node for the topology
and the "domain idle states" DT bindings. In case the idle states are
compatible with "domain-idle-state", the initialized genpd becomes
responsible of selecting an idle state for the PM domain, via assigning it
a genpd governor.
Note that, a successful initialization of the genpd data structures, is
followed by a call to psci_set_osi_mode(), as to try to enable the OSI mode
in the PSCI FW. In case this fails, we fall back into a degraded mode
rather than bailing out and returning error codes.
Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
When the hierarchical CPU topology is used and when a CPU is put offline,
that CPU prevents its PM domain from being powered off, which is because
genpd observes the corresponding attached device as being active from a
runtime PM point of view. Furthermore, any potential master PM domains are
also prevented from being powered off.
To address this limitation, let's add add a new CPU hotplug state
(CPUHP_AP_CPU_PM_STARTING) and register up/down callbacks for it, which
allows us to deal with runtime PM accordingly.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
In case we have succeeded to attach a CPU to its PM domain, let's deploy
runtime PM support for the corresponding attached device, to allow the CPU
to be powered-managed accordingly.
The triggering point for when runtime PM reference counting should be done,
has been selected to the deepest idle state for the CPU. However, from the
hierarchical point view, there may be good reasons to do runtime PM
reference counting even on shallower idle states, but at this point this
isn't supported, mainly due to limitations set by the generic PM domain.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
The per CPU variable psci_power_state, contains an array of fixed values,
which reflects the corresponding arm,psci-suspend-param parsed from DT, for
each of the available CPU idle states.
This isn't sufficient when using the hierarchical CPU topology in DT, in
combination with having PSCI OS initiated (OSI) mode enabled. More
precisely, in OSI mode, Linux is responsible of telling the PSCI FW what
idle state the cluster (a group of CPUs) should enter, while in PSCI
Platform Coordinated (PC) mode, each CPU independently votes for an idle
state of the cluster.
For this reason, introduce a per CPU variable called domain_state and
implement two helper functions to read/write its value. Then let the
domain_state take precedence over the regular selected state, when entering
and idle state.
To avoid executing the above OSI specific code in the ->enter() callback,
while operating in the default PSCI Platform Coordinated mode, let's also
add a new enter-function and use it for OSI.
Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
In order to enable a CPU to be power managed through its PM domain, let's
try to attach it by calling psci_dt_attach_cpu() during the cpuidle
initialization.
psci_dt_attach_cpu() returns a pointer to the attached struct device, which
later should be used for runtime PM, hence we need to store it somewhere.
Rather than adding yet another per CPU variable, let's create a per CPU
struct to collect the relevant per CPU variables.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Introduce a PSCI DT helper function, psci_dt_attach_cpu(), which takes a
CPU number as an in-parameter and tries to attach the CPU's struct device
to its corresponding PM domain.
Let's makes use of dev_pm_domain_attach_by_name(), as it allows us to
specify "psci" as the "name" of the PM domain to attach to. Additionally,
let's also prepare the attached device to be power managed via runtime PM.
Note that, the implementation of the new helper function is in a new
separate c-file, which may seems a bit too much at this point. However,
subsequent changes that implements the remaining part of the PM domain
support for cpuidle-psci, helps to justify this split.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Currently CPU's idle states are represented using the flattened model.
Let's add support for the hierarchical layout, via converting to use
of_get_cpu_state_node().
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Iterating through the idle state nodes in DT, to find out the number of
states that needs to be allocated is unnecessary, as it has already been
done from dt_init_idle_driver(). Therefore, drop the iteration and use the
number we already have at hand.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Currently CPU's idle states are represented using the flattened model.
Let's add support for the hierarchical layout, via converting to use
of_get_cpu_state_node().
Suggested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Co-developed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
Instead of allocating 'n-1' states in psci_power_state to manage 'n'
idle states which include "ARM WFI" state, it would be simpler to have
1:1 mapping between psci_power_state and cpuidle driver states.
ARM WFI state(i.e. idx == 0) is handled specially in the generic macro
CPU_PM_CPU_IDLE_ENTER_PARAM and hence state[-1] is not possible. However
for sake of code readability, it is better to have 1:1 mapping and not
use [idx - 1] to access psci_power_state corresponding to driver cpuidle
state for idx.
psci_power_state[0] is default initialised to 0 and is never accessed
while entering WFI state.
Reported-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Rafael J. Wysocki <rafael@kernel.org>
In certain situations it may be useful to prevent some idle states
from being used by default while allowing user space to enable them
later on.
For this purpose, introduce a new state flag, CPUIDLE_FLAG_OFF, to
mark idle states that should be disabled by default, make the core
set CPUIDLE_STATE_DISABLED_BY_USER for those states at the
initialization time and add a new state attribute in sysfs,
"default_status", to inform user space of the initial status of
the given idle state ("disabled" if CPUIDLE_FLAG_OFF is set for it,
"enabled" otherwise).
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use devm_platform_ioremap_resource() to simplify code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use devm_platform_ioremap_resource() to simplify code.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The data type of the target_residency_ns field in struct cpuidle_state
is u64, so it does not need to be cast into u64.
Get rid of the unnecessary type cast.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It turns out that cpuidle_driver_state_disabled() can be called
before registering the cpufreq driver on some platforms, which
was not expected when it was introduced and which leads to a NULL
pointer dereference when trying to walk the CPUs associated with
the given cpuidle driver.
Fix the problem by making cpuidle_driver_state_disabled() check if
the driver's mask of CPUs associated with it is present and to set
CPUIDLE_FLAG_UNUSABLE for the given idle state in the driver's states
list if that is not the case to cause __cpuidle_register_device() to
set CPUIDLE_STATE_DISABLED_BY_DRIVER for that state for all cpuidle
devices registered by it later.
Fixes: cbda56d5fe ("cpuidle: Introduce cpuidle_driver_state_disabled() for driver quirks")
Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 259231a045 ("cpuidle: add poll_limit_ns to cpuidle_device
structure") changed, by mistake, the target residency from the first
available sleep state to the last available sleep state (which should
be longer).
This might cause excessive polling.
Fixes: 259231a045 ("cpuidle: add poll_limit_ns to cpuidle_device structure")
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
End sentences in help text with a period (aka full stop).
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After recent cpuidle updates the "disabled" field in struct
cpuidle_state is only used by two drivers (intel_idle and shmobile
cpuidle) for marking unusable idle states, but that may as well be
achieved with the help of a state flag, so define an "unusable" idle
state flag, CPUIDLE_FLAG_UNUSABLE, make the drivers in question use
it instead of the "disabled" field and make the core set
CPUIDLE_STATE_DISABLED_BY_DRIVER for the idle states with that flag
set.
After the above changes, the "disabled" field in struct cpuidle_state
is not used any more, so drop it.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Adjust indentation from spaces to tab (+optional two spaces) as in
coding style with command like:
$ sed -e 's/^ /\t/' -i */Kconfig
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Modify cpuidle_use_deepest_state() to take an additional exit latency
limit argument to be passed to find_deepest_idle_state() and make
cpuidle_idle_call() pass dev->forced_idle_latency_limit_ns to it for
forced idle.
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[ rjw: Rebase and rearrange code, subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In some cases it may be useful to specify an exit latency limit for
the idle state to be used during CPU idle time injection.
Instead of duplicating the information in struct cpuidle_device
or propagating the latency limit in the call stack, replace the
use_deepest_state field with forced_latency_limit_ns to represent
that limit, so that the deepest idle state with exit latency within
that limit is forced (i.e. no governors) when it is set.
A zero exit latency limit for forced idle means to use governors in
the usual way (analogous to use_deepest_state equal to "false" before
this change).
Additionally, add play_idle_precise() taking two arguments, the
duration of forced idle and the idle state exit latency limit, both
in nanoseconds, and redefine play_idle() as a wrapper around that
new function.
This change is preparatory, no functional impact is expected.
Suggested-by: Rafael J. Wysocki <rafael@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[ rjw: Subject, changelog, cpuidle_use_deepest_state() kerneldoc, whitespace ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 99e98d3fb1 ("cpuidle: Consolidate disabled state checks")
overlooked the fact that the imx6q and tegra20 cpuidle drivers use
the "disabled" field in struct cpuidle_state for quirks which trigger
after the initialization of cpuidle, so reading the initial value of
that field is not sufficient for those drivers.
In order to allow them to implement the quirks without using the
"disabled" field in struct cpuidle_state, introduce a new helper
function and modify them to use it.
Fixes: 99e98d3fb1 ("cpuidle: Consolidate disabled state checks")
Reported-by: Len Brown <lenb@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are three places in teo_select() where a given amount of time
is compared with TICK_NSEC if tick_nohz_tick_stopped() returns true,
which is a bit of duplicated code.
Avoid that code duplication by defining a helper function to do the
check and using it in all of the places in question.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If the current state with the maximum "early hits" metric in
teo_select() is also the one "matching" the expected idle duration,
it will be used as the candidate one for selection even if its
"misses" metric is greater than its "hits" metric, which is not
correct.
In that case, the candidate state should be shallower than the
current one and its "early hits" metric should be the maximum
among the idle states shallower than the current one.
To make that happen, modify teo_select() to save the index of
the state whose "early hits" metric is the maximum for the
range of states below the current one and go back to that state
if it turns out that the current one should be rejected.
Fixes: 159e48560f ("cpuidle: teo: Fix "early hits" handling for disabled idle states")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
One purpose of the computations in teo_update() is to determine
whether or not the (saved) time till the next timer event and the
measured idle duration fall into the same "bin", so avoid using
values that include the cpuidle overhead to obtain the latter.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, the cpuidle subsystem uses microseconds as the unit of
time which (among other things) causes the idle loop to incur some
integer division overhead for no clear benefit.
In order to allow cpuidle to measure time in nanoseconds, add two
new fields, exit_latency_ns and target_residency_ns, to represent the
exit latency and target residency of an idle state in nanoseconds,
respectively, to struct cpuidle_state and initialize them with the
help of the corresponding values in microseconds provided by drivers.
Additionally, change cpuidle_governor_latency_req() to return the
idle state exit latency constraint in nanoseconds.
Also meeasure idle state residency (last_residency_ns in struct
cpuidle_device and time_ns in struct cpuidle_driver) in nanoseconds
and update the cpuidle core and governors accordingly.
However, the menu governor still computes typical intervals in
microseconds to avoid integer overflows.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
There are two reasons why CPU idle states may be disabled: either
because the driver has disabled them or because they have been
disabled by user space via sysfs.
In the former case, the state's "disabled" flag is set once during
the initialization of the driver and it is never cleared later (it
is read-only effectively). In the latter case, the "disable" field
of the given state's cpuidle_state_usage struct is set and it may be
changed via sysfs. Thus checking whether or not an idle state has
been disabled involves reading these two flags every time.
In order to avoid the additional check of the state's "disabled" flag
(which is effectively read-only anyway), use the value of it at the
init time to set a (new) flag in the "disable" field of that state's
cpuidle_state_usage structure and use the sysfs interface to
manipulate another (new) flag in it. This way the state is disabled
whenever the "disable" field of its cpuidle_state_usage structure is
nonzero, whatever the reason, and it is the only place to look into
to check whether or not the state has been disabled.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Fix __cpuidle_set_driver() to check if any of the CPUs in the mask has
a driver different from drv already and, if so, return -EBUSY before
updating any cpuidle_drivers per-CPU pointers.
Fixes: 82467a5a88 ("cpuidle: simplify multiple driver support")
Cc: 3.11+ <stable@vger.kernel.org> # 3.11+
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currenly haltpoll isn't aware of the 'idle=' override, the priority is
'idle=poll' > haltpoll > 'idle=halt'. When 'idle=poll' is used, cpuidle
driver is bypassed but current_driver in sys still shows 'haltpoll'.
When 'idle=halt' is used, haltpoll takes precedence and makes
'idle=halt' have no effect.
Add a check to prevent the haltpoll driver from loading if 'idle=' is
present.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Co-developed-by: Joao Martins <joao.m.martins@oracle.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The TEO governor uses idle duration "bins" defined in accordance with
the CPU idle states table provided by the driver, so that each "bin"
covers the idle duration range between the target residency of the
idle state corresponding to it and the target residency of the closest
deeper idle state. The governor collects statistics for each bin
regardless of whether or not the idle state corresponding to it is
currently enabled.
In particular, the "early hits" metric measures the likelihood of a
situation in which the idle duration measured after wakeup falls into
to given bin, but the time till the next timer (sleep length) falls
into a bin corresponding to one of the deeper idle states. It is
used when the "hits" and "misses" metrics indicate that the state
"matching" the sleep length should not be selected, so that the state
with the maximum "early hits" value is selected instead of it.
If the idle state corresponding to the given bin is disabled, it
cannot be selected and if it turns out to be the one that should be
selected, a shallower idle state needs to be used instead of it.
Nevertheless, the metrics collected for the bin corresponding to it
are still valid and need to be taken into account as though that
state had not been disabled.
As far as the "early hits" metric is concerned, teo_select() tries to
take disabled states into account, but the state index corresponding
to the maximum "early hits" value computed by it may be incorrect.
Namely, it always uses the index of the previous maximum "early hits"
state then, but there may be enabled idle states closer to the
disabled one in question. In particular, if the current candidate
state (whose index is the idx value) is closer to the disabled one
and the "early hits" value of the disabled state is greater than the
current maximum, the index of the current candidate state (idx)
should replace the "maximum early hits state" index.
Modify the code to handle that case correctly.
Fixes: b26bf6ab71 ("cpuidle: New timer events oriented governor for tickless systems")
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
The TEO governor uses idle duration "bins" defined in accordance with
the CPU idle states table provided by the driver, so that each "bin"
covers the idle duration range between the target residency of the
idle state corresponding to it and the target residency of the closest
deeper idle state. The governor collects statistics for each bin
regardless of whether or not the idle state corresponding to it is
currently enabled.
In particular, the "hits" and "misses" metrics measure the likelihood
of a situation in which both the time till the next timer (sleep
length) and the idle duration measured after wakeup fall into the
given bin. Namely, if the "hits" value is greater than the "misses"
one, that situation is more likely than the one in which the sleep
length falls into the given bin, but the idle duration measured after
wakeup falls into a bin corresponding to one of the shallower idle
states.
If the idle state corresponding to the given bin is disabled, it
cannot be selected and if it turns out to be the one that should be
selected, a shallower idle state needs to be used instead of it.
Nevertheless, the metrics collected for the bin corresponding to it
are still valid and need to be taken into account as though that
state had not been disabled.
For this reason, make teo_select() always use the "hits" and "misses"
values of the idle duration range that the sleep length falls into
even if the specific idle state corresponding to it is disabled and
if the "hits" values is greater than the "misses" one, select the
closest enabled shallower idle state in that case.
Fixes: b26bf6ab71 ("cpuidle: New timer events oriented governor for tickless systems")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
Rename a local variable in teo_select() in preparation for subsequent
code modifications, no intentional impact.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
Prevent disabled CPU idle state with target residencies beyond the
anticipated idle duration from being taken into account by the TEO
governor.
Fixes: b26bf6ab71 ("cpuidle: New timer events oriented governor for tickless systems")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
- Rework the main suspend-to-idle control flow to avoid repeating
"noirq" device resume and suspend operations in case of spurious
wakeups from the ACPI EC and decouple the ACPI EC wakeups support
from the LPS0 _DSM support (Rafael Wysocki).
- Extend the wakeup sources framework to expose wakeup sources as
device objects in sysfs (Tri Vo, Stephen Boyd).
- Expose system suspend statistics in sysfs (Kalesh Singh).
- Introduce a new haltpoll cpuidle driver and a new matching
governor for virtualized guests wanting to do guest-side polling
in the idle loop (Marcelo Tosatti, Joao Martins, Wanpeng Li,
Stephen Rothwell).
- Fix the menu and teo cpuidle governors to allow the scheduler tick
to be stopped if PM QoS is used to limit the CPU idle state exit
latency in some cases (Rafael Wysocki).
- Increase the resolution of the play_idle() argument to microseconds
for more fine-grained injection of CPU idle cycles (Daniel Lezcano).
- Switch over some users of cpuidle notifiers to the new QoS-based
frequency limits and drop the CPUFREQ_ADJUST and CPUFREQ_NOTIFY
policy notifier events (Viresh Kumar).
- Add new cpufreq driver based on nvmem for sun50i (Yangtao Li).
- Add support for MT8183 and MT8516 to the mediatek cpufreq driver
(Andrew-sh.Cheng, Fabien Parent).
- Add i.MX8MN support to the imx-cpufreq-dt cpufreq driver (Anson
Huang).
- Add qcs404 to cpufreq-dt-platdev blacklist (Jorge Ramirez-Ortiz).
- Update the qcom cpufreq driver (among other things, to make it
easier to extend and to use kryo cpufreq for other nvmem-based
SoCs) and add qcs404 support to it (Niklas Cassel, Douglas
RAILLARD, Sibi Sankar, Sricharan R).
- Fix assorted issues and make assorted minor improvements in the
cpufreq code (Colin Ian King, Douglas RAILLARD, Florian Fainelli,
Gustavo Silva, Hariprasad Kelam).
- Add new devfreq driver for NVidia Tegra20 (Dmitry Osipenko, Arnd
Bergmann).
- Add new Exynos PPMU events to devfreq events and extend that
mechanism (Lukasz Luba).
- Fix and clean up the exynos-bus devfreq driver (Kamil Konieczny).
- Improve devfreq documentation and governor code, fix spelling
typos in devfreq (Ezequiel Garcia, Krzysztof Kozlowski, Leonard
Crestez, MyungJoo Ham, Gaël PORTAY).
- Add regulators enable and disable to the OPP (operating performance
points) framework (Kamil Konieczny).
- Update the OPP framework to support multiple opp-suspend properties
(Anson Huang).
- Fix assorted issues and make assorted minor improvements in the OPP
code (Niklas Cassel, Viresh Kumar, Yue Hu).
- Clean up the generic power domains (genpd) framework (Ulf Hansson).
- Clean up assorted pieces of power management code and documentation
(Akinobu Mita, Amit Kucheria, Chuhong Yuan).
- Update the pm-graph tool to version 5.5 including multiple fixes
and improvements (Todd Brandt).
- Update the cpupower utility (Benjamin Weis, Geert Uytterhoeven,
Sébastien Szymanski).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl2ArZ4SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxgfYQAK80hs43vWQDmp7XKrN4pQe8+qYULAGO
fBfrFl+NG9y/cnuqnt3NtA8MoyNsMMkMLkpkEDMfSbYqqH5ehEzX5+uGJWiWx8+Y
oH5KU8MH7Tj/utYaalGzDt0AHfHZDIGC0NCUNQJVtE/4mOANFabwsCwscp4MrD5Q
WjFN8U4BrsmWgJdZ/U9QIWcDZ0I+1etCF+rZG2yxSv31FMq2Zk/Qm4YyobqCvQFl
TR9rxl08wqUmIYIz5cDjt/3AKH7NLLDqOTstbCL7cmufM5XPFc1yox69xc89UrIa
4AMgmDp7SMwFG/gdUPof0WQNmx7qxmiRAPleAOYBOZW/8jPNZk2y+RhM5NeF72m7
AFqYiuxqatkSb4IsT8fLzH9IUZOdYr8uSmoMQECw+MHdApaKFjFV8Lb/qx5+AwkD
y7pwys8dZSamAjAf62eUzJDWcEwkNrujIisGrIXrVHb7ISbweskMOmdAYn9p4KgP
dfRzpJBJ45IaMIdbaVXNpg3rP7Apfs7X1X+/ZhG6f+zHH3zYwr8Y81WPqX8WaZJ4
qoVCyxiVWzMYjY2/1lzjaAdqWojPWHQ3or3eBaK52DouyG3jY6hCDTLwU7iuqcCX
jzAtrnqrNIKufvaObEmqcmYlIIOFT7QaJCtGUSRFQLfSon8fsVSR7LLeXoAMUJKT
JWQenuNaJngK
=TBDQ
-----END PGP SIGNATURE-----
Merge tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These include a rework of the main suspend-to-idle code flow (related
to the handling of spurious wakeups), a switch over of several users
of cpufreq notifiers to QoS-based limits, a new devfreq driver for
Tegra20, a new cpuidle driver and governor for virtualized guests, an
extension of the wakeup sources framework to expose wakeup sources as
device objects in sysfs, and more.
Specifics:
- Rework the main suspend-to-idle control flow to avoid repeating
"noirq" device resume and suspend operations in case of spurious
wakeups from the ACPI EC and decouple the ACPI EC wakeups support
from the LPS0 _DSM support (Rafael Wysocki).
- Extend the wakeup sources framework to expose wakeup sources as
device objects in sysfs (Tri Vo, Stephen Boyd).
- Expose system suspend statistics in sysfs (Kalesh Singh).
- Introduce a new haltpoll cpuidle driver and a new matching governor
for virtualized guests wanting to do guest-side polling in the idle
loop (Marcelo Tosatti, Joao Martins, Wanpeng Li, Stephen Rothwell).
- Fix the menu and teo cpuidle governors to allow the scheduler tick
to be stopped if PM QoS is used to limit the CPU idle state exit
latency in some cases (Rafael Wysocki).
- Increase the resolution of the play_idle() argument to microseconds
for more fine-grained injection of CPU idle cycles (Daniel
Lezcano).
- Switch over some users of cpuidle notifiers to the new QoS-based
frequency limits and drop the CPUFREQ_ADJUST and CPUFREQ_NOTIFY
policy notifier events (Viresh Kumar).
- Add new cpufreq driver based on nvmem for sun50i (Yangtao Li).
- Add support for MT8183 and MT8516 to the mediatek cpufreq driver
(Andrew-sh.Cheng, Fabien Parent).
- Add i.MX8MN support to the imx-cpufreq-dt cpufreq driver (Anson
Huang).
- Add qcs404 to cpufreq-dt-platdev blacklist (Jorge Ramirez-Ortiz).
- Update the qcom cpufreq driver (among other things, to make it
easier to extend and to use kryo cpufreq for other nvmem-based
SoCs) and add qcs404 support to it (Niklas Cassel, Douglas
RAILLARD, Sibi Sankar, Sricharan R).
- Fix assorted issues and make assorted minor improvements in the
cpufreq code (Colin Ian King, Douglas RAILLARD, Florian Fainelli,
Gustavo Silva, Hariprasad Kelam).
- Add new devfreq driver for NVidia Tegra20 (Dmitry Osipenko, Arnd
Bergmann).
- Add new Exynos PPMU events to devfreq events and extend that
mechanism (Lukasz Luba).
- Fix and clean up the exynos-bus devfreq driver (Kamil Konieczny).
- Improve devfreq documentation and governor code, fix spelling typos
in devfreq (Ezequiel Garcia, Krzysztof Kozlowski, Leonard Crestez,
MyungJoo Ham, Gaël PORTAY).
- Add regulators enable and disable to the OPP (operating performance
points) framework (Kamil Konieczny).
- Update the OPP framework to support multiple opp-suspend properties
(Anson Huang).
- Fix assorted issues and make assorted minor improvements in the OPP
code (Niklas Cassel, Viresh Kumar, Yue Hu).
- Clean up the generic power domains (genpd) framework (Ulf Hansson).
- Clean up assorted pieces of power management code and documentation
(Akinobu Mita, Amit Kucheria, Chuhong Yuan).
- Update the pm-graph tool to version 5.5 including multiple fixes
and improvements (Todd Brandt).
- Update the cpupower utility (Benjamin Weis, Geert Uytterhoeven,
Sébastien Szymanski)"
* tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (126 commits)
cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available
cpuidle-haltpoll: do not set an owner to allow modunload
cpuidle-haltpoll: return -ENODEV on modinit failure
cpuidle-haltpoll: set haltpoll as preferred governor
cpuidle: allow governor switch on cpuidle_register_driver()
PM: runtime: Documentation: add runtime_status ABI document
pm-graph: make setVal unbuffered again for python2 and python3
powercap: idle_inject: Use higher resolution for idle injection
cpuidle: play_idle: Increase the resolution to usec
cpuidle-haltpoll: vcpu hotplug support
cpufreq: Add qcs404 to cpufreq-dt-platdev blacklist
cpufreq: qcom: Add support for qcs404 on nvmem driver
cpufreq: qcom: Refactor the driver to make it easier to extend
cpufreq: qcom: Re-organise kryo cpufreq to use it for other nvmem based qcom socs
dt-bindings: opp: Add qcom-opp bindings with properties needed for CPR
dt-bindings: opp: qcom-nvmem: Support pstates provided by a power domain
Documentation: cpufreq: Update policy notifier documentation
cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events
PM / Domains: Verify PM domain type in dev_pm_genpd_set_performance_state()
PM / Domains: Simplify genpd_lookup_dev()
...
The downside of guest side polling is that polling is performed even
with other runnable tasks in the host. However, even if poll in kvm
can aware whether or not other runnable tasks in the same pCPU, it
can still incur extra overhead in over-subscribe scenario. Now we can
just enable guest polling when dedicated pCPUs are available.
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpuidle-haltpoll can be built as a module to allow optional late load.
Given we are setting @owner to THIS_MODULE, cpuidle will attempt to grab a
module reference every time a cpuidle_device is registered -- so
essentially all online cpus get a reference.
This prevents for the module to be unloaded later, which makes the
module_exit callback entirely unused. Thus remove the @owner and allow
module to be unloaded.
Fixes: fa86ee90eb ("add cpuidle-haltpoll driver")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When a user loads cpuidle-haltpoll on a non KVM guest the module will
successfully load, even though idle driver registration didn't take
place.
We should instead return -ENODEV signaling the user that the driver can't
be loaded, like other error paths in haltpoll_init(). An example of such
error paths is when we return -EBUSY when attempting to register an idle
driver when it had one already (e.g. intel_idle loads at boot and then we
attempt to insert module cpuidle-haltpoll).
Fixes: fa86ee90eb ("add cpuidle-haltpoll driver")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Right now, guest current governors have the following ratings:
* ladder -> 10
* teo -> 19
* menu -> 20
* haltpoll -> 21
* ladder + nohz=off -> 25
haltpoll governor got introduced and it is now the default governor given
its highest rating -- with ladder+nohz being the exception -- regardless of
idle driver in the guest. An example of an undesirable case is x86 KVM
guests with MWAIT which have intel_idle registered first, and consequently
will have haltpoll be used as governor which would get limited to a poll
state and state 1 and the other states wouldn't get used.
To keep the previous defaults we decrease rating of governor to 9 (below
current lowest rating) and thus rely on @governor switch on
cpuidle_register_driver() to tie in haltpoll idle driver and governor
together.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The recently introduced haltpoll driver is largely only useful with
haltpoll governor. To allow drivers to associate with a particular idle
behaviour, add a @governor property to 'struct cpuidle_driver' and thus
allow a cpuidle driver to switch to a *preferred* governor on idle driver
registration. We save the previous governor, and when an idle driver is
unregistered we switch back to that.
The @governor can be overridden by cpuidle.governor= boot param or
alternatively be ignored if the governor doesn't exist.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When cpus != maxcpus cpuidle-haltpoll will fail to register all vcpus
past the online ones and thus fail to register the idle driver.
This is because cpuidle_add_sysfs() will return with -ENODEV as a
consequence from get_cpu_device() return no device for a non-existing
CPU.
Instead switch to cpuidle_register_driver() and manually register each
of the present cpus through cpuhp_setup_state() callbacks and future
ones that get onlined or offlined. This mimmics similar logic that
intel_idle does.
Fixes: fa86ee90eb ("add cpuidle-haltpoll driver")
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Notice that setting measured_us to UINT_MAX in teo_update() earlier
doesn't change the behavior of the following code, so do that and
eliminate a redundant check used for setting measured_us to UINT_MAX.
This change is not expected to alter functionality.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Current PSCI code handles idle state entry through the
psci_cpu_suspend_enter() API, that takes an idle state index as a
parameter and convert the index into a previously initialized
power_state parameter before calling the PSCI.CPU_SUSPEND() with it.
This is unwieldly, since it forces the PSCI firmware layer to keep track
of power_state parameter for every idle state so that the
index->power_state conversion can be made in the PSCI firmware layer
instead of the CPUidle driver implementations.
Move the power_state handling out of drivers/firmware/psci
into the respective ACPI/DT PSCI CPUidle backends and convert
the psci_cpu_suspend_enter() API to get the power_state
parameter as input, which makes it closer to its firmware
interface PSCI.CPU_SUSPEND() API.
A notable side effect is that the PSCI ACPI/DT CPUidle backends
now can directly handle (and if needed update) power_state
parameters before handing them over to the PSCI firmware
interface to trigger PSCI.CPU_SUSPEND() calls.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Will Deacon <will@kernel.org>
Allow selection of the PSCI CPUidle in the kernel by updating
the respective Kconfig entry.
Remove PSCI callbacks from ARM/ARM64 generic CPU ops
to prevent the PSCI idle driver from clashing with the generic
ARM CPUidle driver initialization, that relies on CPU ops
to initialize and enter idle states.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Will Deacon <will@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Will Deacon <will@kernel.org>
PSCI firmware is the standard power management control for
all ARM64 based platforms and it is also deployed on some
ARM 32 bit platforms to date.
Idle state entry in PSCI is currently achieved by calling
arm_cpuidle_init() and arm_cpuidle_suspend() in a generic
idle driver, which in turn relies on ARM/ARM64 CPUidle back-end
to relay the call into PSCI firmware if PSCI is the boot method.
Given that PSCI is the standard idle entry method on ARM64 systems
(which means that no other CPUidle driver are expected on ARM64
platforms - so PSCI is already a generic idle driver), in order to
simplify idle entry and code maintenance, it makes sense to have a PSCI
specific idle driver so that idle code that it is currently living in
drivers/firmware directory can be hoisted out of it and moved
where it belongs, into a full-fledged PSCI driver, leaving PSCI code
in drivers/firmware as a pure firmware interface, as it should be.
Implement a PSCI CPUidle driver. By default it is a silent Kconfig entry
which is left unselected, since it selection would clash with the
generic ARM CPUidle driver that provides a PSCI based idle driver
through the arm/arm64 arches back-ends CPU operations.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Will Deacon <will@kernel.org>
CPUidle back-end operations are not implemented in some platforms
but this should not be considered an error serious enough to be
logged. Check the arm_cpuidle_init() return value to detect whether
the failure must be reported or not in the kernel log and do
not log it if the platform does not support CPUidle operations.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Signed-off-by: Will Deacon <will@kernel.org>
The TEO goveror prevents the scheduler tick from being stopped (unless
stopped already) if there is a PM QoS latency constraint for the given
CPU and the target residency of the deepest idle state matching that
constraint is below the tick boundary.
However, that is problematic if CPUs with PM QoS latency constraints
are idle for long times, because it effectively causes the tick to
run on them all the time which is wasteful. [It is also confusing
and questionable if they are full dynticks CPUs.]
To address that issue, modify the TEO governor to carry out the
entire search for the most suitable idle state (from the target
residency perspective) even if a latency constraint is present,
to allow it to determine the expected idle duration in all cases.
Also, when using the last several measured idle duration values
to refine the idle state selection, make it compare those values
with the current expected idle duration value (instead of
comparing them with the target residency of the idle state
selected so far) which should prevent the tick from being
retained when it makes sense to stop it sometimes (especially
in the presence of PM QoS latency constraints).
Fixes: b26bf6ab71 ("cpuidle: New timer events oriented governor for tickless systems")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After commit 554c8aa8ec ("sched: idle: Select idle state before
stopping the tick") the menu governor prevents the scheduler tick from
being stopped (unless stopped already) if there is a PM QoS latency
constraint for the given CPU and the target residency of the deepest
idle state matching that constraint is below the tick boundary.
However, that is problematic if CPUs with PM QoS latency constraints
are idle for long times, because it effectively causes the tick to
run on them all the time which is wasteful. [It is also confusing
and questionable if they are full dynticks CPUs.]
To address that issue, make the menu governor allow the tick to be
stopped only if the idle duration predicted by it is beyond the tick
boundary, except when the shallowest idle state is selected upfront
and it is not a "polling" one.
Fixes: 554c8aa8ec ("sched: idle: Select idle state before stopping the tick")
Link: https://lore.kernel.org/lkml/79b247b3-e056-610e-9a07-e685dfdaa6c9@gmail.com/
Reported-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Tested-by: Thomas Lindroth <thomas.lindroth@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When performing guest side polling, it is not necessary to
also perform host side polling.
So disable host side polling, via the new MSR interface,
when loading cpuidle-haltpoll driver.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The cpuidle_haltpoll governor, in conjunction with the haltpoll cpuidle
driver, allows guest vcpus to poll for a specified amount of time before
halting.
This provides the following benefits to host side polling:
1) The POLL flag is set while polling is performed, which allows
a remote vCPU to avoid sending an IPI (and the associated
cost of handling the IPI) when performing a wakeup.
2) The VM-exit cost can be avoided.
The downside of guest side polling is that polling is performed
even with other runnable tasks in the host.
Results comparing halt_poll_ns and server/client application
where a small packet is ping-ponged:
host --> 31.33
halt_poll_ns=300000 / no guest busy spin --> 33.40 (93.8%)
halt_poll_ns=0 / guest_halt_poll_ns=300000 --> 32.73 (95.7%)
For the SAP HANA benchmarks (where idle_spin is a parameter
of the previous version of the patch, results should be the
same):
hpns == halt_poll_ns
idle_spin=0/ idle_spin=800/ idle_spin=0/
hpns=200000 hpns=0 hpns=800000
DeleteC06T03 (100 thread) 1.76 1.71 (-3%) 1.78 (+1%)
InsertC16T02 (100 thread) 2.14 2.07 (-3%) 2.18 (+1.8%)
DeleteC00T01 (1 thread) 1.34 1.28 (-4.5%) 1.29 (-3.7%)
UpdateC00T03 (1 thread) 4.72 4.18 (-12%) 4.53 (-5%)
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since this field is shared by all governors, move it to
cpuidle device structure.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a poll_limit_ns variable to cpuidle_device structure.
Calculate and configure it in the new cpuidle_poll_time
function, in case its zero.
Individual governors are allowed to override this value.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a cpuidle driver that calls the architecture default_idle routine.
To be used in conjunction with the haltpoll governor.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* pm-cpufreq:
cpufreq: Make cpufreq_generic_init() return void
cpufreq: imx-cpufreq-dt: Add i.MX8MN support
cpufreq: Add QoS requests for userspace constraints
cpufreq: intel_pstate: Reuse refresh_frequency_limits()
cpufreq: Register notifiers with the PM QoS framework
PM / QoS: Add support for MIN/MAX frequency constraints
PM / QOS: Pass request type to dev_pm_qos_read_value()
PM / QOS: Rename __dev_pm_qos_read_value() and dev_pm_qos_raw_read_value()
PM / QOS: Pass request type to dev_pm_qos_{add|remove}_notifier()
dev_pm_qos_read_value() will soon need to support more constraint types
(min/max frequency) and will have another argument to it, i.e. type of
the constraint. While that is fine for the existing users of
dev_pm_qos_read_value(), but not that optimal for the callers of
__dev_pm_qos_read_value() and dev_pm_qos_raw_read_value() as all the
callers of these two routines are only looking for resume latency
constraint.
Lets make these two routines care only about the resume latency
constraint and rename them to __dev_pm_qos_resume_latency() and
dev_pm_qos_raw_resume_latency().
Suggested-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this file is released under the gplv2
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 68 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Armijn Hemel <armijn@tjaldur.nl>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190531190114.292346262@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this code is licenced under the gpl version 2 as described in the
copying file that acompanies the linux kernel
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 1 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528171439.466585205@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms and conditions of the gnu general public license
version 2 as published by the free software foundation this program
is distributed in the hope it will be useful but without any
warranty without even the implied warranty of merchantability or
fitness for a particular purpose see the gnu general public license
for more details you should have received a copy of the gnu general
public license along with this program if not see http www gnu org
licenses
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 228 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Steve Winslow <swinslow@gmail.com>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Alexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190528171438.107155473@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 3 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version this program is distributed in the
hope that it will be useful but without any warranty without even
the implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version [author] [kishon] [vijay] [abraham]
[i] [kishon]@[ti] [com] this program is distributed in the hope that
it will be useful but without any warranty without even the implied
warranty of merchantability or fitness for a particular purpose see
the gnu general public license for more details
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version [author] [graeme] [gregory]
[gg]@[slimlogic] [co] [uk] [author] [kishon] [vijay] [abraham] [i]
[kishon]@[ti] [com] [based] [on] [twl6030]_[usb] [c] [author] [hema]
[hk] [hemahk]@[ti] [com] this program is distributed in the hope
that it will be useful but without any warranty without even the
implied warranty of merchantability or fitness for a particular
purpose see the gnu general public license for more details
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 1105 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Reviewed-by: Richard Fontana <rfontana@redhat.com>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070033.202006027@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license as published by
the free software foundation either version 2 of the license or at
your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 3029 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To be able to predict the sleep duration for a CPU entering idle, it
is essential to know the expiration time of the next timer. Both the
teo and the menu cpuidle governors already use this information for
CPU idle state selection.
Moving forward, a similar prediction needs to be made for a group of
idle CPUs rather than for a single one and the following changes
implement a new genpd governor for that purpose.
In order to support that feature, add a new function called
tick_nohz_get_next_hrtimer() that will return the next hrtimer
expiration time of a given CPU to be invoked after deciding
whether or not to stop the scheduler tick on that CPU.
Make the cpuidle core call tick_nohz_get_next_hrtimer() right
before invoking the ->enter() callback provided by the cpuidle
driver for the given state and store its return value in the
per-CPU struct cpuidle_device, so as to make it available to code
outside of cpuidle.
Note that at the point when cpuidle calls tick_nohz_get_next_hrtimer(),
the governor's ->select() callback has already returned and indicated
whether or not the tick should be stopped, so in fact the value
returned by tick_nohz_get_next_hrtimer() always is the next hrtimer
expiration time for the given CPU, possibly including the tick (if
it hasn't been stopped).
Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
Co-developed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
[ rjw: Subject & changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since commit 45f1ff59e2 ("cpuidle: Return nohz hint from
cpuidle_select()") Exynos CPUidle driver stopped entering C1 (AFTR) mode
on Exynos4412-based Trats2 board.
Further analysis revealed that the CPUidle framework changed the way
it handles predicted timer ticks and reported target residency for the
given idle states. As a result, the C1 (AFTR) state was not chosen
anymore on completely idle device. The main issue was to high target
residency value. The similar C1 (AFTR) state for 'coupled' CPUidle
version used 10 times lower value for the target residency, despite
the fact that it is the same state from the hardware perspective.
The 100000us value for standard C1 (AFTR) mode is there from the begining
of the support for this idle state, added by the commit 67173ca492
("ARM: EXYNOS: Add support AFTR mode on EXYNOS4210"). That commit doesn't
give any reason for it, instead it looks like it was blindly copied from
the WFI/IDLE state of the same driver that time. That time, that value
was probably not really used by the framework for any critical decision,
so it didn't matter that much.
Now it turned out to be an issue, so unify the target residency with the
'coupled' version, as it seems to better match the real use case values
and restores the operation of the Exynos CPUidle driver on the idle
device.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After commit 61cb5758d3 ("cpuidle: Add cpuidle.governor= command
line parameter") new cpuidle governors are not added to the list
of available governors, so governor selection via sysfs doesn't
work as expected (even though it is rarely used anyway).
Fix that by making cpuidle_register_governor() add new governors to
cpuidle_governors again.
Fixes: 61cb5758d3 ("cpuidle: Add cpuidle.governor= command line parameter")
Reported-by: Kees Cook <keescook@chromium.org>
Cc: 5.0+ <stable@vger.kernel.org> # 5.0+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The variance computation in get_typical_interval() may overflow if
the square of the value of diff exceeds the maximum for the int64_t
data type value which basically is the case when it is of the order
of UINT_MAX.
However, data points so far in the future don't matter for idle
state selection anyway, so change the initial threshold value in
get_typical_interval() to INT_MAX which will cause more "outlying"
data points to be discarded without affecting the selection result.
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, the DT of the idle states will be parsed first whether it's
compatible or not. This could cause a warning message that comes from if
the CPU doesn't support identical idle states. E.g. Tegra186 can run
with 2 Cortex-A57 and 2 Denver cores with different idle states on
different types of these cores.
So fix it by checking the match node earlier, then it can make sure it
only goes through the idle states that the CPU supported.
Signed-off-by: Joseph Lo <josephl@nvidia.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The default time is declared in units of microsecnds,
but is used as nanoseconds, resulting in significant
accounting errors for idle state 0 time when all idle
states deeper than 0 are disabled.
Under these unusual conditions, we don't really care
about the poll time limit anyhow.
Fixes: 800fb34a99 ("cpuidle: poll_state: Disregard disable idle states")
Signed-off-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The venerable menu governor does some things that are quite
questionable in my view.
First, it includes timer wakeups in the pattern detection data and
mixes them up with wakeups from other sources which in some cases
causes it to expect what essentially would be a timer wakeup in a
time frame in which no timer wakeups are possible (because it knows
the time until the next timer event and that is later than the
expected wakeup time).
Second, it uses the extra exit latency limit based on the predicted
idle duration and depending on the number of tasks waiting on I/O,
even though those tasks may run on a different CPU when they are
woken up. Moreover, the time ranges used by it for the sleep length
correction factors depend on whether or not there are tasks waiting
on I/O, which again doesn't imply anything in particular, and they
are not correlated to the list of available idle states in any way
whatever.
Also, the pattern detection code in menu may end up considering
values that are too large to matter at all, in which cases running
it is a waste of time.
A major rework of the menu governor would be required to address
these issues and the performance of at least some workloads (tuned
specifically to the current behavior of the menu governor) is likely
to suffer from that. It is thus better to introduce an entirely new
governor without them and let everybody use the governor that works
better with their actual workloads.
The new governor introduced here, the timer events oriented (TEO)
governor, uses the same basic strategy as menu: it always tries to
find the deepest idle state that can be used in the given conditions.
However, it applies a different approach to that problem.
First, it doesn't use "correction factors" for the time till the
closest timer, but instead it tries to correlate the measured idle
duration values with the available idle states and use that
information to pick up the idle state that is most likely to "match"
the upcoming CPU idle interval.
Second, it doesn't take the number of "I/O waiters" into account at
all and the pattern detection code in it avoids taking timer wakeups
into account. It also only uses idle duration values less than the
current time till the closest timer (with the tick excluded) for that
purpose.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Notable changes:
- Mitigations for Spectre v2 on some Freescale (NXP) CPUs.
- A large series adding support for pass-through of Nvidia V100 GPUs to guests
on Power9.
- Another large series to enable hardware assistance for TLB table walk on
MPC8xx CPUs.
- Some preparatory changes to our DMA code, to make way for further cleanups
from Christoph.
- Several fixes for our Transactional Memory handling discovered by fuzzing the
signal return path.
- Support for generating our system call table(s) from a text file like other
architectures.
- A fix to our page fault handler so that instead of generating a WARN_ON_ONCE,
user accesses of kernel addresses instead print a ratelimited and
appropriately scary warning.
- A cosmetic change to make our unhandled page fault messages more similar to
other arches and also more compact and informative.
- Freescale updates from Scott:
"Highlights include elimination of legacy clock bindings use from dts
files, an 83xx watchdog handler, fixes to old dts interrupt errors, and
some minor cleanup."
And many clean-ups, reworks and minor fixes etc.
Thanks to:
Alexandre Belloni, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V,
Arnd Bergmann, Benjamin Herrenschmidt, Breno Leitao, Christian Lamparter,
Christophe Leroy, Christoph Hellwig, Daniel Axtens, Darren Stevens, David
Gibson, Diana Craciun, Dmitry V. Levin, Firoz Khan, Geert Uytterhoeven, Greg
Kurz, Gustavo Romero, Hari Bathini, Joel Stanley, Kees Cook, Madhavan
Srinivasan, Mahesh Salgaonkar, Markus Elfring, Mathieu Malaterre, Michal
Suchánek, Naveen N. Rao, Nick Desaulniers, Oliver O'Halloran, Paul Mackerras,
Ram Pai, Ravi Bangoria, Rob Herring, Russell Currey, Sabyasachi Gupta, Sam
Bobroff, Satheesh Rajendran, Scott Wood, Segher Boessenkool, Stephen Rothwell,
Tang Yuantian, Thiago Jung Bauermann, Yangtao Li, Yuantian Tang, Yue Haibing.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJcJLwZAAoJEFHr6jzI4aWAAv4P/jMvP52lA90i2E8G72LOVSF1
33DbE/Okib3VfmmMcXZpgpEfwIcEmJcIj86WWcLWzBfXLunehkgwh+AOfBLwqWch
D08+RR9EZb7ppvGe91hvSgn4/28CWVKAxuDviSuoE1OK8lOTncu889r2+AxVFZiY
f6Al9UPlB3FTJonNx8iO4r/GwrPigukjbzp1vkmJJg59LvNUrMQ1Fgf9D3cdlslH
z4Ff9zS26RJy7cwZYQZI4sZXJZmeQ1DxOZ+6z6FL/nZp/O4WLgpw6C6o1+vxo1kE
9ZnO/3+zIRhoWiXd6OcOQXBv3NNCjJZlXh9HHAiL8m5ZqbmxrStQWGyKW/jjEZuK
wVHxfUT19x9Qy1p+BH3XcUNMlxchYgcCbEi5yPX2p9ZDXD6ogNG7sT1+NO+FBTww
ueCT5PCCB/xWOccQlBErFTMkFXFLtyPDNFK7BkV7uxbH0PQ+9guCvjWfBZti6wjD
/6NK4mk7FpmCiK13Y1xjwC5OqabxLUYwtVuHYOMr5TOPh8URUPS4+0pIOdoYDM6z
Ensrq1CC843h59MWADgFHSlZ78FRtZlG37JAXunjLbqGupLOvL7phC9lnwkylHga
2hWUWFeOV8HFQBP4gidZkLk64pkT9LzqHgdgIB4wUwrhc8r2mMZGdQTq5H7kOn3Q
n9I48PWANvEC0PBCJ/KL
=cr6s
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Notable changes:
- Mitigations for Spectre v2 on some Freescale (NXP) CPUs.
- A large series adding support for pass-through of Nvidia V100 GPUs
to guests on Power9.
- Another large series to enable hardware assistance for TLB table
walk on MPC8xx CPUs.
- Some preparatory changes to our DMA code, to make way for further
cleanups from Christoph.
- Several fixes for our Transactional Memory handling discovered by
fuzzing the signal return path.
- Support for generating our system call table(s) from a text file
like other architectures.
- A fix to our page fault handler so that instead of generating a
WARN_ON_ONCE, user accesses of kernel addresses instead print a
ratelimited and appropriately scary warning.
- A cosmetic change to make our unhandled page fault messages more
similar to other arches and also more compact and informative.
- Freescale updates from Scott:
"Highlights include elimination of legacy clock bindings use from
dts files, an 83xx watchdog handler, fixes to old dts interrupt
errors, and some minor cleanup."
And many clean-ups, reworks and minor fixes etc.
Thanks to: Alexandre Belloni, Alexey Kardashevskiy, Andrew Donnellan,
Aneesh Kumar K.V, Arnd Bergmann, Benjamin Herrenschmidt, Breno Leitao,
Christian Lamparter, Christophe Leroy, Christoph Hellwig, Daniel
Axtens, Darren Stevens, David Gibson, Diana Craciun, Dmitry V. Levin,
Firoz Khan, Geert Uytterhoeven, Greg Kurz, Gustavo Romero, Hari
Bathini, Joel Stanley, Kees Cook, Madhavan Srinivasan, Mahesh
Salgaonkar, Markus Elfring, Mathieu Malaterre, Michal Suchánek, Naveen
N. Rao, Nick Desaulniers, Oliver O'Halloran, Paul Mackerras, Ram Pai,
Ravi Bangoria, Rob Herring, Russell Currey, Sabyasachi Gupta, Sam
Bobroff, Satheesh Rajendran, Scott Wood, Segher Boessenkool, Stephen
Rothwell, Tang Yuantian, Thiago Jung Bauermann, Yangtao Li, Yuantian
Tang, Yue Haibing"
* tag 'powerpc-4.21-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (201 commits)
Revert "powerpc/fsl_pci: simplify fsl_pci_dma_set_mask"
powerpc/zImage: Also check for stdout-path
powerpc: Fix HMIs on big-endian with CONFIG_RELOCATABLE=y
macintosh: Use of_node_name_{eq, prefix} for node name comparisons
ide: Use of_node_name_eq for node name comparisons
powerpc: Use of_node_name_eq for node name comparisons
powerpc/pseries/pmem: Convert to %pOFn instead of device_node.name
powerpc/mm: Remove very old comment in hash-4k.h
powerpc/pseries: Fix node leak in update_lmb_associativity_index()
powerpc/configs/85xx: Enable CONFIG_DEBUG_KERNEL
powerpc/dts/fsl: Fix dtc-flagged interrupt errors
clk: qoriq: add more compatibles strings
powerpc/fsl: Use new clockgen binding
powerpc/83xx: handle machine check caused by watchdog timer
powerpc/fsl-rio: fix spelling mistake "reserverd" -> "reserved"
powerpc/fsl_pci: simplify fsl_pci_dma_set_mask
arch/powerpc/fsl_rmu: Use dma_zalloc_coherent
vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver
vfio_pci: Allow regions to add own capabilities
vfio_pci: Allow mapping extra regions
...
Add two new metrics for CPU idle states, "above" and "below", to count
the number of times the given state had been asked for (or entered
from the kernel's perspective), but the observed idle duration turned
out to be too short or too long for it (respectively).
These metrics help to estimate the quality of the CPU idle governor
in use.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
of_find_node_by_path() acquires a reference to the node
returned by it and that reference needs to be dropped by its caller.
bl_idle_init() doesn't do that, so fix it.
Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add cpuidle.governor= command line parameter to allow the default
cpuidle governor to be replaced.
That is useful, for example, if someone running a tickful kernel
wants to use the menu governor on it.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When computing the limit of time to spend in the loop in poll_idle(),
use the target residency of the first enabled idle state deeper than
state 0 instead of always using the target residency of state 1.
This helps when state 1 is disabled for diagnostics, for instance.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When booting a pseries kernel with PREEMPT enabled, it dumps the
following warning:
BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is pseries_processor_idle_init+0x5c/0x22c
CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828
Call Trace:
[c000000429437ab0] [c0000000009c8878] dump_stack+0xec/0x164 (unreliable)
[c000000429437b00] [c0000000005f2f24] check_preemption_disabled+0x154/0x160
[c000000429437b90] [c000000000cab8e8] pseries_processor_idle_init+0x5c/0x22c
[c000000429437c10] [c000000000010ed4] do_one_initcall+0x64/0x300
[c000000429437ce0] [c000000000c54500] kernel_init_freeable+0x3f0/0x500
[c000000429437db0] [c0000000000112dc] kernel_init+0x2c/0x160
[c000000429437e20] [c00000000000c1d0] ret_from_kernel_thread+0x5c/0x6c
This happens because the code calls get_lppaca() which calls
get_paca() and it checks if preemption is disabled through
check_preemption_disabled().
Preemption should be disabled because the per CPU variable may make no
sense if there is a preemption (and a CPU switch) after it reads the
per CPU data and when it is used.
In this device driver specifically, it is not a problem, because this
code just needs to have access to one lppaca struct, and it does not
matter if it is the current per CPU lppaca struct or not (i.e. when
there is a preemption and a CPU migration).
That said, the most appropriate fix seems to be related to avoiding
the debug_smp_processor_id() call at get_paca(), instead of calling
preempt_disable() before get_paca().
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
The only reason that remains, to why the ARM cpuidle driver calls
cpuidle_register_driver(), is to avoid printing an error message in case
another driver already have been registered for the CPU. This seems a bit
silly, but more importantly, if that is a common scenario, perhaps we
should change cpuidle_register() accordingly instead.
In either case, let's consolidate the code, by converting to use
cpuidle_register|unregister(), which also avoids the unnecessary allocation
of the struct cpuidle_device.
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>