WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Rafael J. Wysocki	a802ea9645	cpuidle: Check the sign of index in cpuidle_reflect() Avoid calling the governor's ->reflect method if the state index passed to cpuidle_reflect() is negative. This allows the analogous check to be dropped from menu_reflect(), so do that too, and ensures that arbitrary error codes can be passed to cpuidle_reflect() as the index with no adverse consequences. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2015-05-04 22:53:28 +02:00
Rafael J. Wysocki	df8d9eeadd	cpuidle: Run tick_broadcast_exit() with disabled interrupts Commit `335f49196f` (sched/idle: Use explicit broadcast oneshot control function) replaced clockevents_notify() invocations in cpuidle_idle_call() with direct calls to tick_broadcast_enter() and tick_broadcast_exit(), but it overlooked the fact that interrupts were already enabled before calling the latter which led to functional breakage on systems using idle states with the CPUIDLE_FLAG_TIMER_STOP flag set. Fix that by moving the invocations of tick_broadcast_enter() and tick_broadcast_exit() down into cpuidle_enter_state() where interrupts are still disabled when tick_broadcast_exit() is called. Also ensure that interrupts will be disabled before running tick_broadcast_exit() even if they have been enabled by the idle state's ->enter callback. Trigger a WARN_ON_ONCE() in that case, as we generally don't want that to happen for states with CPUIDLE_FLAG_TIMER_STOP set. Fixes: `335f49196f` (sched/idle: Use explicit broadcast oneshot control function) Reported-and-tested-by: Linus Walleij <linus.walleij@linaro.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-04-29 15:19:21 +02:00
Bartlomiej Zolnierkiewicz	d75e4af14e	cpuidle: remove state_count field from struct cpuidle_device Thomas Schlichter reports the following issue on his Samsung NC20: "The C-states C1 and C2 to the OS when connected to AC, and additionally provides the C3 C-state when disconnected from AC. However, the number of C-states shown in sysfs is fixed to the number of C-states present at boot. If I boot with AC connected, I always only see the C-states up to C2 even if I disconnect AC. The reason is commit `130a5f6924` (ACPI / cpuidle: remove dev->state_count setting). It removes the update of dev->state_count, but sysfs uses exactly this variable to show the C-states. The fix is to use drv->state_count in sysfs. As this is currently the last user of dev->state_count, this variable can be completely removed." Remove dev->state_count as per the above. Reported-by: Thomas Schlichter <thomas.schlichter@web.de> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: 3.14+ <stable@vger.kernel.org> # 3.14+ [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-04-03 13:15:50 +02:00
Rafael J. Wysocki	ef2b22ac54	cpuidle / sleep: Use broadcast timer for states that stop local timer Commit `3810631332` (PM / sleep: Re-implement suspend-to-idle handling) overlooked the fact that entering some sufficiently deep idle states by CPUs may cause their local timers to stop and in those cases it is necessary to switch over to a broadcast timer prior to entering the idle state. If the cpuidle driver in use does not provide the new ->enter_freeze callback for any of the idle states, that problem affects suspend-to-idle too, but it is not taken into account after the changes made by commit `3810631332`. Fix that by changing the definition of cpuidle_enter_freeze() and re-arranging of the code in cpuidle_idle_call(), so the former does not call cpuidle_enter() any more and the fallback case is handled by cpuidle_idle_call() directly. Fixes: `3810631332` (PM / sleep: Re-implement suspend-to-idle handling) Reported-and-tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2015-03-05 23:13:19 +01:00
Rafael J. Wysocki	31a3409065	cpuidle / sleep: Do sanity checks in cpuidle_enter_freeze() too Modify cpuidle_enter_freeze() to do the sanity checks done by cpuidle_select() to avoid crashing the suspend-to-idle code path in case something is missing. Fixes: `3810631332` (PM / sleep: Re-implement suspend-to-idle handling) Original-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2015-02-28 23:46:39 +01:00
Rafael J. Wysocki	01e04f466e	idle / sleep: Avoid excessive disabling and enabling interrupts Disabling interrupts at the end of cpuidle_enter_freeze() is not useful, because its caller, cpuidle_idle_call(), re-enables them right away after invoking it. To avoid that unnecessary back and forth dance with interrupts, make cpuidle_enter_freeze() enable interrupts after calling enter_freeze_proper() and drop the local_irq_disable() at its end, so that all of the code paths in it end up with interrupts enabled. Then, cpuidle_idle_call() will not need to re-enable interrupts after calling cpuidle_enter_freeze() any more, because the latter will return with interrupts enabled, in analogy with cpuidle_enter(). Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2015-02-28 23:46:24 +01:00
Rafael J. Wysocki	124cf9117c	PM / sleep: Make it possible to quiesce timers during suspend-to-idle The efficiency of suspend-to-idle depends on being able to keep CPUs in the deepest available idle states for as much time as possible. Ideally, they should only be brought out of idle by system wakeup interrupts. However, timer interrupts occurring periodically prevent that from happening and it is not practical to chase all of the "misbehaving" timers in a whack-a-mole fashion. A much more effective approach is to suspend the local ticks for all CPUs and the entire timekeeping along the lines of what is done during full suspend, which also helps to keep suspend-to-idle and full suspend reasonably similar. The idea is to suspend the local tick on each CPU executing cpuidle_enter_freeze() and to make the last of them suspend the entire timekeeping. That should prevent timer interrupts from triggering until an IO interrupt wakes up one of the CPUs. It needs to be done with interrupts disabled on all of the CPUs, though, because otherwise the suspended clocksource might be accessed by an interrupt handler which might lead to fatal consequences. Unfortunately, the existing ->enter callbacks provided by cpuidle drivers generally cannot be used for implementing that, because some of them re-enable interrupts temporarily and some idle entry methods cause interrupts to be re-enabled automatically on exit. Also some of these callbacks manipulate local clock event devices of the CPUs which really shouldn't be done after suspending their ticks. To overcome that difficulty, introduce a new cpuidle state callback, ->enter_freeze, that will be guaranteed (1) to keep interrupts disabled all the time (and return with interrupts disabled) and (2) not to touch the CPU timer devices. Modify cpuidle_enter_freeze() to look for the deepest available idle state with ->enter_freeze present and to make the CPU execute that callback with suspended tick (and the last of the online CPUs to execute it with suspended timekeeping). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2015-02-15 19:40:09 +01:00
Rafael J. Wysocki	3810631332	PM / sleep: Re-implement suspend-to-idle handling In preparation for adding support for quiescing timers in the final stage of suspend-to-idle transitions, rework the freeze_enter() function making the system wait on a wakeup event, the freeze_wake() function terminating the suspend-to-idle loop and the mechanism by which deep idle states are entered during suspend-to-idle. First of all, introduce a simple state machine for suspend-to-idle and make the code in question use it. Second, prevent freeze_enter() from losing wakeup events due to race conditions and ensure that the number of online CPUs won't change while it is being executed. In addition to that, make it force all of the CPUs re-enter the idle loop in case they are in idle states already (so they can enter deeper idle states if possible). Next, drop cpuidle_use_deepest_state() and replace use_deepest_state checks in cpuidle_select() and cpuidle_reflect() with a single suspend-to-idle state check in cpuidle_idle_call(). Finally, introduce cpuidle_enter_freeze() that will simply find the deepest idle state available to the given CPU and enter it using cpuidle_enter(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>	2015-02-13 23:49:36 +01:00
Daniel Lezcano	442bf3aaf5	sched: Let the scheduler see CPU idle states When the cpu enters idle, it stores the cpuidle state pointer in its struct rq instance which in turn could be used to make a better decision when balancing tasks. As soon as the cpu exits its idle state, the struct rq reference is cleared. There are a couple of situations where the idle state pointer could be changed while it is being consulted: 1. For x86/acpi with dynamic c-states, when a laptop switches from battery to AC that could result on removing the deeper idle state. The acpi driver triggers: 'acpi_processor_cst_has_changed' 'cpuidle_pause_and_lock' 'cpuidle_uninstall_idle_handler' 'kick_all_cpus_sync'. All cpus will exit their idle state and the pointed object will be set to NULL. 2. The cpuidle driver is unloaded. Logically that could happen but not in practice because the drivers are always compiled in and 95% of them are not coded to unregister themselves. In any case, the unloading code must call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync' as mentioned above. A race can happen if we use the pointer and then one of these two scenarios occurs at the same moment. In order to be safe, the idle state pointer stored in the rq must be used inside a rcu_read_lock section where we are protected with the 'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The idle_get_state() and idle_put_state() accessors should be used to that effect. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Cc: linaro-kernel@lists.linaro.org Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/n/tip-@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-24 14:46:58 +02:00
Chuansheng Liu	2ed903c548	cpuidle: Use wake_up_all_idle_cpus() to wake up all idle cpus Currently kick_all_cpus_sync() or smp_call_function() can not break the polling idle cpu immediately. Instead using wake_up_all_idle_cpus() which can wake up the polling idle cpu quickly is much more helpful for power. Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-pm@vger.kernel.org Cc: changcheng.liu@intel.com Cc: xiaoming.wang@intel.com Cc: souvik.k.chakravarty@intel.com Cc: luto@amacapital.net Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: linux-pm@vger.kernel.org Link: http://lkml.kernel.org/r/1409815075-4180-3-git-send-email-chuansheng.liu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-09-19 12:35:16 +02:00
Sandeep Tripathy	30fe688402	cpuidle: move idle traces to cpuidle_enter_state() idle_exit event is the first event after a core exits idle state. So this should be traced before local irq is ebabled. Likewise idle_entry is the last event before a core enters idle state. This will ease visualising the cpu idle state from kernel traces. Signed-off-by: Sandeep Tripathy <sandeep.tripathy@linaro.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> [rjw: Subject, rebase] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-07-09 15:45:23 +02:00
Rafael J. Wysocki	a6220fc19a	PM / suspend: Always use deepest C-state in the "freeze" sleep state If freeze_enter() is called, we want to bypass the current cpuidle governor and always use the deepest available (that is, not disabled) C-state, because we want to save as much energy as reasonably possible then and runtime latency constraints don't matter at that point, since the system is in a sleep state anyway. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Aubrey Li <aubrey.li@linux.intel.com>	2014-05-07 01:49:28 +02:00
Rafael J. Wysocki	52c324f8a8	cpuidle: Combine cpuidle_enabled() with cpuidle_select() Since both cpuidle_enabled() and cpuidle_select() are only called by cpuidle_idle_call(), it is not really useful to keep them separate and combining them will help to avoid complicating cpuidle_idle_call() even further if governors are changed to return error codes sometimes. This code modification shouldn't lead to any functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-05-01 00:13:47 +02:00
Linus Torvalds	05bf58ca4b	Merge branch 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull sched/idle changes from Ingo Molnar: "More idle code reorganization, to prepare for more integration. (Sent separately because it depended on pending timer work, which is now upstream)" * 'sched-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/idle: Add more comments to the code sched/idle: Move idle conditions in cpuidle_idle main function sched/idle: Reorganize the idle loop cpuidle/idle: Move the cpuidle_idle_call function to idle.c idle/cpuidle: Split cpuidle_idle_call main function into smaller functions	2014-04-02 16:22:27 -07:00
Linus Torvalds	4dedde7c7a	ACPI and power management updates for 3.15-rc1 - Device PM QoS support for latency tolerance constraints on systems with hardware interfaces allowing such constraints to be specified. That is necessary to prevent hardware-driven power management from becoming overly aggressive on some systems and to prevent power management features leading to excessive latencies from being used in some cases. - Consolidation of the handling of ACPI hotplug notifications for device objects. This causes all device hotplug notifications to go through the root notify handler (that was executed for all of them anyway before) that propagates them to individual subsystems, if necessary, by executing callbacks provided by those subsystems (those callbacks are associated with struct acpi_device objects during device enumeration). As a result, the code in question becomes both smaller in size and more straightforward and all of those changes should not affect users. - ACPICA update, including fixes related to the handling of _PRT in cases when it is broken and the addition of "Windows 2013" to the list of supported "features" for _OSI (which is necessary to support systems that work incorrectly or don't even boot without it). Changes from Bob Moore and Lv Zheng. - Consolidation of ACPI _OST handling from Jiang Liu. - ACPI battery and AC fixes allowing unusual system configurations to be handled by that code from Alexander Mezin. - New device IDs for the ACPI LPSS driver from Chiau Ee Chew. - ACPI fan and thermal optimizations related to system suspend and resume from Aaron Lu. - Cleanups related to ACPI video from Jean Delvare. - Assorted ACPI fixes and cleanups from Al Stone, Hanjun Guo, Lan Tianyu, Paul Bolle, Tomasz Nowicki. - Intel RAPL (Running Average Power Limits) driver cleanups from Jacob Pan. - intel_pstate fixes and cleanups from Dirk Brandewie. - cpufreq fixes related to system suspend/resume handling from Viresh Kumar. - cpufreq core fixes and cleanups from Viresh Kumar, Stratos Karafotis, Saravana Kannan, Rashika Kheria, Joe Perches. - cpufreq drivers updates from Viresh Kumar, Zhuoyu Zhang, Rob Herring. - cpuidle fixes related to the menu governor from Tuukka Tikkanen. - cpuidle fix related to coupled CPUs handling from Paul Burton. - Asynchronous execution of all device suspend and resume callbacks, except for ->prepare and ->complete, during system suspend and resume from Chuansheng Liu. - Delayed resuming of runtime-suspended devices during system suspend for the PCI bus type and ACPI PM domain. - New set of PM helper routines to allow device runtime PM callbacks to be used during system suspend and resume more easily from Ulf Hansson. - Assorted fixes and cleanups in the PM core from Geert Uytterhoeven, Prabhakar Lad, Philipp Zabel, Rashika Kheria, Sebastian Capella. - devfreq fix from Saravana Kannan. / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJTLgB1AAoJEILEb/54YlRxfs4P/35fIu9h8ClNWUPXqi3nlGIt yMyumKvF1VdsOKLbjTtFq6B3UOlhqDijYTCQd7Xt7X8ONTk/ND9ec2t/5xGkSdUI q46fa0qZXeqUn0Kt2t+kl6tgVQOkDj94aNlEh+7Ya3Uu6WYDDfmZtOBOFAMk6D8l ND4rHJpX+eUsRLBrcxaUxxdD8AW5guGcPKyeyzsXv1bY1BZnpLFrZ3PhuI5dn2CL L/zmk3A+wG6+ZlQxnwDdrKa3E6uhRSIDeF0vI4Byspa1wi5zXknJG2J7MoQ9JEE9 VQpBXlqach5wgXqJ8PAqAeaB6Ie26/F7PYG8r446zKw/5UUtdNUx+0dkjQ7Mz8Tu ajuVxfwrrPhZeQqmVBxlH5Gg7Ez2KBKEfDxTdRnzI7FoA7PE5XDcg3kO64bhj8LJ yugnV/ToU9wMztZnPC7CoGPwUgxMJvr9LwmxS4aeKcVUBES05eg0vS3lwdZMgqkV iO0QkWTmhZ952qZCqZxbh0JqaaX8Wgx2kpX2tf1G2GJqLMZco289bLh6njNT+8CH EzdQKYYyn6G6+Qg2M0f/6So3qU17x9XtE4ZBWQdGDpqYOGZhjZAOs/VnB1Ysw/K3 cDBzswlJd0CyyUps9B+qbf49OpbWVwl5kKeuHUuPxugEVryhpSp9AuG+tNil74Sj JuGTGR4fyFjDBX5cvAPm =ywR6 -----END PGP SIGNATURE----- Merge tag 'pm+acpi-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI and power management updates from Rafael Wysocki: "The majority of this material spent some time in linux-next, some of it even several weeks. There are a few relatively fresh commits in it, but they are mostly fixes and simple cleanups. ACPI took the lead this time, both in terms of the number of commits and the number of modified lines of code, cpufreq follows and there are a few changes in the PM core and in cpuidle too. A new feature that already got some LWN.net's attention is the device PM QoS extension allowing latency tolerance requirements to be propagated from leaf devices to their ancestors with hardware interfaces for specifying latency tolerance. That should help systems with hardware-driven power management to avoid going too far with it in cases when there are latency tolerance constraints. There also are some significant changes in the ACPI core related to the way in which hotplug notifications are handled. They affect PCI hotplug (ACPIPHP) and the ACPI dock station code too. The bottom line is that all those notification now go through the root notify handler and are propagated to the interested subsystems by means of callbacks instead of having to install a notify handler for each device object that we can potentially get hotplug notifications for. In addition to that ACPICA will now advertise "Windows 2013" compatibility for _OSI, because some systems out there don't work correctly if that is not done (some of them don't even boot). On the system suspend side of things, all of the device suspend and resume callbacks, except for ->prepare() and ->complete(), are now going to be executed asynchronously as that turns out to speed up system suspend and resume on some platforms quite significantly and we have a few more optimizations in that area. Apart from that, there are some new device IDs and fixes and cleanups all over. In particular, the system suspend and resume handling by cpufreq should be improved and the cpuidle menu governor should be a bit more robust now. Specifics: - Device PM QoS support for latency tolerance constraints on systems with hardware interfaces allowing such constraints to be specified. That is necessary to prevent hardware-driven power management from becoming overly aggressive on some systems and to prevent power management features leading to excessive latencies from being used in some cases. - Consolidation of the handling of ACPI hotplug notifications for device objects. This causes all device hotplug notifications to go through the root notify handler (that was executed for all of them anyway before) that propagates them to individual subsystems, if necessary, by executing callbacks provided by those subsystems (those callbacks are associated with struct acpi_device objects during device enumeration). As a result, the code in question becomes both smaller in size and more straightforward and all of those changes should not affect users. - ACPICA update, including fixes related to the handling of _PRT in cases when it is broken and the addition of "Windows 2013" to the list of supported "features" for _OSI (which is necessary to support systems that work incorrectly or don't even boot without it). Changes from Bob Moore and Lv Zheng. - Consolidation of ACPI _OST handling from Jiang Liu. - ACPI battery and AC fixes allowing unusual system configurations to be handled by that code from Alexander Mezin. - New device IDs for the ACPI LPSS driver from Chiau Ee Chew. - ACPI fan and thermal optimizations related to system suspend and resume from Aaron Lu. - Cleanups related to ACPI video from Jean Delvare. - Assorted ACPI fixes and cleanups from Al Stone, Hanjun Guo, Lan Tianyu, Paul Bolle, Tomasz Nowicki. - Intel RAPL (Running Average Power Limits) driver cleanups from Jacob Pan. - intel_pstate fixes and cleanups from Dirk Brandewie. - cpufreq fixes related to system suspend/resume handling from Viresh Kumar. - cpufreq core fixes and cleanups from Viresh Kumar, Stratos Karafotis, Saravana Kannan, Rashika Kheria, Joe Perches. - cpufreq drivers updates from Viresh Kumar, Zhuoyu Zhang, Rob Herring. - cpuidle fixes related to the menu governor from Tuukka Tikkanen. - cpuidle fix related to coupled CPUs handling from Paul Burton. - Asynchronous execution of all device suspend and resume callbacks, except for ->prepare and ->complete, during system suspend and resume from Chuansheng Liu. - Delayed resuming of runtime-suspended devices during system suspend for the PCI bus type and ACPI PM domain. - New set of PM helper routines to allow device runtime PM callbacks to be used during system suspend and resume more easily from Ulf Hansson. - Assorted fixes and cleanups in the PM core from Geert Uytterhoeven, Prabhakar Lad, Philipp Zabel, Rashika Kheria, Sebastian Capella. - devfreq fix from Saravana Kannan" * tag 'pm+acpi-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits) PM / devfreq: Rewrite devfreq_update_status() to fix multiple bugs PM / sleep: Correct whitespace errors in <linux/pm.h> intel_pstate: Set core to min P state during core offline cpufreq: Add stop CPU callback to cpufreq_driver interface cpufreq: Remove unnecessary braces cpufreq: Fix checkpatch errors and warnings cpufreq: powerpc: add cpufreq transition latency for FSL e500mc SoCs MAINTAINERS: Reorder maintainer addresses for PM and ACPI PM / Runtime: Update runtime_idle() documentation for return value meaning video / output: Drop display output class support fujitsu-laptop: Drop unneeded include acer-wmi: Stop selecting VIDEO_OUTPUT_CONTROL ACPI / gpu / drm: Stop selecting VIDEO_OUTPUT_CONTROL ACPI / video: fix ACPI_VIDEO dependencies cpufreq: remove unused notifier: CPUFREQ_{SUSPENDCHANGE\|RESUMECHANGE} cpufreq: Do not allow ->setpolicy drivers to provide ->target cpufreq: arm_big_little: set 'physical_cluster' for each CPU cpufreq: arm_big_little: make vexpress driver depend on bL core driver ACPI / button: Add ACPI Button event via netlink routine ACPI: Remove duplicate definitions of PREFIX ...	2014-04-01 12:48:54 -07:00
Paul Burton	0b89e9aa28	cpuidle: delay enabling interrupts until all coupled CPUs leave idle As described by a comment at the end of cpuidle_enter_state_coupled it can be inefficient for coupled idle states to return with IRQs enabled since they may proceed to service an interrupt instead of clearing the coupled idle state. Until they have finished & cleared the idle state all CPUs coupled with them will spin rather than being able to enter a safe idle state. Commits `e1689795a7` "cpuidle: Add common time keeping and irq enabling" and `554c06ba3e` "cpuidle: remove en_core_tk_irqen flag" led to the cpuidle_enter_state enabling interrupts for all idle states, including coupled ones, making this inefficiency unavoidable by drivers & the local_irq_enable near the end of cpuidle_enter_state_coupled redundant. This patch avoids enabling interrupts in cpuidle_enter_state after a coupled state has been entered, allowing them to remain disabled until all coupled CPUs have exited the idle state and cpuidle_enter_state_coupled re-enables them. Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Paul Burton <paul.burton@imgtec.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2014-03-12 00:24:21 +01:00
Daniel Lezcano	30cdd69e2a	cpuidle/idle: Move the cpuidle_idle_call function to idle.c The cpuidle_idle_call does nothing more than calling the three individuals function and is no longer used by any arch specific code but only in the cpuidle framework code. We can move this function into the idle task code to ensure better proximity to the scheduler code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: rjw@rjwysocki.net Cc: preeti@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1393832934-11625-2-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-03-11 11:52:45 +01:00
Daniel Lezcano	907e30f1bb	idle/cpuidle: Split cpuidle_idle_call main function into smaller functions In order to allow better integration between the cpuidle framework and the scheduler, reducing the distance between these two sub-components will facilitate this integration by moving part of the cpuidle code in the idle task file and, because idle.c is in the sched directory, we have access to the scheduler's private structures. This patch splits the cpuidle_idle_call main entry function into 3 calls to a newly added API: 1. select the idle state 2. enter the idle state 3. reflect the idle state The cpuidle_idle_call calls these three functions to implement the main idle entry function. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: rjw@rjwysocki.net Cc: preeti@linux.vnet.ibm.com Link: http://lkml.kernel.org/r/1393832934-11625-1-git-send-email-daniel.lezcano@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-03-11 11:52:45 +01:00
Preeti U Murthy	ba8f20c2eb	cpuidle: Handle clockevents_notify(BROADCAST_ENTER) failure Some archs set the CPUIDLE_FLAG_TIMER_STOP flag for idle states in which the local timers stop. The cpuidle_idle_call() currently handles such idle states by calling into the broadcast framework so as to wakeup CPUs at their next wakeup event. With the hrtimer mode of broadcast, the BROADCAST_ENTER call into the broadcast frameowork can fail for archs that do not have an external clock device to handle wakeups and the CPU in question has thus to be made the stand by CPU. This patch handles such cases by failing the call into cpuidle so that the arch can take some default action. The arch will certainly not enter a similar idle state because a failed cpuidle call will also implicitly indicate that the broadcast framework has not registered this CPU to be woken up. Hence we are safe if we fail the cpuidle call. In the process move the functions that trace idle statistics just before and after the entry and exit into idle states respectively. In other scenarios where the call to cpuidle fails, we end up not tracing idle entry and exit since a decision on an idle state could not be taken. Similarly when the call to broadcast framework fails, we skip tracing idle statistics because we are in no further position to take a decision on an alternative idle state to enter into. Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: deepthi@linux.vnet.ibm.com Cc: paulmck@linux.vnet.ibm.com Cc: fweisbec@gmail.com Cc: paulus@samba.org Cc: srivatsa.bhat@linux.vnet.ibm.com Cc: svaidy@linux.vnet.ibm.com Cc: peterz@infradead.org Cc: benh@kernel.crashing.org Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/20140207080652.17187.66344.stgit@preeti.in.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2014-02-07 15:34:29 +01:00
Konrad Rzeszutek Wilk	813e8e3d6a	cpuidle: Check for dev before deregistering it. If not, we could end up in the unfortunate situation where we dereference a NULL pointer b/c we have cpuidle disabled. This is the case when booting under Xen (which uses the ACPI P/C states but disables the CPU idle driver) - and can be easily reproduced when booting with cpuidle.off=1. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8156db4a>] cpuidle_unregister_device+0x2a/0x90 .. snip.. Call Trace: [<ffffffff813b15b4>] acpi_processor_power_exit+0x3c/0x5c [<ffffffff813af0a9>] acpi_processor_stop+0x61/0xb6 [<ffffffff814215bf>] __device_release_driver+0fffff81421653>] device_release_driver+0x23/0x30 [<ffffffff81420ed8>] bus_remove_device+0x108/0x180 [<ffffffff8141d9d9>] device_del+0x129/0x1c0 [<ffffffff813cb4b0>] ? unregister_xenbus_watch+0x1f0/0x1f0 [<ffffffff8141da8e>] device_unregister+0x1e/0x60 [<ffffffff814243e9>] unregister_cpu+0x39/0x60 [<ffffffff81019e03>] arch_unregister_cpu+0x23/0x30 [<ffffffff813c3c51>] handle_vcpu_hotplug_event+0xc1/0xe0 [<ffffffff813cb4f5>] xenwatch_thread+0x45/0x120 [<ffffffff810af010>] ? abort_exclusive_wait+0xb0/0xb0 [<ffffffff8108ec42>] kthread+0xd2/0xf0 [<ffffffff8108eb70>] ? kthread_create_on_node+0x180/0x180 [<ffffffff816ce17c>] ret_from_fork+0x7c/0xb0 [<ffffffff8108eb70>] ? kthread_create_on_node+0x180/0x180 This problem also appears in 3.12 and could be a candidate for backport. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-12-03 22:05:22 +01:00
Viresh Kumar	d7c7f10326	cpuidle: don't call poll_idle_init() for every cpu poll_idle_init() just initializes drv->states[0] and so that is required to be done only once for each driver. Currently, it is called from cpuidle_enable_device() which is called for every CPU that the driver supports. That is not required, so move it to a better place and call it from __cpuidle_register_driver() so that the initialization is carried out only once. Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-30 01:21:23 +01:00
Viresh Kumar	6d281e97a1	cpuidle: replace multiline statements with single line in cpuidle_idle_call() Few statements in cpuidle_idle_call() are broken into multiple lines, although that isn't really necessary. Convert those to single line. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-30 01:21:23 +01:00
Viresh Kumar	fb11c9c63f	cpuidle: reduce code duplication inside cpuidle_idle_call() We are doing this twice in cpuidle_idle_call() routine: drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP Would be better if we actually store this in a local variable and use that. That reduces code duplication and likely makes this piece of code run faster (in case the compiler wasn't able to optimize it earlier) [rjw: Cast the result of bitwise AND to bool explicitly using !!] Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-30 01:21:22 +01:00
Viresh Kumar	9b29a86f04	cpuidle: merge two if() statements for checking error cases Two checks cpuidle_idle_call() cause the same error code to be returned if they fail, so merge them for clarity. [rjw: Changelog] Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-30 01:21:22 +01:00
Viresh Kumar	47182668ca	cpuidle: rearrange __cpuidle_register_device() to keep minimal exit points This patch rearranges __cpuidle_register_device() a bit in order to reduce the number of exit points in that function. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-30 01:21:22 +01:00
Viresh Kumar	267d4bf8ee	cpuidle: make __cpuidle_device_init() return void The only value returned by __cpuidle_device_init() is 0, so it very well may be a void function. Make that happen. [rjw: Changelog] Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-30 01:21:21 +01:00
Viresh Kumar	caf4a36e81	cpuidle: Fix comments in cpuidle core Some comments in cpuidle core files contain trivial mistakes. This patch fixes them. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-10-30 01:21:21 +01:00
Daniel Lezcano	c878a52d3c	cpuidle: Check if device is already registered Make __cpuidle_register_device() check whether or not the device has been registered already and return -EBUSY immediately if that's the case. [rjw: Changelog] Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-07-15 02:09:48 +02:00
Daniel Lezcano	5df0aa7341	cpuidle: Introduce __cpuidle_device_init() Add __cpuidle_device_init() for initializing the cpuidle_device structure. [rjw: Changelog] Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-07-15 02:09:47 +02:00
Daniel Lezcano	f6bb51a53a	cpuidle: Introduce __cpuidle_unregister_device() To reduce code duplication related to the unregistration of cpuidle devices, introduce __cpuidle_unregister_device() and move all of the unregistration code to that function. [rjw: Changelog] Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-07-15 02:09:47 +02:00
Daniel Lezcano	10b9d3f8a4	cpuidle: Check cpuidle_enable_device() return value We previously changed the ordering of the cpuidle framework initialization so that the governors are registered before the drivers which can register their devices right from the start. Now, we can safely remove the __cpuidle_register_device() call hack in cpuidle_enable_device() and check if the driver has been registered before enabling it. Then, cpuidle_register_device() can consistently check the cpuidle_enable_device() return value when enabling the device. [rjw: Changelog] Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-07-15 02:09:47 +02:00
Daniel Lezcano	82467a5a88	cpuidle: simplify multiple driver support Commit `bf4d1b5` (cpuidle: support multiple drivers) introduced support for using multiple cpuidle drivers at the same time. It added a couple of new APIs to register the driver per CPU, but that led to some unnecessary code complexity related to the kernel config options deciding whether or not the multiple driver support is enabled. The code has to work as it did before when the multiple driver support is not enabled and the multiple driver support has to be compatible with the previously existing API. Remove the new API, not used by any driver in the tree yet (but needed for the HMP cpuidle drivers that will be submitted soon), and add a new cpumask pointer to the cpuidle driver structure that will point to the mask of CPUs handled by the given driver. That will allow the cpuidle_[un]register_driver() API to be used for the multiple driver support along with the cpuidle_[un]register() functions added recently. [rjw: Changelog] Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-06-11 13:42:38 +02:00
Daniel Lezcano	1c192d047a	cpuidle: fix comment format Fix comment format for the kernel doc script. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-24 00:54:51 +02:00
Daniel Lezcano	4c637b2175	cpuidle: make a single register function for all The usual scheme to initialize a cpuidle driver on a SMP is: cpuidle_register_driver(drv); for_each_possible_cpu(cpu) { device = &per_cpu(cpuidle_dev, cpu); cpuidle_register_device(device); } This code is duplicated in each cpuidle driver. On UP systems, it is done this way: cpuidle_register_driver(drv); device = &per_cpu(cpuidle_dev, cpu); cpuidle_register_device(device); On UP, the macro 'for_each_cpu' does one iteration: #define for_each_cpu(cpu, mask) \ for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask) Hence, the initialization loop is the same for UP than SMP. Beside, we saw different bugs / mis-initialization / return code unchecked in the different drivers, the code is duplicated including bugs. After fixing all these ones, it appears the initialization pattern is the same for everyone. Please note, some drivers are doing dev->state_count = drv->state_count. This is not necessary because it is done by the cpuidle_enable_device function in the cpuidle framework. This is true, until you have the same states for all your devices. Otherwise, the 'low level' API should be used instead with the specific initialization for the driver. Let's add a wrapper function doing this initialization with a cpumask parameter for the coupled idle states and use it for all the drivers. That will save a lot of LOC, consolidate the code, and the modifications in the future could be done in a single place. Another benefit is the consolidation of the cpuidle_device variable which is now in the cpuidle framework and no longer spread accross the different arch specific drivers. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-23 13:45:22 +02:00
Daniel Lezcano	554c06ba3e	cpuidle: remove en_core_tk_irqen flag The en_core_tk_irqen flag is set in all the cpuidle driver which means it is not necessary to specify this flag. Remove the flag and the code related to it. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Kevin Hilman <khilman@linaro.org> # for mach-omap2/* Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-23 13:45:22 +02:00
Daniel Lezcano	b60e6a0eb0	cpuidle : handle clockevent notify from the cpuidle framework When a cpu enters a deep idle state, the local timers are stopped and the time framework falls back to the timer device used as a broadcast timer. The different cpuidle drivers are calling clockevents_notify ENTER/EXIT when the idle state stops the local timer. Add a new flag CPUIDLE_FLAG_TIMER_STOP which can be set by the cpuidle drivers. If the flag is set, the cpuidle core code takes care of the notification on behalf of the driver to avoid pointless code duplication. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-04-01 01:10:27 +02:00
Paul Gortmaker	43720bd601	PM / tracing: remove deprecated power trace API The text in Documentation said it would be removed in 2.6.41; the text in the Kconfig said removal in the 3.1 release. Either way you look at it, we are well past both, so push it off a cliff. Note that the POWER_CSTATE and the POWER_PSTATE are part of the legacy tracing API. Remove all tracepoints which use these flags. As can be seen from context, most already have a trace entry via trace_cpu_idle anyways. Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as compared to the CSTATE ones which all have a clear start/stop. As part of this, the trace_power_frequency also becomes orphaned, so it too is deleted. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-01-26 00:39:12 +01:00
Daniel Lezcano	8aef33a7cf	cpuidle: remove the power_specified field in the driver We realized that the power usage field is never filled and when it is filled for tegra, the power_specified flag is not set causing all of these values to be reset when the driver is initialized with set_power_state(). However, the power_specified flag can be simply removed under the assumption that the states are always backward sorted, which is the case with the current code. This change allows the menu governor select function and the cpuidle_play_dead() to be simplified. Moreover, the set_power_states() function can removed as it does not make sense any more. Drop the power_specified flag from struct cpuidle_driver and make the related changes as described above. As a consequence, this also fixes the bug where on the dynamic C-states system, the power fields are not initialized. [rjw: Changelog] References: https://bugzilla.kernel.org/show_bug.cgi?id=42870 References: https://bugzilla.kernel.org/show_bug.cgi?id=43349 References: https://lkml.org/lkml/2012/10/16/518 Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-01-15 14:18:04 +01:00
Sivaram Nair	0e5537b30d	cpuidle: Fix finding state with min power_usage Since cpuidle_state.power_usage is a signed value, use INT_MAX (instead of -1) to init the local copies so that functions that tries to find cpuidle states with minimum power usage works correctly even if they use non-negative values. Signed-off-by: Sivaram Nair <sivaramn@nvidia.com> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-01-03 13:11:05 +01:00
Julius Werner	a474a51549	cpuidle: Measure idle state durations with monotonic clock Many cpuidle drivers measure their time spent in an idle state by reading the wallclock time before and after idling and calculating the difference. This leads to erroneous results when the wallclock time gets updated by another processor in the meantime, adding that clock adjustment to the idle state's time counter. If the clock adjustment was negative, the result is even worse due to an erroneous cast from int to unsigned long long of the last_residency variable. The negative 32 bit integer will zero-extend and result in a forward time jump of roughly four billion milliseconds or 1.3 hours on the idle state residency counter. This patch changes all affected cpuidle drivers to either use the monotonic clock for their measurements or make use of the generic time measurement wrapper in cpuidle.c, which was already working correctly. Some superfluous CLIs/STIs in the ACPI code are removed (interrupts should always already be disabled before entering the idle function, and not get reenabled until the generic wrapper has performed its second measurement). It also removes the erroneous cast, making sure that negative residency values are applied correctly even though they should not appear anymore. Signed-off-by: Julius Werner <jwerner@chromium.org> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-27 14:17:58 +01:00
Daniel Lezcano	bf4d1b5ddb	cpuidle: support multiple drivers With the tegra3 and the big.LITTLE [1] new architectures, several cpus with different characteristics (latencies and states) can co-exists on the system. The cpuidle framework has the limitation of handling only identical cpus. This patch removes this limitation by introducing the multiple driver support for cpuidle. This option is configurable at compile time and should be enabled for the architectures mentioned above. So there is no impact for the other platforms if the option is disabled. The option defaults to 'n'. Note the multiple drivers support is also compatible with the existing drivers, even if just one driver is needed, all the cpu will be tied to this driver using an extra small chunk of processor memory. The multiple driver support use a per-cpu driver pointer instead of a global variable and the accessor to this variable are done from a cpu context. In order to keep the compatibility with the existing drivers, the function 'cpuidle_register_driver' and 'cpuidle_unregister_driver' will register the specified driver for all the cpus. The semantic for the output of /sys/devices/system/cpu/cpuidle/current_driver remains the same except the driver name will be related to the current cpu. The /sys/devices/system/cpu/cpu[0-9]/cpuidle/driver/name files are added allowing to read the per cpu driver name. [1] http://lwn.net/Articles/481055/ Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:23 +01:00
Youquan Song	d73d68dc49	cpuidle: Set residency to 0 if target Cstate not enter When cpuidle governor choose a C-state to enter for idle CPU, but it notice that there is tasks request to be executed. So the idle CPU will not really enter the target C-state and go to run task. In this situation, it will use the residency of previous really entered target C-states. Obviously, it is not reasonable. So, this patch fix it by set the target C-state residency to 0. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:20 +01:00
Daniel Lezcano	e45a00d679	cpuidle / sysfs: move kobj initialization in the syfs file Move the kobj initialization and completion in the sysfs.c and encapsulate the code more. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:19 +01:00
Daniel Lezcano	1aef40e288	cpuidle / sysfs: change function parameter The function needs the cpuidle_device which is initially passed to the caller. The current code gets the struct device from the struct cpuidle_device, pass it the cpuidle_add_sysfs function. This function calls per_cpu(cpuidle_devices, cpu) to get the cpuidle_device. This patch pass the cpuidle_device instead and simplify the code. Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2012-11-15 00:34:19 +01:00
Srivatsa S. Bhat	cf31cd1a0c	ACPI idle, CPU hotplug: Fix NULL pointer dereference during hotplug On a KVM guest, when a CPU is taken offline and brought back online, we hit the following NULL pointer dereference: [ 45.400843] Unregister pv shared memory for cpu 1 [ 45.412331] smpboot: CPU 1 is now offline [ 45.529894] SMP alternatives: lockdep: fixing up alternatives [ 45.533472] smpboot: Booting Node 0 Processor 1 APIC 0x1 [ 45.411526] kvm-clock: cpu 1, msr 0:7d14601, secondary cpu clock [ 45.571370] KVM setup async PF for cpu 1 [ 45.572331] kvm-stealtime: cpu 1, msr 7d0e040 [ 45.575031] BUG: unable to handle kernel NULL pointer dereference at (null) [ 45.576017] IP: [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80 [ 45.576017] PGD 5dfb067 PUD 5da8067 PMD 0 [ 45.576017] Oops: 0000 [#1] SMP [ 45.576017] Modules linked in: [ 45.576017] CPU 0 [ 45.576017] Pid: 607, comm: stress_cpu_hotp Not tainted 3.6.0-padata-tp-debug #3 Bochs Bochs [ 45.576017] RIP: 0010:[<ffffffff81519f98>] [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80 [ 45.576017] RSP: 0018:ffff880005d93ce8 EFLAGS: 00010286 [ 45.576017] RAX: ffff880005d93fd8 RBX: 0000000000000000 RCX: 0000000000000006 [ 45.576017] RDX: 0000000000000006 RSI: 2222222222222222 RDI: 0000000000000000 [ 45.576017] RBP: ffff880005d93cf8 R08: 2222222222222222 R09: 2222222222222222 [ 45.576017] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 45.576017] R13: 0000000000000000 R14: ffffffff81c8cca0 R15: 0000000000000001 [ 45.576017] FS: 00007f91936ae700(0000) GS:ffff880007c00000(0000) knlGS:0000000000000000 [ 45.576017] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 45.576017] CR2: 0000000000000000 CR3: 0000000005db3000 CR4: 00000000000006f0 [ 45.576017] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 45.576017] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 45.576017] Process stress_cpu_hotp (pid: 607, threadinfo ffff880005d92000, task ffff8800066bbf40) [ 45.576017] Stack: [ 45.576017] ffff880007a96400 0000000000000000 ffff880005d93d28 ffffffff813ac689 [ 45.576017] ffff880007a96400 ffff880007a96400 0000000000000002 ffffffff81cd8d01 [ 45.576017] ffff880005d93d58 ffffffff813aa498 0000000000000001 00000000ffffffdd [ 45.576017] Call Trace: [ 45.576017] [<ffffffff813ac689>] acpi_processor_hotplug+0x55/0x97 [ 45.576017] [<ffffffff813aa498>] acpi_cpu_soft_notify+0x93/0xce [ 45.576017] [<ffffffff816ae47d>] notifier_call_chain+0x5d/0x110 [ 45.576017] [<ffffffff8109730e>] __raw_notifier_call_chain+0xe/0x10 [ 45.576017] [<ffffffff81069050>] __cpu_notify+0x20/0x40 [ 45.576017] [<ffffffff81069085>] cpu_notify+0x15/0x20 [ 45.576017] [<ffffffff816978f1>] _cpu_up+0xee/0x137 [ 45.576017] [<ffffffff81697983>] cpu_up+0x49/0x59 [ 45.576017] [<ffffffff8168758d>] store_online+0x9d/0xe0 [ 45.576017] [<ffffffff8140a9f8>] dev_attr_store+0x18/0x30 [ 45.576017] [<ffffffff812322c0>] sysfs_write_file+0xe0/0x150 [ 45.576017] [<ffffffff811b389c>] vfs_write+0xac/0x180 [ 45.576017] [<ffffffff811b3be2>] sys_write+0x52/0xa0 [ 45.576017] [<ffffffff816b31e9>] system_call_fastpath+0x16/0x1b [ 45.576017] Code: 48 c7 c7 40 e5 ca 81 e8 07 d0 18 00 5d c3 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 10 48 89 5d f0 4c 89 65 f8 48 89 fb <f6> 07 02 75 13 48 8b 5d f0 4c 8b 65 f8 c9 c3 66 0f 1f 84 00 00 [ 45.576017] RIP [<ffffffff81519f98>] cpuidle_disable_device+0x18/0x80 [ 45.576017] RSP <ffff880005d93ce8> [ 45.576017] CR2: 0000000000000000 [ 45.656079] ---[ end trace 433d6c9ac0b02cef ]--- Analysis: Commit `3d339dc` (cpuidle / ACPI : move cpuidle_device field out of the acpi_processor_power structure()) made the allocation of the dev structure (struct cpuidle) of a CPU dynamic, whereas previously it was statically allocated. And this dynamic allocation occurs in acpi_processor_power_init() if pr->flags.power evaluates to non-zero. On KVM guests, pr->flags.power evaluates to zero, hence dev is never allocated. This causes the NULL pointer (dev) dereference in cpuidle_disable_device() during a subsequent CPU online operation. Fix this by ensuring that dev is non-NULL before dereferencing. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Len Brown <len.brown@intel.com>	2012-10-08 22:52:54 -04:00
Linus Torvalds	476525004a	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull ACPI & power management update from Len Brown: "Re-write of the turbostat tool. lower overhead was necessary for measuring very large system when they are very idle. IVB support in intel_idle It's what I run on my IVB, others should be able to also:-) ACPICA core update We have found some bugs due to divergence between Linux and the upstream ACPICA base. Most of these patches are to reduce that divergence to reduce the risk of future bugs. Some cpuidle updates, mostly for non-Intel More will be coming, as they depend on this part. Some thermal management changes needed by non-ACPI systems. Some _OST (OS Status Indication) updates for hot ACPI hot-plug." * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (51 commits) Thermal: Documentation update Thermal: Add Hysteresis attributes Thermal: Make Thermal trip points writeable ACPI/AC: prevent OOPS on some boxes due to missing check power_supply_register() return value check tools/power: turbostat: fix large c1% issue tools/power: turbostat v2 - re-write for efficiency ACPICA: Update to version 20120711 ACPICA: AcpiSrc: Fix some translation issues for Linux conversion ACPICA: Update header files copyrights to 2012 ACPICA: Add new ACPI table load/unload external interfaces ACPICA: Split file: tbxface.c -> tbxfload.c ACPICA: Add PCC address space to space ID decode function ACPICA: Fix some comment fields ACPICA: Table manager: deploy new firmware error/warning interfaces ACPICA: Add new interfaces for BIOS(firmware) errors and warnings ACPICA: Split exception code utilities to a new file, utexcep.c ACPI: acpi_pad: tune round_robin_time ACPICA: Update to version 20120620 ACPICA: Add support for implicit notify on multiple devices ACPICA: Update comments; no functional change ...	2012-07-26 14:28:55 -07:00
Len Brown	ec033d0a02	Merge branches 'acpi_pad', 'acpica', 'apei-bugzilla-43282', 'battery', 'cpuidle-coupled', 'cpuidle-tweaks', 'intel_idle-ivb', 'ost', 'red-hat-bz-772730', 'thermal', 'thermal-spear' and 'turbostat-v2' into release	2012-07-26 00:03:58 -04:00
Rafael J. Wysocki	7791bd230c	Merge branch 'pm-domains' * pm-domains: PM / Domains: Fix build warning for CONFIG_PM_RUNTIME unset PM / Domains: Replace plain integer with NULL pointer in domain.c file PM / Domains: Add missing static storage class specifier in domain.c file PM / Domains: Allow device callbacks to be added at any time PM / Domains: Add device domain data reference counter PM / Domains: Add preliminary support for cpuidle, v2 PM / Domains: Do not stop devices after restoring their states PM / Domains: Use subsystem runtime suspend/resume callbacks by default	2012-07-19 00:03:17 +02:00
Preeti U Murthy	8651f97bd9	PM / cpuidle: System resume hang fix with cpuidle On certain bios, resume hangs if cpus are allowed to enter idle states during suspend [1]. This was fixed in apci idle driver [2].But intel_idle driver does not have this fix. Thus instead of replicating the fix in both the idle drivers, or in more platform specific idle drivers if needed, the more general cpuidle infrastructure could handle this. A suspend callback in cpuidle_driver could handle this fix. But a cpuidle_driver provides only basic functionalities like platform idle state detection capability and mechanisms to support entry and exit into CPU idle states. All other cpuidle functions are found in the cpuidle generic infrastructure for good reason that all cpuidle drivers, irrepective of their platforms will support these functions. One option therefore would be to register a suspend callback in cpuidle which handles this fix. This could be called through a PM_SUSPEND_PREPARE notifier. But this is too generic a notfier for a driver to handle. Also, ideally the job of cpuidle is not to handle side effects of suspend. It should expose the interfaces which "handle cpuidle 'during' suspend" or any other operation, which the subsystems call during that respective operation. The fix demands that during suspend, no cpus should be allowed to enter deep C-states. The interface cpuidle_uninstall_idle_handler() in cpuidle ensures that. Not just that it also kicks all the cpus which are already in idle out of their idle states which was being done during cpu hotplug through a CPU_DYING_FROZEN callbacks. Now the question arises about when during suspend should cpuidle_uninstall_idle_handler() be called. Since we are dealing with drivers it seems best to call this function during dpm_suspend(). Delaying the call till dpm_suspend_noirq() does no harm, as long as it is before cpu_hotplug_begin() to avoid race conditions with cpu hotpulg operations. In dpm_suspend_noirq(), it would be wise to place this call before suspend_device_irqs() to avoid ugly interactions with the same. Ananlogously, during resume. References: [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/674075. [2] http://marc.info/?l=linux-pm&m=133958534231884&w=2 Reported-and-tested-by: Dave Hansen <dave@linux.vnet.ibm.com> Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>	2012-07-10 21:34:49 +02:00
Rafael J. Wysocki	cbc9ef0287	PM / Domains: Add preliminary support for cpuidle, v2 On some systems there are CPU cores located in the same power domains as I/O devices. Then, power can only be removed from the domain if all I/O devices in it are not in use and the CPU core is idle. Add preliminary support for that to the generic PM domains framework. First, the platform is expected to provide a cpuidle driver with one extra state designated for use with the generic PM domains code. This state should be initially disabled and its exit_latency value should be set to whatever time is needed to bring up the CPU core itself after restoring power to it, not including the domain's power on latency. Its .enter() callback should point to a procedure that will remove power from the domain containing the CPU core at the end of the CPU power transition. The remaining characteristics of the extra cpuidle state, referred to as the "domain" cpuidle state below, (e.g. power usage, target residency) should be populated in accordance with the properties of the hardware. Next, the platform should execute genpd_attach_cpuidle() on the PM domain containing the CPU core. That will cause the generic PM domains framework to treat that domain in a special way such that: * When all devices in the domain have been suspended and it is about to be turned off, the states of the devices will be saved, but power will not be removed from the domain. Instead, the "domain" cpuidle state will be enabled so that power can be removed from the domain when the CPU core is idle and the state has been chosen as the target by the cpuidle governor. * When the first I/O device in the domain is resumed and __pm_genpd_poweron(() is called for the first time after power has been removed from the domain, the "domain" cpuidle state will be disabled to avoid subsequent surprise power removals via cpuidle. The effective exit_latency value of the "domain" cpuidle state depends on the time needed to bring up the CPU core itself after restoring power to it as well as on the power on latency of the domain containing the CPU core. Thus the "domain" cpuidle state's exit_latency has to be recomputed every time the domain's power on latency is updated, which may happen every time power is restored to the domain, if the measured power on latency is greater than the latency stored in the corresponding generic_pm_domain structure. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Kevin Hilman <khilman@ti.com>	2012-07-03 19:07:42 +02:00

1 2 3

109 Коммитов