Commit a938da06 introduced a useful little log message to tell
users/debuggers which wakeup source aborted a suspend. However,
this message is only printed if the abort happens during the
in-kernel suspend path (after writing /sys/power/state).
The full specification of the /sys/power/wakeup_count facility
allows user-space power managers to double-check if wakeups have
already happened before it actually tries to suspend (e.g. while it
was running user-space pre-suspend hooks), by writing the last known
wakeup_count value to /sys/power/wakeup_count. This patch changes
the sysfs handler for that node to also print said log message if
that write fails, so that we can figure out the offending wakeup
source for both kinds of suspend aborts.
Signed-off-by: Julius Werner <jwerner@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This adds in a new message to the wakeup code which adds an
indication to the log that suspend was cancelled due to a wake event
occouring during the suspend sequence. It also adjusts the message
printed in suspend.c to reflect the potential that a suspend was
aborted, as opposed to a device failing to suspend.
Without these message adjustments one can end up with a kernel log
that says that a device failed to suspend with no actual device
suspend failures, which can be confusing to the log examiner.
Signed-off-by: Bernie Thompson <bhthompson@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
PM_SUSPEND_FREEZE state is a general state that
does not need any platform specific support, it equals
frozen processes + suspended devices + idle processors.
Compared with PM_SUSPEND_MEMORY,
PM_SUSPEND_FREEZE saves less power
because the system is still in a running state.
PM_SUSPEND_FREEZE has less resume latency because it does not
touch BIOS, and the processors are in idle state.
Compared with RTPM/idle,
PM_SUSPEND_FREEZE saves more power as
1. the processor has longer sleep time because processes are frozen.
The deeper c-state the processor supports, more power saving we can get.
2. PM_SUSPEND_FREEZE uses system suspend code path, thus we can get
more power saving from the devices that does not have good RTPM support.
This state is useful for
1) platforms that do not have STR, or have a broken STR.
2) platforms that have an extremely low power idle state,
which can be used to replace STR.
The following describes how PM_SUSPEND_FREEZE state works.
1. echo freeze > /sys/power/state
2. the processes are frozen.
3. all the devices are suspended.
4. all the processors are blocked by a wait queue
5. all the processors idles and enters (Deep) c-state.
6. an interrupt fires.
7. a processor is woken up and handles the irq.
8. if it is a general event,
a) the irq handler runs and quites.
b) goto step 4.
9. if it is a real wake event, say, power button pressing, keyboard touch, mouse moving,
a) the irq handler runs and activate the wakeup source
b) wakeup_source_activate() notifies the wait queue.
c) system starts resuming from PM_SUSPEND_FREEZE
10. all the devices are resumed.
11. all the processes are unfrozen.
12. system is back to working.
Known Issue:
The wakeup of this new PM_SUSPEND_FREEZE state may behave differently
from the previous suspend state.
Take ACPI platform for example, there are some GPEs that only enabled
when the system is in sleep state, to wake the system backk from S3/S4.
But we are not touching these GPEs during transition to PM_SUSPEND_FREEZE.
This means we may lose some wake event.
But on the other hand, as we do not disable all the Interrupts during
PM_SUSPEND_FREEZE, we may get some extra "wakeup" Interrupts, that are
not available for S3/S4.
The patches has been tested on an old Sony laptop, and here are the results:
Average Power:
1. RPTM/idle for half an hour:
14.8W, 12.6W, 14.1W, 12.5W, 14.4W, 13.2W, 12.9W
2. Freeze for half an hour:
11W, 10.4W, 9.4W, 11.3W 10.5W
3. RTPM/idle for three hours:
11.6W
4. Freeze for three hours:
10W
5. Suspend to Memory:
0.5~0.9W
Average Resume Latency:
1. RTPM/idle with a black screen: (From pressing keyboard to screen back)
Less than 0.2s
2. Freeze: (From pressing power button to screen back)
2.50s
3. Suspend to Memory: (From pressing power button to screen back)
4.33s
>From the results, we can see that all the platforms should benefit from
this patch, even if it does not have Low Power S0.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Jon Medhurst (Tixy) recently noticed a problem with the
events_lock usage. One of the Android patches that uses
wakeup_sources calls wakeup_source_add() with irqs disabled.
However, the event_lock usage in wakeup_source_add() uses
spin_lock_irq()/spin_unlock_irq(), which reenables interrupts.
This results in lockdep warnings.
The fix is to use spin_lock_irqsave()/spin_lock_irqrestore()
instead for the events_lock.
References: https://bugs.launchpad.net/linaro-landing-team-arm/+bug/1037565
Reported-and-debugged-by: Jon Medhurst (Tixy) <tixy@linaro.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
A driver or app may repeatedly request a wakeup source while the system
is attempting to enter suspend, which may indicate a bug or at least
point out a highly active system component that is responsible for
decreased battery life on a mobile device. Even when the incidence
of suspend abort is not severe, identifying wakeup sources that
frequently abort suspend can be a useful clue for power management
analysis.
In some cases the existing stats can point out the offender where there is
an unexpectedly high activation count that stands out from the others, but
in other cases the wakeup source frequently taken just after the rest of
the system thinks its time to suspend might not stand out in the overall
stats.
It is also often useful to have information about what's been happening
recently, rather than totals of all activity for the system boot.
It's suggested to dump a line about which wakeup source
aborted suspend to aid analysis of these situations.
Signed-off-by: Todd Poynor <toddpoynor@google.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Android allows user space to manipulate wakelocks using two
sysfs file located in /sys/power/, wake_lock and wake_unlock.
Writing a wakelock name and optionally a timeout to the wake_lock
file causes the wakelock whose name was written to be acquired (it
is created before is necessary), optionally with the given timeout.
Writing the name of a wakelock to wake_unlock causes that wakelock
to be released.
Implement an analogous interface for user space using wakeup sources.
Add the /sys/power/wake_lock and /sys/power/wake_unlock files
allowing user space to create, activate and deactivate wakeup
sources, such that writing a name and optionally a timeout to
wake_lock causes the wakeup source of that name to be activated,
optionally with the given timeout. If that wakeup source doesn't
exist, it will be created and then activated. Writing a name to
wake_unlock causes the wakeup source of that name, if there is one,
to be deactivated. Wakeup sources created with the help of
wake_lock that haven't been used for more than 5 minutes are garbage
collected and destroyed. Moreover, there can be only WL_NUMBER_LIMIT
wakeup sources created with the help of wake_lock present at a time.
The data type used to track wakeup sources created by user space is
called "struct wakelock" to indicate the origins of this feature.
This version of the patch includes an rbtree manipulation fix from John Stultz.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Android uses one wakelock statistics that is only necessary for
opportunistic sleep. Namely, the prevent_suspend_time field
accumulates the total time the given wakelock has been locked
while "automatic suspend" was enabled. Add an analogous field,
prevent_sleep_time, to wakeup sources and make it behave in a similar
way.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Introduce a mechanism by which the kernel can trigger global
transitions to a sleep state chosen by user space if there are no
active wakeup sources.
It consists of a new sysfs attribute, /sys/power/autosleep, that
can be written one of the strings returned by reads from
/sys/power/state, an ordered workqueue and a work item carrying out
the "suspend" operations. If a string representing the system's
sleep state is written to /sys/power/autosleep, the work item
triggering transitions to that state is queued up and it requeues
itself after every execution until user space writes "off" to
/sys/power/autosleep.
That work item enables the detection of wakeup events using the
functions already defined in drivers/base/power/wakeup.c (with one
small modification) and calls either pm_suspend(), or hibernate() to
put the system into a sleep state. If a wakeup event is reported
while the transition is in progress, it will abort the transition and
the "system suspend" work item will be queued up again.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Add tracepoints to wakeup_source_activate and wakeup_source_deactivate.
Useful for checking that specific wakeup sources overlap as expected.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Wakeup statistics used by Android are slightly different from what we
have in wakeup sources at the moment and there aren't any known
users of those statistics other than Android, so modify them to make
it easier for Android to switch to wakeup sources.
This removes the struct wakeup_source's hit_cout field, which is very
rough and therefore not very useful, and adds two new fields,
wakeup_count and expire_count. The first one tracks how many times
the wakeup source is activated with events_check_enabled set (which
roughly corresponds to the situations when a system power transition
to a sleep state is in progress and would be aborted by this wakeup
source if it were the only active one at that time) and the second
one is the number of times the wakeup source has been activated with
a timeout that expired.
Additionally, the last_time field is now updated when the wakeup
source is deactivated too (previously it was only updated during
the wakeup source's activation), which seems to be what Android does
with the analogous counter for wakelocks.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The current wakeup source deactivation code doesn't do anything when
the counter of wakeup events in progress goes down to zero, which
requires pm_get_wakeup_count() to poll that counter periodically.
Although this reduces the average time it takes to deactivate a
wakeup source, it also may lead to a substantial amount of unnecessary
polling if there are extended periods of wakeup activity. Thus it
seems reasonable to use a wait queue for signaling the "no wakeup
events in progress" condition and remove the polling.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: mark gross <markgross@thegnar.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The existing wakeup source initialization routines are not
particularly useful for wakeup sources that aren't created by
wakeup_source_create(), because their users have to open code
filling the objects with zeros and setting their names. For this
reason, introduce routines that can be used for initializing, for
example, static wakeup source objects.
Requested-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
If __pm_stay_awake() is called after __pm_wakeup_event() for the same
wakep source object before its timer expires, it won't cancel the
timer, so the wakeup source will be deactivated from the timer
function as scheduled by __pm_wakeup_event(). In that case
__pm_stay_awake() doesn't have any effect beyond incrementing
the wakeup source's event_count field, although it should cancel
the timer and make the wakeup source stay active until __pm_relax()
is called for it.
To fix this problem make __pm_stay_awake() delete the wakeup source's
timer and ensure that it won't be deactivated from the timer funtion
afterwards by clearing its timer_expires field.
Reported-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
If __pm_wakeup_event() has been used (with a nonzero timeout) to
report a wakeup event and then __pm_relax() immediately followed by
__pm_stay_awake() is called or __pm_wakeup_event() is called once
again for the same wakeup source object before its timer expires, the
timer function pm_wakeup_timer_fn() may still be run as a result of
the previous __pm_wakeup_event() call. In either of those cases it
may mistakenly deactivate the wakeup source that has just been
activated.
To prevent that from happening, make wakeup_source_deactivate()
clear the wakeup source's timer_expires field and make
pm_wakeup_timer_fn() check if timer_expires is different from zero
and if it's not in future before calling wakeup_source_deactivate()
(if timer_expires is 0, it means that the timer has just been
deleted and if timer_expires is in future, it means that the timer
has just been rescheduled to a different time).
Reported-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
If wakeup_source_destroy() is called for an active wakeup source that
is never deactivated, it will spin forever. To prevent that from
happening, make wakeup_source_destroy() call __pm_relax() for the
wakeup source object it is about to free instead of waiting until
it will be deactivated by someone else. However, for this to work
it also needs to make sure that the timer function will not be
executed after the final __pm_relax(), so make it run
del_timer_sync() on the wakeup source's timer beforehand.
Additionally, update the kerneldoc comment to document the
requirement that __pm_stay_awake() and __pm_wakeup_event() must not
be run in parallel with wakeup_source_destroy().
Reported-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Initialize wakeup source locks in wakeup_source_add() instead of
wakeup_source_create(), because otherwise the locks of the wakeup
sources that haven't been allocated with wakeup_source_create()
aren't initialized and handled properly.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Most of these files were implicitly getting EXPORT_SYMBOL via
device.h which was including module.h, but that path will be broken
soon.
[ with input from Stephen Rothwell <sfr@canb.auug.org.au> ]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
This patch (as1485) documents a change to the kernel's default wakeup
policy. Devices that forward wakeup requests between buses should be
enabled for wakeup by default.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
wakeup_source_add() adds an item into wakeup_sources list.
There is no need to call synchronize_rcu() at this point.
Its only needed in wakeup_source_remove()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
It turns out that some PCI devices are only found to be
wakeup-capable during registration, in which case, when
device_set_wakeup_capable() is called, device_is_registered() already
returns 'true' for the given device, but dpm_sysfs_add() hasn't been
called for it yet. This leads to situations in which the device's
power.can_wakeup flag is not set as requested because of failing
wakeup_sysfs_add() and its wakeup-related sysfs files are not
created, although they should be present. This is a post-2.6.38
regression introduced by commit cb8f51bdad
(PM: Do not create wakeup sysfs files for devices that cannot wake
up).
To work around this problem initialize the device's power.entry
field to an empty list head and make device_set_wakeup_capable()
check if it is still empty before attempting to add the devices
wakeup-related sysfs files with wakeup_sysfs_add(). Namely, if
power.entry is still empty at this point, device_pm_add() hasn't been
called yet for the device and its wakeup-related files will be
created later, so device_set_wakeup_capable() doesn't have to create
them.
Reported-and-tested-by: Tino Keitel <tino.keitel@tikei.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Currently, wakeup sysfs attributes are created for all devices,
regardless of whether or not they are wakeup-capable. This is
excessive and complicates wakeup device identification from user
space (i.e. to identify wakeup-capable devices user space has to read
/sys/devices/.../power/wakeup for all devices and see if they are not
empty).
Fix this issue by avoiding to create wakeup sysfs files for devices
that cannot wake up the system from sleep states (i.e. whose
power.can_wakeup flags are unset during registration) and modify
device_set_wakeup_capable() so that it adds (or removes) the relevant
sysfs attributes if a device's wakeup capability status is changed.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Since pm_save_wakeup_count() has just been changed to clear
events_check_enabled unconditionally before checking if there are
any new wakeup events registered since the last read from
/sys/power/wakeup_count, the detection of wakeup events during
suspend may be disabled, after it's been enabled, by writing a
"wrong" value back to /sys/power/wakeup_count. For this reason,
it is not necessary to update events_check_enabled in
pm_get_wakeup_count() any more.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
According to Documentation/ABI/testing/sysfs-power, the
/sys/power/wakeup_count interface should only make the kernel react
to wakeup events during suspend if the last write to it has been
successful. However, if /sys/power/wakeup_count is written to two
times in a row, where the first write is successful and the second
is not, the kernel will still react to wakeup events during suspend
due to a bug in pm_save_wakeup_count().
Fix the bug by making pm_save_wakeup_count() clear
events_check_enabled unconditionally before checking if there are
any new wakeup events registered since the previous read from
/sys/power/wakeup_count.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The memory barrier in wakeup_source_deactivate() is supposed to
prevent the callers of pm_wakeup_pending() and pm_get_wakeup_count()
from seeing the new value of events_in_progress (0, in particular)
and the old value of event_count at the same time. However, if
wakeup_source_deactivate() is executed by CPU0 and, for instance,
pm_wakeup_pending() is executed by CPU1, where both processors can
reorder operations, the memory barrier in wakeup_source_deactivate()
doesn't affect CPU1 which can reorder reads. In that case CPU1 may
very well decide to fetch event_count before it's modified and
events_in_progress after it's been updated, so pm_wakeup_pending()
may fail to detect a wakeup event. This issue can be addressed by
using a single atomic variable to store both events_in_progress
and event_count, so that they can be updated together in a single
atomic operation.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
To avoid confusion with the meaning and return value of
pm_check_wakeup_events() replace it with pm_wakeup_pending() that
will work the other way around (ie. return true when system-wide
power transition should be aborted).
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
There may be wakeup sources that aren't associated with any devices
and their statistics information won't be available from sysfs. Also,
for debugging purposes it is convenient to have all of the wakeup
sources statistics available from one place. For these reasons,
introduce new file "wakeup_sources" in debugfs containing those
statistics.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Introduce struct wakeup_source for representing system wakeup sources
within the kernel and for collecting statistics related to them.
Make the recently introduced helper functions pm_wakeup_event(),
pm_stay_awake() and pm_relax() use struct wakeup_source objects
internally, so that wakeup statistics associated with wakeup devices
can be collected and reported in a consistent way (the definition of
pm_relax() is changed, which is harmless, because this function is
not called directly by anyone yet). Introduce new wakeup-related
sysfs device attributes in /sys/devices/.../power for reporting the
device wakeup statistics.
Change the global wakeup events counters event_count and
events_in_progress into atomic variables, so that it is not necessary
to acquire a global spinlock in pm_wakeup_event(), pm_stay_awake()
and pm_relax(), which should allow us to avoid lock contention in
these functions on SMP systems with many wakeup devices.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Originally, pm_wakeup_event() uses struct delayed_work objects,
allocated with GFP_ATOMIC, to schedule the execution of pm_relax()
in future. However, as noted by Alan Stern, it is not necessary to
do that, because all pm_wakeup_event() calls can use one static timer
that will always be set to expire at the latest time passed to
pm_wakeup_event().
The modifications are based on the example code posted by Alan.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
One of the arguments during the suspend blockers discussion was that
the mainline kernel didn't contain any mechanisms making it possible
to avoid races between wakeup and system suspend.
Generally, there are two problems in that area. First, if a wakeup
event occurs exactly when /sys/power/state is being written to, it
may be delivered to user space right before the freezer kicks in, so
the user space consumer of the event may not be able to process it
before the system is suspended. Second, if a wakeup event occurs
after user space has been frozen, it is not generally guaranteed that
the ongoing transition of the system into a sleep state will be
aborted.
To address these issues introduce a new global sysfs attribute,
/sys/power/wakeup_count, associated with a running counter of wakeup
events and three helper functions, pm_stay_awake(), pm_relax(), and
pm_wakeup_event(), that may be used by kernel subsystems to control
the behavior of this attribute and to request the PM core to abort
system transitions into a sleep state already in progress.
The /sys/power/wakeup_count file may be read from or written to by
user space. Reads will always succeed (unless interrupted by a
signal) and return the current value of the wakeup events counter.
Writes, however, will only succeed if the written number is equal to
the current value of the wakeup events counter. If a write is
successful, it will cause the kernel to save the current value of the
wakeup events counter and to abort the subsequent system transition
into a sleep state if any wakeup events are reported after the write
has returned.
[The assumption is that before writing to /sys/power/state user space
will first read from /sys/power/wakeup_count. Next, user space
consumers of wakeup events will have a chance to acknowledge or
veto the upcoming system transition to a sleep state. Finally, if
the transition is allowed to proceed, /sys/power/wakeup_count will
be written to and if that succeeds, /sys/power/state will be written
to as well. Still, if any wakeup events are reported to the PM core
by kernel subsystems after that point, the transition will be
aborted.]
Additionally, put a wakeup events counter into struct dev_pm_info and
make these per-device wakeup event counters available via sysfs,
so that it's possible to check the activity of various wakeup event
sources within the kernel.
To illustrate how subsystems can use pm_wakeup_event(), make the
low-level PCI runtime PM wakeup-handling code use it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Alan Stern <stern@rowland.harvard.edu>