On Lenovo ThinkPad X1 Carbon - the 5th Generation, enabling an earlier
EC event freezing timing causes acpitz-virtual-0 to report a stuck
48C temparature. And with EC firmware revisioned as 1.14, without
reverting back to old EC event freezing timing, the fan still blows
up after a system resume.
This reverts the culprit change so that the regression can be fixed
without upgrading the EC firmware.
Fixes: d30283057e (ACPI / EC: Enable event freeze mode to improve event handling)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=191181#c168
Tested-by: Damjan Georgievski <gdamjan@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: 4.9+ <stable@vger.kernel.org> # 4.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
According to bug reports, although the busy polling mode can make
noirq stages execute faster, it causes abnormal fan blowing up after
system resume (see the first link below for a video demonstration)
on Lenovo ThinkPad X1 Carbon - the 5th Generation. The problem can
be fixed by upgrading the EC firmware on that machine.
However, many reporters confirm that the problem can be fixed by
stopping busy polling during suspend/resume and for some of them
upgrading the EC firmware is not an option.
For this reason, drop the noirq stage hooks from the EC driver
to fix the regression.
Fixes: c3a696b6e8 (ACPI / EC: Use busy polling mode when GPE is not enabled)
Link: https://youtu.be/9NQ9x-Jm99Q
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196129
Reported-by: Andreas Lindhe <andreas@lindhe.io>
Tested-by: Gjorgji Jankovski <j.gjorgji@gmail.com>
Tested-by: Damjan Georgievski <gdamjan@gmail.com>
Tested-by: Fernando Chaves <nanochaves@gmail.com>
Tested-by: Tomislav Ivek <tomislav.ivek@gmail.com>
Tested-by: Denis P. <theoriginal.skullburner@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Update the ACPICA code in the kernel to upstream revision
revision 20170531 (which covers all of the new material from
ACPI 6.2) including:
* Support for the PinFunction(), PinConfig(), PinGroup(),
PinGroupFunction(), and PinGroupConfig() resource descriptors
(Mika Westerberg).
* Support for new subtables in HEST and SRAT, new notify value
for HEST, header support for TPM2 table changes, and BGRT
Status field update (Bob Moore).
* Support for new PCCT subtables (David Box).
* Support for _LSI, _LSR, _LSW, and _HMA as predefined methods
(Erik Schmauss).
* Support for the new WSMT, HMAT, and PPTT tables (Lv Zheng).
* New UUID values for Processor Properties (Bob Moore).
* New notify values for memory attributes and graceful shutdown
(Bob Moore).
* Fix related to the PCAT_COMPAT MADT flag (Janosch Hildebrand).
* Resource to AML conversion fix for resources containing GPIOs
(Mika Westerberg).
* Disassembler-related updates (Bob Moore, David Box, Erik
Schmauss).
* Assorted fixes and cleanups (Bob Moore, Erik Schmauss, Lv Zheng,
Cao Jin).
- Modify ACPICA to always use designated initializers for function
pointer structures to make the structure layout randomization GCC
plugin work with it (Kees Cook).
- Update the tables configfs interface to unload SSDTs on configfs
entry removal (Jan Kiszka).
- Add support for the GPI1 regulator to the xpower PMIC Operation
Region handler (Hans de Goede).
- Fix ACPI EC issues related to conflicting EC definitions in the
ECDT and in the ACPI namespace (Lv Zheng, Carlo Caione, Chris
Chiu).
- Fix an interrupt storm issue in the EC driver and make its debug
output work with dynamic debug as expected (Lv Zheng).
- Add ACPI backlight quirk for Dell Precision 7510 (Shih-Yuan Lee).
- Fix whitespace in pr_fmt() to align log entries properly in some
places in the ACPI subsystem (Vincent Legoll).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJZWrJyAAoJEILEb/54YlRxY2cP/1h2un46sEnP7uLVLK4lpEih
NQ79+fA7NTidA9J6gin2Kjgl8BFgswAhQBJZG3cJxUrhIIRWItSeLGvHb0CXX+iY
m7CpL0ralIKV9XCju8B5b2V+0qn2tPzHS5a8PSX9Gvs0N6G8Qnlq4jspIjXMa3zH
3D/fmYbQZeuHjypiBqRlB5IE49O2FQL2+d4Vn1rryuAFFya610ulASOZxsQ015d4
Xt2pSTqUbqeD8rG8+j4VKFy8x0Lj0eEU3FUgYNJHlO+pOG+wTTs3KPyNOa33h0OC
US0Wc0XOUyz78P5YkEgA7Ve/j1E6bqQ9bVF6mIcqjvAOReqSe84RCJ86Ckjlewxf
VMNjCT/qDDkB8d+IAX6e7uYwxuP8bTSfUGBHjoI16qaJ13zd1/jOVY8QH5zPg5Ml
r06HO5iOrQS3yhumYRk3gIdV+cgdHt3SEQlfi30nFH5Yzp8epk85UPyjYUYPDwqy
QP5QbYvmHyuIUSXW2HQxZYADSaqnKIglRizCIKcFHT5+J554DjM5T51A5UpyQkY3
cjtgrpg4KA5qCrpUAqPPxA2mtUVsZk1h4HwhrkTVt4xaR9GP9gvvzonP552fmHrM
nIX9tM2JEf+LC7eCV+yrk8dtGNTRYpmEBQw7go4R82pTE5YTOWDf1bNM59gTFoWP
LXi28C9lb2aibWlX/0bw
=ImFw
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These mostly update the ACPICA code in the kernel to upstream revision
20170531 which covers all of the new material from ACPI 6.2, including
new tables (WSMT, HMAT, PPTT), new subtables and definition changes
for some existing tables (BGRT, HEST, SRAT, TPM2, PCCT), new resource
descriptor macros for pin control, support for new predefined methods
(_LSI, _LSR, _LSW, _HMA), fixes and cleanups.
On top of that, an additional ACPICA change from Kees (which also is
upstream already) switches all of the definitions of function pointer
structures in ACPICA to use designated initializers so as to make the
structure layout randomization GCC plugin work with it.
The rest is a few fixes and cleanups in the EC driver, an xpower PMIC
driver update, a new backlight blacklist entry, and update of the
tables configfs interface and a messages formatting cleanup.
Specifics:
- Update the ACPICA code in the kernel to upstream revision revision
20170531 (which covers all of the new material from ACPI 6.2)
including:
* Support for the PinFunction(), PinConfig(), PinGroup(),
PinGroupFunction(), and PinGroupConfig() resource descriptors
(Mika Westerberg).
* Support for new subtables in HEST and SRAT, new notify value for
HEST, header support for TPM2 table changes, and BGRT Status
field update (Bob Moore).
* Support for new PCCT subtables (David Box).
* Support for _LSI, _LSR, _LSW, and _HMA as predefined methods
(Erik Schmauss).
* Support for the new WSMT, HMAT, and PPTT tables (Lv Zheng).
* New UUID values for Processor Properties (Bob Moore).
* New notify values for memory attributes and graceful shutdown
(Bob Moore).
* Fix related to the PCAT_COMPAT MADT flag (Janosch Hildebrand).
* Resource to AML conversion fix for resources containing GPIOs
(Mika Westerberg).
* Disassembler-related updates (Bob Moore, David Box, Erik
Schmauss).
* Assorted fixes and cleanups (Bob Moore, Erik Schmauss, Lv Zheng,
Cao Jin).
- Modify ACPICA to always use designated initializers for function
pointer structures to make the structure layout randomization GCC
plugin work with it (Kees Cook).
- Update the tables configfs interface to unload SSDTs on configfs
entry removal (Jan Kiszka).
- Add support for the GPI1 regulator to the xpower PMIC Operation
Region handler (Hans de Goede).
- Fix ACPI EC issues related to conflicting EC definitions in the
ECDT and in the ACPI namespace (Lv Zheng, Carlo Caione, Chris
Chiu).
- Fix an interrupt storm issue in the EC driver and make its debug
output work with dynamic debug as expected (Lv Zheng).
- Add ACPI backlight quirk for Dell Precision 7510 (Shih-Yuan Lee).
- Fix whitespace in pr_fmt() to align log entries properly in some
places in the ACPI subsystem (Vincent Legoll)"
* tag 'acpi-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (63 commits)
ACPI / EC: Add quirk for GL720VMK
ACPI / EC: Fix media keys not working problem on some Asus laptops
ACPI / EC: Add support to skip boot stage DSDT probe
ACPI / EC: Enhance boot EC sanity check
ACPI / video: Add quirks for the Dell Precision 7510
ACPI: EC: Fix EC command visibility for dynamic debug
ACPI: EC: Fix an EC event IRQ storming issue
ACPICA: Use designated initializers
ACPICA: Update version to 20170531
ACPICA: Update a couple of debug output messages
ACPICA: acpiexec: enhance local signal handler
ACPICA: Simplify output for the ACPI Debug Object
ACPICA: Unix application OSL: Correctly handle control-c (EINTR)
ACPICA: Improvements for debug output only
ACPICA: Disassembler: allow conflicting external declarations to be emitted.
ACPICA: Disassembler: add external op to namespace on first pass
ACPICA: Disassembler: prevent external op's from opening a new scope
ACPICA: Changed Gbl_disasm_flag to acpi_gbl_disasm_flag
ACPICA: Changing External to a named object
ACPICA: Update two error messages to emit control method name
...
ASUS GL720VMK is also affected by the EC GPE preference issue.
Signed-off-by: Carlo Caione <carlo@caione.org>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some Asus laptops (verified on X550VXK/FX502VD/FX502VE) get no
interrupts when pressing media keys thus the corresponding functions
are not invoked. It's due to the _GPE defines in DSDT for EC returns
differnt value compared to the GPE Number in ECDT. Confirmed with Asus
that the vale in ECDT is the correct one. This commit uses DMI quirks
to prevent calling _GPE when doing ec_parse_device() and keep the ECDT
GPE number setting for the EC device.
With previous commit, it is ensured that if there is an ECDT, it can
always be kept as boot_ec, this patch thus can implement a quirk on
top of the determined ECDT boot_ec.
Link: https://phabricator.endlessm.com/T16033
Link: https://phabricator.endlessm.com/T16722
Link: https://bugzilla.kernel.org/show_bug.cgi?id=195651
Tested-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Chris Chiu <chiu@endlessm.com>
Signed-off-by: Carlo Caione <carlo@caione.org>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We prepared _INI/_STA methods for \_SB, \_SB.PCI0, \_SB.LID0 and
\_SB.EC, _HID(PNP0C09)/_CRS/_GPE for \_SB.EC to poke Windows behavior
with qemu, we got the following execution sequence:
\_SB._INI
\_SB.PCI0._STA
\_SB.LID0._STA
\_SB.EC._STA
\_SB.PCI0._INI
\_SB.LID0._INI
\_SB.EC._INI
There is no extra DSDT EC device enumeration process occurring before
the main ACPI device enumeration process. That means acpi_ec_dsdt_probe()
is not Windows-compatible.
Tracking back, it was added by the following commit:
Commit: c5279dee26
Subject: ACPI: EC: Add some basic check for ECDT data
but that commit was misguided.
Why we shouldn't enumerate DSDT EC before the main ACPI device
enumeration?
The only way to know if the DSDT EC is valid would be to evaluate its
_STA control method, but it's not safe to evaluate this control method
that early and out of the ACPI enumeration process, because _STA may
refer to entities (such as resources or ACPI device objects) that may
not have been initialized before OSPM starts to enumerate them via
the main ACPI device enumeration.
But after we had reverted back to the expected behavior, a regression
was reported. On that platform, there is no ECDT, but the platform
control methods access EC operation region earlier than Linux expects
causing some ACPI method execution errors. For this reason, we just
go back to old behavior to still probe DSDT EC as the boot EC.
However, that turns out to lead to yet another functional breakage
and in order to work around all of the problems, we skip boot stage
DSDT probe when the ECDT exists so that a later quirk can always use
correct ECDT GPE setting.
Link: http://bugzilla.kernel.org/show_bug.cgi?id=11880
Link: http://bugzilla.kernel.org/show_bug.cgi?id=119261
Link: http://bugzilla.kernel.org/show_bug.cgi?id=195651
Tested-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
[ rjw: Changelog & comments massage ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It's reported that some buggy BIOS tables can contain 2 DSDT ECs, one of
them is invalid but acpi_ec_dsdt_probe() fails to pick the valid one.
This patch simply enhances sanity checks in ec_parse_device() as a
workaround to skip probing wrong namespace ECs.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=195651
Tested-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
acpi_ec_cmd_string() currently is only enabled for "DEBUG" macro, but users
trend to use CONFIG_DYNAMIC_DEBUG and enable ec.c pr_debug() print-outs by
"dyndbg='file ec.c +p'". In this use case, all command names are turned
into UNDEF and the log is confusing. This affects bugzilla triage work.
This patch fixes this issue by enabling acpi_ec_cmd_string() for
CONFIG_DYNAMIC_DEBUG.
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: Feng Chenzhou <chenzhoux.feng@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The EC event IRQ (SCI_EVT) can only be handled by submitting QR_EC. As the
EC driver handles SCI_EVT in a workqueue, after SCI_EVT is flagged and
before QR_EC is submitted, there is a period risking IRQ storming. EC IRQ
must be masked for this period but linux EC driver never does so.
No end user notices the IRQ storming and no developer fixes this known
issue because:
1. The EC IRQ is always edge triggered GPE, and
2. The kernel can execute no-op EC IRQ handler very fast.
For edge-triggered EC GPE platforms, it is only reported of post-resume EC
event lost issues, there won't be an IRQ storming. For level triggered EC
GPE platforms, fortunately the kernel is always fast enough to execute such
a no-op EC IRQ handler so that the IRQ handler won't be accumulated to
starve the task contexts, causing a real IRQ storming.
But the IRQ storming actually can still happen when:
1. The EC IRQ performs like level triggered GPE, and
2. The kernel EC debugging log is turned on but the console is slow enough.
There are more and more platforms using EC GPE as wake GPE where the EC GPE
is likely designed as level triggered. Then when EC debugging log is
enabled, the EC IRQ handler is no longer a no-op but dumps IRQ status to
the consoles. If the consoles are slow enough, the EC IRQs can arrive much
faster than executing the handler. Finally the accumulated EC event IRQ
handlers starve the task contexts, causing the IRQ storming to occur, and
the kernel hangs can be observed during boot/resume.
This patch fixes this issue by masking EC IRQ for this period:
1. Begins when there is an SCI_EVT IRQ pending, and
2. Ends when there is a QR_EC completed (SCI_EVT acknowledged).
Tested-by: Wang Wendy <wendy.wang@intel.com>
Tested-by: Feng Chenzhou <chenzhoux.feng@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some recent Dell laptops, including the XPS13 model numbers 9360 and
9365, cannot be woken up from suspend-to-idle by pressing the power
button which is unexpected and makes that feature less usable on
those systems. Moreover, on the 9365 ACPI S3 (suspend-to-RAM) is
not expected to be used at all (the OS these systems ship with never
exercises the ACPI S3 path in the firmware) and suspend-to-idle is
the only viable system suspend mechanism there.
The reason why the power button wakeup from suspend-to-idle doesn't
work on those systems is because their power button events are
signaled by the EC (Embedded Controller), whose GPE (General Purpose
Event) line is disabled during suspend-to-idle transitions in Linux.
That is done on purpose, because in general the EC tends to be noisy
for various reasons (battery and thermal updates and similar, for
example) and all events signaled by it would kick the CPUs out of
deep idle states while in suspend-to-idle, which effectively might
defeat its purpose.
Of course, on the Dell systems in question the EC GPE must be enabled
during suspend-to-idle transitions for the button press events to
be signaled while suspended at all, but fortunately there is a way
out of this puzzle.
First of all, those systems have the ACPI_FADT_LOW_POWER_S0 flag set
in their ACPI tables, which means that the OS is expected to prefer
the "low power S0 idle" system state over ACPI S3 on them. That
causes the most recent versions of other OSes to simply ignore ACPI
S3 on those systems, so it is reasonable to expect that it should not
be necessary to block GPEs during suspend-to-idle on them.
Second, in addition to that, the systems in question provide a special
firmware interface that can be used to indicate to the platform that
the OS is transitioning into a system-wide low-power state in which
certain types of activity are not desirable or that it is leaving
such a state and that (in principle) should allow the platform to
adjust its operation mode accordingly.
That interface is a special _DSM object under a System Power
Management Controller device (PNP0D80). The expected way to use it
is to invoke function 0 from it on system initialization, functions
3 and 5 during suspend transitions and functions 4 and 6 during
resume transitions (to reverse the actions carried out by the
former). In particular, function 5 from the "Low-Power S0" device
_DSM is expected to cause the platform to put itself into a low-power
operation mode which should include making the EC less verbose (so to
speak). Next, on resume, function 6 switches the platform back to
the "working-state" operation mode.
In accordance with the above, modify the ACPI suspend-to-idle code
to look for the "Low-Power S0" _DSM interface on platforms with the
ACPI_FADT_LOW_POWER_S0 flag set in the ACPI tables. If it's there,
use it during suspend-to-idle transitions as prescribed and avoid
changing the GPE configuration in that case. [That should reflect
what the most recent versions of other OSes do.]
Also modify the ACPI EC driver to make it handle events during
suspend-to-idle in the usual way if the "Low-Power S0" _DSM interface
is going to be used to make the power button events work while
suspended on the Dell machines mentioned above
Link: http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When GPE is not enabled, it is not efficient to use the wait polling mode
as it introduces an unexpected scheduler delay.
So before the GPE handler is installed, this patch uses busy polling mode
for all EC(s) and the logic can be applied to non boot EC(s) during the
suspend/resume process.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=191561
Tested-by: Jakobus Schurz <jakobus.schurz@gmail.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
IRQ polling logic has been implemented to drain the post-boot/resume
EC events:
1. Triggered by the following code, invoked from acpi_ec_enable_event():
if (!test_bit(EC_FLAGS_QUERY_PENDING, &ec->flags))
advance_transaction(ec);
2. Drained by the following code, invoked after acpi_ec_complete_query():
if (status & ACPI_EC_FLAG_SCI)
acpi_ec_submit_query(ec);
This facility is safer than the old CLEAR_ON_RESUME quirk as the
CLEAR_ON_RESUME quirk sends EC query commands unconditionally. The
behavior is apparently not suitable for firmware that requires
QUERY_HANDSHAKE quirk. Though the QUERY_HANDSHAKE quirk isn't used
now because of the improvement done in the EC transaction state
machine (ec_event_clearing=QUERY), it is the proof that we cannot
send EC query command unconditionally.
So it's time to delete the out-dated CLEAR_ON_RESUME quirk to let the
users to try the newer approach.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=191211
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are issues related to the boot_ec:
1. If acpi_ec_remove() is invoked, boot_ec will also be freed, this is not
expected as the boot_ec could be enumerated via ECDT.
2. Address space handler installation/unstallation lead to unexpected _REG
evaluations.
This patch adds acpi_is_boot_ec() check to be used to fix the above issues.
However, since acpi_ec_remove() actually won't be invoked, this patch
doesn't handle the reference counting of "struct acpi_ec", it only ensures
the correctness of the boot_ec destruction during the boot.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=153511
Reported-and-tested-by: Jonh Henderson <jw.hendy@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is possible to register _Qxx from namespace and use the ECDT EC to
perform event handling. The reported bug reveals that Windows is using ECDT
in this way in case the namespace EC is not present. This patch facilitates
Linux to support ECDT in this way.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=115021
Reported-and-tested-by: Luya Tshimbalanga <luya@fedoraproject.org>
Tested-by: Jonh Henderson <jw.hendy@gmail.com>
Reviewed-by: Peter Wu <peter@lekensteyn.nl>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When the handler installation failed, there was no code to free the
allocated EC device. This patch fixes this memory leakage issue.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=115021
Reported-and-tested-by: Luya Tshimbalanga <luya@fedoraproject.org>
Tested-by: Jonh Henderson <jw.hendy@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In order to support full ECDT (driving the ECDT EC after probing the
namespace EC), we need to change our EC device alloc/free algorithm, ensure
not to free old boot EC before qualifying new boot EC.
This patch achieves this by cleaning up first_ec/boot_ec logic:
1. first_ec: used to perform transactions, so it is assigned in new
acpi_ec_setup() function.
2. boot_ec: used to track early EC device, so it is assigned in new
acpi_config_boot_ec() function which explictly tells the driver to save
the EC device as early EC device.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=115021
Reported-and-tested-by: Luya Tshimbalanga <luya@fedoraproject.org>
Tested-by: Jonh Henderson <jw.hendy@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch enables the event freeze mode, flushing the EC event handling in
.suspend() callback. This feature is experimental, if it is bisected out to
be the cause of the real issues, please report the issues to the kernel
bugzilla for further root causing and improvement.
This mode eliminates useless _Qxx handling during the power saving
operations, thus can help to tune the power saving operations faster. Tests
show that this mode can efficiently block flooding _Qxx during the suspend
process and tune the speed of the suspend faster.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Tested-by: Todd E Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the original EC driver, though the event handling is not explicitly
stopped, the EC driver is actually not able to handle events during the
noirq stage as the EC driver is not prepared to handle the EC events in the
polling mode. So if there is no advance_transaction() triggered, the EC
driver couldn't notice the EC events.
However, do we actually need to handle EC events during suspend/resume
stage? EC events are mostly useless for the suspend/resume period (key
strokes and battery/thermal updates, etc.,), and the useful ones (lid
close, power/sleep button press) should have already been delivered to the
OSPM to trigger the power saving operations.
Thus this patch implements acpi_ec_disable_event() to be a reverse call of
acpi_ec_enable_event(), with which, the EC driver is able to stop handling
the EC events in a position before entering the noirq stage.
Since there are actually 2 choices for us:
1. implement event handling in polling mode;
2. stop event handling before entering noirq stage.
And this patch only implements the second choice using .suspend() callback.
Thus this is experimental (first choice is better? or different hook
position is better?). This patch finally keeps the old behavior by default
and prepares a boot parameter to enable this feature.
The differences of the event handling availability between the old behavior
(this patch is not applied) and the new behavior (this patch is applied)
are as follows:
!FreezeEvents FreezeEvents
before suspend Y Y
suspend before EC Y Y
suspend after EC Y N
suspend_late Y N
suspend_noirq Y (actually N) N
resume_noirq Y (actually N) N
resume_late Y (actually N) N
resume before EC Y (actually N) N
resume after EC Y Y
after resume Y Y
Where "actually N" means if there is no EC transactions, the EC driver
is actually not able to notice the pending events.
We can see that FreezeEvents is the only approach now can actually flush
the EC event handling with both query commands and _Qxx evaluations
flushed, other modes can only flush the EC event handling with only query
commands flushed, _Qxx evaluations occurred after stopping the EC driver
may end up failure due to the failure of the EC transaction carried out in
the _Qxx control methods.
We also can see that this feature should be able to trigger some platform
notifications later than resuming other drivers.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Tested-by: Todd E Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch makes 2 changes:
1. Restore old behavior
Originally, EC driver stops handling both events and transactions in
acpi_ec_block_transactions(), and restarts to handle transactions in
acpi_ec_unblock_transactions_early(), restarts to handle both events and
transactions in acpi_ec_unblock_transactions().
While currently, EC driver still stops handling both events and
transactions in acpi_ec_block_transactions(), but restarts to handle both
events and transactions in acpi_ec_unblock_transactions_early().
This patch tries to restore the old behavior by dropping
__acpi_ec_enable_event() from acpi_unblock_transactions_early().
2. Improve old behavior
However this still cannot fix the real issue as both of the
acpi_ec_unblock_xxx() functions are invoked in the noirq stage. Since the
EC driver actually doesn't implement the event handling in the polling
mode, re-enabling the event handling too early in the noirq stage could
result in the problem that if there is no triggering source causing
advance_transaction() to be invoked, pending SCI_EVT cannot be detected by
the EC driver and _Qxx cannot be triggered.
It actually makes sense to restart the event handling in any point during
resuming after the noirq stage. Just like the boot stage where the event
handling is enabled in .add(), this patch further moves
acpi_ec_enable_event() to .resume(). After doing that, the following 2
functions can be combined:
acpi_ec_unblock_transactions_early()/acpi_ec_unblock_transactions().
The differences of the event handling availability between the old behavior
(this patch isn't applied) and the new behavior (this patch is applied) are
as follows:
!Applied Applied
before suspend Y Y
suspend before EC Y Y
suspend after EC Y Y
suspend_late Y Y
suspend_noirq Y (actually N) Y (actually N)
resume_noirq Y (actually N) Y (actually N)
resume_late Y (actually N) Y (actually N)
resume before EC Y (actually N) Y (actually N)
resume after EC Y (actually N) Y
after resume Y (actually N) Y
Where "actually N" means if there is no triggering source, the EC driver
is actually not able to notice the pending SCI_EVT occurred in the noirq
stage. So we can clearly see that this patch has improved the situation.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Tested-by: Todd E Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After enabling the EC event handling, Linux is still in the noirq stage, if
there is no triggering source (EC transaction, GPE STS status),
advance_transaction() will not be invoked and SCI_EVT cannot be detected.
This patch adds one more triggering source after enabling the EC event
handling to poll the pending SCI_EVT.
Known issues:
1. Still no SCI_EVT triggering source
There could still be no SCI_EVT triggering source after handling the
first SCI_EVT (polled by this patch if any). Because after handling the
first SCI_EVT, Linux could still be in noirq stage and there could still
be no further triggering source in this stage. Then the second SCI_EVT
indicated during this stage still cannot be detected by the EC driver.
With this improvement applied, it is then possible to move
acpi_ec_enable_event() out of the noirq stage to fix this issue (if the
first SCI_EVT is handled out of the noirq stage, the follow-up SCI_EVTs
should be able to trigger IRQs).
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Tested-by: Todd E Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is a hidden logic in the EC driver:
1. During boot, EC_FLAGS_QUERY_PENDING is responsible for blocking event
handling;
2. During suspend, EC_FLAGS_STARTED is responsible for blocking event
handling.
This patch uses a new EC_FLAGS_QUERY_ENABLED flag to make this hidden
logic explicit and have code cleaned up. No functional change.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Tested-by: Todd E Brandt <todd.e.brandt@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is reported that on some platforms, resume speed is not fast. The cause
is: in noirq stage, EC driver is working in polling mode, and each state
machine advancement requires a context switch.
The context switch is not necessary to the EC driver's polling mode. This
patch implements PM hooks to automatically switch the driver to/from the
busy polling mode to eliminate the overhead caused by the context switch.
This finally contributes to the tuning result: acpi_pm_finish() execution
time is improved from 192ms to 6ms.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reported-and-tested-by: Todd E Brandt <todd.e.brandt@linux.intel.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
A regression is caused by the following commit:
Commit: 02b771b64b
Subject: ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations
In this commit, using system workqueue causes that the maximum parallel
executions of _Qxx can exceed 255. This violates the method reentrancy
limit in ACPICA and generates the following error log:
ACPI Error: Method reached maximum reentrancy limit (255) (20150818/dsmethod-341)
This patch creates a seperate workqueue and limits the number of parallel
_Qxx evaluations down to a configurable value (can be tuned against number
of online CPUs).
Since EC events are handled after driver probe, we can create the workqueue
in acpi_ec_init().
Fixes: 02b771b64b (ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=135691
Cc: 4.3+ <stable@vger.kernel.org> # 4.3+
Reported-and-tested-by: Helen Buus <ubuntu@hbuus.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is an order issue in ec_remove_handlers() that acpi_ec_stop()
is called before removing the operation region handler. That is
incorrect, because the operation region handler removal triggers
_REG(DISCONNECT) which may result in new EC transactions to carry
out.
That existing issue has been triggered by the following commit:
Commit: dcf15cbded
Subject: ACPI / EC: Fix a boot EC regresion by restoring boot EC
which changed the driver to call ec_remove_handlers() after invoking
_REG(CONNECT), so the issue has become visible.
Fixes: dcf15cbded (ACPI / EC: Fix a boot EC regresion by restoring boot EC)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=102421
Reported-and-tested-by: Wolfram Sang <wsa@the-dreams.de>
Reported-by: Nicholas <nkudriavtsev@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Our Windows probe result shows that EC._REG is evaluated after evaluating
all _INI/_STA control methods.
With boot EC always switched in acpi_ec_dsdt_probe(), we can see that as
long as there is no EC opregion accesses in the MLC (module level code, AML
code out of any control methods) and in _INI/_STA, there is no need to make
sure that ECDT must be correct.
Bugs of 9399/12461 were reported against an order issue that BAT0/1._STA
evaluations contain EC accesses while the ECDT setting is wrong.
>From the acpidump output posted on bug 9399, we can see that it is actually
a different issue. In this table, if EC._REG is not executed, EC accesses
will be done in a platform specific manner. As we've already ensured not to
execute EC._REG during the eary stage, we can remove the quirks for bug
9399.
From the acpidump output posted on bug 12461, we can see that it still
needs the quirk. In this table, EC._REG flags a named object whose default
value is One, thus BAT1._STA surely should invoke EC accesses whatever we
invoke EC._REG or not. We have to keep the quirk for it before we can root
cause the issue.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Failure handling of the boot EC code is not tidy. This patch cleans
them up with acpi_ec_alloc().
This patch also changes acpi_ec_dsdt_probe(), always switches the
boot EC from the ECDT one to the DSDT one in this function.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
According to the Windows probing result, during the table loading, the EC
device described in the ECDT should be used. And the ECDT EC is also
effective during the period the namespace objects are initialized (we can
see a separate process executing _STA/_INI on Windows before executing
other device specific control methods, for example, EC._REG). During the
device enumration, the EC device described in the DSDT should be used. But
there are differences between Linux and Windows around the device probing
order. Thus in Linux, we should enable the DSDT EC as early as possible
before enumerating devices in order not to trigger issues related to the
device enumeration order differences.
This patch thus converts acpi_boot_ec_enable() into acpi_ec_dsdt_probe() to
fix the gap. This also fixes a user reported regression triggered after we
switched the "table loading"/"ECDT support" to be ACPI spec 2.0 compliant.
Fixes: 59f0aa9480 (ACPI 2.0 / ECDT: Remove early namespace reference from EC)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=119261
Reported-and-tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
All operation region accesses are allowed by AML interpreter when AML is
executed, so actually BIOSen are responsible to avoid the operation region
accesses in AML before OSPM has prepared an operation region driver. This
is done via _REG control method. So AML code normally sets a global named
object REGC to 1 when _REG(3, 1) is evaluated.
Then what is ECDT? Quoting from ACPI spec 6.0, 5.2.15 Embedded Controller
Boot Resources Table (ECDT):
"The presence of this table allows OSPM to provide Embedded Controller
operation region space access before the namespace has been evaluated."
Spec also suggests a compatible mean to indicate the early EC access
availability:
Device (EC)
{
Name (REGC, Ones)
Method (_REG, 2)
{
If (LEqual (Arg0, 3))
{
Store (Arg1, REGC)
}
}
Method (ECAV)
{
If (LEqual (REGC, Ones))
{
If (LGreaterEqual (_REV, 2))
{
Return (One)
}
Else
{
Return (Zero)
}
}
Else
{
Return (REGC)
}
}
}
In this way, it allows EC accesses to happen before EC._REG(3, 1) is
invoked.
But ECAV is not the only way practical BIOSen using to indicate the early
EC access availibility, the known variations include:
1. Setting REGC to One in \_SB._INI when _REV >= 2. Since \_SB._INI is the
first control method evaluated by OSPM during the enumeration, this
allows EC accesses to happen for the entire enumeration process before
the namespace EC is enumerated.
2. Initialize REGC to One by default, this even allows EC accesses to
happen during the table loading.
Linux is now broken around ECDT support during the long term bug fixing
work because it has merged many wrong ECDT bug fixes (see details below).
Linux currently uses namespace EC's settings instead of ECDT settings when
ECDT is detected. This apparently will result in namespace walk and
_CRS/_GPE/_REG evaluations. Such stuffs could only happen after namespace
is ready, while ECDT is purposely to be used before namespace is ready.
The wrong bug fixing story is:
1. Link 1:
At Linux ACPI early stages, "no _Lxx/_Exx/_Qxx evaluation can happen
before the namespace is ready" are not ensured by ACPICA core and Linux.
This is currently ensured by deferred enabling of GPE and defered
registering of EC query methods (acpi_ec_register_query_methods).
2. Link 2:
Reporters reported buggy ECDTs, expecting quirks for the platform.
Originally, the quirk is simple, only doing things with ECDT.
Bug 9399 and 12461 are platforms (Asus L4R, Asus M6R, MSI MS-171F)
reported to have wrong ECDT IO port addresses, the port addresses are
reversed.
Bug 11880 is a platform (Asus X50GL) reported to have 0 valued port
addresses, we can see that all EC accesses are protected by ECAV on
this platform, so actually no early EC accesses is required by this
platform.
3. Link 3:
But when the bug fixing developer was requested to provide a handy and
non-quirk bug fix, he tried to use correct EC settings from namespace
and broke the spec purpose. We can even see that the developer was
suffered from many regrssions. One interesting one is 14086, where the
actual root cause obviously should be: _REG is evaluated too early. But
unfortunately, the bug is fixed in a totally wrong way.
So everything goes wrong from these commits:
Commit: c6cb0e8784
Subject: ACPI: EC: Don't trust ECDT tables from ASUS
Commit: a5032bfdd9
Subject: ACPI: EC: Always parse EC device
This patch reverts Linux behavior to simple ECDT quirk support in order to
stop early _CRS/_GPE/_REG evaluations.
For Bug 9399, 12461, since it is reported that the platforms require early
EC accesses, this patch restores the simple ECDT quirks for them.
For Bug 11880, since it is not reported that the platform requires early EC
accesses and its ACPI tables contain correct ECAV, we choose an ECDT
enumeration failure for this platform.
Link 1: https://bugzilla.kernel.org/show_bug.cgi?id=9916http://bugzilla.kernel.org/show_bug.cgi?id=10100https://lkml.org/lkml/2008/2/25/282
Link 2: https://bugzilla.kernel.org/show_bug.cgi?id=9399https://bugzilla.kernel.org/show_bug.cgi?id=12461https://bugzilla.kernel.org/show_bug.cgi?id=11880
Link 3: https://bugzilla.kernel.org/show_bug.cgi?id=11884https://bugzilla.kernel.org/show_bug.cgi?id=14081https://bugzilla.kernel.org/show_bug.cgi?id=14086https://bugzilla.kernel.org/show_bug.cgi?id=14446
Link 4: https://bugzilla.kernel.org/show_bug.cgi?id=112911
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch splits EC_FLAGS_HANDLERS_INSTALLED so that address space handler
can be installed when it is not possible to install GPE handler during
early stage.
This patch also tunes address space handler installation, making it
happening earlier than GPE handler installation for the same purpose.
Since acpi_ec_start()/acpi_ec_stop() will be entered multiple times after
applying this change, it is also required to protect acpi_enable_gpe()/
acpi_disable_gpe() invocations.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=112911
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The acpi_ec_delete_query() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In acpi_ec_guard_event(), EC transaction state machine variables should be
checked with the EC spinlock locked.
The bug doesn't trigger any real issue now because this bug can only occur
when the ec_event_clearing=event mode is applied while there is no user
currently using this mode.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
1. acpi_ec_remove_query_handlers()
This patch refines the query handler removal logic implemented in
acpi_ec_remove_query_handler(), making it to invoke new
acpi_ec_remove_query_handlers() API, and ensuring all other removal code
paths to invoke the new API to honor the reference count of the query
handlers.
2. acpi_ec_get_query_handler_by_value()
This patch also refines the query handler search logic originally
implemented in acpi_ec_query(), collecting it into
acpi_ec_get_query_handler_by_value(). And since schedule_work() can ensure
the serilization of acpi_ec_event_handler(), we needn't put the
mutex_lock() around schedule_work().
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When query handler is not found, "result" is actually stil 0, and
"struct acpi_ec_query" is not NULL, so the deletion code of
"struct acpi_ec_query" at the end of the function cannot be invoked.
As a consequence, memory leak can be observed.
The issue is introduced by this commit:
Commit: 02b771b64b
Subject: ACPI / EC: Fix an issue caused by the serialized _Qxx
This patch fixes such memory leakage.
Fixes: 02b771b64b (ACPI / EC: Fix an issue caused by the serialized _Qxx evaluations)
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is proven that Windows evaluates _Qxx handlers in a parallel way. This
patch follows this fact, splits _Qxx evaluations from the NOTIFY queue to
form a separate queue, so that _Qxx evaluations can be queued up on
different CPUs rather than being queued up on a CPU0 bound queue.
Event handling related callbacks are also renamed and sorted in this patch.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=94411
Reported-and-tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is no need to carry potentially outdated Free Software Foundation
mailing address in file headers since the COPYING file includes it.
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When the QR_EC transaction fails, the EC_FLAGS_QUERY_PENDING flag prevents
the event handling work queue from being scheduled again.
Though there shouldn't be failed QR_EC transactions, and this gap was
efficiently used for catching and learning the SCI_EVT clearing timing
compliance issues, we need to fix this as we are not fully compatible
with all platforms/Windows to handle SCI_EVT clearing timing correctly.
Fixing this gives the EC driver the chances to recover from a state machine
failure.
So this patch fixes this issue. When nr_pending_queries drops to 0, it
clears EC_FLAGS_QUERY_PENDING at the proper position for different modes in
order to ensure that the SCI_EVT handling can proceed.
In order to be clearer for future ec_event_clearing modes, all checks in
this patch are written in the inclusive style, not the exclusive style.
Cc: 3.16+ <stable@vger.kernel.org> # 3.16+
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is reported that on several platforms, EC firmware will not respond
non-expected QR_EC (see EC_FLAGS_QUERY_HANDSHAKE, only write QR_EC when
SCI_EVT is set).
Unfortunately, ACPI specification doesn't define when the SCI_EVT should be
cleared by the firmware, thus the original implementation queued up second
QR_EC right after writing QR_EC command and before reading the returned
event value as at that time the SCI_EVT is ensured not cleared. This
behavior is also based on the assumption that the firmware should be able
to return 0x00 to indicate "no outstanding event". This behavior did fix
issues on Samsung platforms where the spurious query value of 0x00 is
supported and didn't break platforms in my test queue.
But recently, specific Acer, Asus, Lenovo platforms keep on blaming this
change.
This patch changes the behavior to re-check the SCI_EVT a bit later and
removes EC_FLAGS_QUERY_HANDSHAKE quirks, hoping this is the Windows
compliant EC driver behavior.
In order to be robust to the possible regressions, instead of removing the
quirk directly, this patch keeps the quirk code, removes the quirk users
and keeps old behavior for Samsung platforms.
Cc: 3.16+ <stable@vger.kernel.org> # 3.16+
Link: https://bugzilla.kernel.org/show_bug.cgi?id=94411
Link: https://bugzilla.kernel.org/show_bug.cgi?id=97381
Link: https://bugzilla.kernel.org/show_bug.cgi?id=98111
Reported-and-tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Reported-and-tested-by: Tigran Gabrielyan <tigrangab@gmail.com>
Reported-and-tested-by: Adrien D <ghbdtn@openmailbox.org>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We've been suffering from the uncertainty of the SCI_EVT clearing timing.
This patch implements 3 of 4 possible modes to handle SCI_EVT clearing
variations. The old behavior is kept in this patch.
Status: QR_EC is re-checked as early as possible after checking previous
SCI_EVT. This always leads to 2 QR_EC transactions per SCI_EVT
indication and the target may implement event queue which returns
0x00 indicating "no outstanding event".
This is proven to be a conflict against Windows behavior, but is
still kept in this patch to make the EC driver robust to the
possible regressions that may occur on Samsung platforms.
Query: QR_EC is re-checked after the target has handled the QR_EC query
request command pushed by the host.
Event: QR_EC is re-checked after the target has noticed the query event
response data pulled by the host.
This timing is not determined by any IRQs, so we may need to use a
guard period in this mode, which may explain the existence of the
ec_guard() code used by the old EC driver where the re-check timing
is implemented in the similar way as this mode.
Method: QR_EC is re-checked as late as possible after completing the _Qxx
evaluation. The target may implement SCI_EVT like a level triggered
interrupt.
It is proven on kernel bugzilla 94411 that, Windows will have all
_Qxx evaluations parallelized. Thus unless required by further
evidences, we needn't implement this mode as it is a conflict of
the _Qxx parallelism requirement.
Note that, according to the reports, there are platforms that cannot be
handled using the "Status" mode without enabling the
EC_FLAGS_QUERY_HANDSHAKE quirk. But they can be handled with the other
modes according to the tests (kernel bugzilla 97381).
The following log entry can be used to confirm the differences of the 3
modes as it should appear at the different positions for the 3 modes:
Command(QR_EC) unblocked
Status: appearing after
EC_SC(W) = 0x84
Query: appearing after
EC_DATA(R) = 0xXX
where XX is the event number used to determine _QXX
Event: appearing after first
EC_SC(R) = 0xX0 SCI_EVT=x BURST=0 CMD=0 IBF=0 OBF=0
that is next to the following log entry:
Command(QR_EC) completed by hardware
Link: https://bugzilla.kernel.org/show_bug.cgi?id=94411
Link: https://bugzilla.kernel.org/show_bug.cgi?id=97381
Link: https://bugzilla.kernel.org/show_bug.cgi?id=98111
Reported-and-tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Reported-and-tested-by: Tigran Gabrielyan <tigrangab@gmail.com>
Reported-and-tested-by: Adrien D <ghbdtn@openmailbox.org>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
During the period that a work queue is scheduled (queued up for run) but
hasn't been run, second schedule_work() could fail. This may not lead to
the loss of queries because QR_EC is always ensured to be submitted after
the work queue has been in the running state.
The event handling work queue can be changed into the loop style to allow
us to control the code in a more flexible way:
1. Makes it possible to add event=0x00 termination condition in the loop.
2. Increases the thoughput of the QR_EC transactions as the 2nd+ QR_EC
transactions may be handled in the same work item used for the 1st QR_EC
transaction, thus the delay caused by the 2nd+ work item scheduling can
be eliminated.
Except the logging message changes and the throughput improvement, this
patch is just a funcitonal no-op.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Tested-by: Tigran Gabrielyan <tigrangab@gmail.com>
Tested-by: Adrien D <ghbdtn@openmailbox.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch collects transaction state transition code into one function. We
then could have a single function to maintain transaction transition
related behaviors. No functional changes.
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Tested-by: Gabriele Mazzotta <gabriele.mzt@gmail.com>
Tested-by: Tigran Gabrielyan <tigrangab@gmail.com>
Tested-by: Adrien D <ghbdtn@openmailbox.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
{ Update to correct 1 patch subject in the description }
We have fixed a lot of race issues in the EC driver recently.
The following commit introduces MSI udelay()/msleep() quirk to MSI laptops
to make EC firmware working for bug 12011 without root causing any EC
driver race issues:
Commit: 5423a0cb3f
Subject: ACPI: EC: Add delay for slow MSI controller
Commit: 34ff4dbccc
Subject: ACPI: EC: Separate delays for MSI hardware
The following commit extends ECDT validation quirk to MSI laptops to make
EC driver locating EC registers properly for bug 12461:
Commit: a5032bfdd9
Subject: ACPI: EC: Always parse EC device
This is a different quirk than the MSI udelay()/msleep() quirk. This patch
keeps validating ECDT for only "Micro-Star MS-171F" as reported.
The following commit extends MSI udelay()/msleep() quirk to Quanta laptops
to make EC firmware working for bug 20242, there is no requirement to
validate ECDT for Quanta laptops:
Commit: 534bc4e3d2 Mon Sep 17 00:00:00 2001
Subject: ACPI EC: enable MSI workaround for Quanta laptops
The following commit extends MSI udelay()/msleep() quirk to Clevo laptops
to make EC firmware working for bug 77431, there is no requirement to
validate ECDT for Clevo laptops:
Commit: 777cb38295
Subject: ACPI / EC: Add msi quirk for Clevo W350etq
All udelay()/msleep() quirks for MSI/Quanta/Clevo seem to be the wrong
fixes generated without fixing the EC driver race issues.
And even if it is not wrong, the guarding can be covered by the following
commits in wait polling mode:
Commit: 9e295ac14d
Subject: ACPI / EC: Reduce ec_poll() by referencing the last register access timestamp.
Commit: commit in the same series
Subject: ACPI / EC: Fix and clean up register access guarding logics.
The only case that is not covered is the inter-transaction guarding. And
there is no evidence that we need the inter-transaction guarding upon
reading the noted bug entries.
So it is time to remove the quirks and let the users to try again. If there
is a regression, the only thing we need to do is to restore the
inter-transaction guarding for the reported platforms.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=12011
Link: https://bugzilla.kernel.org/show_bug.cgi?id=12461
Link: https://bugzilla.kernel.org/show_bug.cgi?id=20242
Link: https://bugzilla.kernel.org/show_bug.cgi?id=77431
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We have 2 polling modes in the EC driver:
1. busy polling: originally used for the MSI quirks. udelay() is used to
perform register access guarding.
2. wait polling: normal code path uses wait_event_timeout() and it can be
woken up as soon as the transaction is completed in the interrupt mode.
It also contains the register acces guarding logic in case the interrupt
doesn't arrive and the EC driver is about to advance the transaction in
task context (the polling mode).
The wait polling is useful for interrupt mode to allow other tasks to use
the CPU during the wait.
But for the polling mode, the busy polling takes less time than the wait
polling, because if no interrupt arrives, the wait polling has to wait the
minimal HZ interval.
We have a new use case for using the busy polling mode. Some GPIO drivers
initialize PIN configuration which cause a GPIO multiplexed EC GPE to be
disabled out of the GPE register's control. Busy polling mode is useful
here as it takes less time than the wait polling. But the guarding logic
prevents it from responding even faster. We should spinning around the EC
status rather than spinning around the nop execution lasted a determined
period.
This patch introduces 2 module params for the polling mode switch and the
guard time, so that users can use the busy polling mode without the
guarding in case the guarding is not necessary. This is an example to use
the 2 module params for this purpose:
acpi.ec_busy_polling acpi.ec_polling_guard=0
We've tested the patch on a test platform. The platform suffers from such
kind of the GPIO PIN issue. The GPIO driver resets all PIN configuration
and after that, EC interrupt cannot arrive because of the multiplexing.
Then the platform suffers from a long delay carried out by the
wait_event_timeout() as all further EC transactions will run in the polling
mode. We switched the EC driver to use the busy polling mechanism instead
of the wait timeout polling mechanism and the delay is still high:
[ 44.283005] calling PNP0C0B:00+ @ 1305, parent: platform
[ 44.417548] call PNP0C0B:00+ returned 0 after 131323 usecs
And this patch can significantly reduce the delay:
[ 44.502625] calling PNP0C0B:00+ @ 1308, parent: platform
[ 44.503760] call PNP0C0B:00+ returned 0 after 1103 usecs
Tested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the polling mode, EC driver shouldn't access the EC registers too
frequently. Though this statement is concluded from the non-root caused
bugs (see links below), we've maintained the register access guarding
logics in the current EC driver. The guarding logics can be found here and
there, makes it hard to root cause real timing issues. This patch collects
the guarding logics into one single function so that all hidden logics
related to this can be seen clearly.
The current guarding related code also has several issues:
1. Per-transaction timestamp prevents inter-transaction guarding from being
implemented in the same place. We have an inter-transaction udelay() in
acpi_ec_transaction_unblocked(), this logic can be merged into ec_poll()
if we can use per-device timestamp. This patch completes such merge to
form a new ec_guard() function and collects all guarding related hidden
logics in it.
One hidden logic is: there is no inter-transaction guarding performed
for non MSI quirk (wait polling mode), this patch skips
inter-transaction guarding before wait_event_timeout() for the wait
polling mode to reveal the hidden logic.
The other hidden logic is: there is msleep() inter-transaction guarding
performed when the GPE storming is observed. As after merging this
commit:
Commit: e1d4d90fc0
Subject: ACPI / EC: Refine command storm prevention support
EC_FLAGS_COMMAND_STORM is ensured to be cleared after invoking
acpi_ec_transaction_unlocked(), the msleep() guard logic will never
happen now. Since no one complains such change, this logic is likely
added during the old times where the EC race issues are not fixed and
the bugs are false root-caused to the timing issue. This patch simply
removes the out-dated logic. We can restore it by stop skipping
inter-transaction guarding for wait polling mode.
Two different delay values are defined for msleep() and udelay() while
they are merged in this patch to 550us.
2. time_after() causes additional delay in the polling mode (can only be
observed in noirq suspend/resume processes where polling mode is always
used) before advance_transaction() is invoked ("wait polling" log is
added before wait_event_timeout()). We can see 2 wait_event_timeout()
invocations. This is because time_after() ensures a ">" validation while
we only need a ">=" validation here:
[ 86.739909] ACPI: Waking up from system sleep state S3
[ 86.742857] ACPI : EC: 2: Increase command
[ 86.742859] ACPI : EC: ***** Command(RD_EC) started *****
[ 86.742861] ACPI : EC: ===== TASK (0) =====
[ 86.742871] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[ 86.742873] ACPI : EC: EC_SC(W) = 0x80
[ 86.742876] ACPI : EC: ***** Event started *****
[ 86.742880] ACPI : EC: ~~~~~ wait polling ~~~~~
[ 86.743972] ACPI : EC: ~~~~~ wait polling ~~~~~
[ 86.747966] ACPI : EC: ===== TASK (0) =====
[ 86.747977] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[ 86.747978] ACPI : EC: EC_DATA(W) = 0x06
[ 86.747981] ACPI : EC: ~~~~~ wait polling ~~~~~
[ 86.751971] ACPI : EC: ~~~~~ wait polling ~~~~~
[ 86.755969] ACPI : EC: ===== TASK (0) =====
[ 86.755991] ACPI : EC: EC_SC(R) = 0x21 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=1
[ 86.755993] ACPI : EC: EC_DATA(R) = 0x03
[ 86.755994] ACPI : EC: ~~~~~ wait polling ~~~~~
[ 86.755995] ACPI : EC: ***** Command(RD_EC) stopped *****
[ 86.755996] ACPI : EC: 1: Decrease command
This patch corrects this by using time_before() instead in ec_guard():
[ 54.283146] ACPI: Waking up from system sleep state S3
[ 54.285414] ACPI : EC: 2: Increase command
[ 54.285415] ACPI : EC: ***** Command(RD_EC) started *****
[ 54.285416] ACPI : EC: ~~~~~ wait polling ~~~~~
[ 54.285417] ACPI : EC: ===== TASK (0) =====
[ 54.285424] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[ 54.285425] ACPI : EC: EC_SC(W) = 0x80
[ 54.285427] ACPI : EC: ***** Event started *****
[ 54.285429] ACPI : EC: ~~~~~ wait polling ~~~~~
[ 54.287209] ACPI : EC: ===== TASK (0) =====
[ 54.287218] ACPI : EC: EC_SC(R) = 0x20 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=0
[ 54.287219] ACPI : EC: EC_DATA(W) = 0x06
[ 54.287222] ACPI : EC: ~~~~~ wait polling ~~~~~
[ 54.291190] ACPI : EC: ===== TASK (0) =====
[ 54.291210] ACPI : EC: EC_SC(R) = 0x21 SCI_EVT=1 BURST=0 CMD=0 IBF=0 OBF=1
[ 54.291213] ACPI : EC: EC_DATA(R) = 0x03
[ 54.291214] ACPI : EC: ~~~~~ wait polling ~~~~~
[ 54.291215] ACPI : EC: ***** Command(RD_EC) stopped *****
[ 54.291216] ACPI : EC: 1: Decrease command
After cleaning up all guarding logics, we have one single function
ec_guard() collecting all old, non-root-caused, hidden logics. Then we can
easily tune the logics in one place to respond to the bug reports.
Except the time_before() change, all other changes do not change the
behavior of the EC driver.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=12011
Link: https://bugzilla.kernel.org/show_bug.cgi?id=20242
Link: https://bugzilla.kernel.org/show_bug.cgi?id=77431
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>