When a PCIe card is hot-removed, the Presence Detect State and Data Link
Layer Link Active bits often do not clear simultaneously. I've seen delays
of up to 244 msec between the two events with Thunderbolt.
After pciehp has brought down the slot in response to the first event, the
other bit may still be set. It's not discernible whether it's set because
a new card is already in the slot or if it will soon clear. So pciehp
tries to bring up the slot and in the latter case fails with a bunch of
messages, some of them at KERN_ERR severity. If the slot is no longer
occupied, the messages are false positives and annoy users.
Stuart Hayes reports the following splat on hot removal:
KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up
KERN_INFO pcieport 0000:3c:06.0: pciehp: Timeout waiting for Presence Detect
KERN_ERR pcieport 0000:3c:06.0: pciehp: link training error: status 0x0001
KERN_ERR pcieport 0000:3c:06.0: pciehp: Failed to check link status
Dongdong Liu complains about a similar splat:
KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Link Down
KERN_INFO iommu: Removing device 0000:87:00.0 from group 12
KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Card present
KERN_INFO pcieport 0000:80:10.0: Data Link Layer Link Active not set in 1000 msec
KERN_ERR pciehp 0000:80:10.0:pcie004: Failed to check link status
Users are particularly irritated to see a bringup attempt even though the
slot was explicitly brought down via sysfs. In a perfect world, we could
avoid this by setting Link Disable on slot bringdown and re-enabling it
upon a Presence Detect State change. In reality however, there are broken
hotplug ports which hardwire Presence Detect to zero, see 80696f9914
("PCI: pciehp: Tolerate Presence Detect hardwired to zero"). Conversely,
PCIe r1.0 hotplug ports hardwire Link Active to zero because Link Active
Reporting wasn't specified before PCIe r1.1. On unplug, some ports first
clear Presence then Link (see Stuart Hayes' splat) whereas others use the
inverse order (see Dongdong Liu's splat). To top it off, there are hotplug
ports which flap the Presence and Link bits on slot bringup, see
6c35a1ac3d ("PCI: pciehp: Tolerate initially unstable link").
pciehp is designed to work with all of these variants. Surplus attempts at
slot bringup are a lesser evil than not being able to bring up slots at
all. Although we could try to perfect the behavior for specific hotplug
controllers, we'd risk breaking others or increasing code complexity.
But we can certainly minimize annoyance by emitting only a single message
with KERN_INFO severity if bringup is unsuccessful:
* Drop the "Timeout waiting for Presence Detect" message in
pcie_wait_for_presence(). The sole caller of that function,
pciehp_check_link_status(), ignores the timeout and carries on. It emits
error messages of its own and I don't think this particular message adds
much value.
* There's a single error condition in pciehp_check_link_status() which
does not emit a message. Adding one allows dropping the "Failed to check
link status" message emitted by board_added() if
pciehp_check_link_status() returns a non-zero integer.
* Tone down all messages in pciehp_check_link_status() to KERN_INFO
severity and rephrase them to look as innocuous as possible. To this
end, move the message emitted by pcie_wait_for_link_delay() to its
callers.
As a result, Stuart Hayes' splat becomes:
KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Link Up
KERN_INFO pcieport 0000:3c:06.0: pciehp: Slot(180): Cannot train link: status 0x0001
Dongdong Liu's splat becomes:
KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): Card present
KERN_INFO pciehp 0000:80:10.0:pcie004: Slot(36): No link
The messages now merely serve as information that presence or link bits
were set a little longer than expected. Bringup failures which are not
false positives are still reported, albeit no longer at KERN_ERR severity.
Link: https://lore.kernel.org/linux-pci/20200310182100.102987-1-stuart.w.hayes@gmail.com/
Link: https://lore.kernel.org/linux-pci/1547649064-19019-1-git-send-email-liudongdong3@huawei.com/
Link: https://lore.kernel.org/r/b45e46fd8a6aa6930aaac9d7718c2e4b787a4e5e.1595935071.git.lukas@wunner.de
Reported-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Reported-by: Dongdong Liu <liudongdong3@huawei.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
This addresses deadlocks in these common cases in hierarchies containing
two switches:
- All involved ports are runtime suspended and they are unplugged. This
can happen easily if the drivers involved automatically enable runtime
PM (xHCI for example does that).
- System is suspended (e.g., closing the lid on a laptop) with a dock +
something else connected, and the dock is unplugged while suspended.
These cases lead to the following deadlock:
INFO: task irq/126-pciehp:198 blocked for more than 120 seconds.
irq/126-pciehp D 0 198 2 0x80000000
Call Trace:
schedule+0x2c/0x80
schedule_timeout+0x246/0x350
wait_for_completion+0xb7/0x140
kthread_stop+0x49/0x110
free_irq+0x32/0x70
pcie_shutdown_notification+0x2f/0x50
pciehp_remove+0x27/0x50
pcie_port_remove_service+0x36/0x50
device_release_driver+0x12/0x20
bus_remove_device+0xec/0x160
device_del+0x13b/0x350
device_unregister+0x1a/0x60
remove_iter+0x1e/0x30
device_for_each_child+0x56/0x90
pcie_port_device_remove+0x22/0x40
pcie_portdrv_remove+0x20/0x60
pci_device_remove+0x3e/0xc0
device_release_driver_internal+0x18c/0x250
device_release_driver+0x12/0x20
pci_stop_bus_device+0x6f/0x90
pci_stop_bus_device+0x31/0x90
pci_stop_and_remove_bus_device+0x12/0x20
pciehp_unconfigure_device+0x88/0x140
pciehp_disable_slot+0x6a/0x110
pciehp_handle_presence_or_link_change+0x263/0x400
pciehp_ist+0x1c9/0x1d0
irq_thread_fn+0x24/0x60
irq_thread+0xeb/0x190
kthread+0x120/0x140
INFO: task irq/190-pciehp:2288 blocked for more than 120 seconds.
irq/190-pciehp D 0 2288 2 0x80000000
Call Trace:
__schedule+0x2a2/0x880
schedule+0x2c/0x80
schedule_preempt_disabled+0xe/0x10
mutex_lock+0x2c/0x30
pci_lock_rescan_remove+0x15/0x20
pciehp_unconfigure_device+0x4d/0x140
pciehp_disable_slot+0x6a/0x110
pciehp_handle_presence_or_link_change+0x263/0x400
pciehp_ist+0x1c9/0x1d0
irq_thread_fn+0x24/0x60
irq_thread+0xeb/0x190
kthread+0x120/0x140
What happens here is that the whole hierarchy is runtime resumed and the
parent PCIe downstream port, which got the hot-remove event, starts
removing devices below it, taking pci_lock_rescan_remove() lock. When the
child PCIe port is runtime resumed it calls pciehp_check_presence() which
ends up calling pciehp_card_present() and pciehp_check_link_active(). Both
of these use pcie_capability_read_word(), which notices that the underlying
device is already gone and returns PCIBIOS_DEVICE_NOT_FOUND with the
capability value set to 0. When pciehp gets this value it thinks that its
child device is also hot-removed and schedules its IRQ thread to handle the
event.
The deadlock happens when the child's IRQ thread runs and tries to acquire
pci_lock_rescan_remove() which is already taken by the parent and the
parent waits for the child's IRQ thread to finish.
Prevent this from happening by checking the return value of
pcie_capability_read_word() and if it is PCIBIOS_DEVICE_NOT_FOUND stop
performing any hot-removal activities.
[bhelgaas: add common scenarios to commit log]
Link: https://lore.kernel.org/r/20191029170022.57528-2-mika.westerberg@linux.intel.com
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
A sysfs request to enable or disable a PCIe hotplug slot should not
return before it has been carried out. That is sought to be achieved by
waiting until the controller's "pending_events" have been cleared.
However the IRQ thread pciehp_ist() clears the "pending_events" before
it acts on them. If pciehp_sysfs_enable_slot() / _disable_slot() happen
to check the "pending_events" after they have been cleared but while
pciehp_ist() is still running, the functions may return prematurely
with an incorrect return value.
Fix by introducing an "ist_running" flag which must be false before a sysfs
request is allowed to return.
Fixes: 32a8cef274 ("PCI: pciehp: Enable/disable exclusively from IRQ thread")
Link: https://lore.kernel.org/linux-pci/1562226638-54134-1-git-send-email-wangxiongfeng2@huawei.com
Link: https://lore.kernel.org/r/4174210466e27eb7e2243dd1d801d5f75baaffd8.1565345211.git.lukas@wunner.de
Reported-and-tested-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: stable@vger.kernel.org # v4.19+
The PCIe spec doesn't mention "green LEDs" or "amber LEDs". Replace those
terms with "Power Indicator" and "Attention Indicator" so the comments
match the spec language. Comment changes only.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Remove pciehp_green_led_{on,off,blink}() and use pciehp_set_indicators()
instead, since the code is mostly the same.
[bhelgaas: drop set_power_indicator() wrapper to reduce the number of
interfaces]
Link: https://lore.kernel.org/r/20190903111021.1559-5-efremov@linux.com
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Combine adjacent updates of power and attention indicators into a single
pciehp_set_indicators() call. This sends one command to the hotplug
controller instead of two.
Link: https://lore.kernel.org/r/20190903111021.1559-3-efremov@linux.com
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Log messages with pci_dev, not pcie_device. Factor out common message
prefixes with dev_fmt().
Example output change:
- pciehp 0000:00:06.0:pcie004: Slot(0) Powering on due to button press
+ pcieport 0000:00:06.0: pciehp: Slot(0) Powering on due to button press
Link: https://lore.kernel.org/lkml/20190509141456.223614-8-helgaas@kernel.org
Signed-off-by: Frederick Lawler <fred@fredlawl.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
During a safe hot remove, the OS powers off the slot, which may cause a
Data Link Layer State Changed event. The slot has already been set to
OFF_STATE, so that event results in re-enabling the device, making it
impossible to safely remove it.
Clear out the Presence Detect Changed and Data Link Layer State Changed
events when the disabled slot has settled down.
It is still possible to re-enable the device if it remains in the slot
after pressing the Attention Button by pressing it again.
Fixes the problem that Micah reported below: an NVMe drive power button may
not actually turn off the drive.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203237
Reported-by: Micah Parrish <micah.parrish@hpe.com>
Tested-by: Micah Parrish <micah.parrish@hpe.com>
Signed-off-by: Sergey Miroshnichenko <s.miroshnichenko@yadro.com>
[bhelgaas: changelog, add bugzilla URL]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v4.19+
When the PCI hotplug core and its first user, cpqphp, were introduced in
February 2002 with historic commit a8a2069f432c, cpqphp allocated a slot
struct for its internal use plus a hotplug_slot struct to be registered
with the hotplug core and linked the two with pointers:
https://git.kernel.org/tglx/history/c/a8a2069f432c
Nowadays, the predominant pattern in the tree is to embed ("subclass")
such structures in one another and cast to the containing struct with
container_of(). But it wasn't until July 2002 that container_of() was
introduced with historic commit ec4f214232cf:
https://git.kernel.org/tglx/history/c/ec4f214232cf
pnv_php, introduced in 2016, did the right thing and embedded struct
hotplug_slot in its internal struct pnv_php_slot, but all other drivers
cargo-culted cpqphp's design and linked separate structs with pointers.
Embedding structs is preferrable to linking them with pointers because
it requires fewer allocations, thereby reducing overhead and simplifying
error paths. Casting an embedded struct to the containing struct
becomes a cheap subtraction rather than a dereference. And having fewer
pointers reduces the risk of them pointing nowhere either accidentally
or due to an attack.
Convert all drivers to embed struct hotplug_slot in their internal slot
struct. The "private" pointer in struct hotplug_slot thereby becomes
unused, so drop it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> # drivers/pci/hotplug/rpa*
Acked-by: Sebastian Ott <sebott@linux.ibm.com> # drivers/pci/hotplug/s390*
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com> # drivers/platform/x86
Cc: Len Brown <lenb@kernel.org>
Cc: Scott Murray <scott@spiteful.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Oliver OHalloran <oliveroh@au1.ibm.com>
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Corentin Chary <corentin.chary@gmail.com>
Cc: Darren Hart <dvhart@infradead.org>
Of the members which were just moved from pciehp's slot struct to the
controller struct, rename "lock" to "state_lock" and rename "work" to
"button_work" for clarity. Perform the rename separately to the
unification of the two structs per Sinan's request.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sinan Kaya <okaya@kernel.org>
pciehp was originally introduced together with shpchp in a single
commit, c16b4b14d980 ("PCI Hotplug: Add SHPC and PCI Express hot-plug
drivers"):
https://git.kernel.org/tglx/history/c/c16b4b14d980
shpchp supports up to 31 slots per controller, hence uses separate slot
and controller structs. pciehp has a 1:1 relationship between slot and
controller and therefore never required this separation. Nevertheless,
because much of the code had been copy-pasted between the two drivers,
pciehp likewise uses separate structs to this very day.
The artificial separation of data structures adds unnecessary complexity
and bloat to pciehp and requires constantly chasing pointers at runtime.
Simplify the driver by merging struct slot into struct controller.
Merge the slot constructor pcie_init_slot() and the destructor
pcie_cleanup_slot() into the controller counterparts.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The WiGig Bus Extension (WBE) specification allows tunneling PCIe over
IEEE 802.11. A product implementing this spec is the wil6210 from
Wilocity (now part of Qualcomm Atheros). It integrates a PCIe switch
with a wireless network adapter:
00.0-+ [1ae9:0101] Upstream Port
+-00.0-+ [1ae9:0200] Downstream Port
| +-00.0 [168c:0034] Atheros AR9462 Wireless Network Adapter
+-02.0 [1ae9:0201] Downstream Port
+-03.0 [1ae9:0201] Downstream Port
Wirelessly attached devices presumably appear below the hotplug ports
with device ID [1ae9:0201]. Oddly, the Downstream Port [1ae9:0200]
leading to the wireless network adapter is likewise Hotplug Capable,
but has its Presence Detect State bit hardwired to zero. Even if the
Link Active bit is set, Presence Detect is zero, so this cannot be
caused by in-band presence detection but only by broken hardware.
pciehp assumes an empty slot if Presence Detect State is zero,
regardless of Link Active being one. Consequently, up until v4.18 it
removes the wireless network adapter in pciehp_resume(). From v4.19 it
already does so in pciehp_probe().
Be lenient towards broken hardware and assume the slot is occupied if
Link Active is set: Introduce pciehp_card_present_or_link_active()
and use it in lieu of pciehp_get_adapter_status() everywhere, except
in pciehp_handle_presence_or_link_change() whose log messages depend
on which of Presence Detect State or Link Active is set.
Remove the Presence Detect State check from __pciehp_enable_slot()
because it is only called if either of Presence Detect State or Link
Active is set.
Caution: There is a possibility that broken hardware exists which has
working Presence Detect but hardwires Link Active to one. On such
hardware the slot will now incorrectly be considered always occupied.
If such hardware is discovered, this commit can be rolled back and a
quirk can be added which sets is_hotplug_bridge = 0 for [1ae9:0200].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=200839
Reported-and-tested-by: David Yang <mmyangfl@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rajat Jain <rajatja@google.com>
Cc: Ashok Raj <ashok.raj@intel.com>
pciehp's ->enable_slot, ->disable_slot, ->get_attention_status and
->reset_slot callbacks are currently implemented by wrapper functions
that do nothing else but call down to a backend function. The backends
are not called from anywhere else, so drop the wrappers and use the
backends directly as callbacks, thereby shaving off a few lines of
unnecessary code.
No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Drop the following includes from pciehp source files which no longer use
any of the included symbols:
* <linux/sched/signal.h> in pciehp.h
<linux/signal.h> in pciehp_hpc.c
Added by commit de25968cc8 ("fix more missing includes") to
accommodate for a call to signal_pending().
The call was removed by commit 262303fe32 ("pciehp: fix wait command
completion").
* <linux/interrupt.h> in pciehp_core.c
Added by historic commit f308a2dfbe63 ("PCI: add PCI Express Port Bus
Driver subsystem") to accommodate for a call to free_irq():
https://git.kernel.org/tglx/history/c/f308a2dfbe63
The call was removed by commit 407f452b05 ("pciehp: remove
unnecessary free_irq").
* <linux/time.h> in pciehp_core.c and pciehp_hpc.c
Added by commit 34d03419f0 ("PCIEHP: Add Electro Mechanical
Interlock (EMI) support to the PCIE hotplug driver."),
which was reverted by commit bd3d99c170 ("PCI: Remove untested
Electromechanical Interlock (EMI) support in pciehp.").
* <linux/module.h> in pciehp_ctrl.c, pciehp_hpc.c and pciehp_pci.c
Added by historic commit c16b4b14d980 ("PCI Hotplug: Add SHPC and PCI
Express hot-plug drivers"):
https://git.kernel.org/tglx/history/c/c16b4b14d980
Module-related symbols were neither used back then in those files,
nor are they used today.
* <linux/slab.h> in pciehp_ctrl.c
Added by commit 5a0e3ad6af ("include cleanup: Update gfp.h and
slab.h includes to prepare for breaking implicit slab.h inclusion from
percpu.h") to accommodate for calls to kmalloc().
The calls were removed by commit 0e94916e60 ("PCI: pciehp: Handle
events synchronously").
* "../pci.h" in pciehp_ctrl.c
Added by historic commit 67f4660b72f2 ("PCI: ASPM patch for") to
accommodate for usage of the global variable pcie_mch_quirk:
https://git.kernel.org/tglx/history/c/67f4660b72f2
The global variable was removed by commit 0ba379ec0f ("PCI: Simplify
hotplug mch quirk").
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When removing PCI devices below a hotplug bridge, pciehp marks them as
disconnected if the card is no longer present in the slot or it quiesces
them if the card is still present (by disabling INTx interrupts, bus
mastering and SERR# reporting).
To detect whether the card is still present, pciehp checks the Presence
Detect State bit in the Slot Status register. The problem with this
approach is that even if the card is present, the link to it may be
down, and it that case it would be better to mark the devices as
disconnected instead of trying to quiesce them. Moreover, if the card
in the slot was quickly replaced by another one, the Presence Detect
State bit would be set, yet trying to quiesce the new card's devices
would be wrong and the correct thing to do is to mark the previous
card's devices as disconnected.
Instead of looking at the Presence Detect State bit, it is better to
differentiate whether the card was surprise removed versus safely
removed (via sysfs or an Attention Button press). On surprise removal,
the devices should be marked as disconnected, whereas on safe removal it
is correct to quiesce the devices.
The knowledge whether a surprise removal or a safe removal is at hand
does exist further up in the call stack: A surprise removal is
initiated by pciehp_handle_presence_or_link_change(), a safe removal by
pciehp_handle_disable_request().
Pass that information down to pciehp_unconfigure_device() and use it in
lieu of the Presence Detect State bit. While there, add kernel-doc to
pciehp_unconfigure_device() and pciehp_configure_device().
Tested-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Keith Busch <keith.busch@intel.com>
Per Mika's request, add an explicit break to the last case of switch
statements everywhere in pciehp to be more defensive towards future
amendments.
Per Gustavo's request, mark all non-empty implicit fallthroughs with a
comment to silence warnings triggered by -Wimplicit-fallthrough=2.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
pciehp's IRQ thread ensures accessibility of the port by runtime resuming
its parent to D0. However when the slot is enabled/disabled, the port
itself needs to be in D0 because its secondary bus is accessed in:
pciehp_check_link_status(),
pciehp_configure_device() (both called from board_added())
and
pciehp_unconfigure_device() (called from remove_board()).
Thus, acquire a runtime PM ref on enable/disablement of the slot.
Yinghai Lu additionally discovered that some SkyLake servers feature a
Power Controller for their PCIe hotplug ports (PCIe r3.1, sec 6.7.1.8)
which requires the port to be in D0 when invoking
pciehp_power_on_slot() (likewise called from board_added()).
If slot power is turned on while in D3hot, link training later fails:
https://lkml.kernel.org/r/20170205073454.GA253@wunner.de
The spec is silent about such a requirement, but it seems prudent to
assume that any hotplug port with a Power Controller may need this.
The present commit holds a runtime PM ref whenever slot power is turned
on and off, but it doesn't keep the port in D0 as long as slot power is
on. If vendors determine that's necessary, they need to amend pciehp to
acquire a runtime PM ref in pciehp_power_on_slot() and release one in
pciehp_power_off_slot().
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
A hotplug port's Slot Status register does not count how often each type
of event occurred, it only records the fact *that* an event has occurred.
Previously pciehp queued a work item for each event. But if it missed
an event, e.g. removal of a card in-between two back-to-back insertions,
it queued up the wrong work item or no work item at all. Commit
fad214b0aa ("PCI: pciehp: Process all hotplug events before looking
for new ones") sought to improve the situation by shrinking the window
during which events may be missed.
But Stefan Roese reports unbalanced Card present and Link Up events,
suggesting that we're still missing events if they occur very rapidly.
Bjorn Helgaas responds that he considers pciehp's event handling
"baroque" and calls for its simplification and rationalization:
https://lkml.kernel.org/r/20180202192045.GA53759@bhelgaas-glaptop.roam.corp.google.com
It gets worse once a hotplug port is runtime suspended: The port can
signal an interrupt while it and its parents are in D3hot, i.e. while
it is inaccessible. By the time we've runtime resumed all parents to D0
and read the port's Slot Status register, we may have missed an arbitrary
number of events. Event handling therefore needs to be reworked to
become resilient to missed events.
Assume that a Presence Detect Changed event has occurred.
Consider the following truth table:
- Slot is in OFF_STATE and is currently empty. => Do nothing.
(The event is trailing a Link Down or we've
missed an insertion and subsequent removal.)
- Slot is in OFF_STATE and is currently occupied. => Turn the slot on.
- Slot is in ON_STATE and is currently empty. => Turn the slot off.
- Slot is in ON_STATE and is currently occupied. => Turn the slot off,
(Be cautious and assume the card in then back on.
the slot isn't the same as before.)
This leads to the following simple algorithm:
1 If the slot is in ON_STATE, turn it off unconditionally.
2 If the slot is currently occupied, turn it on.
Because those actions are now carried out synchronously, rather than by
scheduled work items, pciehp reacts to the *current* situation and
missed events no longer matter.
Data Link Layer State Changed events can be handled identically to
Presence Detect Changed events. Note that in the above truth table,
a Link Up trailing a Card present event didn't have to be accounted for:
It is filtered out by pciehp_check_link_status().
As for Attention Button Pressed events, PCIe r4.0, sec 6.7.1.5 says:
"Once the Power Indicator begins blinking, a 5-second abort interval
exists during which a second depression of the Attention Button cancels
the operation." In other words, the user can only expect the system to
react to a button press after it starts blinking. Missed button presses
that occur in-between are irrelevant.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Stefan Roese <sr@denx.de>
Cc: Mayurkumar Patel <mayurkumar.patel@intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
No callers of pciehp_enable/disable_slot() outside of pciehp_ctrl.c
remain, so declare the functions static. For now this requires forward
declarations. Those can be eliminated by reshuffling functions once the
ongoing effort to refactor the driver has settled.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Previously slot enablement and disablement could happen concurrently.
But now it's under the exclusive control of the IRQ thread, rendering
the locking obsolete. Drop it.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Besides the IRQ thread, there are several other places in the driver
which enable or disable the slot:
- pciehp_probe() enables the slot if it's occupied and the pciehp_force
module parameter is used.
- pciehp_resume() enables or disables the slot after system sleep.
- pciehp_queue_pushbutton_work() enables or disables the slot after the
5 second delay following an Attention Button press.
- pciehp_sysfs_enable_slot() and pciehp_sysfs_disable_slot() enable or
disable the slot on sysfs write.
This requires locking and complicates pciehp's state machine.
A simplification can be achieved by enabling and disabling the slot
exclusively from the IRQ thread.
Amend the functions listed above to request slot enable/disablement from
the IRQ thread by either synthesizing a Presence Detect Changed event or,
in the case of a disable user request (via sysfs or an Attention Button
press), submitting a newly introduced force disable request. The latter
is needed because the slot shall be forced off despite being occupied.
For this force disable request, avoid colliding with Slot Status register
bits by using a bit number greater than 16.
For synchronous execution of requests (on sysfs write), wait for the
request to finish and retrieve the result. There can only ever be one
sysfs write in flight due to the locking in kernfs_fop_write(), hence
there is no risk of returning the result of a different sysfs request to
user space.
The POWERON_STATE and POWEROFF_STATE is now no longer entered by the
above-listed functions, but solely by the IRQ thread when it begins a
power transition. Afterwards, it moves to STATIC_STATE. The same
applies to canceling the Attention Button work, it likewise becomes an
IRQ thread only operation.
An immediate consequence is that the POWERON_STATE and POWEROFF_STATE is
never observed by the IRQ thread itself, only by functions called in a
different context, such as pciehp_sysfs_enable_slot(). So remove
handling of these states from pciehp_handle_button_press() and
pciehp_handle_link_change() which are exclusively called from the IRQ
thread.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
handle_button_press_event() currently determines whether the slot has
been turned on or off by looking at the Power Controller Control bit in
the Slot Control register. This assumes that an attention button
implies presence of a power controller even though that's not mandated
by the spec. Moreover the Power Controller Control bit is unreliable
when a power fault occurs (PCIe r4.0, sec 6.7.1.8). This issue has
existed since the driver was introduced in 2004.
Fix by replacing STATIC_STATE with ON_STATE and OFF_STATE and tracking
whether the slot has been turned on or off. This is also a required
ingredient to make pciehp resilient to missed events, which is the
object of an upcoming commit.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Previously the slot workqueue was used to handle events and enable or
disable the slot. That's no longer the case as those tasks are done
synchronously in the IRQ thread. The slot workqueue is thus merely used
to handle a button press after the 5 second delay and only one such work
item may be in flight at any given time. A separate workqueue isn't
necessary for this simple task, so use the system workqueue instead.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Up until now, pciehp's IRQ handler schedules a work item for each event,
which in turn schedules a work item to enable or disable the slot. This
double indirection was necessary because sleeping wasn't allowed in the
IRQ handler.
However it is now that pciehp has been converted to threaded IRQ handling
and polling, so handle events synchronously in pciehp_ist() and remove
the work item infrastructure (with the exception of work items to handle
a button press after the 5 second delay).
For link or presence change events, move the register read to determine
the current link or presence state behind acquisition of the slot lock
to prevent it from becoming stale while the lock is contended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
If the attention button is pressed to power on the slot AND the user
powers on the slot via sysfs before 5 seconds have elapsed AND powering
on the slot fails because either the slot is unoccupied OR the latch is
open, we neglect turning off the green LED so it keeps on blinking.
That's because the error path of pciehp_sysfs_enable_slot() doesn't call
pciehp_green_led_off(), unlike pciehp_power_thread() which does.
The bug has been present since 2004 when the driver was introduced.
Fix by deduplicating common code in pciehp_sysfs_enable_slot() and
pciehp_power_thread() into a wrapper function pciehp_enable_slot() and
renaming the existing function to __pciehp_enable_slot(). Same for
pciehp_disable_slot(). This will also simplify the upcoming rework of
pciehp's event handling.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Since commit 0f4bd8014d ("PCI: hotplug: Drop checking of PCI_BRIDGE_
CONTROL in *_unconfigure_device()"), pciehp_unconfigure_device() can no
longer fail, so declare it and its sole caller remove_board() void, in
keeping with the usual kernel pattern that enablement can fail, but
disablement cannot. No functional change intended.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
pciehp_disable_slot() checks if the ctrl attribute of the slot is NULL
and bails out if so. However the function is not called prior to the
attribute being set in pcie_init_slot(), and pcie_init_slot() is not
called if ctrl is NULL. So the check is unnecessary. Drop it.
It has been present ever since the driver was introduced in 2004, but it
was already unnecessary back then:
https://git.kernel.org/tglx/history/c/c16b4b14d980
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Add SPDX GPL-2.0+ to all PCI files that specified the GPL and allowed
either GPL version 2 or any later version.
Remove the boilerplate GPL version 2 or later language, relying on the
assertion in b24413180f ("License cleanup: add SPDX GPL-2.0 license
identifier to files with no license") that the SPDX identifier may be used
instead of the full boilerplate text.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A surprise link down may retrain very quickly causing the same slot
generate a link up event before handling the link down event completes.
Since the link is active, the power off work queued from the first link
down will cause a second down event when power is disabled. However, the
link up event sets the slot state to POWERON_STATE before the event to
handle this is enqueued, making the second down event believe it needs to
do something.
This creates constant link up and down event cycle.
To prevent this it is better to handle each event at the time in order it
occurred, so change the driver to use ordered workqueue instead.
A normal device hotplug triggers two events (presense detect and link up)
that are already handled properly in the driver but we currently log an
error if we find an existing device in the slot. Since this is not an error
change the log level to be debug instead to avoid scaring users.
This is based on the original work by Ashok Raj.
Link: https://patchwork.kernel.org/patch/9469023
Suggested-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* pci/pm:
x86/platform/intel-mid: Constify mid_pci_platform_pm
PCI: pciehp: Add runtime PM support for PCIe hotplug ports
ACPI / hotplug / PCI: Make device_is_managed_by_native_pciehp() public
ACPI / hotplug / PCI: Use cached copy of PCI_EXP_SLTCAP_HPC bit
PCI: Unfold conditions to block runtime PM on PCIe ports
PCI: Consolidate conditions to allow runtime PM on PCIe ports
PCI: Activate runtime PM on a PCIe port only if it can suspend
PCI: Speed up algorithm in pci_bridge_d3_update()
PCI: Autosense device removal in pci_bridge_d3_update()
PCI: Don't acquire ref on parent in pci_bridge_d3_update()
USB: UHCI: report non-PME wakeup signalling for Intel hardware
PCI: Check for PME in targeted sleep state
If an error occurs when enabling a slot, pciehp_power_thread() turns off
the power indicator. But if the only error is that the slot was already
enabled, we should leave the power indicator on.
Return success if called to enable an already-enabled slot.
This is in the same spirit of the special handling for EEXISTS when
pciehp_configure_device() determines the slot devices already exist.
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Linux 4.8 added support for runtime suspending PCIe ports to D3hot with
commit 006d44e49a ("PCI: Add runtime PM support for PCIe ports"), but
excluded hotplug ports. Those are now afforded runtime PM by the present
commit.
Hotplug ports require a few extra considerations:
- The configuration space of the port remains accessible in D3hot, so all
the functions to read or modify the Slot Status and Slot Control
registers need not be modified. Even turning on slot power doesn't seem
to require the port to be in D0, at least the PCIe spec doesn't say so
and I confirmed that by testing with a Thunderbolt controller.
- However D0 is required to access devices on the secondary bus. This
happens in pciehp_check_link_status() and pciehp_configure_device() (both
called from board_added()) and in pciehp_unconfigure_device() (called
from remove_board()), so acquire a runtime PM ref for their invocation.
- The hotplug port stays active as long as it has active children. If all
hotplugged devices below the port runtime suspend, the port is allowed to
runtime suspend as well. Plug and unplug detection continues to work in
D3hot.
- Hotplug interrupts are delivered in-band, so while the hotplug port
itself is allowed to go to D3hot, its parent ports must stay in D0 for
interrupts to come through. Add a corresponding restriction to
pci_dev_check_d3cold().
- Runtime PM may only be allowed if the hotplug port is handled natively by
the OS. On ACPI systems, the port may alternatively be handled by the
firmware and things break if the OS puts the port into D3 behind the
firmware's back: E.g. Thunderbolt hotplug ports on non-Macs are handled
by Intel's firmware in System Management Mode and the firmware is known
to access devices on the port's secondary bus without checking first if
the port is in D0: https://bugzilla.kernel.org/show_bug.cgi?id=53811
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: Mika Westerberg <mika.westerberg@linux.intel.com>
Long ago, we updated a "switch_save" field based on the latch status. But
switch_save was unused, and ed6cbcf2ac ("[PATCH] pciehp: miscellaneous
cleanups") removed it.
We no longer use the latch status, so remove calls to
pciehp_get_latch_status(). No functional change intended.
Tested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Print slot name consistently as "Slot(%s)". I don't know whether that's
ideal, but we can at least do it the same way all the time. No functional
change intended.
Tested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Previously we read Slot Status when handling a surprise event. But Slot
Status might have changed since we identified the event, and the event_type
already tells us whether to enable or disable the slot, so there's no need
to read it again.
Remove handle_surprise_event() and queue the power work directly.
[bhelgaas: changelog]
Tested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mayurkumar Patel <mayurkumar.patel@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Rajat Jain <rajatxjain@gmail.com>
Clear the LED attention status after a successful device add. It is
possible the attention LED was on from a previous power fault or link
failure, and a subsequent successful device insert insertion should clear
it.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When called from pciehp_sysfs_disable_slot(), the call to
pciehp_disable_slot() was not protected by the hotplug mutex.
Hold slot->hotplug_lock while calling pciehp_disable_slot().
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rajat Jain <rajatxjain@gmail.com>
Up to now, work items to be queued to be handled by pciehp_power_thread()
are allocated using kmalloc() in three different locations. If not needed,
kfree() is called to free the allocated data.
Introduce a separate function to allocate the work item and queue it, and
call it only if needed. This reduces code duplication and avoids having to
free memory if the work item does not need to get executed.
[bhelgaas: tweak "no memory" message, make pciehp_queue_power_work() static]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
The pciehp_handle_*() functions (pciehp_handle_attention_button(), etc.)
only contain a line or two of useful code, so it's clumsy to put
them in separate functions. All they so is add an event to a work queue,
and it's clearer to see that directly in the ISR.
Inline them directly into pcie_isr(). No functional change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rajat Jain <rajatja@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Rename queue_interrupt_event() to pciehp_queue_interrupt_event() so we can
make it extern and call it from pcie_isr().
No functional change.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rajat Jain <rajatja@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
Nobody looks at the return value from queue_interrupt_event(), so errors
were silently ignored. Convert it to a "void" function and note the error
in the dmesg log.
No functional change except the new message.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rajat Jain <rajatja@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
The pciehp debug logging is overly verbose and often redundant. Almost all
of the information printed by dbg_ctrl() is also printed by the normal PCI
core enumeration code and by pcie_init().
Remove the redundant debug info.
When claiming a pciehp bridge, we print the slot characteristics, e.g.,
Slot #6 AttnBtn- AttnInd- PwrInd- PwrCtrl- MRL- Interlock- NoCompl+ LLActRep+
Add the Hot-Plug Capable and Hot-Plug Surprise bits to this information,
and print it all in the same order as lspci does.
No functional change except the message text changes.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Rajat Jain <rajatja@google.com>
Acked-by: Yinghai Lu <yinghai@kernel.org>
The PCIe spec (r3.0, sec 7.8.9) says Hot-Plug Surprise indicates support
for surprise *removal*, but pciehp checked this to determine if it should
handle presence detect interrupts for device *addition*.
Allow surprise device addition even if the slot doesn't advertise support
for surprise removal.
Keith has a platform with slots for front-loading SFF devices. The slots
do not have attention buttons and do not support surprise removal, but they
do have presence detect. In that case, we still want to use presence
detect for device addition.
Keith's original patch handled surprise insertions only if Hot-Plug Capable
is set. I think that test is superfluous because pciehp only claims slots
that advertise Hot-Plug Capable (see get_port_device_capability()).
Link: http://lkml.kernel.org/r/1419275223-14602-1-git-send-email-keith.busch@intel.com
Based-on-patch-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rajat Jain <rajatxjain@gmail.com>
Merge quoted strings that are broken across lines into a single entity.
The compiler merges them anyway, but checkpatch complains about it, and
merging them makes it easier to grep for strings.
No functional change.
[bhelgaas: changelog, do the same for everything under drivers/pci]
Signed-off-by: Ryan Desfosses <ryan@desfo.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Fix various whitespace errors.
No functional change.
[bhelgaas: fix other similar problems]
Signed-off-by: Ryan Desfosses <ryan@desfo.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>