Some IRQ chips may be located in a power domain outside of the CPU
subsystem and hence will require device specific runtime power
management. In order to support such IRQ chips, add a pointer for a
device structure to the irq_chip structure, and if this pointer is
populated by the IRQ chip driver and CONFIG_PM is selected in the kernel
configuration, then the pm_runtime_get/put APIs for this chip will be
called when an IRQ is requested/freed, respectively.
Reviewed-by: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
As we now do for non-percpu interrupt, perform a lookup of the
interrupt trigger if the user doesn't supply one. The difference
here is that we can only do it at enable time (trigger configuration
can be per-cpu as well).
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
For some devices the IRQ trigger type for a device is read from
firmware, such as device-tree. The IRQ trigger type is typically read
when the mapping for IRQ is created, which is before the IRQ is
requested. Hence, the IRQ trigger type is programmed when mapping the
IRQ and not when requesting the IRQ.
Although this works for most cases, in order to support IRQ chips which
require runtime power management, which may not be accessible prior
to requesting the IRQ, it is desirable to look-up the IRQ trigger type
when it is requested. Therefore, if the IRQ trigger type is not
specified when __setup_irq() is called, look-up the saved IRQ trigger
type. This will allow us to defer the programming of the trigger type
from when the IRQ is mapped to when it is actually requested.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
In the function, setup_irq(), we don't check that the descriptor
returned from irq_to_desc() is valid before we start using it. For
example chip_bus_lock() called from setup_irq(), assumes that the
descriptor pointer is valid and doesn't check before dereferencing it.
In many other functions including setup/free_percpu_irq() we do check
that the descriptor returned is not NULL and therefore add the same test
to setup_irq() to ensure the descriptor returned is valid.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Use the more common logging method with the eventual goal of removing
pr_warning altogether.
Miscellanea:
- Realign arguments
- Coalesce formats
- Add missing space between a few coalesced formats
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [kernel/power/suspend.c]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Redesign of cpufreq governors and the intel_pstate driver to
make them use callbacks invoked by the scheduler to trigger CPU
frequency evaluation instead of using per-CPU deferrable timers
for that purpose (Rafael Wysocki).
- Reorganization and cleanup of cpufreq governor code to make it
more straightforward and fix some concurrency problems in it
(Rafael Wysocki, Viresh Kumar).
- Cleanup and improvements of locking in the cpufreq core (Viresh
Kumar).
- Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
Kumar, Eric Biggers).
- intel_pstate driver updates including fixes, optimizations and a
modification to make it enable enable hardware-coordinated P-state
selection (HWP) by default if supported by the processor (Philippe
Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
Franciosi).
- Operating Performance Points (OPP) framework updates to improve
its handling of voltage regulators and device clocks and updates
of the cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).
- Updates of the powernv cpufreq driver to fix initialization
and cleanup problems in it and correct its worker thread handling
with respect to CPU offline, new powernv_throttle tracepoint
(Shilpasri Bhat).
- ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).
- ACPICA updates including one fix for a regression introduced
by previos changes in the ACPICA code (Bob Moore, Lv Zheng,
David Box, Colin Ian King).
- Support for installing ACPI tables from initrd (Lv Zheng).
- Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
Chaugule).
- Support for _HID(ACPI0010) devices (ACPI processor containers)
and ACPI processor driver cleanups (Sudeep Holla).
- Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
Aleksey Makarov).
- Modification of the ACPI PCI IRQ management code to make it treat
255 in the Interrupt Line register as "not connected" on x86 (as
per the specification) and avoid attempts to use that value as
a valid interrupt vector (Chen Fan).
- ACPI APEI fixes related to resource leaks (Josh Hunt).
- Removal of modularity from a few ACPI drivers (BGRT, GHES,
intel_pmic_crc) that cannot be built as modules in practice (Paul
Gortmaker).
- PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
as a valid resource type (Harb Abdulhamid).
- New device ID (future AMD I2C controller) in the ACPI driver for
AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).
- Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).
- cpuidle menu governor optimization to avoid a square root
computation in it (Rasmus Villemoes).
- Fix for potential use-after-free in the generic device properties
framework (Heikki Krogerus).
- Updates of the generic power domains (genpd) framework including
support for multiple power states of a domain, fixes and debugfs
output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
Geert Uytterhoeven).
- Intel RAPL power capping driver updates to reduce IPI overhead in
it (Jacob Pan).
- System suspend/hibernation code cleanups (Eric Biggers, Saurabh
Sengar).
- Year 2038 fix for the process freezer (Abhilash Jindal).
- turbostat utility updates including new features (decoding of more
registers and CPUID fields, sub-second intervals support, GFX MHz
and RC6 printout, --out command line option), fixes (syscall jitter
detection and workaround, reductioin of the number of syscalls made,
fixes related to Xeon x200 processors, compiler warning fixes) and
cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJW50NXAAoJEILEb/54YlRxvr8QAIktC9+ft0y5AmU46hDcBWcK
QutyWJL9X9BS6DWBJZA2qclDYFmhMfi5Fza1se0gQ9TnLB/KrBwHWLsiYoTsb1k+
nPKf214aPk+qAhkVuyB4leNWML9Qz9n9jwku/EYxWWpgtbSRf3+0ioIKZeWWc/8V
JvuaOu4O+g/tkmL7QTrnGWBwhIIssAAV85QPsHkx+g68MrCj4UMMzm7z9G21SPXX
bmP8yIHsczX/XnRsY0W2NSno7Vdk6ImHpDJ26IAZg28WRNPWICHgGYHvB0TTWMvb
tts+yqfF7/7QLRjT/M8k9CzDBDE/DnVqoZ0fNJ+aYr7hNKF32mtAN+jH9ZB9dl/P
fEFapJkPxnWyzAoVoB9Dz0rkcZkYMlbxlLWzUGpaPq0JflUUTzLk0ApSjmMn4HRO
UddwCDdyHTaYThp3gn6GbOb0pIP0SdOVbI1M2QV2x/4PLcT2Ft8Np1+1RFWOeinZ
Bdl9AE890big0808mqbBzw/buETwr9FjHtCdDPXpP0vJpkBLu3nIYRNb0LCt39es
mWMp6dFhGgvGj3D3ahTuV3GI8hdpDkh9SObexa11RCjkTKrXcwEmFxHxLeFXwKYq
alG278bo6cSChRMziS1lis+W/3tsJRN4TXUSv1PPzJHrFgptQVFRStU9ngBKP+pN
WB+itPc4Fw0YHOrAFsrx
=cfty
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI updates from Rafael Wysocki:
"This time the majority of changes go into cpufreq and they are
significant.
First off, the way CPU frequency updates are triggered is different
now. Instead of having to set up and manage a deferrable timer for
each CPU in the system to evaluate and possibly change its frequency
periodically, cpufreq governors set up callbacks to be invoked by the
scheduler on a regular basis (basically on utilization updates). The
"old" governors, "ondemand" and "conservative", still do all of their
work in process context (although that is triggered by the scheduler
now), but intel_pstate does it all in the callback invoked by the
scheduler with no need for any additional asynchronous processing.
Of course, this eliminates the overhead related to the management of
all those timers, but also it allows the cpufreq governor code to be
simplified quite a bit. On top of that, the common code and data
structures used by the "ondemand" and "conservative" governors are
cleaned up and made more straightforward and some long-standing and
quite annoying problems are addressed. In particular, the handling of
governor sysfs attributes is modified and the related locking becomes
more fine grained which allows some concurrency problems to be avoided
(particularly deadlocks with the core cpufreq code).
In principle, the new mechanism for triggering frequency updates
allows utilization information to be passed from the scheduler to
cpufreq. Although the current code doesn't make use of it, in the
works is a new cpufreq governor that will make decisions based on the
scheduler's utilization data. That should allow the scheduler and
cpufreq to work more closely together in the long run.
In addition to the core and governor changes, cpufreq drivers are
updated too. Fixes and optimizations go into intel_pstate, the
cpufreq-dt driver is updated on top of some modification in the
Operating Performance Points (OPP) framework and there are fixes and
other updates in the powernv cpufreq driver.
Apart from the cpufreq updates there is some new ACPICA material,
including a fix for a problem introduced by previous ACPICA updates,
and some less significant changes in the ACPI code, like CPPC code
optimizations, ACPI processor driver cleanups and support for loading
ACPI tables from initrd.
Also updated are the generic power domains framework, the Intel RAPL
power capping driver and the turbostat utility and we have a bunch of
traditional assorted fixes and cleanups.
Specifics:
- Redesign of cpufreq governors and the intel_pstate driver to make
them use callbacks invoked by the scheduler to trigger CPU
frequency evaluation instead of using per-CPU deferrable timers for
that purpose (Rafael Wysocki).
- Reorganization and cleanup of cpufreq governor code to make it more
straightforward and fix some concurrency problems in it (Rafael
Wysocki, Viresh Kumar).
- Cleanup and improvements of locking in the cpufreq core (Viresh
Kumar).
- Assorted cleanups in the cpufreq core (Rafael Wysocki, Viresh
Kumar, Eric Biggers).
- intel_pstate driver updates including fixes, optimizations and a
modification to make it enable enable hardware-coordinated P-state
selection (HWP) by default if supported by the processor (Philippe
Longepe, Srinivas Pandruvada, Rafael Wysocki, Viresh Kumar, Felipe
Franciosi).
- Operating Performance Points (OPP) framework updates to improve its
handling of voltage regulators and device clocks and updates of the
cpufreq-dt driver on top of that (Viresh Kumar, Jon Hunter).
- Updates of the powernv cpufreq driver to fix initialization and
cleanup problems in it and correct its worker thread handling with
respect to CPU offline, new powernv_throttle tracepoint (Shilpasri
Bhat).
- ACPI cpufreq driver optimization and cleanup (Rafael Wysocki).
- ACPICA updates including one fix for a regression introduced by
previos changes in the ACPICA code (Bob Moore, Lv Zheng, David Box,
Colin Ian King).
- Support for installing ACPI tables from initrd (Lv Zheng).
- Optimizations of the ACPI CPPC code (Prashanth Prakash, Ashwin
Chaugule).
- Support for _HID(ACPI0010) devices (ACPI processor containers) and
ACPI processor driver cleanups (Sudeep Holla).
- Support for ACPI-based enumeration of the AMBA bus (Graeme Gregory,
Aleksey Makarov).
- Modification of the ACPI PCI IRQ management code to make it treat
255 in the Interrupt Line register as "not connected" on x86 (as
per the specification) and avoid attempts to use that value as a
valid interrupt vector (Chen Fan).
- ACPI APEI fixes related to resource leaks (Josh Hunt).
- Removal of modularity from a few ACPI drivers (BGRT, GHES,
intel_pmic_crc) that cannot be built as modules in practice (Paul
Gortmaker).
- PNP framework update to make it treat ACPI_RESOURCE_TYPE_SERIAL_BUS
as a valid resource type (Harb Abdulhamid).
- New device ID (future AMD I2C controller) in the ACPI driver for
AMD SoCs (APD) and in the designware I2C driver (Xiangliang Yu).
- Assorted ACPI cleanups (Colin Ian King, Kaiyen Chang, Oleg Drokin).
- cpuidle menu governor optimization to avoid a square root
computation in it (Rasmus Villemoes).
- Fix for potential use-after-free in the generic device properties
framework (Heikki Krogerus).
- Updates of the generic power domains (genpd) framework including
support for multiple power states of a domain, fixes and debugfs
output improvements (Axel Haslam, Jon Hunter, Laurent Pinchart,
Geert Uytterhoeven).
- Intel RAPL power capping driver updates to reduce IPI overhead in
it (Jacob Pan).
- System suspend/hibernation code cleanups (Eric Biggers, Saurabh
Sengar).
- Year 2038 fix for the process freezer (Abhilash Jindal).
- turbostat utility updates including new features (decoding of more
registers and CPUID fields, sub-second intervals support, GFX MHz
and RC6 printout, --out command line option), fixes (syscall jitter
detection and workaround, reductioin of the number of syscalls
made, fixes related to Xeon x200 processors, compiler warning
fixes) and cleanups (Len Brown, Hubert Chrzaniuk, Chen Yu)"
* tag 'pm+acpi-4.6-rc1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (182 commits)
tools/power turbostat: bugfix: TDP MSRs print bits fixing
tools/power turbostat: correct output for MSR_NHM_SNB_PKG_CST_CFG_CTL dump
tools/power turbostat: call __cpuid() instead of __get_cpuid()
tools/power turbostat: indicate SMX and SGX support
tools/power turbostat: detect and work around syscall jitter
tools/power turbostat: show GFX%rc6
tools/power turbostat: show GFXMHz
tools/power turbostat: show IRQs per CPU
tools/power turbostat: make fewer systems calls
tools/power turbostat: fix compiler warnings
tools/power turbostat: add --out option for saving output in a file
tools/power turbostat: re-name "%Busy" field to "Busy%"
tools/power turbostat: Intel Xeon x200: fix turbo-ratio decoding
tools/power turbostat: Intel Xeon x200: fix erroneous bclk value
tools/power turbostat: allow sub-sec intervals
ACPI / APEI: ERST: Fixed leaked resources in erst_init
ACPI / APEI: Fix leaked resources
intel_pstate: Do not skip samples partially
intel_pstate: Remove freq calculation from intel_pstate_calc_busy()
intel_pstate: Move intel_pstate_calc_busy() into get_target_pstate_use_performance()
...
Per the x86-specific footnote to PCI spec r3.0, sec 6.2.4, the value 255 in
the Interrupt Line register means "unknown" or "no connection."
Previously, when we couldn't derive an IRQ from the _PRT, we fell back to
using the value from Interrupt Line as an IRQ. It's questionable whether
we should do that at all, but the spec clearly suggests we shouldn't do it
for the value 255 on x86.
Calling request_irq() with IRQ 255 may succeed, but the driver won't
receive any interrupts. Or, if IRQ 255 is shared with another device, it
may succeed, and the driver's ISR will be called at random times when the
*other* device interrupts. Or it may fail if another device is using IRQ
255 with incompatible flags. What we *want* is for request_irq() to fail
predictably so the driver can fall back to polling.
On x86, assume 255 in the Interrupt Line means the INTx line is not
connected. In that case, set dev->irq to IRQ_NOTCONNECTED so request_irq()
will fail gracefully with -ENOTCONN.
We found this problem on a system where Secure Boot firmware assigned
Interrupt Line 255 to an i801_smbus device and another device was already
using MSI-X IRQ 255. This was in v3.10, where i801_probe() fails if
request_irq() fails:
i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143)
i801_smbus 0000:00:1f.3: can't derive routing for PCI INT C
i801_smbus 0000:00:1f.3: PCI INT C: no GSI
genirq: Flags mismatch irq 255. 00000080 (i801_smbus) vs. 00000000 (megasa)
CPU: 0 PID: 2487 Comm: kworker/0:1 Not tainted 3.10.0-229.el7.x86_64 #1
Hardware name: FUJITSU PRIMEQUEST 2800E2/D3736, BIOS PRIMEQUEST 2000 Serie5
Call Trace:
dump_stack+0x19/0x1b
__setup_irq+0x54a/0x570
request_threaded_irq+0xcc/0x170
i801_probe+0x32f/0x508 [i2c_i801]
local_pci_probe+0x45/0xa0
i801_smbus 0000:00:1f.3: Failed to allocate irq 255: -16
i801_smbus: probe of 0000:00:1f.3 failed with error -16
After aeb8a3d16a ("i2c: i801: Check if interrupts are disabled"),
i801_probe() will fall back to polling if request_irq() fails. But we
still need this patch because request_irq() may succeed or fail depending
on other devices in the system. If request_irq() fails, i801_smbus will
work by falling back to polling, but if it succeeds, i801_smbus won't work
because it expects interrupts that it may not receive.
Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The irq code browses the list of actions differently to inspect the element
one by one. Even if it is not a problem, for the sake of consistent code,
provide a macro similar to for_each_irq_desc in order to have the same loop to
go through the actions list and use it in the code.
[ tglx: Renamed the macro ]
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: http://lkml.kernel.org/r/1452765253-31148-1-git-send-email-daniel.lezcano@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull irq updates from Thomas Gleixner:
"The irq department provides:
- Support for MSI to wire bridges and a first user of it
- More ACPI support for ARM/GIC
- A new TS-4800 interrupt controller driver
- RCU based free of interrupt descriptors to support the upcoming
Intel VMD technology without introducing a locking nightmare
- The usual pile of fixes and updates to drivers and core code"
* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
irqchip/omap-intc: Add support for spurious irq handling
irqchip/zevio: Use irq_data_get_chip_type() helper
irqchip/omap-intc: Remove duplicate setup for IRQ chip type handler
irqchip/ts4800: Add TS-4800 interrupt controller
irqchip/ts4800: Add documentation for TS-4800 interrupt controller
irq/platform-MSI: Increase the maximum MSIs the MSI framework can support
irqchip/gicv2m: Miscellaneous fixes for v2m resources and SPI ranges
irqchip/bcm2836: Make code more readable
irqchip/bcm2836: Tolerate IRQs while no flag is set in ISR
irqchip/bcm2836: Add SMP support for the 2836
irqchip/bcm2836: Fix initialization of the LOCAL_IRQ_CNT timers
irqchip/gic-v2m: acpi: Introducing GICv2m ACPI support
irqchip/gic-v2m: Refactor to prepare for ACPI support
irqdomain: Introduce is_fwnode_irqchip helper
acpi: pci: Setup MSI domain for ACPI based pci devices
genirq/msi: Export functions to allow MSI domains in modules
irqchip/mbigen: Implement the mbigen irq chip operation functions
irqchip/mbigen: Create irq domain for each mbigen device
irqchip/mgigen: Add platform device driver for mbigen device
dt-bindings: Documents the mbigen bindings
...
If a interrupt chip utilizes chip->buslock then free_irq() can
deadlock in the following way:
CPU0 CPU1
interrupt(X) (Shared or spurious)
free_irq(X) interrupt_thread(X)
chip_bus_lock(X)
irq_finalize_oneshot(X)
chip_bus_lock(X)
synchronize_irq(X)
synchronize_irq() waits for the interrupt thread to complete,
i.e. forever.
Solution is simple: Drop chip_bus_lock() before calling
synchronize_irq() as we do with the irq_desc lock. There is nothing to
be protected after the point where irq_desc lock has been released.
This adds chip_bus_lock/unlock() to the remove_irq() code path, but
that's actually correct in the case where remove_irq() is called on
such an interrupt. The current users of remove_irq() are not affected
as none of those interrupts is on a chip which requires buslock.
Reported-by: Fredrik Markström <fredrik.markstrom@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Certain interrupt controller drivers have a register set that does not
make it easy to save/restore the mask of enabled/disabled interrupts
at suspend/resume time. At resume time, such drivers rely on the core
kernel irq subsystem to tell whether such or such interrupt is enabled
or not, in order to restore the proper state in the interrupt
controller register.
While the irqd_irq_disabled() provides the relevant information for
global interrupts, there is no similar function to query the
enabled/disabled state of a per-CPU interrupt.
Therefore, this commit complements the percpu_irq API with an
irq_percpu_is_enabled() function.
[ tglx: Simplified the implementation and added kerneldoc ]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Tawfik Bayouk <tawfik@marvell.com>
Cc: Nadav Haklai <nadavh@marvell.com>
Cc: Lior Amsalem <alior@marvell.com>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Cc: Gregory Clement <gregory.clement@free-electrons.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Link: http://lkml.kernel.org/r/1445347435-2333-2-git-send-email-thomas.petazzoni@free-electrons.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull networking updates from David Miller:
Changes of note:
1) Allow to schedule ICMP packets in IPVS, from Alex Gartrell.
2) Provide FIB table ID in ipv4 route dumps just as ipv6 does, from
David Ahern.
3) Allow the user to ask for the statistics to be filtered out of
ipv4/ipv6 address netlink dumps. From Sowmini Varadhan.
4) More work to pass the network namespace context around deep into
various packet path APIs, starting with the netfilter hooks. From
Eric W Biederman.
5) Add layer 2 TX/RX checksum offloading to qeth driver, from Thomas
Richter.
6) Use usec resolution for SYN/ACK RTTs in TCP, from Yuchung Cheng.
7) Support Very High Throughput in wireless MESH code, from Bob
Copeland.
8) Allow setting the ageing_time in switchdev/rocker. From Scott
Feldman.
9) Properly autoload L2TP type modules, from Stephen Hemminger.
10) Fix and enable offload features by default in 8139cp driver, from
David Woodhouse.
11) Support both ipv4 and ipv6 sockets in a single vxlan device, from
Jiri Benc.
12) Fix CWND limiting of thin streams in TCP, from Bendik Rønning
Opstad.
13) Fix IPSEC flowcache overflows on large systems, from Steffen
Klassert.
14) Convert bridging to track VLANs using rhashtable entries rather than
a bitmap. From Nikolay Aleksandrov.
15) Make TCP listener handling completely lockless, this is a major
accomplishment. Incoming request sockets now live in the
established hash table just like any other socket too.
From Eric Dumazet.
15) Provide more bridging attributes to netlink, from Nikolay
Aleksandrov.
16) Use hash based algorithm for ipv4 multipath routing, this was very
long overdue. From Peter Nørlund.
17) Several y2038 cures, mostly avoiding timespec. From Arnd Bergmann.
18) Allow non-root execution of EBPF programs, from Alexei Starovoitov.
19) Support SO_INCOMING_CPU as setsockopt, from Eric Dumazet. This
influences the port binding selection logic used by SO_REUSEPORT.
20) Add ipv6 support to VRF, from David Ahern.
21) Add support for Mellanox Spectrum switch ASIC, from Jiri Pirko.
22) Add rtl8xxxu Realtek wireless driver, from Jes Sorensen.
23) Implement RACK loss recovery in TCP, from Yuchung Cheng.
24) Support multipath routes in MPLS, from Roopa Prabhu.
25) Fix POLLOUT notification for listening sockets in AF_UNIX, from Eric
Dumazet.
26) Add new QED Qlogic river, from Yuval Mintz, Manish Chopra, and
Sudarsana Kalluru.
27) Don't fetch timestamps on AF_UNIX sockets, from Hannes Frederic
Sowa.
28) Support ipv6 geneve tunnels, from John W Linville.
29) Add flood control support to switchdev layer, from Ido Schimmel.
30) Fix CHECKSUM_PARTIAL handling of potentially fragmented frames, from
Hannes Frederic Sowa.
31) Support persistent maps and progs in bpf, from Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1790 commits)
sh_eth: use DMA barriers
switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion
net: sched: kill dead code in sch_choke.c
irda: Delete an unnecessary check before the function call "irlmp_unregister_service"
net: dsa: mv88e6xxx: include DSA ports in VLANs
net: dsa: mv88e6xxx: disable SA learning for DSA and CPU ports
net/core: fix for_each_netdev_feature
vlan: Invoke driver vlan hooks only if device is present
arcnet/com20020: add LEDS_CLASS dependency
bpf, verifier: annotate verbose printer with __printf
dp83640: Only wait for timestamps for packets with timestamping enabled.
ptp: Change ptp_class to a proper bitmask
dp83640: Prune rx timestamp list before reading from it
dp83640: Delay scheduled work.
dp83640: Include hash in timestamp/packet matching
ipv6: fix tunnel error handling
net/mlx5e: Fix LSO vlan insertion
net/mlx5e: Re-eanble client vlan TX acceleration
net/mlx5e: Return error in case mlx5e_set_features() fails
net/mlx5e: Don't allow more than max supported channels
...
If an irq chip does not implement the irq_disable callback, then we
use a lazy approach for disabling the interrupt. That means that the
interrupt is marked disabled, but the interrupt line is not
immediately masked in the interrupt chip. It only becomes masked if
the interrupt is raised while it's marked disabled. We use this to avoid
possibly expensive mask/unmask operations for common case operations.
Unfortunately there are devices which do not allow the interrupt to be
disabled easily at the device level. They are forced to use
disable_irq_nosync(). This can result in taking each interrupt twice.
Instead of enforcing the non lazy mode on all interrupts of a irq
chip, provide a settings flag, which can be set by the driver for that
particular interrupt line.
Reported-and-tested-by: Duc Dang <dhdang@apm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1510092348370.6097@nanos
Some drivers might use the per-cpu interrupts and still might be built as a
module. Export request_percpu_irq an free_percpu_irq to these user, which
also make it consistent with enable/disable_percpu_irq that were exported.
Reported-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The documentation of request_percpu_irq is confusing and suggest that the
interrupt is not enabled at all, while it is actually enabled on the local
CPU.
Clarify that.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Force threading of interrupts does not really deal with interrupts
which are requested with a primary and a threaded handler. The current
policy is to leave them alone and let the primary handler run in
interrupt context, but we set the ONESHOT flag for those interrupts as
well.
Kohji Okuno debugged a problem with the SDHCI driver where the
interrupt thread waits for a hardware interrupt to trigger, which can't
work well because the hardware interrupt is masked due to the ONESHOT
flag being set. He proposed to set the ONESHOT flag only if the
interrupt does not provide a thread handler.
Though that does not work either because these interrupts can be
shared. So the other interrupt would rightfully get the ONESHOT flag
set and therefor the same situation would happen again.
To deal with this proper, we need to force thread the primary handler
of such interrupts as well. That means that the primary interrupt
handler is treated as any other primary interrupt handler which is not
marked IRQF_NO_THREAD. The threaded handler becomes a separate thread
so the SDHCI flow logic can be handled gracefully.
The same issue was reported against 4.1-rt.
Reported-and-tested-by: Kohji Okuno <okuno.kohji@jp.panasonic.com>
Reported-By: Michal Smucr <msmucr@gmail.com>
Reported-and-tested-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1509211058080.5606@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Irq affinity mask is per-irq instead of per irqchip, so move it into
struct irq_common_data.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/1433303281-27688-1-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Export these functions to be able to build the Qualcomm family A PMIC
gpio and mpp drivers as modules.
[ tglx: Made them GPL exports ]
Signed-off-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: <kernel-build-reports@lists.linaro.org>
Cc: <linaro-kernel@lists.linaro.org>
Cc: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Link: http://lkml.kernel.org/r/1437594184-22966-1-git-send-email-bjorn.andersson@sonymobile.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Unused except for the alpha wrapper, which can retrieve if from the
irq descriptor.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1433391238-19471-21-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Provide a irq_desc based variant of irq_can_set_affinity() to avoid a
redundant lookup for the core code users.
[ tglx: Split out from combo patch ]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Solely used for debug output. Can be retrieved from irq descriptor if
necessary.
[ tglx: Split out from combo patch ]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It's only required for debug output and can be retrieved from the irq
descriptor if necessary.
[ tglx: Split out from combo patch ]
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It's only used in the software resend case and can be retrieved from
irq_desc if necessary.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1433391238-19471-18-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Introduce helper function irq_data_get_node() and variants thereof to
hide struct irq_data implementation details.
Convert the core code to use them.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Kevin Cernekee <cernekee@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/1433145945-789-5-git-send-email-jiang.liu@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
With Posted-Interrupts support in Intel CPU and IOMMU, an external
interrupt from assigned-devices could be directly delivered to a
virtual CPU in a virtual machine. Instead of hacking KVM and Intel
IOMMU drivers, we propose a platform independent interface to target
an interrupt to a specific virtual CPU in a virtual machine, or set
virtual CPU affinity for an interrupt.
By adopting this new interface and the hierarchy irqdomain, we could
easily support posted-interrupts on Intel platforms, and also provide
flexible enough interfaces for other platforms to support similar
features.
Here is the usage scenario for this interface:
Guest update MSI/MSI-X interrupt configuration
-->QEMU and KVM handle this
-->KVM call this interface (passing posted interrupts descriptor
and guest vector)
-->irq core will transfer the control to IOMMU
-->IOMMU will do the real work of updating IRTE (IRTE has new
format for VT-d Posted-Interrupts)
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Signed-off-by: Feng Wu <feng.wu@intel.com>
Link: http://lkml.kernel.org/r/1432026437-16560-2-git-send-email-feng.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
There is a number of cases where a kernel subsystem may want to
introspect the state of an interrupt at the irqchip level:
- When a peripheral is shared between virtual machines,
its interrupt state becomes part of the guest's state,
and must be switched accordingly. KVM on arm/arm64 requires
this for its guest-visible timer
- Some GPIO controllers seem to require peeking into the
interrupt controller they are connected to to report
their internal state
This seem to be a pattern that is common enough for the core code
to try and support this without too many horrible hacks. Introduce
a pair of accessors (irq_get_irqchip_state/irq_set_irqchip_state)
to retrieve the bits that can be of interest to another subsystem:
pending, active, and masked.
- irq_get_irqchip_state returns the state of the interrupt according
to a parameter set to IRQCHIP_STATE_PENDING, IRQCHIP_STATE_ACTIVE,
IRQCHIP_STATE_MASKED or IRQCHIP_STATE_LINE_LEVEL.
- irq_set_irqchip_state similarly sets the state of the interrupt.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Tested-by: Bjorn Andersson <bjorn.andersson@sonymobile.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Phong Vo <pvo@apm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Tin Huynh <tnhuynh@apm.com>
Cc: Y Vo <yvo@apm.com>
Cc: Toan Le <toanle@apm.com>
Cc: Bjorn Andersson <bjorn@kryo.se>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/1426676484-21812-2-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It currently is required that all users of NO_SUSPEND interrupt
lines pass the IRQF_NO_SUSPEND flag when requesting the IRQ or the
WARN_ON_ONCE() in irq_pm_install_action() will trigger. That is
done to warn about situations in which unprepared interrupt handlers
may be run unnecessarily for suspended devices and may attempt to
access those devices by mistake. However, it may cause drivers
that have no technical reasons for using IRQF_NO_SUSPEND to set
that flag just because they happen to share the interrupt line
with something like a timer.
Moreover, the generic handling of wakeup interrupts introduced by
commit 9ce7a25849 (genirq: Simplify wakeup mechanism) only works
for IRQs without any NO_SUSPEND users, so the drivers of wakeup
devices needing to use shared NO_SUSPEND interrupt lines for
signaling system wakeup generally have to detect wakeup in their
interrupt handlers. Thus if they happen to share an interrupt line
with a NO_SUSPEND user, they also need to request that their
interrupt handlers be run after suspend_device_irqs().
In both cases the reason for using IRQF_NO_SUSPEND is not because
the driver in question has a genuine need to run its interrupt
handler after suspend_device_irqs(), but because it happens to
share the line with some other NO_SUSPEND user. Otherwise, the
driver would do without IRQF_NO_SUSPEND just fine.
To make it possible to specify that condition explicitly, introduce
a new IRQ action handler flag for shared IRQs, IRQF_COND_SUSPEND,
that, when set, will indicate to the IRQ core that the interrupt
user is generally fine with suspending the IRQ, but it also can
tolerate handler invocations after suspend_device_irqs() and, in
particular, it is capable of detecting system wakeup and triggering
it as appropriate from its interrupt handler.
That will allow us to work around a problem with a shared timer
interrupt line on at91 platforms.
Link: http://marc.info/?l=linux-kernel&m=142252777602084&w=2
Link: http://marc.info/?t=142252775300011&r=1&w=2
Link: https://lkml.org/lkml/2014/12/15/552
Reported-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
For things like netpoll there is a need to disable an interrupt from
atomic context. Currently netpoll uses disable_irq() which will
sleep-wait on threaded handlers and thus forced_irqthreads breaks
things.
Provide disable_hardirq(), which uses synchronize_hardirq() to only wait
for active hardirq handlers; also change synchronize_hardirq() to
return the status of threaded handlers.
This will allow one to try-disable an interrupt from atomic context, or
in case of request_threaded_irq() to only wait for the hardirq part.
Suggested-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Miller <davem@davemloft.net>
Cc: Eyal Perry <eyalpe@mellanox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Quentin Lambert <lambert.quentin@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Link: http://lkml.kernel.org/r/20150205130623.GH5029@twins.programming.kicks-ass.net
[ Fixed typos and such. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The recent set_affinity commit by me introduced some null
pointer dereferences on driver unload, because some drivers
call this function with a NULL argument. This fixes the issue
by just checking for null before setting the affinity mask.
Fixes: e2e64a9325 ("genirq: Set initial affinity in irq_set_affinity_hint()")
Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20150128185739.9689.84588.stgit@jbrandeb-cp2.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Problem:
The default behavior of the kernel is somewhat undesirable as all
requested interrupts end up on CPU0 after registration. A user can
run irqbalance daemon, or can manually configure smp_affinity via the
proc filesystem, but the default affinity of the interrupts for all
devices is always CPU zero, this can cause performance problems or
very heavy cpu use of only one core if not noticed and fixed by the
user.
Solution:
Enable the setting of the initial affinity directly when the driver
sets a hint.
This enabling means that kernel drivers can include an initial
affinity setting for the interrupt, instead of all interrupts starting
out life on CPU0. Of course if irqbalance is still running then the
interrupts will get moved as before.
This function is currently called by drivers in block, crypto,
infiniband, ethernet and scsi trees, but only a handful, so these will
be the devices affected by this change.
Tested on i40e, and default interrupts were spread across the CPUs
according to the hint.
drivers/block/mtip32xx/mtip32xx.c:3
drivers/block/nvme-core.c:2
drivers/crypto/qat/qat_dh895xcc/adf_isr.c:3
drivers/infiniband/hw/qib/qib_iba7322.c:2
drivers/net/ethernet/intel/i40e/i40e_main.c:3
drivers/net/ethernet/intel/i40evf/i40evf_main.c:3
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3
drivers/net/ethernet/mellanox/mlx4/en_cq.c:2
drivers/scsi/hpsa.c:3
drivers/scsi/lpfc/lpfc_init.c:3
drivers/scsi/megaraid/megaraid_sas_base.c:8
drivers/soc/ti/knav_qmss_acc.c:1
drivers/soc/ti/knav_qmss_queue.c:2
drivers/virtio/virtio_pci_common.c:2
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: netdev@vger.kernel.org
Link: http://lkml.kernel.org/r/20141219012206.4220.27491.stgit@jbrandeb-cp2.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Add IRQ_SET_MASK_OK_DONE in addition to IRQ_SET_MASK_OK and
IRQ_SET_MASK_OK_NOCOPY to support stacked irqchip. IRQ_SET_MASK_OK_DONE
is the same as IRQ_SET_MASK_OK to irq core. To stacked irqchip, it means
that ascendant irqchips have done all the work and no more handling
needed in descendant irqchips.
Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
Cc: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Account the IRQF_NO_SUSPEND and IRQF_RESUME_EARLY actions on shared
interrupt lines and yell loudly if there is a mismatch.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
No functional change. Preparatory patch for cleaning up the suspend
abort functionality. Update the comments while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Till reported that the spurious interrupt detection of threaded
interrupts is broken in two ways:
- note_interrupt() is called for each action thread of a shared
interrupt line. That's wrong as we are only interested whether none
of the device drivers felt responsible for the interrupt, but by
calling multiple times for a single interrupt line we account
IRQ_NONE even if one of the drivers felt responsible.
- note_interrupt() when called from the thread handler is not
serialized. That leaves the members of irq_desc which are used for
the spurious detection unprotected.
To solve this we need to defer the spurious detection of a threaded
interrupt to the next hardware interrupt context where we have
implicit serialization.
If note_interrupt is called with action_ret == IRQ_WAKE_THREAD, we
check whether the previous interrupt requested a deferred check. If
not, we request a deferred check for the next hardware interrupt and
return.
If set, we check whether one of the interrupt threads signaled
success. Depending on this information we feed the result into the
spurious detector.
If one primary handler of a shared interrupt returns IRQ_HANDLED we
disable the deferred check of irq threads on the same line, as we have
found at least one device driver who cared.
Reported-by: Till Straumann <strauman@slac.stanford.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Austin Schuh <austin@peloton-tech.com>
Cc: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: Wolfgang Grandegger <wg@grandegger.com>
Cc: Pavel Pisa <pisa@cmp.felk.cvut.cz>
Cc: Marc Kleine-Budde <mkl@pengutronix.de>
Cc: linux-can@vger.kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1303071450130.22263@ionos
The current implementation of irq_set_affinity() refuses rightfully to
route an interrupt to an offline cpu.
But there is a special case, where this is actually desired. Some of
the ARM SoCs have per cpu timers which require setting the affinity
during cpu startup where the cpu is not yet in the online mask.
If we can't do that, then the local timer interrupt for the about to
become online cpu is routed to some random online cpu.
The developers of the affected machines tried to work around that
issue, but that results in a massive mess in that timer code.
We have a yet unused argument in the set_affinity callbacks of the irq
chips, which I added back then for a similar reason. It was never
required so it got not used. But I'm happy that I never removed it.
That allows us to implement a sane handling of the above scenario. So
the affected SoC drivers can add the required force handling to their
interrupt chip, switch the timer code to irq_force_affinity() and
things just work.
This does not affect any existing user of irq_set_affinity().
Tagged for stable to allow a simple fix of the affected SoC clock
event drivers.
Reported-and-tested-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Tomasz Figa <t.figa@samsung.com>,
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>,
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: linux-arm-kernel@lists.infradead.org,
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20140416143315.717251504@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The flag is necessary for interrupt chips which require an ACK/EOI
after the handler has run. In case of threaded handlers this needs to
happen after the threaded handler has completed before the unmask of
the interrupt.
The flag is only unseful in combination with the handle_fasteoi_irq
flow control handler.
It can be combined with the flag IRQCHIP_EOI_IF_HANDLED, so the EOI is
not issued when the interrupt is disabled or in progress.
Tested-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-sunxi@googlegroups.com
Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
Link: http://lkml.kernel.org/r/1394733834-26839-2-git-send-email-hdegoede@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Merge the request/release callbacks which are in a separate branch for
consumption by the gpio folks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
For certain irq types, e.g. gpios, it's necessary to request resources
before starting up the irq.
This might fail so we cannot use the irq_startup() callback because we
might call the irq_set_type() callback before that which does not make
sense when the resource is not available. Calling irq_startup() before
irq_set_type() can lead to spurious interrupts which is not desired
either.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jean-Jacques Hiblot <jjhiblot@traphandler.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1403080857160.18573@ionos.tec.linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We hit one rare case below:
T1 calling disable_irq(), but hanging at synchronize_irq()
always;
The corresponding irq thread is in sleeping state;
And all CPUs are in idle state;
After analysis, we found there is one possible scenerio which
causes T1 is waiting there forever:
CPU0 CPU1
synchronize_irq()
wait_event()
spin_lock()
atomic_dec_and_test(&threads_active)
insert the __wait into queue
spin_unlock()
if(waitqueue_active)
atomic_read(&threads_active)
wake_up()
Here after inserted the __wait into queue on CPU0, and before
test if queue is empty on CPU1, there is no barrier, it maybe
cause it is not visible for CPU1 immediately, although CPU0 has
updated the queue list.
It is similar for CPU0 atomic_read() threads_active also.
So we'd need one smp_mb() before waitqueue_active.that, but removing
the waitqueue_active() check solves it as wel l and it makes
things simple and clear.
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Cc: Xiaoming Wang <xiaoming.wang@intel.com>
Link: http://lkml.kernel.org/r/1393212590-32543-1-git-send-email-chuansheng.liu@intel.com
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In course of the sdhci/sdio discussion with Russell about killing the
sdio kthread hackery we discovered the need to be able to wake an
interrupt thread from software.
The rationale for this is, that sdio hardware can lack proper
interrupt support for certain features. So the driver needs to poll
the status registers, but at the same time it needs to be woken up by
an hardware interrupt.
To be able to get rid of the home brewn kthread construct of sdio we
need a way to wake an irq thread independent of an actual hardware
interrupt.
Provide an irq_wake_thread() function which wakes up the thread which
is associated to a given dev_id. This allows sdio to invoke the irq
thread from the hardware irq handler via the IRQ_WAKE_THREAD return
value and provides a possibility to wake it via a timer for the
polling scenarios. That allows to simplify the sdio logic
significantly.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Chris Ball <chris@printf.net>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140215003823.772565780@linutronix.de
synchronize_irq() waits for hard irq and threaded handlers to complete
before returning. For some special cases we only need to make sure
that the hard interrupt part of the irq line is not in progress when
we disabled the - possibly shared - interrupt at the device level.
A proper use case for this was provided by Russell. The sdhci driver
requires some irq triggered functions to be run in thread context. The
current implementation of the thread context is a sdio private kthread
construct, which has quite some shortcomings. These can be avoided
when the thread is directly associated to the device interrupt via the
generic threaded irq infrastructure.
Though there is a corner case related to run time power management
where one side disables the device interrupts at the device level and
needs to make sure, that an already running hard interrupt handler has
completed before proceeding further. Though that hard interrupt
handler might wake the associated thread, which in turn can request
the runtime PM to reenable the device. Using synchronize_irq() leads
to an immediate deadlock of the irq thread waiting for the PM lock and
the synchronize_irq() waiting for the irq thread to complete.
Due to the fact that it is sufficient for this case to ensure that no
hard irq handler is executing a new function which avoids the check
for the thread is required.
Add a function, which just monitors the hard irq parts and ignores the
threaded handlers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Russell King <linux@arm.linux.org.uk>
Cc: Chris Ball <chris@printf.net>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140215003823.653236081@linutronix.de
Pull trivial tree updates from Jiri Kosina:
"Usual earth-shaking, news-breaking, rocket science pile from
trivial.git"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
doc: add missing files to timers/00-INDEX
timekeeping: Fix some trivial typos in comments
mm: Fix some trivial typos in comments
irq: Fix some trivial typos in comments
NUMA: fix typos in Kconfig help text
mm: update 00-INDEX
doc: Documentation/DMA-attributes.txt fix typo
DRM: comment: `halve' -> `half'
Docs: Kconfig: `devlopers' -> `developers'
doc: typo on word accounting in kprobes.c in mutliple architectures
treewide: fix "usefull" typo
treewide: fix "distingush" typo
mm/Kconfig: Grammar s/an/a/
kexec: Typo s/the/then/
Documentation/kvm: Update cpuid documentation for steal time and pv eoi
treewide: Fix common typo in "identify"
__page_to_pfn: Fix typo in comment
Correct some typos for word frequency
clk: fixed-factor: Fix a trivial typo
...
In commit ee23871389 ("genirq: Set irq thread to RT priority on
creation") we moved the assigment of the thread's priority from the
thread's function into __setup_irq(). That function may run in user
context for instance if the user opens an UART node and then driver
calls requests in the ->open() callback. That user may not have
CAP_SYS_NICE and so the irq thread won't run with the SCHED_OTHER
policy.
This patch uses sched_setscheduler_nocheck() so we omit the CAP_SYS_NICE
check which is otherwise required for the SCHED_OTHER policy.
[bigeasy: Rewrite the changelog]
Signed-off-by: Thomas Pfaff <tpfaff@pcs.com>
Cc: Ivo Sieben <meltedpianoman@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1381489240-29626-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit 02725e7471 ('genirq: Use irq_get/put functions'),
inadvertently changed can_request_irq() to return 0 for IRQs that have
no action. This causes pcibios_lookup_irq() to select only IRQs that
already have an action with IRQF_SHARED set, or to fail if there are
none. Change can_request_irq() to return 1 for IRQs that have no
action (if the first two conditions are met).
Reported-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is>
Tested-by: Bjarni Ingi Gislason <bjarniig@rhi.hi.is> (against 3.2)
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: 709647@bugs.debian.org
Cc: stable@vger.kernel.org # 2.6.39+
Link: http://bugs.debian.org/709647
Link: http://lkml.kernel.org/r/1372383630.23847.40.camel@deadeye.wl.decadent.org.uk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When a threaded irq handler is installed the irq thread is initially
created on normal scheduling priority. Only after the irq thread is
woken up it sets its priority to RT_FIFO MAX_USER_RT_PRIO/2 itself.
This means that interrupts that occur directly after the irq handler
is installed will be handled on a normal scheduling priority instead
of the realtime priority that one would expect.
Fix this by setting the RT priority on creation of the irq_thread.
Signed-off-by: Ivo Sieben <meltedpianoman@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1370254322-17240-1-git-send-email-meltedpianoman@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull scheduler changes from Ingo Molnar:
"Main changes:
- scheduler side full-dynticks (user-space execution is undisturbed
and receives no timer IRQs) preparation changes that convert the
cputime accounting code to be full-dynticks ready, from Frederic
Weisbecker.
- Initial sched.h split-up changes, by Clark Williams
- select_idle_sibling() performance improvement by Mike Galbraith:
" 1 tbench pair (worst case) in a 10 core + SMT package:
pre 15.22 MB/sec 1 procs
post 252.01 MB/sec 1 procs "
- sched_rr_get_interval() ABI fix/change. We think this detail is not
used by apps (so it's not an ABI in practice), but lets keep it
under observation.
- misc RT scheduling cleanups, optimizations"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
sched/rt: Add <linux/sched/rt.h> header to <linux/init_task.h>
cputime: Remove irqsave from seqlock readers
sched, powerpc: Fix sched.h split-up build failure
cputime: Restore CPU_ACCOUNTING config defaults for PPC64
sched/rt: Move rt specific bits into new header file
sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice
sched: Move sched.h sysctl bits into separate header
sched: Fix signedness bug in yield_to()
sched: Fix select_idle_sibling() bouncing cow syndrome
sched/rt: Further simplify pick_rt_task()
sched/rt: Do not account zero delta_exec in update_curr_rt()
cputime: Safely read cputime of full dynticks CPUs
kvm: Prepare to add generic guest entry/exit callbacks
cputime: Use accessors to read task cputime stats
cputime: Allow dynamic switch between tick/virtual based cputime accounting
cputime: Generic on-demand virtual cputime accounting
cputime: Move default nsecs_to_cputime() to jiffies based cputime file
cputime: Librarize per nsecs resolution cputime definitions
cputime: Avoid multiplication overflow on utime scaling
context_tracking: Export context state for generic vtime
...
Fix up conflict in kernel/context_tracking.c due to comment additions.
Move rt scheduler definitions out of include/linux/sched.h into
new file include/linux/sched/rt.h
Signed-off-by: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The array check is useless so remove it.
[akpm@linux-foundation.org: remove comment, per David]
Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sankara reported that the genirq core code fails to adjust the
affinity of an interrupt thread in several cases:
1) On request/setup_irq() the call to setup_affinity() happens before
the new action is registered, so the new thread is not notified.
2) For secondary shared interrupts nothing notifies the new thread to
change its affinity.
3) Interrupts which have the IRQ_NO_BALANCE flag set are not moving
the thread either.
Fix this by setting the thread affinity flag right on thread creation
time. This ensures that under all circumstances the thread moves to
the right place. Requires a check in irq_thread_check_affinity for an
existing affinity mask (CONFIG_CPU_MASK_OFFSTACK=y)
Reported-and-tested-by: Sankara Muthukrishnan <sankara.m@gmail.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1209041738200.2754@ionos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
As irq_thread_check_affinity is called ONLY inside the while loop in
the irq thread, the core affinity is set only when an interrupt
occurs. This patch sets the core affinity right after the irq thread
is created and before it waits for interrupts. In real-tiime targets
that do not typically change the core affinity of irqs during
run-time, this patch will save additional latency of an irq thread in
setting the core affinity during the first interrupt occurrence for
that irq.
Signed-off-by: Sankara S Muthukrishnan <sankara.m@ni.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/CAFQPvXeVZ858WFYimEU5uvLNxLDd6bJMmqWihFmbCf3ntokz0A@mail.gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Attempts to retrigger nested threaded IRQs currently fail because they
have no primary handler. In order to support retrigger of nested
IRQs, the parent IRQ needs to be retriggered.
To fix, when an IRQ needs to be resent, if the interrupt has a parent
IRQ and runs in the context of the parent IRQ, then resend the parent.
Also, handle_nested_irq() needs to clear the replay flag like the
other handlers, otherwise check_irq_resend() will set it and it will
never be cleared. Without clearing, it results in the first resend
working fine, but check_irq_resend() returning early on subsequent
resends because the replay flag is still set.
Problem discovered on ARM/OMAP platforms where a nested IRQ that's
also a wakeup IRQ happens late in suspend and needed to be retriggered
during the resume process.
[khilman@ti.com: changelog edits, clear IRQS_REPLAY in handle_nested_irq()]
Reported-by: Kevin Hilman <khilman@ti.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1350425269-11489-1-git-send-email-khilman@deeprootsystems.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull irq fix from Ingo Molnar.
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Allow irq chips to mark themself oneshot safe
from interrupts for /dev/random and /dev/urandom. The goal is to
addresses weaknesses discussed in the paper "Mining your Ps and Qs:
Detection of Widespread Weak Keys in Network Devices", by Nadia
Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman, which will
be published in the Proceedings of the 21st Usenix Security Symposium,
August 2012. (See https://factorable.net for more information and an
extended version of the paper.)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJQF/0DAAoJENNvdpvBGATwIowQAOep9QKtLrBvb2lwIRVmeiy8
lRf7V/tYZnz4FePbR0W92JQfKYkCV8yyOO0bmeRzWL3v4m+lRwDTSyA1DDyQMoH+
LOMzvDKSLJMSXTXdSOIr1WYACphViCR/9CrbMBCKSkYfZLJ1MdaEDxT3rcpTGD0T
6iknUweiSkHHhkerU5yQL7FKzD5kYUe0hsF47w7QVlHRHJsW2fsZqkFoh+RpnhNw
03u+djxNGBo9qV81vZ9D1b0vA9uRlEjoWOOEG2XE4M2iq6TUySueA72dQnCwunfi
3kG/u1Swv2dgq6aRrP3H7zdwhYSourGxziu3jNhEKwKEohrxYY7xjNX3RVeTqP67
AzlKsOTWpRLIDrzjSLlb8VxRQiZewu8Unex3e1G+eo20sbcIObHGrxNp7K00zZvd
QZiMHhOwItwFTe4lBO+XbqH2JKbL9/uJmwh5EipMpQTraKO9E6N3CJiUHjzBLo2K
iGDZxRMKf4gVJRwDxbbP6D70JPVu8ZJ09XVIpsXQ3Z1xNqaMF0QdCmP3ty56q1o0
NvkSXxPKrijZs8Sk0rVDqnJ3ll8PuDnXMv5eDtL42VT818I5WxESn9djjwEanGv0
TYxbFub/NRxmPEE5B2Js5FBpqsLf5f282OSMeS/5WLBbnHJR1OoPoAhGVpHvxntC
bi5FC1OolqhvzVIdsqgt
=u7KM
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random subsystem patches from Ted Ts'o:
"This patch series contains a major revamp of how we collect entropy
from interrupts for /dev/random and /dev/urandom.
The goal is to addresses weaknesses discussed in the paper "Mining
your Ps and Qs: Detection of Widespread Weak Keys in Network Devices",
by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman,
which will be published in the Proceedings of the 21st Usenix Security
Symposium, August 2012. (See https://factorable.net for more
information and an extended version of the paper.)"
Fix up trivial conflicts due to nearby changes in
drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c}
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (33 commits)
random: mix in architectural randomness in extract_buf()
dmi: Feed DMI table to /dev/random driver
random: Add comment to random_initialize()
random: final removal of IRQF_SAMPLE_RANDOM
um: remove IRQF_SAMPLE_RANDOM which is now a no-op
sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op
[ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op
board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op
isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op
pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out
uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op
drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op
xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op
n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op
pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op
i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op
input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op
mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op
...
Some interrupt chips like MSI are oneshot safe by implementation. For
those interrupts we can avoid the mask/unmask sequence for threaded
interrupt handlers.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1207132056540.32033@ionos
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Jan Kiszka <jan.kiszka@web.de>
task_work and rcu_head are identical now; merge them (calling the result
struct callback_head, rcu_head #define'd to it), kill separate allocation
in security/keys since we can just use cred->rcu now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
get rid of the only user of ->data; this is _not_ the final variant - in the
end we'll have task_work and rcu_head identical and just use cred->rcu,
at which point the separate allocation will be gone completely.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
With the new interrupt sampling system, we are no longer using the
timer_rand_state structure in the irq descriptor, so we can stop
initializing it now.
[ Merged in fixes from Sedat to find some last missing references to
rand_initialize_irq() ]
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Pull irq and smpboot updates from Thomas Gleixner:
"Just cleanup patches with no functional change and a fix for suspend
issues."
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Introduce irq_do_set_affinity() to reduce duplicated code
genirq: Add IRQS_PENDING for nested and simple irq
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smpboot, idle: Fix comment mismatch over idle_threads_init()
smpboot, idle: Optimize calls to smp_processor_id() in idle_threads_init()
Pull second pile of signal handling patches from Al Viro:
"This one is just task_work_add() series + remaining prereqs for it.
There probably will be another pull request from that tree this
cycle - at least for helpers, to get them out of the way for per-arch
fixes remaining in the tree."
Fix trivial conflict in kernel/irq/manage.c: the merge of Andrew's pile
had brought in commit 97fd75b7b8 ("kernel/irq/manage.c: use the
pr_foo() infrastructure to prefix printks") which changed one of the
pr_err() calls that this merge moves around.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
keys: kill task_struct->replacement_session_keyring
keys: kill the dummy key_replace_session_keyring()
keys: change keyctl_session_to_parent() to use task_work_add()
genirq: reimplement exit_irq_thread() hook via task_work_add()
task_work_add: generic process-context callbacks
avr32: missed _TIF_NOTIFY_RESUME on one of do_notify_resume callers
parisc: need to check NOTIFY_RESUME when exiting from syscall
move key_repace_session_keyring() into tracehook_notify_resume()
TIF_NOTIFY_RESUME is defined on all targets now
Use the module-wide pr_fmt() mechanism rather than open-coding "genirq: "
everywhere.
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All invocations of chip->irq_set_affinity() are doing the same return
value checks. Let them all use a common function.
[ tglx: removed the silly likely while at it ]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Keping Chen <chenkeping@huawei.com>
Link: http://lkml.kernel.org/r/1333120296-13563-3-git-send-email-jiang.liu@huawei.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
exit_irq_thread() and task->irq_thread are needed to handle the unexpected
(and unlikely) exit of irq-thread.
We can use task_work instead and make this all private to
kernel/irq/manage.c, cleanup plus micro-optimization.
1. rename exit_irq_thread() to irq_thread_dtor(), make it
static, and move it up before irq_thread().
2. change irq_thread() to do task_work_add(irq_thread_dtor)
at the start and task_work_cancel() before return.
tracehook_notify_resume() can never play with kthreads,
only do_exit()->exit_task_work() can call the callback
and this is what we want.
3. remove task_struct->irq_thread and the special hook
in do_exit().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Smith <dsmith@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull UML updates from Richard Weinberger:
"Most changes are bug fixes and cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: missing checks of __put_user()/__get_user() return values
um: stub_rt_sigsuspend isn't needed these days anymore
um/x86: merge (and trim) 32- and 64-bit variants of ptrace.h
irq: Remove irq_chip->release()
um: Remove CONFIG_IRQ_RELEASE_METHOD
um: Remove usage of irq_chip->release()
um: Implement um_free_irq()
um: Fix __swp_type()
um: Implement a custom pte_same() function
um: Add BUG() to do_ops()'s error path
um: Remove unused variables
um: bury unused _TIF_RESTORE_SIGMASK
um: wrong sigmask saved in case of multiple sigframes
um: add TIF_NOTIFY_RESUME
um: ->restart_block.fn needs to be reset on sigreturn
As it's only user (UML) does no longer need it we can get
rid of it.
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
We require that shared interrupts agree on a few flag settings. Right
now we silently return with an error code without giving any hint why
we reject it.
Make the printout unconditionally and actually useful by printing the
flags of the new and the already registered action.
Convert all printks to pr_* and use a proper prefix while at it.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Requesting a threaded interrupt without a primary handler and without
IRQF_ONESHOT set is dangerous.
The core will use the default primary handler for it, which merily
wakes the thread. For a level type interrupt this results in an
interrupt storm, because the interrupt line is reenabled after the
primary handler runs. The device has still the line asserted, which
brings us back into the primary handler.
While this works for edge type interrupts, we play it safe and reject
unconditionally because we can't say for sure which type this
interrupt really has. The type flags are unreliable as the underlying
chip implementation can override them. And we cannot assume that
developers using that interface know what they are doing.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We respect node affinity of devices already in the irq descriptor
allocation, but we ignore it for the initial interrupt affinity
setup, so the interrupt might be routed to a different node.
Restrict the default affinity mask to the node on which the irq
descriptor is allocated.
[ tglx: Massaged changelog ]
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/1332788538-17425-1-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The only place irq_finalize_oneshot() is called with force parameter set
is the threaded handler error exit path. But IRQTF_RUNTHREAD is dropped
at this point and irq_wake_thread() is not going to set it again,
since PF_EXITING is set for this thread already. So irq_finalize_oneshot()
will drop the threads bit in threads_oneshot anyway and hence the force
parameter is superfluous.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120321162234.GP24806@dhcp-26-207.brq.redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Alexander pointed out that the warnons in the regular exit path are
bogus and the thread_mask one actually could be triggered when
__setup_irq() hands out that thread_mask again after __free_irq()
dropped irq_desc->lock.
Thinking more about it, neither IRQTF_RUNTHREAD nor the bit in
thread_mask can be set as this is the regular exit path. We come here
due to:
__free_irq()
remove action from desc
synchronize_irq()
kthread_stop()
So synchronize_irq() makes sure that the thread finished running and
cleaned up both the thread_active count and thread_mask. After that
point nothing can set IRQTF_RUNTHREAD on this action. So the warnons
and the cleanups are pointless.
Reported-by: Alexander Gordeev <agordeev@redhat.com>
Cc: Ido Yariv <ido@wizery.com>
Link: http://lkml.kernel.org/r/20120315190755.GA6732@dhcp-26-207.brq.redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The current implementation does not always flush the threaded handler
when disabling the irq. In case the irq handler was called, but the
threaded handler hasn't started running yet, the interrupt will be
flagged as pending, and the handler will not run. This implementation
has some issues:
First, if the interrupt is a wake source and flagged as pending, the
system will not be able to suspend.
Second, when quickly disabling and re-enabling the irq, the threaded
handler might continue to run after the irq is re-enabled without the
irq handler being called first. This might be an unexpected behavior.
In addition, it might be counter-intuitive that the threaded handler
will not be called even though the irq handler was called and returned
IRQ_WAKE_THREAD.
Fix this by always waiting for the threaded handler to complete in
synchronize_irq().
[ tglx: Massaged comments, added WARN_ONs and the missing
IRQTF_RUNTHREAD check in exit_irq_thread() ]
Signed-off-by: Ido Yariv <ido@wizery.com>
Link: http://lkml.kernel.org/r/1322843052-7166-1-git-send-email-ido@wizery.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently IRQTF_DIED flag is set when a IRQ thread handler calls do_exit()
But also PF_EXITING per process flag gets set when a thread exits. This
fix eliminates the duplicate by using PF_EXITING flag.
Also, there is a race condition in exit_irq_thread(). In case a thread's
bit is cleared in desc->threads_oneshot (and the IRQ line gets unmasked),
but before IRQTF_DIED flag is set, a new interrupt might come in and set
just cleared bit again, this time forever. This fix throws IRQTF_DIED flag
away, eliminating the race as a result.
[ tglx: Test THREAD_EXITING first as suggested by Oleg ]
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120309135958.GD2114@dhcp-26-207.brq.redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Since 63706172f3 kthread_stop() is not
afraid of dead kernel threads. So no need to check if a thread is
alive before stopping it. These checks still were racy.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120309135939.GC2114@dhcp-26-207.brq.redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
When a new thread handler is created, an irqaction is passed to it as
data. Not only that irqaction is stored in task_struct by the handler
for later use, but also a structure associated with the kernel thread
keeps this value as long as the thread exists.
This fix kicks irqaction out off task_struct. Yes, I introduce new bit
field. But it allows not only to eliminate the duplicate, but also
shortens size of task_struct.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120309135925.GB2114@dhcp-26-207.brq.redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Xommit ac5637611(genirq: Unmask oneshot irqs when thread was not woken)
fails to unmask when a !IRQ_ONESHOT threaded handler is handled by
handle_level_irq.
This happens because thread_mask is or'ed unconditionally in
irq_wake_thread(), but for !IRQ_ONESHOT interrupts never cleared. So
the check for !desc->thread_active fails and keeps the interrupt
disabled.
Keep the thread_mask zero for !IRQ_ONESHOT interrupts.
Document the thread_mask magic while at it.
Reported-and-tested-by: Sven Joachim <svenjoac@gmx.de>
Reported-and-tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An interrupt might be pending when irq_startup() is called, but the
startup code does not invoke the resend logic. In some cases this
prevents the device from issuing another interrupt which renders the
device non functional.
Call the resend function in irq_startup() to keep things going.
Reported-and-tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
Kconfig: acpi: Fix typo in comment.
misc latin1 to utf8 conversions
devres: Fix a typo in devm_kfree comment
btrfs: free-space-cache.c: remove extra semicolon.
fat: Spelling s/obsolate/obsolete/g
SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
tools/power turbostat: update fields in manpage
mac80211: drop spelling fix
types.h: fix comment spelling for 'architectures'
typo fixes: aera -> area, exntension -> extension
devices.txt: Fix typo of 'VMware'.
sis900: Fix enum typo 'sis900_rx_bufer_status'
decompress_bunzip2: remove invalid vi modeline
treewide: Fix comment and string typo 'bufer'
hyper-v: Update MAINTAINERS
treewide: Fix typos in various parts of the kernel, and fix some comments.
clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
leds: Kconfig: Fix typo 'D2NET_V2'
sound: Kconfig: drop unknown symbol ARCH_CLPS7500
...
Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
kconfig additions, close to removed commented-out old ones)
In irq_wait_for_interrupt(), the should_stop member is verified before
setting the task's state to TASK_INTERRUPTIBLE and calling schedule().
In case kthread_stop sets should_stop and wakes up the process after
should_stop is checked by the irq thread but before the task's state
is changed, the irq thread might never exit:
kthread_stop irq_wait_for_interrupt
------------ ----------------------
...
... while (!kthread_should_stop()) {
kthread->should_stop = 1;
wake_up_process(k);
wait_for_completion(&kthread->exited);
...
set_current_state(TASK_INTERRUPTIBLE);
...
schedule();
}
Fix this by checking if the thread should stop after modifying the
task's state.
[ tglx: Simplified it a bit ]
Signed-off-by: Ido Yariv <ido@wizery.com>
Link: http://lkml.kernel.org/r/1322740508-22640-1-git-send-email-ido@wizery.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org
The power management functions related to interrupts do not know
(yet) about per-cpu interrupts and end up calling the wrong
low-level methods to enable/disable interrupts.
This leads to all kind of interesting issues (action taken on one
CPU only, updating a refcount which is not used otherwise...).
The workaround for the time being is simply to flag these interrupts
with IRQF_NO_SUSPEND. At least on ARM, these interrupts are actually
dealt with at the architecture level.
Reported-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1321446459-31409-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
As request_percpu_irq() doesn't allow for a percpu interrupt to have
its type configured (it is generally impossible to configure it on all
CPUs at once), add a 'type' argument to enable_percpu_irq().
This allows some low-level, board specific init code to be switched to
a generic API.
[ tglx: Added WARN_ON argument ]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The ARM GIC interrupt controller offers per CPU interrupts (PPIs),
which are usually used to connect local timers to each core. Each CPU
has its own private interface to the GIC, and only sees the PPIs that
are directly connect to it.
While these timers are separate devices and have a separate interrupt
line to a core, they all use the same IRQ number.
For these devices, request_irq() is not the right API as it assumes
that an IRQ number is visible by a number of CPUs (through the
affinity setting), but makes it very awkward to express that an IRQ
number can be handled by all CPUs, and yet be a different interrupt
line on each CPU, requiring a different dev_id cookie to be passed
back to the handler.
The *_percpu_irq() functions is designed to overcome these
limitations, by providing a per-cpu dev_id vector:
int request_percpu_irq(unsigned int irq, irq_handler_t handler,
const char *devname, void __percpu *percpu_dev_id);
void free_percpu_irq(unsigned int, void __percpu *);
int setup_percpu_irq(unsigned int irq, struct irqaction *new);
void remove_percpu_irq(unsigned int irq, struct irqaction *act);
void enable_percpu_irq(unsigned int irq);
void disable_percpu_irq(unsigned int irq);
The API has a number of limitations:
- no interrupt sharing
- no threading
- common handler across all the CPUs
Once the interrupt is requested using setup_percpu_irq() or
request_percpu_irq(), it must be enabled by each core that wishes its
local interrupt to be delivered.
Based on an initial patch by Thomas Gleixner.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/1316793788-14500-2-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Some irq chips need the irq_set_wake() functionality, but do not
require a irq_set_wake() callback. Instead of forcing an empty
callback to be implemented add a flag which notes this fact. Check for
the flag in set_irq_wake_real() and return success when set.
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
This reverts commit f3637a5f2e.
It turns out that this breaks several drivers, one example being OMAP
boards which use the on-board OMAP UARTs and the omap-serial driver that
will not boot to userspace after the commit.
Paul Walmsley reports that enabling CONFIG_DEBUG_SHIRQ reveals 'IRQ
handler type mismatch' errors:
IRQ handler type mismatch for IRQ 74
current handler: serial idle
...
and the reason is that setting IRQF_ONESHOT will now result in those
interrupt handlers having different IRQF flags, and thus being
unsharable. So the commit log in the reverted commit:
"Since it is required for those users and
there is no difference for others it makes sense to add this flag
unconditionally."
is simply not true: there may not be any difference from a "actions at
irq time", but there is a *big* difference wrt this flag testing irq
management (see __setup_irq() in kernel/irq/manage.c).
One solution may be to stop verifying IRQF_ONESHOT in __setup_irq(), but
right now the safe course of action is to revert the change. Let's
revisit this in a later merge window.
Reported-by: Paul Walmsley <paul@pwsan.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Requested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Interrupt descriptors can be allocated from modules. The interrupts
are used by other modules, but we have no refcount on the module which
provides the interrupts and there is no way to establish one on the
device level as the interrupt using module is agnostic to the fact
that the interrupt is provided by a module rather than by some builtin
interrupt controller.
To prevent removal of the interrupt providing module, we can track the
owner of the interrupt descriptor, which also provides the relevant
irq chip functions in the irq descriptor.
request/setup_irq() can now acquire a refcount on the owner module to
prevent unloading. free_irq() drops the refcount.
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Link: http://lkml.kernel.org/r/20110711101731.GA13804@Chamillionaire.breakpoint.cc
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If no primary handler is specified then a default one is assigned
which always returns IRQ_WAKE_THREAD. This handler requires the
IRQF_ONESHOT flag on LEVEL / EIO typed irqs because the source of
interrupt is not disabled. Since it is required for those users and
there is no difference for others it makes sense to add this flag
unconditionally.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/1310070737-18514-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In kernel/irq/manage.c::irq_set_irq_wake() we call
irq_get_desc_buslock() which may return NULL, but the code
dereferences the result unconditionally.
irq_set_irq_wake() has lots of callers - I checked a few and I couldn't
find anything that guarantees that they won't call it with some input that
will cause irq_get_desc_buslock() to return NULL, so I think it's a good
thing to test and -EINVAL was the most sane error code in this situation
that I could think of.
Not all callers test the return value of irq_set_irq_wake(), but those
that do take != 0 to mean error as far as I can see, so they should be
fine. I guess those that don't test actually should, but that's a
different issue.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1106092300360.17868@swampdragon.chaosbits.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The detection of spurios interrupts is currently limited to first level
handler. In force-threaded mode we never notice if the threaded irq does
not feel responsible.
This patch catches the return value of the threaded handler and forwards
it to the spurious detector. If the primary handler returns only
IRQ_WAKE_THREAD then the spourious detector ignores it because it gets
called again from the threaded handler.
[ tglx: Report the erroneous return value early and bail out ]
Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Link: http://lkml.kernel.org/r/1306824972-27067-2-git-send-email-sebastian@breakpoint.cc
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This adds support for disabling threading on a per-IRQ basis via the IRQ
status instead of the IRQ flow, which is necessary for interrupts that
don't follow the natural IRQ flow channels, such as those that are
virtually created.
The new APIs added are simply:
irq_set_thread()
irq_set_nothread()
which follow the rest of the IRQ status routines.
Chained handlers also have IRQ_NOTHREAD set on them automatically, making
the lack of threading explicit rather than implicit. Subsequently, the
nothread flag can be viewed through the standard genirq debugging
facilities.
[ tglx: Fixed cleanup fallout ]
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Link: http://lkml.kernel.org/r/%3C20110406210135.GF18426%40linux-sh.org%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The allocated cpumask should be freed in __setup_irq().
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
LKML-Reference: <1301744375-6812-1-git-send-email-dfeng@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>