f6656208f0
Some modern systems have very tight thermal tolerances. Because of this they may cross thermal thresholds when running normal workloads (even during boot). The CPU hardware will react by limiting power/frequency and using duty cycles to bring the temperature back into normal range. Thus users may see a "critical" message about the "temperature above threshold" which is soon followed by "temperature/speed normal". These messages are rate-limited, but still may repeat every few minutes. This issue became worse starting with the Ivy Bridge generation of CPUs because they include a TCC activation offset in the MSR IA32_TEMPERATURE_TARGET. OEMs use this to provide alerts long before critical temperatures are reached. A test run on a laptop with Intel 8th Gen i5 core for two hours with a workload resulted in 20K+ thermal interrupts per CPU for core level and another 20K+ interrupts at package level. The kernel logs were full of throttling messages. The real value of these threshold interrupts, is to debug problems with the external cooling solutions and performance issues due to excessive throttling. So the solution here is the following: - In the current thermal_throttle folder, show: - the maximum time for one throttling event and, - the total amount of time the system was in throttling state. - Do not log short excursions. - Log only when, in spite of thermal throttling, the temperature is rising. On the high threshold interrupt trigger a delayed workqueue that monitors the threshold violation log bit (THERM_STATUS_PROCHOT_LOG). When the log bit is set, this workqueue callback calculates three point moving average and logs a warning message when the temperature trend is rising. When this log bit is clear and temperature is below threshold temperature, then the workqueue callback logs a "Normal" message. Once a high threshold event is logged, the logging is rate-limited. With this patch on the same test laptop, no warnings are printed in the logs as the max time the processor could bring the temperature under control is only 280 ms. This implementation is done with the inputs from Alan Cox and Tony Luck. [ bp: Touchups. ] Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: bberg@redhat.com Cc: ckellner@redhat.com Cc: hdegoede@redhat.com Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20191111214312.81365-1-srinivas.pandruvada@linux.intel.com |
||
---|---|---|
Documentation | ||
LICENSES | ||
arch | ||
block | ||
certs | ||
crypto | ||
drivers | ||
fs | ||
include | ||
init | ||
ipc | ||
kernel | ||
lib | ||
mm | ||
net | ||
samples | ||
scripts | ||
security | ||
sound | ||
tools | ||
usr | ||
virt | ||
.clang-format | ||
.cocciconfig | ||
.get_maintainer.ignore | ||
.gitattributes | ||
.gitignore | ||
.mailmap | ||
COPYING | ||
CREDITS | ||
Kbuild | ||
Kconfig | ||
MAINTAINERS | ||
Makefile | ||
README |
README
Linux kernel ============ There are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. Please read Documentation/admin-guide/README.rst first. In order to build the documentation, use ``make htmldocs`` or ``make pdfdocs``. The formatted documentation can also be read online at: https://www.kernel.org/doc/html/latest/ There are various text files in the Documentation/ subdirectory, several of them using the Restructured Text markup notation. Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.