doc: Document CONFIG_RCU_CPU_STALL_CPUTIME=y stall information
This commit documents the additional RCU CPU stall warning output produced by kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y or booted with rcupdate.rcu_cpu_stall_cputime=1. [ paulmck: Apply wordsmithing. ] Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit is contained in:
Родитель
148750d736
Коммит
7a21ddf01a
|
@ -390,3 +390,91 @@ for example, "P3421".
|
|||
|
||||
It is entirely possible to see stall warnings from normal and from
|
||||
expedited grace periods at about the same time during the same run.
|
||||
|
||||
RCU_CPU_STALL_CPUTIME
|
||||
=====================
|
||||
|
||||
In kernels built with CONFIG_RCU_CPU_STALL_CPUTIME=y or booted with
|
||||
rcupdate.rcu_cpu_stall_cputime=1, the following additional information
|
||||
is supplied with each RCU CPU stall warning::
|
||||
|
||||
rcu: hardirqs softirqs csw/system
|
||||
rcu: number: 624 45 0
|
||||
rcu: cputime: 69 1 2425 ==> 2500(ms)
|
||||
|
||||
These statistics are collected during the sampling period. The values
|
||||
in row "number:" are the number of hard interrupts, number of soft
|
||||
interrupts, and number of context switches on the stalled CPU. The
|
||||
first three values in row "cputime:" indicate the CPU time in
|
||||
milliseconds consumed by hard interrupts, soft interrupts, and tasks
|
||||
on the stalled CPU. The last number is the measurement interval, again
|
||||
in milliseconds. Because user-mode tasks normally do not cause RCU CPU
|
||||
stalls, these tasks are typically kernel tasks, which is why only the
|
||||
system CPU time are considered.
|
||||
|
||||
The sampling period is shown as follows:
|
||||
:<------------first timeout---------->:<-----second timeout----->:
|
||||
:<--half timeout-->:<--half timeout-->: :
|
||||
: :<--first period-->: :
|
||||
: :<-----------second sampling period---------->:
|
||||
: : : :
|
||||
: snapshot time point 1st-stall 2nd-stall
|
||||
|
||||
|
||||
The following describes four typical scenarios:
|
||||
|
||||
1. A CPU looping with interrupts disabled.::
|
||||
|
||||
rcu: hardirqs softirqs csw/system
|
||||
rcu: number: 0 0 0
|
||||
rcu: cputime: 0 0 0 ==> 2500(ms)
|
||||
|
||||
Because interrupts have been disabled throughout the measurement
|
||||
interval, there are no interrupts and no context switches.
|
||||
Furthermore, because CPU time consumption was measured using interrupt
|
||||
handlers, the system CPU consumption is misleadingly measured as zero.
|
||||
This scenario will normally also have "(0 ticks this GP)" printed on
|
||||
this CPU's summary line.
|
||||
|
||||
2. A CPU looping with bottom halves disabled.
|
||||
|
||||
This is similar to the previous example, but with non-zero number of
|
||||
and CPU time consumed by hard interrupts, along with non-zero CPU
|
||||
time consumed by in-kernel execution.::
|
||||
|
||||
rcu: hardirqs softirqs csw/system
|
||||
rcu: number: 624 0 0
|
||||
rcu: cputime: 49 0 2446 ==> 2500(ms)
|
||||
|
||||
The fact that there are zero softirqs gives a hint that these were
|
||||
disabled, perhaps via local_bh_disable(). It is of course possible
|
||||
that there were no softirqs, perhaps because all events that would
|
||||
result in softirq execution are confined to other CPUs. In this case,
|
||||
the diagnosis should continue as shown in the next example.
|
||||
|
||||
3. A CPU looping with preemption disabled.
|
||||
|
||||
Here, only the number of context switches is zero.::
|
||||
|
||||
rcu: hardirqs softirqs csw/system
|
||||
rcu: number: 624 45 0
|
||||
rcu: cputime: 69 1 2425 ==> 2500(ms)
|
||||
|
||||
This situation hints that the stalled CPU was looping with preemption
|
||||
disabled.
|
||||
|
||||
4. No looping, but massive hard and soft interrupts.::
|
||||
|
||||
rcu: hardirqs softirqs csw/system
|
||||
rcu: number: xx xx 0
|
||||
rcu: cputime: xx xx 0 ==> 2500(ms)
|
||||
|
||||
Here, the number and CPU time of hard interrupts are all non-zero,
|
||||
but the number of context switches and the in-kernel CPU time consumed
|
||||
are zero. The number and cputime of soft interrupts will usually be
|
||||
non-zero, but could be zero, for example, if the CPU was spinning
|
||||
within a single hard interrupt handler.
|
||||
|
||||
If this type of RCU CPU stall warning can be reproduced, you can
|
||||
narrow it down by looking at /proc/interrupts or by writing code to
|
||||
trace each interrupt, for example, by referring to show_interrupts().
|
||||
|
|
Загрузка…
Ссылка в новой задаче