Replace the license boiler plate with a SPDX license identifier.
While in the area, update an email address.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Update .h file SPDX comment format per Joe Perches. ]
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
srcu_queue_delayed_work_on() disables preemption (and therefore CPU
hotplug in RCU's case) and then checks based on its own accounting if a
CPU is online. If the CPU is online it uses queue_delayed_work_on()
otherwise it fallbacks to queue_delayed_work().
The problem here is that queue_work() on -RT does not work with disabled
preemption.
queue_work_on() works also on an offlined CPU. queue_delayed_work_on()
has the problem that it is possible to program a timer on an offlined
CPU. This timer will fire once the CPU is online again. But until then,
the timer remains programmed and nothing will happen.
Add a local timer which will fire (as requested per delay) on the local
CPU and then enqueue the work on the specific CPU.
RCUtorture testing with SRCU-P for 24h showed no problems.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
This commit removes the "@irq" argument from the rcu_nmi_exit() docbook
header, given that this function now has no arguments.
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Although the name rcu_process_callbacks() still makes sense for Tiny
RCU, where most of what it does is invoke callbacks, it no longer makes
much sense for Tree RCU, especially given that the actually callback
invocation is relegated to rcu_do_batch(), or, for no-CBs CPUs, to the
rcuo kthreads. Especially in the latter case, rcu_process_callbacks()
has very little to do with actual callbacks. A better description of
this function is that it performs RCU's core processing.
This commit therefore changes the name of Tree RCU's rcu_process_callbacks()
function to rcu_core(), which also has the virtue of being consistent with
the existing invoke_rcu_core() function.
While in the area, the header comment is reworked.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
The name rcu_check_callbacks() arguably made sense back in the early
2000s when RCU was quite a bit simpler than it is today, but it has
become quite misleading, especially with the advent of dyntick-idle
and NO_HZ_FULL. The rcu_check_callbacks() function is RCU's hook into
the scheduling-clock interrupt, and is now but one of many ways that
callbacks get promoted to invocable state.
This commit therefore changes the name to rcu_sched_clock_irq(),
which is the same number of characters and clearly indicates this
function's relation to the rest of the Linux kernel. In addition, for
the sake of consistency, rcu_flavor_check_callbacks() is also renamed
to rcu_flavor_sched_clock_irq().
While in the area, the header comments for both functions are reworked.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Currently, __note_gp_changes() checks to see if the rcu_node structure's
->gp_seq_needed is greater than or equal to that of the rcu_data
structure, and if so, updates the rcu_data structure's ->gp_seq_needed
field. This results in a useless store in the case where the two fields
are equal.
This commit therefore carries out this store only in the case where the
rcu_node structure's ->gp_seq_needed is strictly greater than that of
the rcu_data structure.
Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Link: https://lkml.kernel.org/r/88DC34334CA3444C85D647DBFA962C2735AD5F77@SHSMSX104.ccr.corp.intel.com
The rcu_gp_kthread_wake() function is invoked when it might be necessary
to wake the RCU grace-period kthread. Because self-wakeups are normally
a useless waste of CPU cycles, if rcu_gp_kthread_wake() is invoked from
this kthread, it naturally refuses to do the wakeup.
Unfortunately, natural though it might be, this heuristic fails when
rcu_gp_kthread_wake() is invoked from an interrupt or softirq handler
that interrupted the grace-period kthread just after the final check of
the wait-event condition but just before the schedule() call. In this
case, a wakeup is required, even though the call to rcu_gp_kthread_wake()
is within the RCU grace-period kthread's context. Failing to provide
this wakeup can result in grace periods failing to start, which in turn
results in out-of-memory conditions.
This race window is quite narrow, but it actually did happen during real
testing. It would of course need to be fixed even if it was strictly
theoretical in nature.
This patch does not Cc stable because it does not apply cleanly to
earlier kernel versions.
Fixes: 48a7639ce8 ("rcu: Make callers awaken grace-period kthread")
Reported-by: "He, Bo" <bo.he@intel.com>
Co-developed-by: "Zhang, Jun" <jun.zhang@intel.com>
Co-developed-by: "He, Bo" <bo.he@intel.com>
Co-developed-by: "xiao, jin" <jin.xiao@intel.com>
Co-developed-by: Bai, Jie A <jie.a.bai@intel.com>
Signed-off: "Zhang, Jun" <jun.zhang@intel.com>
Signed-off: "He, Bo" <bo.he@intel.com>
Signed-off: "xiao, jin" <jin.xiao@intel.com>
Signed-off: Bai, Jie A <jie.a.bai@intel.com>
Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com>
[ paulmck: Switch from !in_softirq() to "!in_interrupt() &&
!in_serving_softirq() to avoid redundant wakeups and to also handle the
interrupt-handler scenario as well as the softirq-handler scenario that
actually occurred in testing. ]
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Link: https://lkml.kernel.org/r/CD6925E8781EFD4D8E11882D20FC406D52A11F61@SHSMSX104.ccr.corp.intel.com
Life is hard if RCU manages to get stuck without triggering RCU CPU
stall warnings or triggering the rcu_check_gp_start_stall() checks
for failing to start a grace period. This commit therefore adds a
boot-time-selectable sysrq key (commandeering "y") that allows manually
dumping Tree RCU state. The new rcutree.sysrq_rcu kernel boot parameter
must be set for this sysrq to be available.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
The rcu_check_gp_kthread_starvation() function can be invoked without
holding locks, so the access to the rcu_state structure's ->gp_flags
field must be protected with READ_ONCE(). This commit therefore adds
this protection.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
If a grace period fails to start (for example, because you commented
out the last two lines of rcu_accelerate_cbs_unlocked()), rcu_core()
will invoke rcu_check_gp_start_stall(), which will notice and complain.
However, this complaint is lacking crucial debugging information such
as when the last wakeup executed and what the value of ->gp_seq was at
that time. This commit therefore removes the current pr_alert() from
rcu_check_gp_start_stall(), instead invoking show_rcu_gp_kthreads(),
which has been updated to print the needed information, which is collected
by rcu_gp_kthread_wake().
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
It is perfectly fine to set the rcutree.jiffies_till_first_fqs boot
parameter to zero, in fact, this can be useful on specialty systems that
usually have at least one idle CPU and that need fast grace periods.
This is because this setting causes the RCU grace-period kthread to
scan for idle threads immediately after grace-period initialization,
as opposed to waiting several jiffies to do so.
It is also perfectly fine to set the rcutree.rcu_kick_kthreads kernel
parameter, which gives the RCU grace-period kthread an extra wakeup
if it doesn't make progress for a period of three times the setting of
the rcutree.jiffies_till_first_fqs boot parameter. This is of course
problematic when the value of this parameter is zero, as it can result
in unnecessary wakeup IPIs along with unnecessary WARN_ONCE() invocations.
This commit therefore defers kthread kicking for at least two jiffies,
regardless of the setting of rcutree.jiffies_till_first_fqs.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Back when there were multiple flavors of RCU, it was necessary to
separately count lazy and non-lazy callbacks for each CPU. These counts
were used in CONFIG_RCU_FAST_NO_HZ kernels to determine how long a newly
idle CPU should be allowed to sleep before handling its RCU callbacks.
But now that there is only one flavor, the callback counts for a given
CPU's sole rcu_data structure are the counts for that CPU.
This commit therefore removes the rcu_data structure's ->nonlazy_posted
and ->nonlazy_posted_snap fields, the rcu_idle_count_callbacks_posted()
and rcu_cpu_has_callbacks() functions, repurposes the rcu_data structure's
->all_lazy field to record the laziness state at the beginning of the
latest idle sojourn, and modifies CONFIG_RCU_FAST_NO_HZ RCU CPU stall
warnings accordingly.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Now that rcu_blocking_is_gp() makes the correct immediate-return
decision for both PREEMPT and !PREEMPT, a single implementation of
synchronize_rcu() will work correctly under both configurations.
This commit therefore eliminates a few lines of code by consolidating
the two implementations of synchronize_rcu().
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Now that the RCU flavors have been consolidated, RCU_BH_FLAVOR and
RCU_SCHED_FLAVOR are no longer used. This commit therefore saves a
few lines by removing them.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Given that rcu_force_quiescent_state() is a simple wrapper around
force_quiescent_state(), this commit saves a few lines of code by
inlining force_quiescent_state() into rcu_force_quiescent_state(),
and changing all references to force_quiescent_state() to instead
invoke rcu_force_quiescent_state().
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Given RCU flavor consolidation, the name rcu_spawn_all_nocb_kthreads()
is quite misleading. It no longer ever creates more than one kthread,
and it does so only for the specified CPU. This commit therefore changes
this name to the more descriptive rcu_spawn_cpu_nocb_kthread(), and also
fixes up a similar issue in its header comment while in the area.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
If rcutorture's forward-progress tests fail while a grace period is not
in progress, it is useful to print the time since the last grace period
ended as a way to detect failure to launch a new grace period. This
commit therefore makes this change.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
This commit prints out the non-zero per-CPU callback counts when a
forware-progress error (OOM event) occurs.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
[ paulmck: Fix a pair of uninitialized locals spotted by kbuild test robot. ]
The RCU CPU stall warnings print an estimate of the total number of
RCU callbacks queued in the system, but this estimate leaves out
the callbacks queued for nocbs CPUs. This commit therefore introduces
rcu_get_n_cbs_cpu(), which gives an accurate callback estimate for
both nocbs and normal CPUs, and uses this new function as needed.
This commit also introduces a rcu_get_n_cbs_nocb_cpu() helper function
that returns the number of callbacks for nocbs CPUs or zero otherwise,
and also uses this function in place of direct access to ->nocb_q_count
while in the area (fewer characters, you see).
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
This commit adds an OOM notifier during rcutorture forward-progress
testing. If this notifier is invoked, it dumps out some grace-period
state to help debug the forward-progress problem.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
bug.2018.11.12a: Get rid of BUG_ON() and friends
consolidate.2018.12.01a: Continued RCU flavor-consolidation cleanup
doc.2018.11.12a: Documentation updates
fixes.2018.11.12a: Miscellaneous fixes
initrd.2018.11.08b: Automate creation of rcutorture initrd
sil.2018.11.12a: Remove more spin_unlock_wait() calls
Currently, rcu_gp_cleanup() traces the end of the old grace period after
the old grace period has officially ended. This might make intuitive
sense, but it also makes for confusing event-trace output because the
"end" trace displays not the old but instead the new grace-period number.
This commit therefore traces the end of an old grace period just before
that grace period officially ends.
Reported-by: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Because RCU avoids interrupting idle CPUs, rcu_is_watching() is used to
test whether or not it is currently legal to run RCU read-side critical
sections on this CPU. However, the first sentence and last sentences
of current comment for rcu_is_watching have opposite meaning of what
is expected. This commit therefore fixes this header comment.
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
This commit adds a printout of the number of jiffies since the last time
that the RCU grace-period kthread did any processing. This can be useful
when tracking down forward-progress issues.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
This commit adds the name of the RCU grace-period state to
the show_rcu_gp_kthreads() output in order to ease debugging.
This commit also moves gp_state_getname() up in the code so that
show_rcu_gp_kthreads() can use it.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
In order to debug forward-progress stalls, it is necessary to check
for excessively delayed grace-period starts. This is currently done
for RCU CPU stall warnings by rcu_check_gp_start_stall(), which checks
to see if the start of a requested grace period has been delayed by an
RCU CPU stall warning period. Because rcutorture will need to check
for the time consumed by an RCU forward-progress delay, this commit
promotes gpssdelay from a local variable to a formal parameter. It is
not necessary to export rcu_check_gp_start_stall() because rcutorture
will access it via a wrapper function.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
The rcu_check_gp_start_stall() function multiplies the return value
from rcu_jiffies_till_stall_check() by HZ, but the units are already
in jiffies. This commit therefore avoids the need for introduction of
a jiffies-squared unit by removing the extraneous multiplication.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
The tree.c file has a number of calls to BUG_ON(), which panics the
kernel, which is not a good strategy for devices (like embedded) that
don't have a way to capture console output. This commit therefore
converts these BUG_ON() calls to WARN_ON_ONCE() and WARN_ONCE().
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Event tracing is moving to SRCU in order to take advantage of the fact
that SRCU may be safely used from idle and even offline CPUs. However,
event tracing can invoke call_srcu() very early in the boot process,
even before workqueue_init_early() is invoked (let alone rcu_init()).
Therefore, call_srcu()'s attempts to queue work fail miserably.
This commit therefore detects this situation, and refrains from attempting
to queue work before rcu_init() time, but does everything else that it
would have done, and in addition, adds the srcu_struct to a global list.
The rcu_init() function now invokes a new srcu_init() function, which
is empty if CONFIG_SRCU=n. Otherwise, srcu_init() queues work for
each srcu_struct on the list. This all happens early enough in boot
that there is but a single CPU with interrupts disabled, which allows
synchronization to be dispensed with.
Of course, the queued work won't actually be invoked until after
workqueue_init() is invoked, which happens shortly after the scheduler
is up and running. This means that although call_srcu() may be invoked
any time after per-CPU variables have been set up, there is still a very
narrow window when synchronize_srcu() won't work, and this window
extends from the time that the scheduler starts until the time that
workqueue_init() returns. This can be fixed in a manner similar to
the fix for synchronize_rcu_expedited() and friends, but until someone
actually needs to use synchronize_srcu() during this window, this fix
is added churn for no benefit.
Finally, note that Tree SRCU's new srcu_init() function invokes
queue_work() rather than the queue_delayed_work() function that is
invoked post-boot. The reason is that queue_delayed_work() will (as you
would expect) post a timer, and timers have not yet been initialized.
So use of queue_work() avoids the complaints about use of uninitialized
spinlocks that would otherwise result. Besides, some delay is already
provide by the aforementioned fact that the queued work won't actually
be invoked until after the scheduler is up and running.
Requested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
1e64b15a4b ("rcu: Fix grace-period hangs due to race with CPU offline")
added spinlock_t ofl_lock to the rcu_state structure, then takes it with
preemption disabled during CPU offline, which gives the -rt patchset's
sleeping spinlock heartburn.
This commit therefore converts ->ofl_lock to raw_spinlock_t.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
The rcu_data structure's ->dynticks_fqs is incremented but never
accesses. Its ->cond_resched_completed field isn't used at all.
This commit therefore removes both fields.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit move ->dynticks from the rcu_dynticks structure to the
rcu_data structure, replacing the field of the same name. It also updates
the code to access ->dynticks from the rcu_data structure and to use the
rcu_data structure rather than following to now-gone ->dynticks field
to the now-gone rcu_dynticks structure. While in the area, this commit
also fixes up comments.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit removes ->dynticks_nesting and ->dynticks_nmi_nesting from
the rcu_dynticks structure and updates the code to access them from the
rcu_data structure.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit removes ->rcu_need_heavy_qs and ->rcu_urgent_qs from the
rcu_dynticks structure and updates the code to access them from the
rcu_data structure.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The resched_cpu() interface is quite handy, but it does acquire the
specified CPU's runqueue lock, which does not come for free. This
commit therefore substitutes the following when directing resched_cpu()
at the current CPU:
set_tsk_need_resched(current);
set_preempt_need_resched();
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Because nohz_full CPUs can leave the scheduler-clock interrupt disabled
even when in kernel mode, RCU cannot rely on rcu_check_callbacks() to
enlist the scheduler's aid in extracting a quiescent state from such CPUs.
This commit therefore more aggressively uses resched_cpu() on nohz_full
CPUs that fail to pass through a quiescent state in a timely manner.
By default, the resched_cpu() beating starts 300 milliseconds into the
quiescent state.
While in the neighborhood, add a ->last_fqs_resched field to the rcu_data
structure in order to rate-limit resched_cpu() calls from the RCU
grace-period kthread.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The jiffies_till_sched_qs value used to determine how old a grace period
must be before RCU enlists the help of the scheduler to force a quiescent
state on the holdout CPU. Currently, this defaults to HZ/10 regardless of
system size and may be set only at boot time. This can be a problem for
very large systems, because if the values of the jiffies_till_first_fqs
and jiffies_till_next_fqs kernel parameters are left at their defaults,
they are calculated to increase as the number of CPUs actually configured
on the system increases. Thus, on a sufficiently large system, RCU would
enlist the help of the scheduler before the grace-period kthread had a
chance to scan for idle CPUs, which wastes CPU time.
This commit therefore allows jiffies_till_sched_qs to be set, if desired,
but if left as default, computes is as jiffies_till_first_fqs plus twice
jiffies_till_next_fqs, thus allowing three force-quiescent-state scans
for idle CPUs. This scales with the number of CPUs, providing sensible
default values.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The ->rcu_qs_ctr counter was intended to allow providing a lightweight
report of a quiescent state to all RCU flavors. But now that there is
only one flavor of RCU in any one running kernel, there is no point in
having this feature. This commit therefore removes the ->rcu_qs_ctr
field from the rcu_dynticks structure and the ->rcu_qs_ctr_snap field
from the rcu_data structure. This results in the "rqc" option to the
rcu_fqs trace event no longer being used, so this commit also removes the
"rqc" description from the header comment.
While in the neighborhood, this commit also causes the forward-progress
request .rcu_need_heavy_qs be set one jiffies_till_sched_qs interval
later in the grace period than the first setting of .rcu_urgent_qs.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The patch making need_resched() respond to urgent RCU-QS needs used
is_idle_task(current) to detect an interrupt from idle, which does work
reasonably, but is (in theory at least) vulnerable to loops containing
need_resched() invoked from within RCU_NONIDLE() or its tracepoint
equivalent. This commit therefore moves rcu_is_cpu_rrupt_from_idle()
to a place from which rcu_check_callbacks() can invoke it and replaces
the is_idle_task(current) with rcu_is_cpu_rrupt_from_idle().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent
need for an RCU quiescent state from the force-quiescent-state processing
within the grace-period kthread to context switches and to cond_resched().
Unfortunately, such urgent needs are not communicated to need_resched(),
which is sometimes used to decide when to invoke cond_resched(), for
but one example, within the KVM vcpu_run() function. As of v4.15, this
can result in synchronize_sched() being delayed by up to ten seconds,
which can be problematic, to say nothing of annoying.
This commit therefore checks rcu_dynticks.rcu_urgent_qs from within
rcu_check_callbacks(), which is invoked from the scheduling-clock
interrupt handler. If the current task is not an idle task and is
not executing in usermode, a context switch is forced, and either way,
the rcu_dynticks.rcu_urgent_qs variable is set to false. If the current
task is an idle task, then RCU's dyntick-idle code will detect the
quiescent state, so no further action is required. Similarly, if the
task is executing in usermode, other code in rcu_check_callbacks() and
its called functions will report the corresponding quiescent state.
Reported-by: Marius Hillenbrand <mhillenb@amazon.de>
Reported-by: David Woodhouse <dwmw2@infradead.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Because rcu_barrier() is a one-line wrapper function for _rcu_barrier()
and because nothing else calls _rcu_barrier(), this commit inlines
_rcu_barrier() into rcu_barrier().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that rcu_all_qs() is used only in !PREEMPT builds, move it to
tree_plugin.h so that it is defined only in those builds. This in
turn means that rcu_momentary_dyntick_idle() is only used in !PREEMPT
builds, but it is simply marked __maybe_unused in order to keep it
near the rest of the dyntick-idle code.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit removes rcu_sched_get_gp_seq(), rcu_bh_get_gp_seq(),
rcu_exp_batches_completed_sched(), rcu_sched_force_quiescent_state(),
and rcu_bh_force_quiescent_state(), which are no longer used because
rcutorture no longer does "rcu_bh" and "rcu_sched" torture types.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit saves a few lines by consolidating the RCU-sched function
definitions at the end of include/linux/rcupdate.h. This consolidation
also makes it easier to remove them all when the time comes.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit saves a few lines by consolidating the RCU-bh function
definitions at the end of include/linux/rcupdate.h. This consolidation
also makes it easier to remove them all when the time comes.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_gp_kthread() function is long and deeply indented, so this
commit pulls the loop that repeatedly invokes rcu_gp_fqs() into a new
rcu_gp_fqs_loop() function.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Consolidation of the RCU flavors into one makes increment_cpu_stall_ticks()
a trivial one-line function with only one caller. This commit therefore
inlines it.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Pointers to rcu_data structures should be named rdp, not rsp. This
commit therefore makes this change.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that there is only one rcu_state structure, there is less point in
maintaining a pointer to it. This commit therefore replaces rsp with
&rcu_state in rcu_cpu_starting() and rcu_init_one().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that there is only one rcu_state structure, there is less point
in maintaining a pointer to it. This commit therefore replaces rsp
with &rcu_state in rcu_barrier_callback(), rcu_barrier_func(), and
_rcu_barrier().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that there is only one rcu_state structure, there is less point in
maintaining a pointer to it. This commit therefore replaces rsp with
&rcu_state in rcu_report_qs_rnp(), force_quiescent_state(), and
rcu_check_gp_start_stall().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that there is only one rcu_state structure, there is less point in
maintaining a pointer to it. This commit therefore replaces rsp with
&rcu_state in rcu_do_batch(), invoke_rcu_callbacks(), and __call_rcu().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that there is only one rcu_state structure, there is less point
in maintaining a pointer to it. This commit therefore replaces
rsp with &rcu_state in rcu_start_this_gp(), rcu_accelerate_cbs(),
__note_gp_changes(), rcu_gp_init(), rcu_gp_fqs(), rcu_gp_cleanup(),
rcu_gp_kthread(), and rcu_report_qs_rsp().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that there is only one rcu_state structure, there is less point
in maintaining a pointer to it. This commit therefore replaces rsp
with &rcu_state in print_other_cpu_stall(), print_cpu_stall(), and
check_cpu_stall().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit removes the rsp and gpa local variables, repurposes the j
local variable and adds a gpk (GP kthread) local to improve readability.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit restructures rcutorture_get_gp_data() to take advantage of
the fact that there is only one flavor of RCU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that there is only ever a single flavor of RCU in a given kernel
build, there isn't a whole lot of point in having a flavor-traversal
macro. This commit therefore removes it and converts calls to it to
straightline code, inlining trivial functions as appropriate.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that there is only one rcu_state structure, there is no need for the
rcu_data structure to indicate which it corresponds to. This commit
therefore removes the rcu_data structure's ->rsp field, replacing all
remaining uses of it with &rcu_state.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the Linux
kernel, so there is no need to pass it as a parameter to RCU's rcu_node
tree's accessor macros. This commit therefore removes the rsp parameter
from those macros in kernel/rcu/rcu.h, and removes some now-unused rsp
local variables while in the area.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to
RCU's functions. This commit therefore removes the rsp parameter
from the code in kernel/rcu/tree_exp.h, and removes all of the
rsp local variables while in the area.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to
RCU's functions. This commit therefore removes the rsp parameter
from rcu_nocb_cpu_needs_barrier(), rcu_spawn_one_nocb_kthread(),
rcu_organize_nocb_kthreads(), rcu_nocb_cpu_needs_barrier(), and
rcu_nohz_full_cpu().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
print_cpu_stall_info().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
dump_blkd_tasks() and rcu_preempt_blocked_readers_cgp().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_print_detail_task_stall().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_init_one() and rcu_dump_rcu_node_tree().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of
the Linux kernel, so there is no need to pass it as a parameter
to RCU's functions. This commit therefore removes the rsp
parameter from rcu_boot_init_percpu_data(), rcu_init_percpu_data(),
rcu_cleanup_dying_idle_cpu(), and rcu_migrate_callbacks(). While in
the neighborhood, line the last three into rcutree_prepare_cpu(),
rcu_report_dead() and rcutree_migrate_callbacks(), respectively.
This also gets rid of the for_each_rcu_flavor() calls that were in
those tree functions.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
_rcu_barrier_trace() and _rcu_barrier().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the Linux
kernel, so there is no need to pass it as a parameter to RCU's functions.
This commit therefore removes the rsp parameter from __rcu_pending(),
and also inlines it into rcu_pending(), removing the for_each_rcu_flavor()
while in the neighborhood..
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
__call_rcu_core() and __call_rcu().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
__rcu_process_callbacks(), and also inlines it into rcu_process_callbacks(),
removing the for_each_rcu_flavor() while in the neighborhood.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_check_gp_start_stall().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
force_qs_rnp() and force_quiescent_state().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_do_batch().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_cleanup_dying_cpu() and rcu_cleanup_dead_cpu(). And, as long as
we are in the neighborhood, inlines them into rcutree_dying_cpu() and
rcutree_dead_cpu(), respectively. This also eliminates a pair of
for_each_rcu_flavor() loops.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_check_quiescent_state().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_gp_init(), rcu_gp_fqs_check_wake(), rcu_gp_fqs(), rcu_gp_cleanup(),
and rcu_gp_kthread().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_gp_slow().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
note_gp_changes().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
__note_gp_changes().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_advance_cbs().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_accelerate_cbs_unlocked().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_accelerate_cbs().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_gp_kthread_wake().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_future_gp_cleanup().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
check_cpu_stall().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
print_cpu_stall().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
print_other_cpu_stall().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_stall_kick_kthreads().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_dump_cpu_stacks().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_check_gp_kthread_starvation().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
record_gp_stall_check_time().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_get_root().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_gp_in_progress().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_report_qs_rdp().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_report_unblock_qs_rnp(), which is particularly appropriate in
this case given that this parameter is no longer used.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_report_qs_rsp().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There now is only one rcu_state structure in a given build of the
Linux kernel, so there is no need to pass it as a parameter to RCU's
functions. This commit therefore removes the rsp parameter from
rcu_report_qs_rnp().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_data_p pointer references the default set of per-CPU rcu_data
structures, that is, those that call_rcu() uses, as opposed to
call_rcu_bh() and sometimes call_rcu_sched(). But there is now only one
set of per-CPU rcu_data structures, so that one set is by definition
the default, which means that the rcu_data_p pointer no longer serves
any useful purpose. This commit therefore removes it.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_state_p pointer references the default rcu_state structure,
that is, the one that call_rcu() uses, as opposed to call_rcu_bh()
and sometimes call_rcu_sched(). But there is now only one rcu_state
structure, so that one structure is by definition the default, which
means that the rcu_state_p pointer no longer serves any useful purpose.
This commit therefore removes it.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_state structure's ->rda field was used to find the per-CPU
rcu_data structures corresponding to that rcu_state structure. But now
there is only one rcu_state structure (creatively named "rcu_state")
and one set of per-CPU rcu_data structures (creatively named "rcu_data").
Therefore, uses of the ->rda field can always be replaced by "rcu_data,
and this commit makes that change and removes the ->rda field.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_state structure's ->call field references the corresponding RCU
flavor's call_rcu() function. However, now that there is only ever one
rcu_state structure in a given build of the Linux kernel, and that flavor
uses plain old call_rcu(), there is not a lot of point in continuing to
have the ->call field. This commit therefore removes it.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that a given build of the Linux kernel has only one set of rcu_state,
rcu_node, and rcu_data structures, there is no point in creating a macro
to declare and compile-time initialize them. This commit therefore
just does normal declaration and compile-time initialization of these
structures. While in the area, this commit also removes #ifndefs of
the no-longer-ever-defined preprocessor macro RCU_TREE_NONCORE.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that RCU-preempt knows about preemption disabling, its implementation
of synchronize_rcu() works for synchronize_sched(), and likewise for the
other RCU-sched update-side API members. This commit therefore confines
the RCU-sched update-side code to CONFIG_PREEMPT=n builds, and defines
RCU-sched's update-side API members in terms of those of RCU-preempt.
This means that any given build of the Linux kernel has only one
update-side flavor of RCU, namely RCU-preempt for CONFIG_PREEMPT=y builds
and RCU-sched for CONFIG_PREEMPT=n builds. This in turn means that kernels
built with CONFIG_RCU_NOCB_CPU=y have only one rcuo kthread per CPU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
The rcu_report_exp_rdp() function is always invoked with its "wake"
argument set to "true", so this commit drops this parameter. The only
potential call site that would use "false" is in the code driving the
expedited grace period, and that code uses rcu_report_exp_cpu_mult()
instead, which therefore retains its "wake" parameter.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit updates comments and help text to account for the fact that
RCU-bh update-side functions are now simple wrappers for their RCU or
RCU-sched counterparts.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that the main RCU API knows about softirq disabling and softirq's
quiescent states, the RCU-bh update code can be dispensed with.
This commit therefore removes the RCU-bh update-side implementation and
defines RCU-bh's update-side API in terms of that of either RCU-preempt or
RCU-sched, depending on the setting of the CONFIG_PREEMPT Kconfig option.
In kernels built with CONFIG_RCU_NOCB_CPU=y this has the knock-on effect
of reducing by one the number of rcuo kthreads per CPU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
One necessary step towards consolidating the three flavors of RCU is to
make sure that the resulting consolidated "one flavor to rule them all"
correctly handles networking denial-of-service attacks. One thing that
allows RCU-bh to do so is that __do_softirq() invokes rcu_bh_qs() every
so often, and so something similar has to happen for consolidated RCU.
This must be done carefully. For example, if a preemption-disabled
region of code takes an interrupt which does softirq processing before
returning, consolidated RCU must ignore the resulting rcu_bh_qs()
invocations -- preemption is still disabled, and that means an RCU
reader for the consolidated flavor.
This commit therefore creates a new rcu_softirq_qs() that is called only
from the ksoftirqd task, thus avoiding the interrupted-a-preempted-region
problem. This new rcu_softirq_qs() function invokes rcu_sched_qs(),
rcu_preempt_qs(), and rcu_preempt_deferred_qs(). The latter call handles
any deferred quiescent states.
Note that __do_softirq() still invokes rcu_bh_qs(). It will continue to
do so until a later stage of cleanup when the RCU-bh flavor is removed.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix !SMP issue located by kbuild test robot. ]
RCU's dyntick-idle code is written to tolerate half-interrupts, that it,
either an interrupt that invokes rcu_irq_enter() but never invokes the
corresponding rcu_irq_exit() on the one hand, or an interrupt that never
invokes rcu_irq_enter() but does invoke the "corresponding" rcu_irq_exit()
on the other. These things really did happen at one time, as evidenced
by this ca-2011 LKML post:
http://lkml.kernel.org/r/20111014170019.GE2428@linux.vnet.ibm.com
The reason why RCU tolerates half-interrupts is that usermode helpers
used exceptions to invoke a system call from within the kernel such that
the system call did a normal return (not a return from exception) to
the calling context. This caused rcu_irq_enter() to be invoked without
a matching rcu_irq_exit(). However, usermode helpers have since been
rewritten to make much more housebroken use of workqueues, kernel threads,
and do_execve(), and therefore should no longer produce half-interrupts.
No one knows of any other source of half-interrupts, but then again,
no one seems insane enough to go audit the entire kernel to verify that
half-interrupts really are a relic of the past.
This commit therefore adds a pair of WARN_ON_ONCE() calls that will
trigger in the presence of half interrupts, which the code will continue
to handle correctly. If neither of these WARN_ON_ONCE() trigger by
mid-2021, then perhaps RCU can stop handling half-interrupts, which
would be a considerable simplification.
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
This commit defers reporting of RCU-preempt quiescent states at
rcu_read_unlock_special() time when any of interrupts, softirq, or
preemption are disabled. These deferred quiescent states are reported
at a later RCU_SOFTIRQ, context switch, idle entry, or CPU-hotplug
offline operation. Of course, if another RCU read-side critical
section has started in the meantime, the reporting of the quiescent
state will be further deferred.
This also means that disabling preemption, interrupts, and/or
softirqs will act as an RCU-preempt read-side critical section.
This is enforced by checking preempt_count() as needed.
Some special cases must be handled on an ad-hoc basis, for example,
context switch is a quiescent state even though both the scheduler and
do_exit() disable preemption. In these cases, additional calls to
rcu_preempt_deferred_qs() override the preemption disabling. Similar
logic overrides disabled interrupts in rcu_preempt_check_callbacks()
because in this case the quiescent state happened just before the
corresponding scheduling-clock interrupt.
In theory, this change lifts a long-standing restriction that required
that if interrupts were disabled across a call to rcu_read_unlock()
that the matching rcu_read_lock() also be contained within that
interrupts-disabled region of code. Because the reporting of the
corresponding RCU-preempt quiescent state is now deferred until
after interrupts have been enabled, it is no longer possible for this
situation to result in deadlocks involving the scheduler's runqueue and
priority-inheritance locks. This may allow some code simplification that
might reduce interrupt latency a bit. Unfortunately, in practice this
would also defer deboosting a low-priority task that had been subjected
to RCU priority boosting, so real-time-response considerations might
well force this restriction to remain in place.
Because RCU-preempt grace periods are now blocked not only by RCU
read-side critical sections, but also by disabling of interrupts,
preemption, and softirqs, it will be possible to eliminate RCU-bh and
RCU-sched in favor of RCU-preempt in CONFIG_PREEMPT=y kernels. This may
require some additional plumbing to provide the network denial-of-service
guarantees that have been traditionally provided by RCU-bh. Once these
are in place, CONFIG_PREEMPT=n kernels will be able to fold RCU-bh
into RCU-sched. This would mean that all kernels would have but
one flavor of RCU, which would open the door to significant code
cleanup.
Moving to a single flavor of RCU would also have the beneficial effect
of reducing the NOCB kthreads by at least a factor of two.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Apply rcu_read_unlock_special() preempt_count() feedback
from Joel Fernandes. ]
[ paulmck: Adjust rcu_eqs_enter() call to rcu_preempt_deferred_qs() in
response to bug reports from kbuild test robot. ]
[ paulmck: Fix bug located by kbuild test robot involving recursion
via rcu_preempt_deferred_qs(). ]
When entering or exiting irq or NMI handlers, the current code uses
->dynticks_nmi_nesting to detect if it is in the outermost handler,
that is, the one interrupting or returning to an RCU-idle context (the
idle loop or nohz_full usermode execution). When entering the outermost
handler via an interrupt (as opposed to NMI), it is necessary to invoke
rcu_dynticks_task_exit() just before the CPU is marked non-idle from an
RCU perspective and to invoke rcu_cleanup_after_idle() just after the
CPU is marked non-idle. Similarly, when exiting the outermost handler
via an interrupt, it is necessary to invoke rcu_prepare_for_idle() just
before marking the CPU idle and to invoke rcu_dynticks_task_enter()
just after marking the CPU idle.
The decision to execute these four functions is currently taken in
rcu_irq_enter() and rcu_irq_exit() as follows:
rcu_irq_enter()
/* A conditional branch with ->dynticks_nmi_nesting */
rcu_nmi_enter()
/* A conditional branch with ->dynticks */
/* A conditional branch with ->dynticks_nmi_nesting */
rcu_irq_exit()
/* A conditional branch with ->dynticks_nmi_nesting */
rcu_nmi_exit()
/* A conditional branch with ->dynticks_nmi_nesting */
/* A conditional branch with ->dynticks_nmi_nesting */
rcu_nmi_enter()
/* A conditional branch with ->dynticks */
rcu_nmi_exit()
/* A conditional branch with ->dynticks_nmi_nesting */
This works, but the conditional branches in rcu_irq_enter() and
rcu_irq_exit() are redundant with those in rcu_nmi_enter() and
rcu_nmi_exit(), respectively. Redundant branches are not something
we want in the to/from-idle fastpaths, so this commit refactors
rcu_{nmi,irq}_{enter,exit}() so they use a common inlined function passed
a constant argument as follows:
rcu_irq_enter() inlining rcu_nmi_enter_common(irq=true)
/* A conditional branch with ->dynticks */
rcu_irq_exit() inlining rcu_nmi_exit_common(irq=true)
/* A conditional branch with ->dynticks_nmi_nesting */
rcu_nmi_enter() inlining rcu_nmi_enter_common(irq=false)
/* A conditional branch with ->dynticks */
rcu_nmi_exit() inlining rcu_nmi_exit_common(irq=false)
/* A conditional branch with ->dynticks_nmi_nesting */
The combination of the constant function argument and the inlining allows
the compiler to discard the conditionals that previously controlled
execution of rcu_dynticks_task_exit(), rcu_cleanup_after_idle(),
rcu_prepare_for_idle(), and rcu_dynticks_task_enter(). This reduces both
the to-idle and from-idle path lengths by two conditional branches each,
and improves readability as well.
This commit also changes order of execution from this:
rcu_dynticks_task_exit();
rcu_dynticks_eqs_exit();
trace_rcu_dyntick();
rcu_cleanup_after_idle();
To this:
rcu_dynticks_task_exit();
rcu_dynticks_eqs_exit();
rcu_cleanup_after_idle();
trace_rcu_dyntick();
In other words, the calls to rcu_cleanup_after_idle() and
trace_rcu_dyntick() are reversed. This has no functional effect because
the real concern is whether a given call is before or after the call to
rcu_dynticks_eqs_exit(), and this patch does not change that. Before the
call to rcu_dynticks_eqs_exit(), RCU is not yet watching the current
CPU and after that call RCU is watching.
A similar switch in calling order happens on the idle-entry path, with
similar lack of effect for the same reasons.
Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Applied Steven Rostedt feedback. ]
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Pull scheduler updates from Thomas Gleixner:
- Cleanup and improvement of NUMA balancing
- Refactoring and improvements to the PELT (Per Entity Load Tracking)
code
- Watchdog simplification and related cleanups
- The usual pile of small incremental fixes and improvements
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
watchdog: Reduce message verbosity
stop_machine: Reflow cpu_stop_queue_two_works()
sched/numa: Move task_numa_placement() closer to numa_migrate_preferred()
sched/numa: Use group_weights to identify if migration degrades locality
sched/numa: Update the scan period without holding the numa_group lock
sched/numa: Remove numa_has_capacity()
sched/numa: Modify migrate_swap() to accept additional parameters
sched/numa: Remove unused task_capacity from 'struct numa_stats'
sched/numa: Skip nodes that are at 'hoplimit'
sched/debug: Reverse the order of printing faults
sched/numa: Use task faults only if numa_group is not yet set up
sched/numa: Set preferred_node based on best_cpu
sched/numa: Simplify load_too_imbalanced()
sched/numa: Evaluate move once per node
sched/numa: Remove redundant field
sched/debug: Show the sum wait time of a task group
sched/fair: Remove #ifdefs from scale_rt_capacity()
sched/core: Remove get_cpu() from sched_fork()
sched/cpufreq: Clarify sugov_get_util()
sched/sysctl: Remove unused sched_time_avg_ms sysctl
...
When rcutorture is built in to the kernel, an earlier patch detects
that and raises the priority of RCU's kthreads to allow rcutorture's
RCU priority boosting tests to succeed.
However, if rcutorture is built as a module, those priorities must be
raised manually via the rcutree.kthread_prio kernel boot parameter.
If this manual step is not taken, rcutorture's RCU priority boosting
tests will fail due to kthread starvation. One approach would be to
raise the default priority, but that risks breaking existing users.
Another approach would be to allow runtime adjustment of RCU's kthread
priorities, but that introduces numerous "interesting" race conditions.
This patch therefore instead detects too-low priorities, and prints a
message and disables the RCU priority boosting tests in that case.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Back when RCU had a debugfs interface, there was a test version and
sequence number that allowed associating debugfs data with a particular
test run, where the test run started with modprobe and ended with rmmod,
which was how tests were run back on the old ABAT system within IBM.
But rcutorture testing no longer runs on ABAT, and there is no longer an
RCU debugfs interface, so there is no longer any need for test versions
and sequence numbers.
This commit therefore removes the rcutorture_record_test_transition()
and rcutorture_record_progress() functions, and along with them the
rcutorture_testseq and rcutorture_vernum variables that they update.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcutorture RCU priority boosting tests fail even with CONFIG_RCU_BOOST
set because rcutorture's threads run at the same priority as the default
RCU kthreads (RT class with priority of 1).
This patch checks if RCU torture is built into the kernel and if so,
assigns RT priority 1 to the RCU threads, allowing the rcutorture boost
tests to pass.
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Currently, the range of jiffies_till_{first,next}_fqs are checked and
adjusted on and on in the loop of rcu_gp_kthread on runtime.
However, it's enough to check them only when setting them, not every
time in the loop. So make them handled on a setting time via sysfs.
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit adds any in-the-future ->gp_seq_needed fields to the
diagnostics for an rcutorture writer stall warning message.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Currently, rcu_check_gp_start_stall() waits for one second after the first
request before complaining that a grace period has not yet started. This
was desirable while testing the conversion from ->future_gp_needed[] to
->gp_seq_needed, but it is a bit on the hair-trigger side for production
use under heavy load. This commit therefore makes this wait time be
exactly that of the RCU CPU stall warning, allowing easy adjustment of
both timeouts to suit the distribution or installation at hand.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_cpu_has_callbacks() function is now used in all configurations,
so this commit removes the __maybe_unused.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
These functions are in kernel/rcu/tree.c, which is not an include file,
so there is no problem dropping the "inline", especially given that these
functions are nowhere near a fastpath. This commit therefore delegates
the inlining decision to the compiler by dropping the "inline".
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_dynticks_momentary_idle() function is invoked only from
rcu_momentary_dyntick_idle(), and neither function is particularly
large. This commit therefore saves a few lines by inlining
rcu_dynticks_momentary_idle() into rcu_momentary_dyntick_idle().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The naming and comments associated with some RCU-tasks code make
the faulty assumption that context switches due to cond_resched()
are voluntary. As several people pointed out, this is not the case.
This commit therefore updates function names and comments to better
reflect current reality.
Reported-by: Byungchul Park <byungchul.park@lge.com>
Reported-by: Joel Fernandes <joel@joelfernandes.org>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit also adjusts some whitespace while in the area.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Revert string-breaking %s as requested by Andy Shevchenko. ]
We expect a quiescent state of TASKS_RCU when cond_resched_tasks_rcu_qs()
is called, no matter whether it actually be scheduled or not. However,
it currently doesn't report the quiescent state when the task enters
into __schedule() as it's called with preempt = true. So make it report
the quiescent state unconditionally when cond_resched_tasks_rcu_qs() is
called.
And in TINY_RCU, even though the quiescent state of rcu_bh also should
be reported when the tick interrupt comes from user, it doesn't. So make
it reported.
Lastly in TREE_RCU, rcu_note_voluntary_context_switch() should be
reported when the tick interrupt comes from not only user but also idle,
as an extended quiescent state.
Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Simplify rcutiny portion given no RCU-tasks for !PREEMPT. ]
CPUs are expected to report quiescent states when coming online and
when going offline, and grace-period initialization is supposed to
handle any race conditions where a CPU's ->qsmask bit is set just after
it goes offline. This commit adds diagnostics for the case where an
offline CPU nevertheless has a grace period waiting on it.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Grace-period initialization first processes any recent CPU-hotplug
operations, and then initializes state for the new grace period. These
two phases of initialization are currently not distinguished in debug
prints, but the distinction is valuable in a number of debug situations.
This commit therefore introduces two new values for ->gp_state,
RCU_GP_ONOFF and RCU_GP_INIT, in order to make this distinction.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Interactions between CPU-hotplug operations and grace-period
initialization can result in dump_blkd_tasks(). One of the first
debugging actions in this case is to search back in dmesg to work
out which of the affected rcu_node structure's CPUs are online and to
determine the last CPU-hotplug operation affecting any of those CPUs.
This can be laborious and error-prone, especially when console output
is lost.
This commit therefore causes dump_blkd_tasks() to dump the state of
the affected rcu_node structure's CPUs and the last grace period during
which the last offline and online operation affected each of these CPUs.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that quiescent states for newly offlined CPUs are reported either
when that CPU goes offline or at the end of grace-period initialization,
the CPU-hotplug failsafe in the force-quiescent-state code path is no
longer needed.
This commit therefore removes this failsafe.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Now that quiescent-state reporting is fully event-driven, this commit
removes the check for a lost quiescent state from force_qs_rnp().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The main race with the early part of grace-period initialization appears
to be with CPU hotplug. To more fully open this race window, this commit
moves the rcu_gp_slow() from the beginning of the early initialization
loop to follow that loop, thus widening the race window, especially for
the rcu_node structures that are initialized last. This commit also
expands rcutree.gp_preinit_delay from 3 to 12, giving the same overall
delay in the grace period, but concentrated in the spot where it will
do the most good.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Without special fail-safe quiescent-state-propagation checks, grace-period
hangs can result from the following scenario:
1. CPU 1 goes offline.
2. Because CPU 1 is the only CPU in the system blocking the current
grace period, the grace period ends as soon as
rcu_cleanup_dying_idle_cpu()'s call to rcu_report_qs_rnp()
returns.
3. At this point, the leaf rcu_node structure's ->lock is no longer
held: rcu_report_qs_rnp() has released it, as it must in order
to awaken the RCU grace-period kthread.
4. At this point, that same leaf rcu_node structure's ->qsmaskinitnext
field still records CPU 1 as being online. This is absolutely
necessary because the scheduler uses RCU (in this case on the
wake-up path while awakening RCU's grace-period kthread), and
->qsmaskinitnext contains RCU's idea as to which CPUs are online.
Therefore, invoking rcu_report_qs_rnp() after clearing CPU 1's
bit from ->qsmaskinitnext would result in a lockdep-RCU splat
due to RCU being used from an offline CPU.
5. RCU's grace-period kthread awakens, sees that the old grace period
has completed and that a new one is needed. It therefore starts
a new grace period, but because CPU 1's leaf rcu_node structure's
->qsmaskinitnext field still shows CPU 1 as being online, this new
grace period is initialized to wait for a quiescent state from the
now-offline CPU 1.
6. Without the fail-safe force-quiescent-state checks, there would
be no quiescent state from the now-offline CPU 1, which would
eventually result in RCU CPU stall warnings and memory exhaustion.
It would be good to get rid of the special fail-safe quiescent-state
propagation checks, and thus it would be good to fix things so that
the above scenario cannot happen. This commit therefore adds a new
->ofl_lock to the rcu_state structure. This lock is held by rcu_gp_init()
across the applying of buffered online and offline operations to the
rcu_node tree, and it is also held by rcu_cleanup_dying_idle_cpu()
when buffering a new offline operation. This prevents rcu_gp_init()
from acquiring the leaf rcu_node structure's lock during the interval
between when rcu_cleanup_dying_idle_cpu() invokes rcu_report_qs_rnp(),
which releases ->lock and the re-acquisition of that same lock.
This in turn prevents the failure scenario outlined above, and will
hopefully eventually allow removal of the offline-CPU checks from the
force-quiescent-state code path.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Without special fail-safe quiescent-state-propagation checks, grace-period
hangs can result from the following scenario:
1. A task running on a given CPU is preempted in its RCU read-side
critical section.
2. That CPU goes offline, and there are now no online CPUs
corresponding to that CPU's leaf rcu_node structure.
3. The rcu_gp_init() function does the first phase of grace-period
initialization, and sets the aforementioned leaf rcu_node
structure's ->qsmaskinit field to all zeroes. Because there
is a blocked task, it does not propagate the zeroing of either
->qsmaskinit or ->qsmaskinitnext up the rcu_node tree.
4. The task resumes on some other CPU and exits its critical section.
There is no grace period in progress, so the resulting quiescent
state is not reported up the tree.
5. The rcu_gp_init() function does the second phase of grace-period
initialization, which results in the leaf rcu_node structure
being initialized to expect no further quiescent states, but
with that structure's parent expecting a quiescent-state report.
The parent will never receive a quiescent state from this leaf
rcu_node structure, so the grace period will hang, resulting in
RCU CPU stall warnings.
It would be good to get rid of the special fail-safe quiescent-state
propagation checks. This commit therefore checks the leaf rcu_node
structure's ->wait_blkd_tasks field during grace-period initialization.
If this flag is set, the rcu_report_qs_rnp() is invoked to immediately
report the possible quiescent state. While in the neighborhood, this
commit also report quiescent states for any CPUs that went offline between
the two phases of grace-period initialization, thus reducing grace-period
delays and hopefully eventually allowing removal of offline-CPU checks
from the force-quiescent-state code path.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Consider the following sequence of events in a PREEMPT=y kernel:
1. All but one of the CPUs corresponding to a given leaf rcu_node
structure go offline. Each of these CPUs clears its bit in that
structure's ->qsmaskinitnext field.
2. A new grace period starts, and rcu_gp_init() scans the leaf
rcu_node structures, applying CPU-hotplug changes since the
start of the previous grace period, including those changes in
#1 above. This copies each leaf structure's ->qsmaskinitnext
to its ->qsmask field, which represents the CPUs that this new
grace period will wait on. Each copy operation is done holding
the corresponding leaf rcu_node structure's ->lock, and at the
end of this scan, rcu_gp_init() holds no locks.
3. The last CPU corresponding to #1's leaf rcu_node structure goes
offline, clearing its bit in that structure's ->qsmaskinitnext
field, but not touching the ->qsmaskinit field. Note that
rcu_gp_init() is not currently holding any locks! This CPU does
-not- report a quiescent state because the grace period has not
yet initialized itself sufficiently to have set any bits in any
of the leaf rcu_node structures' ->qsmask fields.
4. The rcu_gp_init() function continues initializing the new grace
period, copying each leaf rcu_node structure's ->qsmaskinit
field to its ->qsmask field while holding the corresponding ->lock.
This sets the ->qsmask bit corresponding to #3's CPU.
5. Before the grace period ends, #3's CPU comes back online.
Because te grace period has not yet done any force-quiescent-state
scans (which would report a quiescent state on behalf of any
offline CPUs), this CPU's ->qsmask bit is still set.
6. A task running on the newly onlined CPU is preempted while in
an RCU read-side critical section. Because this CPU's ->qsmask
bit is net, not only does this task queue itself on the leaf
rcu_node structure's ->blkd_tasks list, it also sets that
structure's ->gp_tasks pointer to reference it.
7. The grace period started in #1 above comes to an end. This
results in rcu_gp_cleanup() being invoked, which, among other
things, checks to make sure that there are no tasks blocking the
just-ended grace period, that is, that all ->gp_tasks pointers
are NULL. The ->gp_tasks pointer corresponding to the task
preempted in #3 above is non-NULL, which results in a splat.
This splat is a false positive. The task's RCU read-side critical
section cannot have begun before the just-ended grace period because
this would mean either: (1) The CPU came online before the grace period
started, which cannot have happened because the grace period started
before that CPU went offline, or (2) The task started its RCU read-side
critical section on some other CPU, but then it would have had to have
been preempted before migrating to this CPU, which would mean that it
would have instead queued itself on that other CPU's rcu_node structure.
RCU's grace periods thus are working correctly. Or, more accurately,
that remaining bugs in RCU's grace periods are elsewhere.
This commit eliminates this false positive by adding code to the end
of rcu_cpu_starting() that reports a quiescent state to RCU, which has
the side-effect of clearing that CPU's ->qsmask bit, preventing the
above scenario. This approach has the added benefit of more promptly
reporting quiescent states corresponding to offline CPUs. Nevertheless,
this commit does -not- remove the need for the force-quiescent-state
scans to check for offline CPUs, given that a CPU might remain offline
indefinitely. And without the checks in the force-quiescent-state scans,
the grace period would also persist indefinitely, which could result in
hangs or memory exhaustion.
Note well that the call to rcu_report_qs_rnp() reporting the quiescent
state must come -after- the setting of this CPU's bit in the leaf rcu_node
structure's ->qsmaskinitnext field. Otherwise, lockdep-RCU will complain
bitterly about quiescent states coming from an offline CPU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Consider the following sequence of events in a PREEMPT=y kernel:
1. All CPUs corresponding to a given rcu_node structure go offline.
A new grace period starts just after the CPU-hotplug code path
does its synchronize_rcu() for the last CPU, so at least this
CPU is present in that structure's ->qsmask.
2. Before the grace period ends, a CPU comes back online, and not
just any CPU, but the one corresponding to a non-zero bit in
the leaf rcu_node structure's ->qsmask.
3. A task running on the newly onlined CPU is preempted while in
an RCU read-side critical section. Because this CPU's ->qsmask
bit is net, not only does this task queue itself on the leaf
rcu_node structure's ->blkd_tasks list, it also sets that
structure's ->gp_tasks pointer to reference it.
4. The grace period started in #1 above comes to an end. This
results in rcu_gp_cleanup() being invoked, which, among other
things, checks to make sure that there are no tasks blocking the
just-ended grace period, that is, that all ->gp_tasks pointers
are NULL. The ->gp_tasks pointer corresponding to the task
preempted in #3 above is non-NULL, which results in a splat.
This splat is a false positive. The task's RCU read-side critical
section cannot have begun before the just-ended grace period because
this would mean either: (1) The CPU came online before the grace period
started, which cannot have happened because the grace period started
before that CPU was all the way offline, or (2) The task started its
RCU read-side critical section on some other CPU, but then it would
have had to have been preempted before migrating to this CPU, which
would mean that it would have instead queued itself on that other CPU's
rcu_node structure.
This commit eliminates this false positive by adding code to the end
of rcu_cleanup_dying_idle_cpu() that reports a quiescent state to RCU,
which has the side-effect of clearing that CPU's ->qsmask bit, preventing
the above scenario. This approach has the added benefit of more promptly
reporting quiescent states corresponding to offline CPUs.
Note well that the call to rcu_report_qs_rnp() reporting the quiescent
state must come -before- the clearing of this CPU's bit in the leaf
rcu_node structure's ->qsmaskinitnext field. Otherwise, lockdep-RCU
will complain bitterly about quiescent states coming from an offline CPU.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The rcu_lockdep_current_cpu_online() function currently checks only the
RCU-sched data structures to determine whether or not RCU believes that a
given CPU is offline. Unfortunately, there are multiple flavors of RCU,
which means that there is a short window of time during which the various
flavors disagree as to whether or not a given CPU is offline. This can
result in false-positive lockdep-RCU splats in which some other flavor
of RCU tries to do something based on its view that the CPU is online,
only to get hit with a lockdep-RCU splat because RCU-sched instead
believes that the CPU is offline.
This commit therefore changes rcu_lockdep_current_cpu_online() to scan
all RCU flavors and to consider a given CPU to be online if any of the
RCU flavors believe it to be online, thus preventing these false-positive
splats.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The force_qs_rnp() function checks for ->qsmask being all zero, that is,
all CPUs for the current rcu_node structure having already passed through
quiescent states. But with RCU-preempt, this is not sufficient to report
quiescent states further up the tree, so there are further checks that
can initiate RCU priority boosting and also for races with CPU-hotplug
operations. However, if neither of these further checks apply, the code
proceeds to carry out a useless scan of an all-zero ->qsmask.
This commit therefore adds code to release the current rcu_node
structure's lock and continue on to the next rcu_node structure, thereby
avoiding this useless scan.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit gets rid of the smp_wmb() in record_gp_stall_check_time()
in favor of an smp_store_release().
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
This commit fixes a typo and adds some additional debugging to the
message emitted when a task blocking the current grace period is listed
as blocking it when either that grace period ends or the next grace
period begins. This commit also reformats the console message for
readability.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
If rcu_report_unblock_qs_rnp() is invoked on something other than
preemptible RCU or if there are still preempted tasks blocking the
current grace period, something went badly wrong in the caller.
This commit therefore adds WARN_ON_ONCE() to these conditions, but
leaving the legitimate reason for early exit (rnp->qsmask != 0)
unwarned.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Currently, rcu_init_new_rnp() walks up the rcu_node combining tree,
setting bits in the ->qsmaskinit fields on the way up. It walks up
unconditionally, regardless of the initial state of these bits. This is
OK because only the corresponding RCU grace-period kthread ever tests
or sets these bits during runtime. However, it is also pointless, and
it increases both memory and lock contention (albeit only slightly), so
this commit stops the walk as soon as an already-set bit is encountered.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Back in the old days, when grace-period initialization blocked CPU
hotplug, the ->qsmaskinit mask was indeed updated at the time that
a given CPU went offline. However, with the deferral of these updates
until the beginning of the next grace period in commit 0aa04b055e
("rcu: Process offlining and onlining only at grace-period start"),
it is instead ->qsmaskinitnext that gets updated at that time.
This commit therefore updates the obsolete comment. It also fixes
punctuation while on the topic of comments mentioning ->qsmaskinit.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Commit 0aa04b055e ("rcu: Process offlining and onlining only at
grace-period start") deferred handling of CPU-hotplug events until the
start of the next grace period, but consider the following sequence
of events:
1. A task is preempted within an RCU-preempt read-side critical
section.
2. The CPU that this task was running on goes offline, along with all
other CPUs sharing the corresponding leaf rcu_node structure.
3. The task resumes execution.
4. One of those CPUs comes back online before a new grace period starts.
In step 2, the code in the next rcu_gp_init() invocation will (correctly)
defer removing the leaf rcu_node structure from the upper-level bitmasks,
and will (correctly) set that structure's ->wait_blkd_tasks field. During
the ensuing interval, RCU will (correctly) track the tasks preempted on
that structure because they must block any subsequent grace period.
In step 3, the code in rcu_read_unlock_special() will (correctly) remove
the task from the leaf rcu_node structure. From this point forward, RCU
need not pay attention to this structure, at least not until one of the
corresponding CPUs comes back online.
In step 4, the code in the next rcu_gp_init() invocation will
(incorrectly) invoke rcu_init_new_rnp(). This is incorrect because
the corresponding rcu_cleanup_dead_rnp() was never invoked. This is
nevertheless harmless because the upper-level bits are still set.
So, no harm, no foul, right?
At least, all is well until a little further into rcu_gp_init()
invocation, which will notice that there are no longer any tasks blocked
on the leaf rcu_node structure, conclude that there is no longer anything
left over from step 2's offline operation, and will therefore invoke
rcu_cleanup_dead_rnp(). But this invocation of rcu_cleanup_dead_rnp()
is for the beginning of the earlier offline interval, and the previous
invocation of rcu_init_new_rnp() is for the end of that same interval.
That is right, they are invoked out of order.
That cannot be good, can it?
It turns out that this is not a (correctness!) problem because
rcu_cleanup_dead_rnp() checks to see if any of the corresponding CPUs
are online, and refuses to do anything if so. In other words, in the
case where rcu_init_new_rnp() and rcu_cleanup_dead_rnp() execute out of
order, they both have no effect.
But this is at best an accident waiting to happen.
This commit therefore adds logic to rcu_gp_init() so that
rcu_init_new_rnp() and rcu_cleanup_dead_rnp() are always invoked in
order, and so that neither are invoked at all in cases where RCU had to
pay attention to the leaf rcu_node structure during the entire time that
all corresponding CPUs were offline.
And, while in the area, this commit reduces confusion by using formal
parameters rather than local variables that just happen to have the same
value at that particular point in the code.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
There's no need to keep checking the same starting node for whether a
grace period is in progress as we advance up the funnel lock loop. Its
sufficient if we just checked it in the start, and then subsequently
checked the internal nodes as we advanced up the combining tree. This
also makes sense because the grace-period updates propogate from the
root to the leaf, so there's a chance we may find a grace period has
started as we advance up, lets check for the same.
Reported-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
The funnel locking loop in rcu_start_this_gp uses rcu_root as a
temporary variable while walking the combining tree. This causes a
tiresome exercise of a code reader reminding themselves that rcu_root
may not be root. Lets just call it rnp, and rename other variables as
well to be more appropriate.
Original patch: https://patchwork.kernel.org/patch/10396577/
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Fix name in comment as well. ]