diff --git a/Documentation/RCU/Design/Requirements/Requirements.html b/Documentation/RCU/Design/Requirements/Requirements.html index 43c4e2f05f40..7efc1c1da7af 100644 --- a/Documentation/RCU/Design/Requirements/Requirements.html +++ b/Documentation/RCU/Design/Requirements/Requirements.html @@ -1381,6 +1381,7 @@ Classes of quality-of-implementation requirements are as follows:
  1. Specialization
  2. Performance and Scalability +
  3. Forward Progress
  4. Composability
  5. Corner Cases
@@ -1822,6 +1823,106 @@ so it is too early to tell whether they will stand the test of time. RCU thus provides a range of tools to allow updaters to strike the required tradeoff between latency, flexibility and CPU overhead. +

Forward Progress

+ +

+In theory, delaying grace-period completion and callback invocation +is harmless. +In practice, not only are memory sizes finite but also callbacks sometimes +do wakeups, and sufficiently deferred wakeups can be difficult +to distinguish from system hangs. +Therefore, RCU must provide a number of mechanisms to promote forward +progress. + +

+These mechanisms are not foolproof, nor can they be. +For one simple example, an infinite loop in an RCU read-side critical +section must by definition prevent later grace periods from ever completing. +For a more involved example, consider a 64-CPU system built with +CONFIG_RCU_NOCB_CPU=y and booted with rcu_nocbs=1-63, +where CPUs 1 through 63 spin in tight loops that invoke +call_rcu(). +Even if these tight loops also contain calls to cond_resched() +(thus allowing grace periods to complete), CPU 0 simply will +not be able to invoke callbacks as fast as the other 63 CPUs can +register them, at least not until the system runs out of memory. +In both of these examples, the Spiderman principle applies: With great +power comes great responsibility. +However, short of this level of abuse, RCU is required to +ensure timely completion of grace periods and timely invocation of +callbacks. + +

+RCU takes the following steps to encourage timely completion of +grace periods: + +

    +
  1. If a grace period fails to complete within 100 milliseconds, + RCU causes future invocations of cond_resched() on + the holdout CPUs to provide an RCU quiescent state. + RCU also causes those CPUs' need_resched() invocations + to return true, but only after the corresponding CPU's + next scheduling-clock. +
  2. CPUs mentioned in the nohz_full kernel boot parameter + can run indefinitely in the kernel without scheduling-clock + interrupts, which defeats the above need_resched() + strategem. + RCU will therefore invoke resched_cpu() on any + nohz_full CPUs still holding out after + 109 milliseconds. +
  3. In kernels built with CONFIG_RCU_BOOST=y, if a given + task that has been preempted within an RCU read-side critical + section is holding out for more than 500 milliseconds, + RCU will resort to priority boosting. +
  4. If a CPU is still holding out 10 seconds into the grace + period, RCU will invoke resched_cpu() on it regardless + of its nohz_full state. +
+ +

+The above values are defaults for systems running with HZ=1000. +They will vary as the value of HZ varies, and can also be +changed using the relevant Kconfig options and kernel boot parameters. +RCU currently does not do much sanity checking of these +parameters, so please use caution when changing them. +Note that these forward-progress measures are provided only for RCU, +not for +SRCU or +Tasks RCU. + +

+RCU takes the following steps in call_rcu() to encourage timely +invocation of callbacks when any given non-rcu_nocbs CPU has +10,000 callbacks, or has 10,000 more callbacks than it had the last time +encouragement was provided: + +

    +
  1. Starts a grace period, if one is not already in progress. +
  2. Forces immediate checking for quiescent states, rather than + waiting for three milliseconds to have elapsed since the + beginning of the grace period. +
  3. Immediately tags the CPU's callbacks with their grace period + completion numbers, rather than waiting for the RCU_SOFTIRQ + handler to get around to it. +
  4. Lifts callback-execution batch limits, which speeds up callback + invocation at the expense of degrading realtime response. +
+ +

+Again, these are default values when running at HZ=1000, +and can be overridden. +Again, these forward-progress measures are provided only for RCU, +not for +SRCU or +Tasks RCU. +Even for RCU, callback-invocation forward progress for rcu_nocbs +CPUs is much less well-developed, in part because workloads benefiting +from rcu_nocbs CPUs tend to invoke call_rcu() +relatively infrequently. +If workloads emerge that need both rcu_nocbs CPUs and high +call_rcu() invocation rates, then additional forward-progress +work will be required. +

Composability

@@ -2272,7 +2373,7 @@ that meets this requirement. Furthermore, NMI handlers can be interrupted by what appear to RCU to be normal interrupts. One way that this can happen is for code that directly invokes -rcu_irq_enter() and rcu_irq_exit() to be called +rcu_irq_enter() and rcu_irq_exit() to be called from an NMI handler. This astonishing fact of life prompted the current code structure, which has rcu_irq_enter() invoking rcu_nmi_enter() @@ -2294,7 +2395,7 @@ via del_timer_sync() or similar.

Unfortunately, there is no way to cancel an RCU callback; once you invoke call_rcu(), the callback function is -going to eventually be invoked, unless the system goes down first. +eventually going to be invoked, unless the system goes down first. Because it is normally considered socially irresponsible to crash the system in response to a module unload request, we need some other way to deal with in-flight RCU callbacks. @@ -3233,6 +3334,11 @@ For example, RCU callback overhead might be charged back to the originating call_rcu() instance, though probably not in production kernels. +

+Additional work may be required to provide reasonable forward-progress +guarantees under heavy load for grace periods and for callback +invocation. +

Summary