WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Jakub Kicinski	96917bb3a3	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net include/linux/net.h `a5ef058dc4` ("net: introduce and use custom sockopt socket flag") `e993ffe3da` ("net: flag sockets supporting msghdr originated zerocopy") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-10-24 13:44:11 -07:00
Paul E. McKenney	31d8aaa87f	rcu: Keep synchronize_rcu() from enabling irqs in early boot Making polled RCU grace periods account for expedited grace periods required acquiring the leaf rcu_node structure's lock during early boot, but after rcu_init() was called. This lock is irq-disabled, but the code incorrectly assumes that irqs are always disabled when invoking synchronize_rcu(). The exception is early boot before the scheduler has started, which means that upon return from synchronize_rcu(), irqs will be incorrectly enabled. This commit fixes this bug by using irqsave/irqrestore locking primitives. Fixes: `bf95b2bc3e` ("rcu: Switch polled grace-period APIs to ->gp_seq_polled") Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-10-20 15:34:49 -07:00
Paul E. McKenney	e6c86c513f	rcu-tasks: Provide rcu_trace_implies_rcu_gp() As an accident of implementation, an RCU Tasks Trace grace period also acts as an RCU grace period. However, this could change at any time. This commit therefore creates an rcu_trace_implies_rcu_gp() that currently returns true to codify this accident. Code relying on this accident must call this function to verify that this accident is still happening. Reported-by: Hou Tao <houtao@huaweicloud.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Link: https://lore.kernel.org/r/20221014113946.965131-2-houtao@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-10-18 10:27:02 -07:00
Paul E. McKenney	5c0ec49004	Merge branches 'doc.2022.08.31b', 'fixes.2022.08.31b', 'kvfree.2022.08.31b', 'nocb.2022.09.01a', 'poll.2022.08.31b', 'poll-srcu.2022.08.31b' and 'tasks.2022.08.31b' into HEAD doc.2022.08.31b: Documentation updates fixes.2022.08.31b: Miscellaneous fixes kvfree.2022.08.31b: kvfree_rcu() updates nocb.2022.09.01a: NOCB CPU updates poll.2022.08.31b: Full-oldstate RCU polling grace-period API poll-srcu.2022.08.31b: Polled SRCU grace-period updates tasks.2022.08.31b: Tasks RCU updates	2022-09-01 10:55:57 -07:00
Zqiang	48297a22a3	rcutorture: Use the barrier operation specified by cur_ops The rcutorture_oom_notify() function unconditionally invokes rcu_barrier(), which is OK when the rcutorture.torture_type value is "rcu", but unhelpful otherwise. The purpose of these barrier calls is to wait for all outstanding callback-flooding callbacks to be invoked before cleaning up their data. Using the wrong barrier function therefore risks arbitrary memory corruption. Thus, this commit changes these rcu_barrier() calls into cur_ops->cb_barrier() to make things work when torturing non-vanilla flavors of RCU. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-09-01 10:50:04 -07:00
Zqiang	528262f502	rcu-tasks: Make RCU Tasks Trace check for userspace execution Userspace execution is a valid quiescent state for RCU Tasks Trace, but the scheduling-clock interrupt does not currently report such quiescent states. Of course, the scheduling-clock interrupt is not strictly speaking userspace execution. However, the only way that this code is not in a quiescent state is if something invoked rcu_read_lock_trace(), and that would be reflected in the ->trc_reader_nesting field in the task_struct structure. Furthermore, this field is checked by rcu_tasks_trace_qs(), which is invoked by rcu_tasks_qs() which is in turn invoked by rcu_note_voluntary_context_switch() in kernels building at least one of the RCU Tasks flavors. It is therefore safe to invoke rcu_tasks_trace_qs() from the rcu_sched_clock_irq(). But rcu_tasks_qs() also invokes rcu_tasks_classic_qs() for RCU Tasks, which lacks the read-side markers provided by RCU Tasks Trace. This raises the possibility that an RCU Tasks grace period could start after the interrupt from userspace execution, but before the call to rcu_sched_clock_irq(). However, it turns out that this is safe because the RCU Tasks grace period waits for an RCU grace period, which will wait for the entire scheduling-clock interrupt handler, including any RCU Tasks read-side critical section that this handler might contain. This commit therefore updates the rcu_sched_clock_irq() function's check for usermode execution and its call to rcu_tasks_classic_qs() to instead check for both usermode execution and interrupt from idle, and to instead call rcu_note_voluntary_context_switch(). This consolidates code and provides more faster RCU Tasks Trace reporting of quiescent states in kernels that do scheduling-clock interrupts for userspace execution. [ paulmck: Consolidate checks into rcu_sched_clock_irq(). ] Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:10:55 -07:00
Paul E. McKenney	d6ad60635c	rcu-tasks: Ensure RCU Tasks Trace loops have quiescent states The RCU Tasks Trace grace-period kthread loops across all CPUs, and there can be quite a few CPUs, with some commercially available systems sporting well over a thousand of them. Some of these loops can feature IPIs, which can take some time. This commit therefore places a call to cond_resched_tasks_rcu_qs() in each such loop. Link: https://docs.google.com/document/d/1V0YnG1HTWMt9WHJjroiJL9lf-hMrud4v8Fn3fhyY0cI/edit?usp=sharing Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:10:55 -07:00
Zqiang	fcd53c8a4d	rcu-tasks: Convert RCU_LOCKDEP_WARN() to WARN_ONCE() Kernels built with CONFIG_PROVE_RCU=y and CONFIG_DEBUG_LOCK_ALLOC=y attempt to emit a warning when the synchronize_rcu_tasks_generic() function is called during early boot while the rcu_scheduler_active variable is RCU_SCHEDULER_INACTIVE. However the warnings is not actually be printed because the debug_lockdep_rcu_enabled() returns false, exactly because the rcu_scheduler_active variable is still equal to RCU_SCHEDULER_INACTIVE. This commit therefore replaces RCU_LOCKDEP_WARN() with WARN_ONCE() to force these warnings to actually be printed. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:10:54 -07:00
Paul E. McKenney	5fe89191e4	srcu: Make Tiny SRCU use full-sized grace-period counters This commit makes Tiny SRCU use full-sized grace-period counters to further avoid counter-wrap issues when using polled grace-period APIs. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:10:15 -07:00
Paul E. McKenney	de3f2671ae	srcu: Make Tiny SRCU poll_state_synchronize_srcu() more precise This commit applies the more-precise grace-period-state check used by rcu_seq_done_exact() to poll_state_synchronize_srcu(). This is important because Tiny SRCU uses a 16-bit counter, which can wrap quite quickly. If counter wrap continues to be a problem, then expanding ->srcu_idx and ->srcu_idx_max to 32 bits might be warranted. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:10:15 -07:00
Paul E. McKenney	599d97e3f2	rcutorture: Make "srcud" option also test polled grace-period API This commit brings the "srcud" (dynamically allocated) SRCU test in line with the "srcu" (statically allocated) test, so that both test the full SRCU polled grace-period API. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:10:15 -07:00
Paul E. McKenney	967c298d65	rcutorture: Limit read-side polling-API testing RCU's polled grace-period API is reasonably lightweight, but still contains heavyweight memory barriers. This commit therefore limits testing of this API from rcutorture's readers in order to avoid the false negatives that these heavyweight operations could provoke. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:09:22 -07:00
Paul E. McKenney	5d7801f201	rcutorture: Expand rcu_torture_write_types() first "if" statement This commit expands the rcu_torture_write_types() function's first "if" condition and body, placing one element per line, in order to make the compiler's error messages more helpful. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:09:22 -07:00
Paul E. McKenney	cc8faf5b65	rcutorture: Use 1-suffixed variable in rcu_torture_write_types() check This commit changes the use of gp_poll_exp to gp_poll_exp1 in the first check in rcu_torture_write_types(). No functional effect, but consistency is a good thing. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:09:22 -07:00
Paul E. McKenney	d761de8a7d	rcu: Make synchronize_rcu() fastpath update only boot-CPU counters Large systems can have hundreds of rcu_node structures, and updating counters in each of them might slow down booting. This commit therefore updates only the counters in those rcu_node structures corresponding to the boot CPU, up to and including the root rcu_node structure. The counters for the remaining rcu_node structures are updated by the rcu_scheduler_starting() function, which executes just before the first non-boot kthread is spawned. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:09:22 -07:00
Paul E. McKenney	b3cdd0a79c	rcutorture: Adjust rcu_poll_need_2gp() for rcu_gp_oldstate field removal Now that rcu_gp_oldstate can accurately track both normal and expedited grace periods regardless of system state, rcutorture's rcu_poll_need_2gp() function need only call for a second grace period for the old single-unsigned-long grace-period polling APIs This commit therefore adjusts rcu_poll_need_2gp() accordingly. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:09:21 -07:00
Paul E. McKenney	7ecef0871d	rcu: Remove ->rgos_polled field from rcu_gp_oldstate structure Because both normal and expedited grace periods increment their respective counters on their pre-scheduler early boot fastpaths, the rcu_gp_oldstate structure no longer needs its ->rgos_polled field. This commit therefore removes this field, shrinking this structure so that it is the same size as an rcu_head structure. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:09:21 -07:00
Paul E. McKenney	43ff97cc99	rcu: Make synchronize_rcu_expedited() fast path update .expedited_sequence This commit causes the early boot single-CPU synchronize_rcu_expedited() fastpath to update the rcu_state structure's ->expedited_sequence counter. This will allow the full-state polled grace-period APIs to detect all expedited grace periods without the need to track the special combined polling-only counter, which is another step towards removing the ->rgos_polled field from the rcu_gp_oldstate, thereby reducing its size by one third. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:09:21 -07:00
Paul E. McKenney	e8755d2bde	rcu: Remove expedited grace-period fast-path forward-progress helper Now that the expedited grace-period fast path can only happen during the pre-scheduler portion of early boot, this fast path can no longer block run-time RCU Trace grace periods. This commit therefore removes the conditional cond_resched() invocation. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:09:21 -07:00
Paul E. McKenney	910e12092e	rcu: Make synchronize_rcu() fast path update ->gp_seq counters This commit causes the early boot single-CPU synchronize_rcu() fastpath to update the rcu_state and rcu_node structures' ->gp_seq and ->gp_seq_needed counters. This will allow the full-state polled grace-period APIs to detect all normal grace periods without the need to track the special combined polling-only counter, which is a step towards removing the ->rgos_polled field from the rcu_gp_oldstate, thereby reducing its size by one third. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:09:21 -07:00
Paul E. McKenney	5f11bad6b7	rcu-tasks: Remove grace-period fast-path rcu-tasks helper Now that the grace-period fast path can only happen during the pre-scheduler portion of early boot, this fast path can no longer block run-time RCU Tasks and RCU Tasks Trace grace periods. This commit therefore removes the conditional cond_resched_tasks_rcu_qs() invocation. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:08 -07:00
Paul E. McKenney	a5d1b0b68a	rcu: Set rcu_data structures' initial ->gpwrap value to true It would be good do reduce the size of the rcu_gp_oldstate structure from three unsigned long instances to two, but this requires that the boot-time optimized grace periods update the various ->gp_seq fields. Updating these fields in the rcu_state structure and in all of the rcu_node structures is at least semi-reasonable, but updating them in all of the rcu_data structures is a bridge too far. This means that if there are too many early boot-time grace periods, the ->gp_seq field in the rcu_data structure cannot be trusted. This commit therefore sets each rcu_data structure's ->gpwrap field to provide the necessary impetus for a suitable level of distrust. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:08 -07:00
Paul E. McKenney	258f887aba	rcu: Disable run-time single-CPU grace-period optimization The run-time single-CPU grace-period optimization applies only to kernels built with CONFIG_SMP=y && CONFIG_PREEMPTION=y that are running on a single-CPU system. But a kernel intended for a single-CPU system should instead be built with CONFIG_SMP=n, and in any case, single-CPU systems running Linux no longer appear to be the common case. Plus this optimization results in the rcu_gp_oldstate structure being half again larger than it needs to be. This commit therefore disables the run-time single-CPU grace-period optimization, so that this optimization applies only during the pre-scheduler portion of the boot sequence. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:08 -07:00
Paul E. McKenney	8df13f0160	rcu: Add full-sized polling for cond_sync_exp_full() The cond_synchronize_rcu_expedited() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds yet another member of the full-state RCU grace-period polling API, which is the cond_synchronize_rcu_exp_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:08 -07:00
Paul E. McKenney	b6fe4917ae	rcu: Add full-sized polling for cond_sync_full() The cond_synchronize_rcu() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds yet another member of the full-state RCU grace-period polling API, which is the cond_synchronize_rcu_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. [ paulmck: Apply feedback from kernel test robot and Julia Lawall. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:08 -07:00
Paul E. McKenney	f21e014345	rcu: Remove blank line from poll_state_synchronize_rcu() docbook header This commit removes the blank line preceding the oldstate parameter to the docbook header for the poll_state_synchronize_rcu() function and marks uses of this parameter later in that header. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:08 -07:00
Paul E. McKenney	6c502b14ba	rcu: Add full-sized polling for start_poll_expedited() The start_poll_synchronize_rcu_expedited() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds yet another member of the full-state RCU grace-period polling API, which is the start_poll_synchronize_rcu_expedited_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. [ paulmck: Apply feedback from kernel test robot and Julia Lawall. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:08 -07:00
Paul E. McKenney	76ea364161	rcu: Add full-sized polling for start_poll() The start_poll_synchronize_rcu() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds the next member of the full-state RCU grace-period polling API, namely the start_poll_synchronize_rcu_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:08 -07:00
Paul E. McKenney	f4754ad292	rcutorture: Verify long-running reader prevents full polling from completing This commit adds full-state polling checks to accompany the old-style polling checks in the rcu_torture_one_read() function. If a polling cycle within an RCU reader completes, a WARN_ONCE() is triggered. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:07 -07:00
Paul E. McKenney	37d6ade31c	rcutorture: Remove redundant RTWS_DEF_FREE check This check does nothing because the state at this point in the code because the rcu_torture_writer_state value is guaranteed to instead be RTWS_REPLACE. This commit therefore removes this check. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:07 -07:00
Paul E. McKenney	d594231aa5	rcutorture: Verify RCU reader prevents full polling from completing This commit adds a test to rcu_torture_writer() that verifies that a ->get_gp_state_full() and ->poll_gp_state_full() polled grace-period sequence does not claim that a grace period elapsed within the confines of the corresponding read-side critical section. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:07 -07:00
Paul E. McKenney	ed7d2f1abe	rcutorture: Allow per-RCU-flavor polled double-GP check Only vanilla RCU needs a double grace period for its compressed polled grace-period old-state cookie. This commit therefore adds an rcu_torture_ops per-flavor function ->poll_need_2gp to allow this check to be adapted to the RCU flavor under test. A NULL pointer for this function says that doubled grace periods are never needed. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:07 -07:00
Paul E. McKenney	ccb42229fb	rcutorture: Abstract synchronous and polled API testing This commit abstracts a do_rtws_sync() function that does synchronous grace-period testing, but also testing the polled API 25% of the time each for the normal and full-state variants of the polled API. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:07 -07:00
Paul E. McKenney	3fdefca9b4	rcu: Add full-sized polling for get_state() The get_state_synchronize_rcu() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds the next member of the full-state RCU grace-period polling API, namely the get_state_synchronize_rcu_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:07 -07:00
Paul E. McKenney	91a967fd69	rcu: Add full-sized polling for get_completed() and poll_state() The get_completed_synchronize_rcu() and poll_state_synchronize_rcu() APIs compress the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds the first members of the full-state RCU grace-period polling API, namely the get_completed_synchronize_rcu_full() and poll_state_synchronize_rcu_full() functions. These use up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but which are guaranteed not to miss grace periods, at least in situations where the single-CPU grace-period optimization does not apply. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:08:07 -07:00
Paul E. McKenney	638dce227a	rcu/nocb: Add CPU number to CPU-{,de}offload failure messages Offline CPUs cannot be offloaded or deoffloaded. Any attempt to offload or deoffload an offline CPU causes a message to be printed on the console, which is good, but this message does not contain the CPU number, which is bad. Such a CPU number can be helpful when debugging, as it gives a clear indication that the CPU in question is in fact offline. This commit therefore adds the CPU number to the CPU-{,de}offload failure messages. Cc: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:07:19 -07:00
Zqiang	5334da2af2	rcu/nocb: Choose the right rcuog/rcuop kthreads to output The show_rcu_nocb_gp_state() function is supposed to dump out the rcuog kthread and the show_rcu_nocb_state() function is supposed to dump out the rcuo[ps] kthread. Currently, both do a mixture, which is not optimal for debugging, even though it does not affect functionality. This commit therefore adjusts these two functions to focus on their respective kthreads. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:07:19 -07:00
Uladzislau Rezki (Sony)	51824b780b	rcu/kvfree: Update KFREE_DRAIN_JIFFIES interval Currently the monitor work is scheduled with a fixed interval of HZ/20, which is roughly 50 milliseconds. The drawback of this approach is low utilization of the 512 page slots in scenarios with infrequence kvfree_rcu() calls. For example on an Android system: <snip> kworker/3:3-507 [003] .... 470.286305: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=6 kworker/6:1-76 [006] .... 470.416613: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000ea0d6556 nr_records=1 kworker/6:1-76 [006] .... 470.416625: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000003e025849 nr_records=9 kworker/3:3-507 [003] .... 471.390000: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000815a8713 nr_records=48 kworker/1:1-73 [001] .... 471.725785: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000fda9bf20 nr_records=3 kworker/1:1-73 [001] .... 471.725833: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000a425b67b nr_records=76 kworker/0:4-1411 [000] .... 472.085673: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007996be9d nr_records=1 kworker/0:4-1411 [000] .... 472.085728: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d0f0dde5 nr_records=5 kworker/6:1-76 [006] .... 472.260340: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000065630ee4 nr_records=102 <snip> In many cases, out of 512 slots, fewer than 10 were actually used. In order to improve batching and make utilization more efficient this commit sets a drain interval to a fixed 5-seconds interval. Floods are detected when a page fills quickly, and in that case, the reclaim work is re-scheduled for the next scheduling-clock tick (jiffy). After this change: <snip> kworker/7:1-371 [007] .... 5630.725708: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000005ab0ffb3 nr_records=121 kworker/7:1-371 [007] .... 5630.989702: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000060c84761 nr_records=47 kworker/7:1-371 [007] .... 5630.989714: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000000babf308 nr_records=510 kworker/7:1-371 [007] .... 5631.553790: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000bb7bd0ef nr_records=169 kworker/7:1-371 [007] .... 5631.553808: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x0000000044c78753 nr_records=510 kworker/5:6-9428 [005] .... 5631.746102: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000d98519aa nr_records=123 kworker/4:7-9434 [004] .... 5632.001758: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x00000000526c9d44 nr_records=322 kworker/4:7-9434 [004] .... 5632.002073: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000002c6a8afa nr_records=185 kworker/7:1-371 [007] .... 5632.277515: rcu_invoke_kfree_bulk_callback: rcu_preempt bulk=0x000000007f4a962f nr_records=510 <snip> Here, all but one of the cases, more than one hundreds slots were used, representing an order-of-magnitude improvement. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:06:50 -07:00
Joel Fernandes (Google)	3826909635	rcu/kfree: Fix kfree_rcu_shrink_count() return value As per the comments in include/linux/shrinker.h, .count_objects callback should return the number of freeable items, but if there are no objects to free, SHRINK_EMPTY should be returned. The only time 0 is returned should be when we are unable to determine the number of objects, or the cache should be skipped for another reason. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:06:50 -07:00
Michal Hocko	093590c16b	rcu: Back off upon fill_page_cache_func() allocation failure The fill_page_cache_func() function allocates couple of pages to store kvfree_rcu_bulk_data structures. This is a lightweight (GFP_NORETRY) allocation which can fail under memory pressure. The function will, however keep retrying even when the previous attempt has failed. This retrying is in theory correct, but in practice the allocation is invoked from workqueue context, which means that if the memory reclaim gets stuck, these retries can hog the worker for quite some time. Although the workqueues subsystem automatically adjusts concurrency, such adjustment is not guaranteed to happen until the worker context sleeps. And the fill_page_cache_func() function's retry loop is not guaranteed to sleep (see the should_reclaim_retry() function). And we have seen this function cause workqueue lockups: kernel: BUG: workqueue lockup - pool cpus=93 node=1 flags=0x1 nice=0 stuck for 32s! [...] kernel: pool 74: cpus=37 node=0 flags=0x1 nice=0 hung=32s workers=2 manager: 2146 kernel: pwq 498: cpus=249 node=1 flags=0x1 nice=0 active=4/256 refcnt=5 kernel: in-flight: 1917:fill_page_cache_func kernel: pending: dbs_work_handler, free_work, kfree_rcu_monitor Originally, we thought that the root cause of this lockup was several retries with direct reclaim, but this is not yet confirmed. Furthermore, we have seen similar lockups without any heavy memory pressure. This suggests that there are other factors contributing to these lockups. However, it is not really clear that endless retries are desireable. So let's make the fill_page_cache_func() function back off after allocation failure. Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Joel Fernandes <joel@joelfernandes.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:06:50 -07:00
Paul E. McKenney	7634b1eaa0	rcu: Exclude outgoing CPU when it is the last to leave The rcu_boost_kthread_setaffinity() function removes the outgoing CPU from the set_cpus_allowed() mask for the corresponding leaf rcu_node structure's rcub priority-boosting kthread. Except that if the outgoing CPU will leave that structure without any online CPUs, the mask is set to the housekeeping CPU mask from housekeeping_cpumask(). Which is fine unless the outgoing CPU happens to be a housekeeping CPU. This commit therefore removes the outgoing CPU from the housekeeping mask. This would of course be problematic if the outgoing CPU was the last online housekeeping CPU, but in that case you are in a world of hurt anyway. If someone comes up with a valid use case for a system needing all the housekeeping CPUs to be offline, further adjustments can be made. Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:06:03 -07:00
Zqiang	621189a1fe	rcu: Avoid triggering strict-GP irq-work when RCU is idle Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger irq-work from rcu_read_unlock(), and the resulting irq-work handler invokes rcu_preempt_deferred_qs_handle(). The point of this triggering is to force grace periods to end quickly in order to give tools like KASAN a better chance of detecting RCU usage bugs such as leaking RCU-protected pointers out of an RCU read-side critical section. However, this irq-work triggering is unconditional. This works, but there is no point in doing this irq-work unless the current grace period is waiting on the running CPU or task, which is not the common case. After all, in the common case there are many rcu_read_unlock() calls per CPU per grace period. This commit therefore triggers the irq-work only when the current grace period is waiting on the running CPU or task. This change was tested as follows on a four-CPU system: echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter echo 1 > /sys/kernel/debug/tracing/function_profile_enabled insmod rcutorture.ko sleep 20 rmmod rcutorture.ko echo 0 > /sys/kernel/debug/tracing/function_profile_enabled echo > /sys/kernel/debug/tracing/set_ftrace_filter This procedure produces results in this per-CPU set of files: /sys/kernel/debug/tracing/trace_stat/function* Sample output from one of these files is as follows: Function Hit Time Avg s^2 -------- --- ---- --- --- rcu_preempt_deferred_qs_handle 838746 182650.3 us 0.217 us 0.004 us The baseline sum of the "Hit" values (the number of calls to this function) was 3,319,015. With this commit, that sum was 1,140,359, for a 2.9x reduction. The worst-case variance across the CPUs was less than 25%, so this large effect size is statistically significant. The raw data is available in the Link: URL. Link: https://lore.kernel.org/all/20220808022626.12825-1-qiang1.zhang@intel.com/ Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:06:02 -07:00
Zhen Lei	e73dfe3093	sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task() The trigger_all_cpu_backtrace() function attempts to send an NMI to the target CPU, which usually provides much better stack traces than the dump_cpu_task() function's approach of dumping that stack from some other CPU. So much so that most calls to dump_cpu_task() only happen after a call to trigger_all_cpu_backtrace() has failed. And the exception to this rule really should attempt to use trigger_all_cpu_backtrace() first. Therefore, move the trigger_all_cpu_backtrace() invocation into dump_cpu_task(). Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Valentin Schneider <vschneid@redhat.com>	2022-08-31 05:03:14 -07:00
Paul E. McKenney	089254fd38	rcu: Document reason for rcu_all_qs() call to preempt_disable() Given that rcu_all_qs() is in non-preemptible kernels, why on earth should it invoke preempt_disable()? This commit adds the reason, which is to work nicely with debugging enabled in CONFIG_PREEMPT_COUNT=y kernels. Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reported-by: Boqun Feng <boqun.feng@gmail.com> Reported-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:03:14 -07:00
Zqiang	6ca0292ccf	rcu: Make tiny RCU support leak callbacks for debug-object errors Currently, only Tree RCU leaks callbacks setting when it detects a duplicate call_rcu(). This commit causes Tiny RCU to also leak callbacks in this situation. Because this is Tiny RCU, kernel size is important: 1. CONFIG_TINY_RCU=y and CONFIG_DEBUG_OBJECTS_RCU_HEAD=n (Production kernel) Original: text data bss dec hex filename 26290663 20159823 15212544 61663030 3ace736 vmlinux With this commit: text data bss dec hex filename 26290663 20159823 15212544 61663030 3ace736 vmlinux 2. CONFIG_TINY_RCU=y and CONFIG_DEBUG_OBJECTS_RCU_HEAD=y (Debugging kernel) Original: text data bss dec hex filename 26291319 20160143 15212544 61664006 3aceb06 vmlinux With this commit: text data bss dec hex filename 26291319 20160431 15212544 61664294 3acec26 vmlinux These results show that the kernel size is unchanged for production kernels, as desired. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:03:14 -07:00
Zqiang	fcb42c9a77	rcu: Add QS check in rcu_exp_handler() for non-preemptible kernels Kernels built with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y maintain preempt_count() state. Because such kernels map __rcu_read_lock() and __rcu_read_unlock() to preempt_disable() and preempt_enable(), respectively, this allows the expedited grace period's !CONFIG_PREEMPT_RCU version of the rcu_exp_handler() IPI handler function to use preempt_count() to detect quiescent states. This preempt_count() usage might seem to risk failures due to use of implicit RCU readers in portions of the kernel under #ifndef CONFIG_PREEMPTION, except that rcu_core() already disallows such implicit RCU readers. The moral of this story is that you must use explicit read-side markings such as rcu_read_lock() or preempt_disable() even if the code knows that this kernel does not support preemption. This commit therefore adds a preempt_count()-based check for a quiescent state in the !CONFIG_PREEMPT_RCU version of the rcu_exp_handler() function for kernels built with CONFIG_PREEMPT_COUNT=y, reporting an immediate quiescent state when the interrupted code had both preemption and softirqs enabled. This change results in about a 2% reduction in expedited grace-period latency in kernels built with both CONFIG_PREEMPT_RCU=n and CONFIG_PREEMPT_COUNT=y. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/all/20220622103549.2840087-1-qiang1.zhang@intel.com/	2022-08-31 05:03:14 -07:00
Zqiang	bca4fa8cb0	rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels In non-premptible kernels, tasks never do context switches within RCU read-side critical sections. Therefore, in such kernels, each leaf rcu_node structure's ->blkd_tasks list will always be empty. The comment on the non-preemptible version of rcu_preempt_deferred_qs() confuses this point, so this commit therefore fixes it. Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:03:14 -07:00
Zqiang	6d60ea03ac	rcu: Fix rcu_read_unlock_strict() strict QS reporting Kernels built with CONFIG_PREEMPT=n and CONFIG_RCU_STRICT_GRACE_PERIOD=y report the quiescent state directly from the outermost rcu_read_unlock(). However, the current CPU's rcu_data structure's ->cpu_no_qs.b.norm might still be set, in which case rcu_report_qs_rdp() will exit early, thus failing to report quiescent state. This commit therefore causes rcu_read_unlock_strict() to clear CPU's rcu_data structure's ->cpu_no_qs.b.norm field before invoking rcu_report_qs_rdp(). Signed-off-by: Zqiang <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>	2022-08-31 05:03:14 -07:00
Linus Torvalds	6614a3c316	- The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYuravgAKCRDdBJ7gKXxA jpqSAQDrXSdII+ht9kSHlaCVYjqRFQz/rRvURQrWQV74f6aeiAD+NHHeDPwZn11/ SPktqEUrF1pxnGQxqLh1kUFUhsVZQgE= =w/UH -----END PGP SIGNATURE----- Merge tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Most of the MM queue. A few things are still pending. Liam's maple tree rework didn't make it. This has resulted in a few other minor patch series being held over for next time. Multi-gen LRU still isn't merged as we were waiting for mapletree to stabilize. The current plan is to merge MGLRU into -mm soon and to later reintroduce mapletree, with a view to hopefully getting both into 6.1-rc1. Summary: - The usual batches of cleanups from Baoquan He, Muchun Song, Miaohe Lin, Yang Shi, Anshuman Khandual and Mike Rapoport - Some kmemleak fixes from Patrick Wang and Waiman Long - DAMON updates from SeongJae Park - memcg debug/visibility work from Roman Gushchin - vmalloc speedup from Uladzislau Rezki - more folio conversion work from Matthew Wilcox - enhancements for coherent device memory mapping from Alex Sierra - addition of shared pages tracking and CoW support for fsdax, from Shiyang Ruan - hugetlb optimizations from Mike Kravetz - Mel Gorman has contributed some pagealloc changes to improve latency and realtime behaviour. - mprotect soft-dirty checking has been improved by Peter Xu - Many other singleton patches all over the place" [ XFS merge from hell as per Darrick Wong in https://lore.kernel.org/all/YshKnxb4VwXycPO8@magnolia/ ] * tag 'mm-stable-2022-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (282 commits) tools/testing/selftests/vm/hmm-tests.c: fix build mm: Kconfig: fix typo mm: memory-failure: convert to pr_fmt() mm: use is_zone_movable_page() helper hugetlbfs: fix inaccurate comment in hugetlbfs_statfs() hugetlbfs: cleanup some comments in inode.c hugetlbfs: remove unneeded header file hugetlbfs: remove unneeded hugetlbfs_ops forward declaration hugetlbfs: use helper macro SZ_1{K,M} mm: cleanup is_highmem() mm/hmm: add a test for cross device private faults selftests: add soft-dirty into run_vmtests.sh selftests: soft-dirty: add test for mprotect mm/mprotect: fix soft-dirty check in can_change_pte_writable() mm: memcontrol: fix potential oom_lock recursion deadlock mm/gup.c: fix formatting in check_and_migrate_movable_page() xfs: fail dax mount if reflink is enabled on a partition mm/memcontrol.c: remove the redundant updating of stats_flush_threshold userfaultfd: don't fail on unrecognized features hugetlb_cgroup: fix wrong hugetlb cgroup numa stat ...	2022-08-05 16:32:45 -07:00
Linus Torvalds	228dfe98a3	Char / Misc driver changes for 6.0-rc1 Here is the large set of char and misc and other driver subsystem changes for 6.0-rc1. Highlights include: - large set of IIO driver updates, additions, and cleanups - new habanalabs device support added (loads of register maps much like GPUs have) - soundwire driver updates - phy driver updates - slimbus driver updates - tiny virt driver fixes and updates - misc driver fixes and updates - interconnect driver updates - hwtracing driver updates - fpga driver updates - extcon driver updates - firmware driver updates - counter driver update - mhi driver fixes and updates - binder driver fixes and updates - speakup driver fixes Full details are in the long shortlog contents. All of these have been in linux-next for a while without any reported problems. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYup9QQ8cZ3JlZ0Brcm9h aC5jb20ACgkQMUfUDdst+ylBKQCfaSuzl9ZP9dTvAw2FPp14oRqXnpoAnicvWAoq 1vU9Vtq2c73uBVLdZm4m =AwP3 -----END PGP SIGNATURE----- Merge tag 'char-misc-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc driver updates from Greg KH: "Here is the large set of char and misc and other driver subsystem changes for 6.0-rc1. Highlights include: - large set of IIO driver updates, additions, and cleanups - new habanalabs device support added (loads of register maps much like GPUs have) - soundwire driver updates - phy driver updates - slimbus driver updates - tiny virt driver fixes and updates - misc driver fixes and updates - interconnect driver updates - hwtracing driver updates - fpga driver updates - extcon driver updates - firmware driver updates - counter driver update - mhi driver fixes and updates - binder driver fixes and updates - speakup driver fixes All of these have been in linux-next for a while without any reported problems" * tag 'char-misc-6.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (634 commits) drivers: lkdtm: fix clang -Wformat warning char: remove VR41XX related char driver misc: Mark MICROCODE_MINOR unused spmi: trace: fix stack-out-of-bound access in SPMI tracing functions dt-bindings: iio: adc: Add compatible for MT8188 iio: light: isl29028: Fix the warning in isl29028_remove() iio: accel: sca3300: Extend the trigger buffer from 16 to 32 bytes iio: fix iio_format_avail_range() printing for none IIO_VAL_INT iio: adc: max1027: unlock on error path in max1027_read_single_value() iio: proximity: sx9324: add empty line in front of bullet list iio: magnetometer: hmc5843: Remove duplicate 'the' iio: magn: yas530: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: magnetometer: ak8974: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: light: veml6030: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: light: vcnl4035: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: light: vcnl4000: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() macros iio: light: tsl2591: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() iio: light: tsl2583: Use DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr() iio: light: isl29028: Use DEFINE_RUNTIME_DEV_PM_OPS() and pm_ptr() iio: light: gp2ap002: Switch to DEFINE_RUNTIME_DEV_PM_OPS and pm_ptr() ...	2022-08-04 11:05:48 -07:00

1 2 3 4 5 ...

2200 Коммитов