WSL2-Linux-Kernel/include/trace/events
Mel Gorman 1b4e3f26f9 mm: vmscan: Reduce throttling due to a failure to make progress
Mike Galbraith, Alexey Avramov and Darrick Wong all reported similar
problems due to reclaim throttling for excessive lengths of time.  In
Alexey's case, a memory hog that should go OOM quickly stalls for
several minutes before stalling.  In Mike and Darrick's cases, a small
memcg environment stalled excessively even though the system had enough
memory overall.

Commit 69392a403f ("mm/vmscan: throttle reclaim when no progress is
being made") introduced the problem although commit a19594ca4a
("mm/vmscan: increase the timeout if page reclaim is not making
progress") made it worse.  Systems at or near an OOM state that cannot
be recovered must reach OOM quickly and memcg should kill tasks if a
memcg is near OOM.

To address this, only stall for the first zone in the zonelist, reduce
the timeout to 1 tick for VMSCAN_THROTTLE_NOPROGRESS and only stall if
the scan control nr_reclaimed is 0, kswapd is still active and there
were excessive pages pending for writeback.  If kswapd has stopped
reclaiming due to excessive failures, do not stall at all so that OOM
triggers relatively quickly.  Similarly, if an LRU is simply congested,
only lightly throttle similar to NOPROGRESS.

Alexey's original case was the most straight forward

	for i in {1..3}; do tail /dev/zero; done

On vanilla 5.16-rc1, this test stalled heavily, after the patch the test
completes in a few seconds similar to 5.15.

Alexey's second test case added watching a youtube video while tail runs
10 times.  On 5.15, playback only jitters slightly, 5.16-rc1 stalls a
lot with lots of frames missing and numerous audio glitches.  With this
patch applies, the video plays similarly to 5.15.

[lkp@intel.com: Fix W=1 build warning]

Link: https://lore.kernel.org/r/99e779783d6c7fce96448a3402061b9dc1b3b602.camel@gmx.de
Link: https://lore.kernel.org/r/20211124011954.7cab9bb4@mail.inbox.lv
Link: https://lore.kernel.org/r/20211022144651.19914-1-mgorman@techsingularity.net
Link: https://lore.kernel.org/r/20211202150614.22440-1-mgorman@techsingularity.net
Link: https://linux-regtracking.leemhuis.info/regzbot/regression/20211124011954.7cab9bb4@mail.inbox.lv/
Reported-and-tested-by: Alexey Avramov <hakavlad@inbox.lv>
Reported-and-tested-by: Mike Galbraith <efault@gmx.de>
Reported-and-tested-by: Darrick J. Wong <djwong@kernel.org>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Hugh Dickins <hughd@google.com>
Tracked-by: Thorsten Leemhuis <regressions@leemhuis.info>
Fixes: 69392a403f ("mm/vmscan: throttle reclaim when no progress is being made")
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-12-31 11:17:07 -08:00
..
9p.h
afs.h netfs, 9p, afs, ceph: Use folios 2021-11-10 21:16:56 +00:00
alarmtimer.h
asoc.h
avc.h
bcache.h
block.h block: don't call blk_status_to_errno in blk_update_request 2021-10-19 05:54:57 -06:00
bpf_test_run.h
bridge.h
btrfs.h btrfs: use delalloc_bytes to determine flush amount for shrink_delalloc 2021-08-23 13:19:07 +02:00
cachefiles.h cachefiles: Fix oops with cachefiles_cull() due to NULL object 2021-10-05 11:22:06 +01:00
cgroup.h
clk.h
cma.h mm, tracing: unify PFN format strings 2021-06-29 10:53:52 -07:00
compaction.h
context_tracking.h
cpuhp.h
damon.h mm/damon: add a tracepoint 2021-09-08 11:50:24 -07:00
devfreq.h
devlink.h devlink: Reduce struct devlink exposure 2021-10-12 16:29:16 -07:00
dma_fence.h treewide: Add missing semicolons to __assign_str uses 2021-06-30 09:19:14 -04:00
erofs.h erofs: get compression algorithms directly on mapping 2021-10-18 00:15:55 +08:00
error_report.h
ext4.h
f2fs.h f2fs: multidevice: support direct IO 2021-10-26 14:04:30 -07:00
fib.h
fib6.h
filelock.h
filemap.h mm, tracing: unify PFN format strings 2021-06-29 10:53:52 -07:00
fs.h NFS: Move generic FS show macros to global header 2021-11-02 12:31:23 -04:00
fs_dax.h
fscache.h fscache: Use refcount_t for the cookie refcount instead of atomic_t 2021-08-27 13:34:03 +01:00
fsi.h
fsi_master_aspeed.h
fsi_master_ast_cf.h
fsi_master_gpio.h
gpio.h
gpu_mem.h
host1x.h
huge_memory.h
hwmon.h
i2c.h
ib_mad.h
ib_umad.h
initcall.h
intel-sst.h
intel_iommu.h iommu/vt-d: Add prq_report trace event 2021-06-10 09:06:13 +02:00
intel_ish.h
io_uring.h io_uring: dump sqe contents if issue fails 2021-10-19 05:49:52 -06:00
iocost.h
iommu.h
ipi.h
irq.h
irq_matrix.h
iscsi.h
jbd2.h jbd2,ext4: add a shrinker to release checkpointed buffers 2021-06-24 10:54:49 -04:00
kmem.h mm, tracing: unify PFN format strings 2021-06-29 10:53:52 -07:00
kvm.h
kyber.h kyber: avoid q->disk dereferences in trace points 2021-10-15 21:02:57 -06:00
libata.h
lock.h
mce.h
mctp.h mctp: Add tracepoints for tag/key handling 2021-09-29 11:00:11 +01:00
mdio.h
migrate.h mm/migrate: demote pages during reclaim 2021-09-03 09:58:16 -07:00
mlxsw.h
mmap.h
mmap_lock.h mm: mmap_lock: use DECLARE_EVENT_CLASS and DEFINE_EVENT_FN 2021-11-06 13:30:36 -07:00
mmc.h
mmflags.h Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
module.h
mptcp.h mptcp: dump csum fields in mptcp_dump_mpext 2021-06-18 11:40:11 -07:00
napi.h
nbd.h
neigh.h
net.h net: use %px to print skb address in trace_netif_receive_skb 2021-07-15 10:28:48 -07:00
net_probe_common.h
netfs.h netfs: Move cookie debug ID to struct netfs_cache_resources 2021-08-25 15:20:25 +01:00
netlink.h
nfs.h NFS: Move NFS protocol display macros to global header 2021-11-02 12:31:23 -04:00
nilfs2.h
nmi.h
objagg.h
oom.h
osnoise.h tracing: Fix spelling in osnoise tracer "interferences" -> "interference" 2021-06-28 14:12:27 -04:00
page_isolation.h
page_pool.h mm, tracing: unify PFN format strings 2021-06-29 10:53:52 -07:00
page_ref.h mm: introduce PAGEFLAGS_MASK to replace ((1UL << NR_PAGEFLAGS) - 1) 2021-09-08 11:50:24 -07:00
pagemap.h mm/lru: Convert __pagevec_lru_add_fn to take a folio 2021-10-18 07:49:40 -04:00
percpu.h
power.h
power_cpu_migrate.h
preemptirq.h
printk.h
pwc.h
pwm.h
qdisc.h qdisc: add new field for qdisc_enqueue tracepoint 2021-07-27 14:16:38 +01:00
qla.h
qrtr.h
random.h
rcu.h rcu/nocb: Unify timers 2021-05-12 12:10:23 -07:00
rdma.h
rdma_core.h
regulator.h
rpcgss.h sunrpc: fix header include guard in trace header 2021-11-17 18:27:32 -05:00
rpcrdma.h A slow cycle for nfsd: mainly cleanup, including Neil's patch dropping 2021-11-10 16:45:54 -08:00
rpm.h
rseq.h
rtc.h
rxrpc.h
sched.h sched/tracing: Remove the redundant 'success' in the sched tracepoint 2021-06-10 11:16:20 -04:00
scmi.h
scsi.h scsi: core: Kill message byte 2021-05-31 22:48:24 -04:00
sctp.h
signal.h
siox.h
skb.h
smbus.h
sock.h net: sock: add trace for socket errors 2021-06-29 11:28:21 -07:00
spi.h spi: Enable tracing of the SPI setup CS selection 2021-05-26 21:22:13 +01:00
spmi.h
sunrpc.h A slow cycle for nfsd: mainly cleanup, including Neil's patch dropping 2021-11-10 16:45:54 -08:00
sunrpc_base.h SUNRPC: Tracepoints should display tk_pid and cl_clid as a fixed-size field 2021-10-20 18:09:54 -04:00
sunvnet.h
swiotlb.h
syscalls.h
target.h
task.h
tcp.h tcp: add tracepoint for checksum errors 2021-05-14 15:26:03 -07:00
tegra_apb_dma.h
thermal.h
thermal_power_allocator.h
thp.h
timer.h
tlb.h
udp.h
ufs.h scsi: ufs: core: Enable power management for wlun 2021-05-10 22:28:20 -04:00
v4l2.h
vb2.h
vmscan.h mm: vmscan: Reduce throttling due to a failure to make progress 2021-12-31 11:17:07 -08:00
vsock_virtio_transport_common.h virtio/vsock: update trace event for SEQPACKET 2021-06-11 13:32:47 -07:00
wbt.h
workqueue.h
writeback.h Merge branch 'akpm' (patches from Andrew) 2021-11-06 14:08:17 -07:00
xdp.h xdp: Extend xdp_redirect_map with broadcast support 2021-05-26 09:46:16 +02:00
xen.h