WSL2-Linux-Kernel/mm
Yin Fengwei fe463fe6aa mm/thp: check and bail out if page in deferred queue already
commit 81e506bec9 upstream.

Kernel build regression with LLVM was reported here:
https://lore.kernel.org/all/Y1GCYXGtEVZbcv%2F5@dev-arch.thelio-3990X/ with
commit f35b5d7d67 ("mm: align larger anonymous mappings on THP
boundaries").  And the commit f35b5d7d67 was reverted.

It turned out the regression is related with madvise(MADV_DONTNEED)
was used by ld.lld. But with none PMD_SIZE aligned parameter len.
trace-bpfcc captured:
531607  531732  ld.lld          do_madvise.part.0 start: 0x7feca9000000, len: 0x7fb000, behavior: 0x4
531607  531793  ld.lld          do_madvise.part.0 start: 0x7fec86a00000, len: 0x7fb000, behavior: 0x4

If the underneath physical page is THP, the madvise(MADV_DONTNEED) can
trigger split_queue_lock contention raised significantly. perf showed
following data:
    14.85%     0.00%  ld.lld           [kernel.kallsyms]           [k]
       entry_SYSCALL_64_after_hwframe
           11.52%
                entry_SYSCALL_64_after_hwframe
                do_syscall_64
                __x64_sys_madvise
                do_madvise.part.0
                zap_page_range
                unmap_single_vma
                unmap_page_range
                page_remove_rmap
                deferred_split_huge_page
                __lock_text_start
                native_queued_spin_lock_slowpath

If THP can't be removed from rmap as whole THP, partial THP will be
removed from rmap by removing sub-pages from rmap.  Even the THP head page
is added to deferred queue already, the split_queue_lock will be acquired
and check whether the THP head page is in the queue already.  Thus, the
contention of split_queue_lock is raised.

Before acquire split_queue_lock, check and bail out early if the THP
head page is in the queue already. The checking without holding
split_queue_lock could race with deferred_split_scan, but it doesn't
impact the correctness here.

Test result of building kernel with ld.lld:
commit 7b5a0b664e (parent commit of f35b5d7d67):
time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
        6:07.99 real,   26367.77 user,  5063.35 sys

commit f35b5d7d676e:
time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
        7:22.15 real,   26235.03 user,  12504.55 sys

commit f35b5d7d67 with the fixing patch:
time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
        6:08.49 real,   26520.15 user,  5047.91 sys

Link: https://lkml.kernel.org/r/20221223135207.2275317-1-fengwei.yin@intel.com
Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-03-10 09:40:09 +01:00
..
damon mm/damon/dbgfs: check if rm_contexts input is for a real context 2022-11-16 09:58:27 +01:00
kasan panic: Consolidate open-coded panic_on_warn checks 2023-02-01 08:27:22 +01:00
kfence panic: Consolidate open-coded panic_on_warn checks 2023-02-01 08:27:22 +01:00
Kconfig
Kconfig.debug
Makefile
backing-dev.c
balloon_compaction.c
bootmem_info.c
cleancache.c
cma.c
cma.h
cma_debug.c
cma_sysfs.c
compaction.c mm, compaction: fix fast_isolate_around() to stay within boundaries 2023-01-12 11:58:47 +01:00
debug.c
debug_page_ref.c
debug_vm_pgtable.c
dmapool.c
early_ioremap.c
fadvise.c
failslab.c
filemap.c mm/filemap: fix page end in filemap_get_read_batch 2023-02-22 12:57:10 +01:00
frontswap.c
gup.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
gup_test.c
gup_test.h
highmem.c
hmm.c
huge_memory.c mm/thp: check and bail out if page in deferred queue already 2023-03-10 09:40:09 +01:00
hugetlb.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
hugetlb_cgroup.c
hugetlb_vmemmap.c
hugetlb_vmemmap.h
hwpoison-inject.c
init-mm.c
internal.h
interval_tree.c
io-mapping.c
ioremap.c
khugepaged.c mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma 2023-01-24 07:22:49 +01:00
kmemleak.c
ksm.c
list_lru.c
maccess.c maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() 2022-11-26 09:24:47 +01:00
madvise.c
mapping_dirty_helpers.c
memblock.c Revert "mm: Always release pages to the buddy allocator in memblock_free_late()." 2023-02-22 12:57:07 +01:00
memcontrol.c mm: memcontrol: deprecate charge moving 2023-03-10 09:40:09 +01:00
memfd.c
memory-failure.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
memory.c
memory_hotplug.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
mempolicy.c migrate: hugetlb: check for hugetlb shared PMD in node migration 2023-02-14 19:17:56 +01:00
mempool.c
memremap.c mm/memremap.c: map FS_DAX device memory as decrypted 2022-11-16 09:58:27 +01:00
memtest.c
migrate.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
mincore.c
mlock.c
mm_init.c
mmap.c mm/mmap: undo ->mmap() when arch_validate_flags() fails 2022-10-26 12:34:24 +02:00
mmap_lock.c
mmu_gather.c mm/khugepaged: fix GUP-fast interaction by sending IPI 2022-12-14 11:37:17 +01:00
mmu_notifier.c
mmzone.c
mprotect.c
mremap.c
msync.c
nommu.c
oom_kill.c
page-writeback.c
page_alloc.c Fix page corruption caused by racy check in __free_pages 2023-02-14 19:18:04 +01:00
page_counter.c
page_ext.c
page_idle.c
page_io.c
page_isolation.c
page_owner.c
page_poison.c
page_reporting.c
page_reporting.h
page_vma_mapped.c
pagewalk.c
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c
pgalloc-track.h
pgtable-generic.c
process_vm_access.c
ptdump.c
readahead.c
rmap.c
rodata_test.c
secretmem.c
shmem.c mm: shmem: don't truncate page if memory failure happens 2022-11-26 09:24:28 +01:00
shuffle.c
shuffle.h
slab.c
slab.h
slab_common.c
slob.c
slub.c
sparse-vmemmap.c
sparse.c
swap.c
swap_cgroup.c
swap_slots.c
swap_state.c
swapfile.c mm/swapfile: add cond_resched() in get_swap_pages() 2023-02-09 11:26:45 +01:00
truncate.c
usercopy.c
userfaultfd.c mm: shmem: don't truncate page if memory failure happens 2022-11-26 09:24:28 +01:00
util.c
vmacache.c
vmalloc.c
vmpressure.c
vmscan.c mm: __isolate_lru_page_prepare() in isolate_migratepages_block() 2022-12-08 11:28:44 +01:00
vmstat.c
workingset.c
z3fold.c
zbud.c
zpool.c
zsmalloc.c
zswap.c