WSL2-Linux-Kernel/mm
Alistair Popple 326b80bd22 mm: take a page reference when removing device exclusive entries
commit 7c7b962938 upstream.

Device exclusive page table entries are used to prevent CPU access to a
page whilst it is being accessed from a device.  Typically this is used to
implement atomic operations when the underlying bus does not support
atomic access.  When a CPU thread encounters a device exclusive entry it
locks the page and restores the original entry after calling mmu notifiers
to signal drivers that exclusive access is no longer available.

The device exclusive entry holds a reference to the page making it safe to
access the struct page whilst the entry is present.  However the fault
handling code does not hold the PTL when taking the page lock.  This means
if there are multiple threads faulting concurrently on the device
exclusive entry one will remove the entry whilst others will wait on the
page lock without holding a reference.

This can lead to threads locking or waiting on a folio with a zero
refcount.  Whilst mmap_lock prevents the pages getting freed via munmap()
they may still be freed by a migration.  This leads to warnings such as
PAGE_FLAGS_CHECK_AT_FREE due to the page being locked when the refcount
drops to zero.

Fix this by trying to take a reference on the folio before locking it.
The code already checks the PTE under the PTL and aborts if the entry is
no longer there.  It is also possible the folio has been unmapped, freed
and re-allocated allowing a reference to be taken on an unrelated folio.
This case is also detected by the PTE check and the folio is unlocked
without further changes.

Link: https://lkml.kernel.org/r/20230330012519.804116-1-apopple@nvidia.com
Fixes: b756a3b5e7 ("mm: device exclusive memory access")
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ralph Campbell <rcampbell@nvidia.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-04-13 16:48:26 +02:00
..
damon mm/damon/dbgfs: check if rm_contexts input is for a real context 2022-11-16 09:58:27 +01:00
kasan panic: Consolidate open-coded panic_on_warn checks 2023-02-01 08:27:22 +01:00
kfence mm: kfence: fix using kfence_metadata without initialization in show_object() 2023-03-30 12:48:01 +02:00
Kconfig kmap_local: don't assume kmap PTEs are linear arrays in memory 2021-11-25 09:48:43 +01:00
Kconfig.debug
Makefile mm: introduce Data Access MONitor (DAMON) 2021-09-08 11:50:24 -07:00
backing-dev.c writeback: avoid use-after-free after removing device 2022-08-31 17:16:47 +02:00
balloon_compaction.c
bootmem_info.c bootmem: remove the vmemmap pages from kmemleak in put_page_bootmem 2022-08-31 17:16:48 +02:00
cleancache.c
cma.c Revert "mm/cma.c: remove redundant cma_mutex lock" 2022-06-09 10:23:27 +02:00
cma.h
cma_debug.c
cma_sysfs.c
compaction.c mm, compaction: fix fast_isolate_around() to stay within boundaries 2023-01-12 11:58:47 +01:00
debug.c mm/debug: sync up latest migrate_reason to migrate_reason_names 2021-09-24 16:13:35 -07:00
debug_page_ref.c
debug_vm_pgtable.c mm/debug_vm_pgtable: remove pte entry from the page table 2022-02-08 18:34:05 +01:00
dmapool.c
early_ioremap.c mm/early_ioremap.c: remove redundant early_ioremap_shutdown() 2021-09-08 11:50:24 -07:00
fadvise.c
failslab.c
filemap.c mm/filemap: fix page end in filemap_get_read_batch 2023-02-22 12:57:10 +01:00
frontswap.c
gup.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
gup_test.c
gup_test.h
highmem.c highmem: fix checks in __kmap_local_sched_{in,out} 2022-04-13 20:59:21 +02:00
hmm.c mm/hmm: fault non-owner device private entries 2022-08-03 12:03:54 +02:00
huge_memory.c mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage 2023-03-22 13:31:35 +01:00
hugetlb.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
hugetlb_cgroup.c
hugetlb_vmemmap.c
hugetlb_vmemmap.h
hwpoison-inject.c mm/hwpoison: avoid the impact of hwpoison_filter() return value on mce handler 2022-07-12 16:35:05 +02:00
init-mm.c
internal.h mm/numa: automatically generate node migration order 2021-09-03 09:58:16 -07:00
interval_tree.c
io-mapping.c
ioremap.c mm: move ioremap_page_range to vmalloc.c 2021-09-08 11:50:24 -07:00
khugepaged.c mm/khugepaged: fix collapse_pte_mapped_thp() to allow anon_vma 2023-01-24 07:22:49 +01:00
kmemleak.c Revert "mm: kmemleak: take a full lowmem check in kmemleak_*_phys()" 2022-09-15 11:30:00 +02:00
ksm.c mm/ksm: remove old GCC 4.9+ check 2021-09-13 10:18:28 -07:00
list_lru.c
maccess.c maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault() 2022-11-26 09:24:47 +01:00
madvise.c mm: fix madivse_pageout mishandling on non-LRU page 2022-10-05 10:39:39 +02:00
mapping_dirty_helpers.c
memblock.c Revert "mm: Always release pages to the buddy allocator in memblock_free_late()." 2023-02-22 12:57:07 +01:00
memcontrol.c mm: memcontrol: deprecate charge moving 2023-03-10 09:40:09 +01:00
memfd.c memfd: fix F_SEAL_WRITE after shmem huge page allocated 2022-03-08 19:12:48 +01:00
memory-failure.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
memory.c mm: take a page reference when removing device exclusive entries 2023-04-13 16:48:26 +02:00
memory_hotplug.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
mempolicy.c migrate: hugetlb: check for hugetlb shared PMD in node migration 2023-02-14 19:17:56 +01:00
mempool.c
memremap.c mm/memremap.c: map FS_DAX device memory as decrypted 2022-11-16 09:58:27 +01:00
memtest.c
migrate.c mm/migration: return errno when isolate_huge_page failed 2023-02-14 19:17:56 +01:00
mincore.c
mlock.c mm/mlock: fix potential imbalanced rlimit ucounts adjustment 2022-05-15 20:18:53 +02:00
mm_init.c
mmap.c mm/mmap: undo ->mmap() when arch_validate_flags() fails 2022-10-26 12:34:24 +02:00
mmap_lock.c
mmu_gather.c mm/khugepaged: fix GUP-fast interaction by sending IPI 2022-12-14 11:37:17 +01:00
mmu_notifier.c mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove() 2022-04-27 14:38:58 +02:00
mmzone.c
mprotect.c mm: don't try to NUMA-migrate COW pages that have other uses 2022-02-23 12:03:03 +01:00
mremap.c mmmremap.c: avoid pointless invalidate_range_start/end on mremap(old_size=0) 2022-04-13 20:59:22 +02:00
msync.c
nommu.c Merge tag 'denywrite-for-5.15' of git://github.com/davidhildenbrand/linux 2021-09-04 11:35:47 -07:00
oom_kill.c oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup 2022-04-27 14:38:58 +02:00
page-writeback.c writeback: avoid use-after-free after removing device 2022-08-31 17:16:47 +02:00
page_alloc.c Fix page corruption caused by racy check in __free_pages 2023-02-14 19:18:04 +01:00
page_counter.c
page_ext.c mm/migrate: add CPU hotplug to demotion #ifdef 2021-10-18 20:22:02 -10:00
page_idle.c mm/idle_page_tracking: make PG_idle reusable 2021-09-08 11:50:24 -07:00
page_io.c mm: fix unexpected zeroed page mapping with zram swap 2022-04-20 09:34:18 +02:00
page_isolation.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
page_owner.c mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE 2021-09-08 11:50:22 -07:00
page_poison.c
page_reporting.c
page_reporting.h
page_vma_mapped.c
pagewalk.c mm: pagewalk: Fix race between unmap and page walker 2022-09-08 12:28:05 +02:00
percpu-internal.h
percpu-km.c
percpu-stats.c
percpu-vm.c
percpu.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
pgalloc-track.h
pgtable-generic.c
process_vm_access.c
ptdump.c mm: pagewalk: Fix race between unmap and page walker 2022-09-08 12:28:05 +02:00
readahead.c
rmap.c mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse 2022-09-05 10:30:07 +02:00
rodata_test.c
secretmem.c mm: fix dereferencing possible ERR_PTR 2022-10-05 10:39:39 +02:00
shmem.c mm: shmem: don't truncate page if memory failure happens 2022-11-26 09:24:28 +01:00
shuffle.c
shuffle.h
slab.c mm/slab: Fix undefined init_cache_node_node() for NUMA and !SMP 2023-03-30 12:47:56 +02:00
slab.h mm, kfence: support kmem_dump_obj() for KFENCE objects 2022-04-27 14:38:51 +02:00
slab_common.c mm, kfence: support kmem_dump_obj() for KFENCE objects 2022-04-27 14:38:51 +02:00
slob.c mm, kfence: support kmem_dump_obj() for KFENCE objects 2022-04-27 14:38:51 +02:00
slub.c mm: slub: fix flush_cpu_slab()/__free_slab() invocations in task context. 2022-09-28 11:11:44 +02:00
sparse-vmemmap.c
sparse.c mm: introduce memmap_alloc() to unify memory map allocation 2021-09-03 09:58:15 -07:00
swap.c mm: fs: invalidate bh_lrus for only cold path 2021-09-24 16:13:35 -07:00
swap_cgroup.c
swap_slots.c
swap_state.c mm: swap: get rid of livelock in swapin readahead 2022-03-23 09:16:41 +01:00
swapfile.c mm/swap: fix swap_info_struct race between swapoff and get_swap_pages() 2023-04-13 16:48:26 +02:00
truncate.c Merge branch 'akpm' (patches from Andrew) 2021-09-03 10:08:28 -07:00
usercopy.c mm/usercopy: return 1 from hardened_usercopy __setup() handler 2022-04-08 14:24:14 +02:00
userfaultfd.c mm: shmem: don't truncate page if memory failure happens 2022-11-26 09:24:28 +01:00
util.c mm: vmalloc: introduce array allocation functions 2022-07-12 16:35:01 +02:00
vmacache.c
vmalloc.c mm: vmalloc: avoid warn_alloc noise caused by fatal signal 2023-04-13 16:48:25 +02:00
vmpressure.c mm/vmpressure: replace vmpressure_to_css() with vmpressure_to_memcg() 2021-09-03 09:58:17 -07:00
vmscan.c mm: __isolate_lru_page_prepare() in isolate_migratepages_block() 2022-12-08 11:28:44 +01:00
vmstat.c mm/vmstat: protect per cpu variables with preempt disable on RT 2021-09-08 15:32:34 -07:00
workingset.c memcg: sync flush only if periodic flush is delayed 2022-04-27 14:38:57 +02:00
z3fold.c
zbud.c
zpool.c
zsmalloc.c zsmalloc: fix races between asynchronous zspage free and page migration 2022-06-06 08:43:39 +02:00
zswap.c