WSL2-Linux-Kernel/arch/s390/mm
Alexander Gordeev 9d28671017 s390/mm: fix 2KB pgtable release race
commit c2c224932f upstream.

There is a race on concurrent 2KB-pgtables release paths when
both upper and lower halves of the containing parent page are
freed, one via page_table_free_rcu() + __tlb_remove_table(),
and the other via page_table_free(). The race might lead to a
corruption as result of remove of list item in page_table_free()
concurrently with __free_page() in __tlb_remove_table().

Let's assume first the lower and next the upper 2KB-pgtables are
freed from a page. Since both halves of the page are allocated
the tracking byte (bits 24-31 of the page _refcount) has value
of 0x03 initially:

CPU0				CPU1
----				----

page_table_free_rcu() // lower half
{
	// _refcount[31..24] == 0x03
	...
	atomic_xor_bits(&page->_refcount,
			0x11U << (0 + 24));
	// _refcount[31..24] <= 0x12
	...
	table = table | (1U << 0);
	tlb_remove_table(tlb, table);
}
...
__tlb_remove_table()
{
	// _refcount[31..24] == 0x12
	mask = _table & 3;
	// mask <= 0x01
	...

				page_table_free() // upper half
				{
					// _refcount[31..24] == 0x12
					...
					atomic_xor_bits(
						&page->_refcount,
						1U << (1 + 24));
					// _refcount[31..24] <= 0x10
					// mask <= 0x10
					...
	atomic_xor_bits(&page->_refcount,
			mask << (4 + 24));
	// _refcount[31..24] <= 0x00
	// mask <= 0x00
	...
	if (mask != 0) // == false
		break;
	fallthrough;
	...
					if (mask & 3) // == false
						...
					else
	__free_page(page);			list_del(&page->lru);
	^^^^^^^^^^^^^^^^^^	RACE!		^^^^^^^^^^^^^^^^^^^^^
}					...
				}

The problem is page_table_free() releases the page as result of
lower nibble unset and __tlb_remove_table() observing zero too
early. With this update page_table_free() will use the similar
logic as page_table_free_rcu() + __tlb_remove_table(), and mark
the fragment as pending for removal in the upper nibble until
after the list_del().

In other words, the parent page is considered as unreferenced and
safe to release only when the lower nibble is cleared already and
unsetting a bit in upper nibble results in that nibble turned zero.

Cc: stable@vger.kernel.org
Suggested-by: Vlastimil Babka <vbabka@suse.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27 11:05:10 +01:00
..
Makefile s390: add ARCH_HAS_DEBUG_WX support 2020-09-14 11:38:35 +02:00
cmm.c mm: remove unneeded includes of <asm/pgalloc.h> 2020-08-07 11:33:26 -07:00
dump_pagetables.c s390: add kfence region to pagetable dumper 2021-07-30 17:09:02 +02:00
extmem.c s390/extmem: remove stale -ENOSPC comment and handling 2020-07-03 10:49:16 +02:00
fault.c Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly" 2021-09-07 11:03:45 -07:00
gmap.c s390/gmap: don't unconditionally call pte_unmap_unlock() in __gmap_zap() 2021-11-18 19:16:40 +01:00
hugetlbpage.c hugetlb: pass vma into huge_pte_alloc() and huge_pmd_share() 2021-05-05 11:27:20 -07:00
init.c Merge branch 'akpm' (patches from Andrew) 2021-09-08 12:55:35 -07:00
kasan_init.c s390/kasan: fix large PMD pages address alignment check 2021-08-25 11:03:33 +02:00
maccess.c s390: replace deprecated CPU-hotplug functions 2021-08-05 14:10:53 +02:00
mmap.c mm: remove unneeded includes of <asm/pgalloc.h> 2020-08-07 11:33:26 -07:00
page-states.c s390/mm: remove unused cmma functions 2021-08-18 10:01:28 +02:00
pageattr.c s390/mm,pageattr: fix walk_pte_level() early exit 2021-08-25 11:03:34 +02:00
pgalloc.c s390/mm: fix 2KB pgtable release race 2022-01-27 11:05:10 +01:00
pgtable.c s390/mm: fix VMA and page table handling code in storage key handling functions 2021-11-18 19:16:40 +01:00
vmem.c s390: rename dma section to amode31 2021-08-05 14:10:53 +02:00