mm: hugetlb_vmemmap: move code comments to vmemmap_dedup.rst
All the comments which explains how HVO works are moved to
vmemmap_dedup.rst since
commit 4917f55b4e
("mm/sparse-vmemmap: improve memory savings for compound devmaps")
except some comments above page_fixed_fake_head(). This commit moves
those comments to vmemmap_dedup.rst and improve vmemmap_dedup.rst as well.
Link: https://lkml.kernel.org/r/20220628092235.91270-8-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
Родитель
6213834c10
Коммит
838691a1c0
|
@ -9,23 +9,23 @@ HugeTLB
|
|||
|
||||
This section is to explain how HugeTLB Vmemmap Optimization (HVO) works.
|
||||
|
||||
The struct page structures (page structs) are used to describe a physical
|
||||
page frame. By default, there is a one-to-one mapping from a page frame to
|
||||
it's corresponding page struct.
|
||||
The ``struct page`` structures are used to describe a physical page frame. By
|
||||
default, there is a one-to-one mapping from a page frame to it's corresponding
|
||||
``struct page``.
|
||||
|
||||
HugeTLB pages consist of multiple base page size pages and is supported by many
|
||||
architectures. See Documentation/admin-guide/mm/hugetlbpage.rst for more
|
||||
details. On the x86-64 architecture, HugeTLB pages of size 2MB and 1GB are
|
||||
currently supported. Since the base page size on x86 is 4KB, a 2MB HugeTLB page
|
||||
consists of 512 base pages and a 1GB HugeTLB page consists of 4096 base pages.
|
||||
For each base page, there is a corresponding page struct.
|
||||
For each base page, there is a corresponding ``struct page``.
|
||||
|
||||
Within the HugeTLB subsystem, only the first 4 page structs are used to
|
||||
contain unique information about a HugeTLB page. __NR_USED_SUBPAGE provides
|
||||
this upper limit. The only 'useful' information in the remaining page structs
|
||||
Within the HugeTLB subsystem, only the first 4 ``struct page`` are used to
|
||||
contain unique information about a HugeTLB page. ``__NR_USED_SUBPAGE`` provides
|
||||
this upper limit. The only 'useful' information in the remaining ``struct page``
|
||||
is the compound_head field, and this field is the same for all tail pages.
|
||||
|
||||
By removing redundant page structs for HugeTLB pages, memory can be returned
|
||||
By removing redundant ``struct page`` for HugeTLB pages, memory can be returned
|
||||
to the buddy allocator for other uses.
|
||||
|
||||
Different architectures support different HugeTLB pages. For example, the
|
||||
|
@ -46,7 +46,7 @@ page.
|
|||
| | 64KB | 2MB | 512MB | 16GB | |
|
||||
+--------------+-----------+-----------+-----------+-----------+-----------+
|
||||
|
||||
When the system boot up, every HugeTLB page has more than one struct page
|
||||
When the system boot up, every HugeTLB page has more than one ``struct page``
|
||||
structs which size is (unit: pages)::
|
||||
|
||||
struct_size = HugeTLB_Size / PAGE_SIZE * sizeof(struct page) / PAGE_SIZE
|
||||
|
@ -76,10 +76,10 @@ Where n is how many pte entries which one page can contains. So the value of
|
|||
n is (PAGE_SIZE / sizeof(pte_t)).
|
||||
|
||||
This optimization only supports 64-bit system, so the value of sizeof(pte_t)
|
||||
is 8. And this optimization also applicable only when the size of struct page
|
||||
is a power of two. In most cases, the size of struct page is 64 bytes (e.g.
|
||||
is 8. And this optimization also applicable only when the size of ``struct page``
|
||||
is a power of two. In most cases, the size of ``struct page`` is 64 bytes (e.g.
|
||||
x86-64 and arm64). So if we use pmd level mapping for a HugeTLB page, the
|
||||
size of struct page structs of it is 8 page frames which size depends on the
|
||||
size of ``struct page`` structs of it is 8 page frames which size depends on the
|
||||
size of the base page.
|
||||
|
||||
For the HugeTLB page of the pud level mapping, then::
|
||||
|
@ -88,7 +88,7 @@ For the HugeTLB page of the pud level mapping, then::
|
|||
= PAGE_SIZE / 8 * 8 (pages)
|
||||
= PAGE_SIZE (pages)
|
||||
|
||||
Where the struct_size(pmd) is the size of the struct page structs of a
|
||||
Where the struct_size(pmd) is the size of the ``struct page`` structs of a
|
||||
HugeTLB page of the pmd level mapping.
|
||||
|
||||
E.g.: A 2MB HugeTLB page on x86_64 consists in 8 page frames while 1GB
|
||||
|
@ -96,7 +96,7 @@ HugeTLB page consists in 4096.
|
|||
|
||||
Next, we take the pmd level mapping of the HugeTLB page as an example to
|
||||
show the internal implementation of this optimization. There are 8 pages
|
||||
struct page structs associated with a HugeTLB page which is pmd mapped.
|
||||
``struct page`` structs associated with a HugeTLB page which is pmd mapped.
|
||||
|
||||
Here is how things look before optimization::
|
||||
|
||||
|
@ -124,10 +124,10 @@ Here is how things look before optimization::
|
|||
+-----------+
|
||||
|
||||
The value of page->compound_head is the same for all tail pages. The first
|
||||
page of page structs (page 0) associated with the HugeTLB page contains the 4
|
||||
page structs necessary to describe the HugeTLB. The only use of the remaining
|
||||
pages of page structs (page 1 to page 7) is to point to page->compound_head.
|
||||
Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of page structs
|
||||
page of ``struct page`` (page 0) associated with the HugeTLB page contains the 4
|
||||
``struct page`` necessary to describe the HugeTLB. The only use of the remaining
|
||||
pages of ``struct page`` (page 1 to page 7) is to point to page->compound_head.
|
||||
Therefore, we can remap pages 1 to 7 to page 0. Only 1 page of ``struct page``
|
||||
will be used for each HugeTLB page. This will allow us to free the remaining
|
||||
7 pages to the buddy allocator.
|
||||
|
||||
|
@ -169,13 +169,37 @@ entries that can be cached in a single TLB entry.
|
|||
|
||||
The contiguous bit is used to increase the mapping size at the pmd and pte
|
||||
(last) level. So this type of HugeTLB page can be optimized only when its
|
||||
size of the struct page structs is greater than 1 page.
|
||||
size of the ``struct page`` structs is greater than **1** page.
|
||||
|
||||
Notice: The head vmemmap page is not freed to the buddy allocator and all
|
||||
tail vmemmap pages are mapped to the head vmemmap page frame. So we can see
|
||||
more than one struct page struct with PG_head (e.g. 8 per 2 MB HugeTLB page)
|
||||
associated with each HugeTLB page. The compound_head() can handle this
|
||||
correctly (more details refer to the comment above compound_head()).
|
||||
more than one ``struct page`` struct with ``PG_head`` (e.g. 8 per 2 MB HugeTLB
|
||||
page) associated with each HugeTLB page. The ``compound_head()`` can handle
|
||||
this correctly. There is only **one** head ``struct page``, the tail
|
||||
``struct page`` with ``PG_head`` are fake head ``struct page``. We need an
|
||||
approach to distinguish between those two different types of ``struct page`` so
|
||||
that ``compound_head()`` can return the real head ``struct page`` when the
|
||||
parameter is the tail ``struct page`` but with ``PG_head``. The following code
|
||||
snippet describes how to distinguish between real and fake head ``struct page``.
|
||||
|
||||
.. code-block:: c
|
||||
|
||||
if (test_bit(PG_head, &page->flags)) {
|
||||
unsigned long head = READ_ONCE(page[1].compound_head);
|
||||
|
||||
if (head & 1) {
|
||||
if (head == (unsigned long)page + 1)
|
||||
/* head struct page */
|
||||
else
|
||||
/* tail struct page */
|
||||
} else {
|
||||
/* head struct page */
|
||||
}
|
||||
}
|
||||
|
||||
We can safely access the field of the **page[1]** with ``PG_head`` because the
|
||||
page is a compound page composed with at least two contiguous pages.
|
||||
The implementation refers to ``page_fixed_fake_head()``.
|
||||
|
||||
Device DAX
|
||||
==========
|
||||
|
@ -189,7 +213,7 @@ PMD_SIZE (2M on x86_64) and PUD_SIZE (1G on x86_64).
|
|||
|
||||
The differences with HugeTLB are relatively minor.
|
||||
|
||||
It only use 3 page structs for storing all information as opposed
|
||||
It only use 3 ``struct page`` for storing all information as opposed
|
||||
to 4 on HugeTLB pages.
|
||||
|
||||
There's no remapping of vmemmap given that device-dax memory is not part of
|
||||
|
|
|
@ -208,19 +208,8 @@ enum pageflags {
|
|||
DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key);
|
||||
|
||||
/*
|
||||
* If HVO is enabled, the head vmemmap page frame is reused and all of the tail
|
||||
* vmemmap addresses map to the head vmemmap page frame (furture details can
|
||||
* refer to the figure at the head of the mm/hugetlb_vmemmap.c). In other
|
||||
* words, there are more than one page struct with PG_head associated with each
|
||||
* HugeTLB page. We __know__ that there is only one head page struct, the tail
|
||||
* page structs with PG_head are fake head page structs. We need an approach
|
||||
* to distinguish between those two different types of page structs so that
|
||||
* compound_head() can return the real head page struct when the parameter is
|
||||
* the tail page struct but with PG_head.
|
||||
*
|
||||
* The page_fixed_fake_head() returns the real head page struct if the @page is
|
||||
* fake page head, otherwise, returns @page which can either be a true page
|
||||
* head or tail.
|
||||
* Return the real head page struct iff the @page is a fake head page, otherwise
|
||||
* return the @page itself. See Documentation/mm/vmemmap_dedup.rst.
|
||||
*/
|
||||
static __always_inline const struct page *page_fixed_fake_head(const struct page *page)
|
||||
{
|
||||
|
|
Загрузка…
Ссылка в новой задаче