WSL2-Linux-Kernel/Documentation
Muchun Song e7d324850b mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page
Patch series "Free the 2nd vmemmap page associated with each HugeTLB
page", v7.

This series can minimize the overhead of struct page for 2MB HugeTLB
pages significantly.  It further reduces the overhead of struct page by
12.5% for a 2MB HugeTLB compared to the previous approach, which means
2GB per 1TB HugeTLB.  It is a nice gain.  Comments and reviews are
welcome.  Thanks.

The main implementation and details can refer to the commit log of patch
1.  In this series, I have changed the following four helpers, the
following table shows the impact of the overhead of those helpers.

	+------------------+-----------------------+
	|       APIs       | head page | tail page |
	+------------------+-----------+-----------+
	|    PageHead()    |     Y     |     N     |
	+------------------+-----------+-----------+
	|    PageTail()    |     Y     |     N     |
	+------------------+-----------+-----------+
	|  PageCompound()  |     N     |     N     |
	+------------------+-----------+-----------+
	|  compound_head() |     Y     |     N     |
	+------------------+-----------+-----------+

	Y: Overhead is increased.
	N: Overhead is _NOT_ increased.

It shows that the overhead of those helpers on a tail page don't change
between "hugetlb_free_vmemmap=on" and "hugetlb_free_vmemmap=off".  But the
overhead on a head page will be increased when "hugetlb_free_vmemmap=on"
(except PageCompound()).  So I believe that Matthew Wilcox's folio series
will help with this.

The users of PageHead() and PageTail() are much less than compound_head()
and most users of PageTail() are VM_BUG_ON(), so I have done some tests
about the overhead of compound_head() on head pages.

I have tested the overhead of calling compound_head() on a head page,
which is 2.11ns (Measure the call time of 10 million times
compound_head(), and then average).

For a head page whose address is not aligned with PAGE_SIZE or a
non-compound page, the overhead of compound_head() is 2.54ns which is
increased by 20%.  For a head page whose address is aligned with
PAGE_SIZE, the overhead of compound_head() is 2.97ns which is increased by
40%.  Most pages are the former.  I do not think the overhead is
significant since the overhead of compound_head() itself is low.

This patch (of 5):

This patch minimizes the overhead of struct page for 2MB HugeTLB pages
significantly.  It further reduces the overhead of struct page by 12.5%
for a 2MB HugeTLB compared to the previous approach, which means 2GB per
1TB HugeTLB (2MB type).

After the feature of "Free sonme vmemmap pages of HugeTLB page" is
enabled, the mapping of the vmemmap addresses associated with a 2MB
HugeTLB page becomes the figure below.

     HugeTLB                    struct pages(8 pages)         page frame(8 pages)
 +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+---> PG_head
 |           |                     |     0     | -------------> |     0     |
 |           |                     +-----------+                +-----------+
 |           |                     |     1     | -------------> |     1     |
 |           |                     +-----------+                +-----------+
 |           |                     |     2     | ----------------^ ^ ^ ^ ^ ^
 |           |                     +-----------+                   | | | | |
 |           |                     |     3     | ------------------+ | | | |
 |           |                     +-----------+                     | | | |
 |           |                     |     4     | --------------------+ | | |
 |    2MB    |                     +-----------+                       | | |
 |           |                     |     5     | ----------------------+ | |
 |           |                     +-----------+                         | |
 |           |                     |     6     | ------------------------+ |
 |           |                     +-----------+                           |
 |           |                     |     7     | --------------------------+
 |           |                     +-----------+
 |           |
 |           |
 |           |
 +-----------+

As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and
remaped. However, the 2nd vmemmap page frame is also can be freed to
the buddy allocator, then we can change the mapping from the figure
above to the figure below.

    HugeTLB                    struct pages(8 pages)         page frame(8 pages)
 +-----------+ ---virt_to_page---> +-----------+   mapping to   +-----------+---> PG_head
 |           |                     |     0     | -------------> |     0     |
 |           |                     +-----------+                +-----------+
 |           |                     |     1     | ---------------^ ^ ^ ^ ^ ^ ^
 |           |                     +-----------+                  | | | | | |
 |           |                     |     2     | -----------------+ | | | | |
 |           |                     +-----------+                    | | | | |
 |           |                     |     3     | -------------------+ | | | |
 |           |                     +-----------+                      | | | |
 |           |                     |     4     | ---------------------+ | | |
 |    2MB    |                     +-----------+                        | | |
 |           |                     |     5     | -----------------------+ | |
 |           |                     +-----------+                          | |
 |           |                     |     6     | -------------------------+ |
 |           |                     +-----------+                            |
 |           |                     |     7     | ---------------------------+
 |           |                     +-----------+
 |           |
 |           |
 |           |
 +-----------+

After we do this, all tail vmemmap pages (1-7) are mapped to the head
vmemmap page frame (0).  In other words, there are more than one page
struct with PG_head associated with each HugeTLB page.  We __know__ that
there is only one head page struct, the tail page structs with PG_head are
fake head page structs.  We need an approach to distinguish between those
two different types of page structs so that compound_head(), PageHead()
and PageTail() can work properly if the parameter is the tail page struct
but with PG_head.

The following code snippet describes how to distinguish between real and
fake head page struct.

	if (test_bit(PG_head, &page->flags)) {
		unsigned long head = READ_ONCE(page[1].compound_head);

		if (head & 1) {
			if (head == (unsigned long)page + 1)
				==> head page struct
			else
				==> tail page struct
		} else
			==> head page struct
	}

We can safely access the field of the @page[1] with PG_head because the
@page is a compound page composed with at least two contiguous pages.

[songmuchun@bytedance.com: restore lost comment changes]

Link: https://lkml.kernel.org/r/20211101031651.75851-1-songmuchun@bytedance.com
Link: https://lkml.kernel.org/r/20211101031651.75851-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Barry Song <song.bao.hua@hisilicon.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Chen Huang <chenhuang5@huawei.com>
Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:08 -07:00
..
ABI Power Supply Fixes for 5.17 cycle 2022-02-20 11:07:46 -08:00
PCI
RCU Merge branches 'doc.2021.11.30c', 'exp.2021.12.07a', 'fastnohz.2021.11.30c', 'fixes.2021.11.30c', 'nocb.2021.12.09a', 'nolibc.2021.11.30c', 'tasks.2021.12.09a', 'torture.2021.12.07a' and 'torturescript.2021.11.30c' into HEAD 2021-12-09 11:38:09 -08:00
accounting - A bunch of fixes: forced idle time accounting, utilization values 2022-01-23 17:35:27 +02:00
admin-guide mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page 2022-03-22 15:57:08 -07:00
arc docs: ARC: Improve readability 2021-12-10 14:28:01 -07:00
arm Documentation: arm: marvell: Extend Avanta list 2022-01-27 11:22:34 -07:00
arm64 KVM: arm64: Workaround Cortex-A510's single-step and PAC trap errata 2022-02-03 09:22:30 +00:00
block docs: block: remove queue-sysfs.rst 2022-01-09 18:59:10 -07:00
bpf bpf, docs: Fully document the JMP mode modifiers 2022-01-05 13:11:26 -08:00
cdrom
core-api mm: document and polish read-ahead code 2022-03-22 15:57:00 -07:00
cpu-freq cpufreq: Reintroduce ready() callback 2022-02-09 13:18:49 +05:30
crypto
dev-tools linux-kselftest-kunit-fixes-5.17-rc4 2022-02-10 15:39:59 -08:00
devicetree ARM: SoC fixes for 5.17, part 3 2022-03-10 11:43:01 -08:00
doc-guide docs: discourage use of list tables 2022-01-07 09:33:13 -07:00
driver-api Three small documentation fixes. 2022-01-22 09:02:57 +02:00
fault-injection
fb
features ARM: 9158/1: leave it to core code to manage thread_info::cpu 2021-12-17 11:34:31 +00:00
filesystems fs: introduce alloc_inode_sb() to allocate filesystems specific inode 2022-03-22 15:57:03 -07:00
firmware-guide Device properties framework updates for 5.17-rc1 2022-01-10 20:48:19 -08:00
firmware_class
fpga
gpu Revert "fbcon: Disable accelerated scrolling" 2022-02-02 15:15:11 +01:00
hid
hwmon hwmon/pmbus: (ir38064) Add support for IR38060, IR38164 IR38263 2021-12-26 15:02:07 -08:00
i2c Docs: Fixes link to I2C specification 2021-12-31 14:39:28 +01:00
ia64
ide
iio
infiniband
input
isdn
kbuild doc: kbuild: fix default in `imply` table 2022-01-08 18:28:21 +09:00
kernel-hacking docs: fix typo in Documentation/kernel-hacking/locking.rst 2022-01-27 11:22:33 -07:00
leds
litmus-tests
livepatch Documentation: livepatch: Add livepatch API page 2021-12-23 11:35:53 +01:00
locking Documentation/locking/locktypes: Update migrate_disable() bits. 2021-11-30 15:40:31 +01:00
m68k
maintainer
mhi
mips
misc-devices
netlabel
networking This isn't a hugely busy cycle for documentation, but a few significant 2022-01-11 10:00:04 -08:00
nios2
nvdimm
openrisc
parisc
pcmcia
power Merge branches 'pm-opp', 'pm-devfreq' and 'powercap' 2022-01-10 18:00:31 +01:00
powerpc
process Kbuild updates for v5.17 2022-01-19 11:15:19 +02:00
riscv riscv: Move KASAN mapping next to the kernel mapping 2022-01-19 17:54:04 -08:00
s390
scheduler docs/scheduler: fix typo and warning in sched-bwc 2021-12-06 12:15:49 -07:00
scsi
security docs: update self-protection __ro_after_init status 2021-12-10 14:02:06 -07:00
sh
sound ALSA: hda/realtek: Add new alc285-hp-amp-init model 2021-12-14 10:44:26 +01:00
sparc
sphinx docs: automarkup.py: Fix invalid HTML link output and broken URI fragments 2022-01-07 09:32:58 -07:00
sphinx-static docs: add support for RTD dark mode 2021-12-10 14:05:55 -07:00
spi spi: pxa2xx: Get rid of unused enable_loopback member 2021-11-29 12:20:00 +00:00
staging Three small documentation fixes. 2022-01-22 09:02:57 +02:00
target
timers rcu: Remove the RCU_FAST_NO_HZ Kconfig option 2021-11-30 17:24:47 -08:00
tools Tracing fixes for 5.17: 2022-02-26 12:10:17 -08:00
trace Three small documentation fixes. 2022-01-22 09:02:57 +02:00
translations cpufreq: Reintroduce ready() callback 2022-02-09 13:18:49 +05:30
tty Documentation: add TTY chapter 2021-11-26 16:27:43 +01:00
usb docs: ABI: fixed req_number desc in UAC1 2021-12-30 12:10:44 +01:00
userspace-api xen: update missing ioctl magic numers documentation 2022-02-03 08:24:34 +01:00
virt Merge branch 'kvm-ppc-cap-210' into kvm-master 2022-02-22 09:07:16 -05:00
vm docs/vm: Fix typo in *harden* 2022-01-27 11:22:34 -07:00
w1
watchdog
x86 x86/sgx: Fix minor documentation issues 2021-11-17 06:36:09 -08:00
xtensa
.gitignore
COPYING-logo
Changes
CodingStyle
Kconfig
Makefile docs: address some text issues with css/theme support 2021-12-16 15:54:12 -07:00
SubmittingPatches
arch.rst docs: Add documentation for ARC processors 2021-11-29 14:53:11 -07:00
asm-annotations.rst
atomic_bitops.txt
atomic_t.txt
conf.py docs: add support for RTD dark mode 2021-12-10 14:05:55 -07:00
docutils.conf
dontdiff
index.rst docs: Hook the RTLA documents into the kernel docs build 2022-01-27 11:20:39 -07:00
logo.gif
memory-barriers.txt asm-generic: introduce io_stop_wc() and add implementation for ARM64 2021-12-22 10:44:53 +00:00
watch_queue.rst