WSL2-Linux-Kernel

История

Luiz Capitulino 944d9fec8d hugetlb: add support for gigantic page allocation at runtime HugeTLB is limited to allocating hugepages whose size are less than MAX_ORDER order. This is so because HugeTLB allocates hugepages via the buddy allocator. Gigantic pages (that is, pages whose size is greater than MAX_ORDER order) have to be allocated at boottime. However, boottime allocation has at least two serious problems. First, it doesn't support NUMA and second, gigantic pages allocated at boottime can't be freed. This commit solves both issues by adding support for allocating gigantic pages during runtime. It works just like regular sized hugepages, meaning that the interface in sysfs is the same, it supports NUMA, and gigantic pages can be freed. For example, on x86_64 gigantic pages are 1GB big. To allocate two 1G gigantic pages on node 1, one can do: # echo 2 > \ /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages And to free them all: # echo 0 > \ /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages The one problem with gigantic page allocation at runtime is that it can't be serviced by the buddy allocator. To overcome that problem, this commit scans all zones from a node looking for a large enough contiguous region. When one is found, it's allocated by using CMA, that is, we call alloc_contig_range() to do the actual allocation. For example, on x86_64 we scan all zones looking for a 1GB contiguous region. When one is found, it's allocated by alloc_contig_range(). One expected issue with that approach is that such gigantic contiguous regions tend to vanish as runtime goes by. The best way to avoid this for now is to make gigantic page allocations very early during system boot, say from a init script. Other possible optimization include using compaction, which is supported by CMA but is not explicitly used by this commit. It's also important to note the following: 1. Gigantic pages allocated at boottime by the hugepages= command-line option can be freed at runtime just fine 2. This commit adds support for gigantic pages only to x86_64. The reason is that I don't have access to nor experience with other archs. The code is arch indepedent though, so it should be simple to add support to different archs 3. I didn't add support for hugepage overcommit, that is allocating a gigantic page on demand when /proc/sys/vm/nr_overcommit_hugepages > 0. The reason is that I don't think it's reasonable to do the hard and long work required for allocating a gigantic page at fault time. But it should be simple to add this if wanted [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by: Davidlohr Bueso <davidlohr@hp.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Rik van Riel <riel@redhat.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2014-06-04 16:53:59 -07:00
..
Kconfig	hugetlb: restrict hugepage_migration_support() to x86_64	2014-06-04 16:53:51 -07:00
Kconfig.debug	mm: more intensive memory corruption debugging	2012-01-10 16:30:42 -08:00
Makefile	block: move mm/bounce.c to block/	2014-05-19 20:01:52 -06:00
backing-dev.c	arch: Mass conversion of smp_mb__*()	2014-04-18 14:20:48 +02:00
balloon_compaction.c	mm: print more details for bad_page()	2014-01-23 16:36:50 -08:00
bootmem.c	mm/bootmem.c: remove unused local `map'	2013-11-13 12:09:09 +09:00
cleancache.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
compaction.c	mm/compaction: make isolate_freepages start at pageblock boundary	2014-05-06 13:04:59 -07:00
debug-pagealloc.c	mm, x86: Remove debug_pagealloc_enabled	2011-12-06 09:24:07 +01:00
dmapool.c	mm: Fix printk typo in dmapool.c	2014-05-05 15:44:47 +02:00
early_ioremap.c	mm: create generic early_ioremap() support	2014-04-07 16:36:15 -07:00
fadvise.c	teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long	2013-03-03 22:46:22 -05:00
failslab.c	switch debugfs to umode_t	2012-01-03 22:54:56 -05:00
filemap.c	Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next	2014-06-03 12:57:53 -07:00
filemap_xip.c	seqcount: Add lockdep functionality to seqcount/seqlock structures	2013-11-06 12:40:26 +01:00
fremap.c	mm: softdirty: make freshly remapped file pages being softdirty unconditionally	2014-06-04 16:53:56 -07:00
frontswap.c	frontswap: fix incorrect zeroing and allocation size for frontswap_map	2013-06-12 16:29:46 -07:00
highmem.c	Some nice cleanups, and even a patch my wife did as a "live" demo for	2012-12-20 08:37:05 -08:00
huge_memory.c	mm/huge_memory.c: complete conversion to pr_foo()	2014-06-04 16:53:58 -07:00
hugetlb.c	hugetlb: add support for gigantic page allocation at runtime	2014-06-04 16:53:59 -07:00
hugetlb_cgroup.c	cgroup: drop const from @buffer of cftype->write_string()	2014-03-19 10:23:54 -04:00
hwpoison-inject.c	mm/hwpoison: add '#' to hwpoison_inject	2014-01-21 16:19:48 -08:00
init-mm.c	atomic: use <linux/atomic.h>	2011-07-26 16:49:47 -07:00
internal.h	mm/readahead.c: inline ra_submit	2014-04-07 16:35:58 -07:00
interval_tree.c	mm: add CONFIG_DEBUG_VM_RB build option	2012-10-09 16:22:42 +09:00
iov_iter.c	take iov_iter stuff to mm/iov_iter.c	2014-04-01 23:19:30 -04:00
kmemcheck.c	kmemcheck: Fix build errors due to missing slab.h	2010-03-30 22:02:32 +09:00
kmemleak-test.c	kmemleak: remove memset by using kzalloc	2011-01-27 18:31:51 +00:00
kmemleak.c	mm: postpone the disabling of kmemleak early logging	2014-05-11 17:55:48 +09:00
ksm.c	mm: close PageTail race	2014-03-04 07:55:47 -08:00
list_lru.c	mm: keep page cache radix tree nodes in check	2014-04-03 16:21:01 -07:00
maccess.c	mm: Map most files to use export.h instead of module.h	2011-10-31 09:20:12 -04:00
madvise.c	mm: madvise: fix MADV_WILLNEED on shmem swapouts	2014-05-23 09:37:29 -07:00
memblock.c	memblock: introduce memblock_alloc_range()	2014-06-04 16:53:57 -07:00
memcontrol.c	mm: memcontrol: remove hierarchy restrictions for swappiness and oom_control	2014-06-04 16:53:58 -07:00
memory-failure.c	mm/memory-failure.c: fix memory leak by race between poison and unpoison	2014-05-23 09:37:30 -07:00
memory.c	x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels	2014-06-04 16:53:55 -07:00
memory_hotplug.c	mm/memory_hotplug.c: move register_memory_resource out of the lock_memory_hotplug	2014-01-23 16:36:52 -08:00
mempolicy.c	mm, mempolicy: remove per-process flag	2014-04-07 16:35:54 -07:00
mempool.c	mm/mempool: warn about __GFP_ZERO usage	2014-06-04 16:53:58 -07:00
migrate.c	mm: fix swapops.h:131 bug if remap_file_pages raced migration	2014-03-20 22:09:09 -07:00
mincore.c	mm + fs: prepare for non-page entries in page cache radix trees	2014-04-03 16:21:00 -07:00
mlock.c	mm: try_to_unmap_cluster() should lock_page() before mlocking	2014-04-07 16:35:57 -07:00
mm_init.c	mm: bring back /sys/kernel/mm	2014-01-27 21:02:39 -08:00
mmap.c	mm/mmap.c: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO	2014-06-04 16:53:58 -07:00
mmu_context.c	sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm	2014-02-21 08:50:17 +01:00
mmu_notifier.c	mm: audit/fix non-modular users of module_init in core code	2014-01-23 16:36:52 -08:00
mmzone.c	mm: numa: Change page last {nid,pid} into {cpu,pid}	2013-10-09 14:47:45 +02:00
mprotect.c	mm: move mmu notifier call from change_protection to change_pmd_range	2014-04-07 16:35:50 -07:00
mremap.c	mm, thp: close race between mremap() and split_huge_page()	2014-05-11 17:55:48 +09:00
msync.c	sanitize vfs_fsync calling conventions	2010-05-21 18:31:21 -04:00
nobootmem.c	mm/nobootmem.c: mark function as static	2014-04-03 16:21:02 -07:00
nommu.c	mm: fix 'ERROR: do not initialise globals to 0 or NULL' and coding style	2014-04-07 16:35:55 -07:00
oom_kill.c	mm, oom: base root bonus on current usage	2014-01-30 16:56:56 -08:00
page-writeback.c	mm/page-writeback.c: fix divide by zero in pos_ratio_polynom	2014-05-06 13:04:58 -07:00
page_alloc.c	mm: get rid of __GFP_KMEMCG	2014-06-04 16:53:56 -07:00
page_cgroup.c	mm/page_cgroup.c: mark functions as static	2014-04-03 16:21:02 -07:00
page_io.c	Merge branch 'for-3.14/core' of git://git.kernel.dk/linux-block	2014-01-30 11:19:05 -08:00
page_isolation.c	mm: memory-hotplug: enable memory hotplug to handle hugepage	2013-09-11 15:57:48 -07:00
pagewalk.c	mm/pagewalk.c: fix walk_page_range() access of wrong PTEs	2013-10-30 14:27:03 -07:00
percpu-km.c	percpu: clear memory allocated with the km allocator	2010-10-02 10:28:42 +03:00
percpu-vm.c	mm: fix kernel-doc warnings	2012-06-20 14:39:36 -07:00
percpu.c	percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree()	2014-04-14 16:18:06 -04:00
pgtable-generic.c	mm: fix TLB flush race between migration, and change_protection_range	2013-12-18 19:04:51 -08:00
process_vm_access.c	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2014-04-12 14:49:50 -07:00
quicklist.c	mm: delete various needless include <linux/module.h>	2011-10-31 09:20:11 -04:00
readahead.c	mm/readahead.c: inline ra_submit	2014-04-07 16:35:58 -07:00
rmap.c	mm: softdirty: don't forget to save file map softdiry bit on unmap	2014-06-04 16:53:56 -07:00
shmem.c	mm: Initialize error in shmem_file_aio_read()	2014-04-13 14:10:26 -07:00
slab.c	sl[au]b: charge slabs to kmemcg explicitly	2014-06-04 16:53:56 -07:00
slab.h	sl[au]b: charge slabs to kmemcg explicitly	2014-06-04 16:53:56 -07:00
slab_common.c	slab: document kmalloc_order	2014-06-04 16:53:58 -07:00
slob.c	mm: slab/slub: use page->list consistently instead of page->lru	2014-04-11 10:06:06 +03:00
slub.c	mm: get rid of __GFP_KMEMCG	2014-06-04 16:53:56 -07:00
sparse-vmemmap.c	mm/sparse: use memblock apis for early memory allocations	2014-01-21 16:19:47 -08:00
sparse.c	mm: use macros from compiler.h instead of __attribute__((...))	2014-04-07 16:35:54 -07:00
swap.c	mm: thrash detection-based file cache sizing	2014-04-03 16:21:01 -07:00
swap_state.c	swap: add a simple detector for inappropriate swapin readahead	2014-02-06 13:48:51 -08:00
swapfile.c	mm/swap: fix race on swap_info reuse between swapoff and swapon	2014-02-06 13:48:51 -08:00
truncate.c	mm: filemap: update find_get_pages_tag() to deal with shadow entries	2014-05-06 13:04:59 -07:00
util.c	nick kvfree() from apparmor	2014-05-06 14:02:53 -04:00
vmacache.c	mm,vmacache: optimize overflow system-wide flushing	2014-06-04 16:53:57 -07:00
vmalloc.c	mm/vmalloc.c: enhance vm_map_ram() comment	2014-04-07 16:35:55 -07:00
vmpressure.c	arm, pm, vmpressure: add missing slab.h includes	2014-02-03 13:24:01 -05:00
vmscan.c	mm: only force scan in reclaim when none of the LRUs are big enough.	2014-06-04 16:53:56 -07:00
vmstat.c	mm,vmacache: add debug data	2014-06-04 16:53:57 -07:00
workingset.c	mm: keep page cache radix tree nodes in check	2014-04-03 16:21:01 -07:00
zbud.c	mm/zbud: fix some trivial typos in comments	2013-09-11 15:57:35 -07:00
zsmalloc.c	zsmalloc: Fix CPU hotplug callback registration	2014-03-20 13:43:45 +01:00
zswap.c	Merge branch 'akpm' (incoming from Andrew)	2014-04-07 16:38:06 -07:00