WSL2-Linux-Kernel

История

Mel Gorman 3193913ce6 mm: page_alloc: default node-ordering on 64-bit NUMA, zone-ordering on 32-bit Zones are allocated by the page allocator in either node or zone order. Node ordering is preferred in terms of locality and is applied automatically in one of three cases: 1. If a node has only low memory 2. If DMA/DMA32 is a high percentage of memory 3. If low memory on a single node is greater than 70% of the node size Otherwise zone ordering is used to preserve low memory for devices that require it. Unfortunately a consequence of this is that applications running on a machine with balanced NUMA nodes will experience different performance characteristics depending on which node they happen to start from. The point of zone ordering is to protect lower zones for devices that require DMA/DMA32 memory. When NUMA was first introduced, this was critical as 32-bit NUMA machines existed and exhausting low memory triggered OOMs easily as so many allocations required low memory. On 64-bit machines the primary concern is devices that are 32-bit only which is less severe than the low memory exhaustion problem on 32-bit NUMA. It seems there are really few devices that depends on it. AGP -- I assume this is getting more rare but even then I think the allocations happen early in boot time where lowmem pressure is less of a problem DRM -- If the device is 32-bit only then there may be low pressure. I didn't evaluate these in detail but it looks like some of these are mobile graphics card. Not many NUMA laptops out there. DRM folk should know better though. Some TV cards -- Much demand for 32-bit capable TV cards on NUMA machines? B43 wireless card -- again not really a NUMA thing. I cannot find a good reason to incur a performance penalty on all 64-bit NUMA machines in case someone throws a brain damanged TV or graphics card in there. This patch defaults to node-ordering on 64-bit NUMA machines. I was tempted to make it default everywhere but I understand that some embedded arches may be using 32-bit NUMA where I cannot predict the consequences. The performance impact depends on the workload and the characteristics of the machine and the machine I tested on had a large Normal zone on node 0 so the impact is within the noise for the majority of tests. The allocation stats show more allocation requests were from DMA32 and local node. Running SpecJBB with multiple JVMs and automatic NUMA balancing disabled the results were specjbb 3.17.0-rc2 3.17.0-rc2 vanilla nodeorder-v1r1 Min 1 29534.00 ( 0.00%) 30020.00 ( 1.65%) Min 10 115717.00 ( 0.00%) 134038.00 ( 15.83%) Min 19 109718.00 ( 0.00%) 114186.00 ( 4.07%) Min 28 104459.00 ( 0.00%) 103639.00 ( -0.78%) Min 37 98245.00 ( 0.00%) 103756.00 ( 5.61%) Min 46 97198.00 ( 0.00%) 96197.00 ( -1.03%) Mean 1 30953.25 ( 0.00%) 31917.75 ( 3.12%) Mean 10 124432.50 ( 0.00%) 140904.00 ( 13.24%) Mean 19 116033.50 ( 0.00%) 119294.75 ( 2.81%) Mean 28 108365.25 ( 0.00%) 106879.50 ( -1.37%) Mean 37 102984.75 ( 0.00%) 106924.25 ( 3.83%) Mean 46 100783.25 ( 0.00%) 105368.50 ( 4.55%) Stddev 1 1260.38 ( 0.00%) 1109.66 ( 11.96%) Stddev 10 7434.03 ( 0.00%) 5171.91 ( 30.43%) Stddev 19 8453.84 ( 0.00%) 5309.59 ( 37.19%) Stddev 28 4184.55 ( 0.00%) 2906.63 ( 30.54%) Stddev 37 5409.49 ( 0.00%) 3192.12 ( 40.99%) Stddev 46 4521.95 ( 0.00%) 7392.52 (-63.48%) Max 1 32738.00 ( 0.00%) 32719.00 ( -0.06%) Max 10 136039.00 ( 0.00%) 148614.00 ( 9.24%) Max 19 130566.00 ( 0.00%) 127418.00 ( -2.41%) Max 28 115404.00 ( 0.00%) 111254.00 ( -3.60%) Max 37 112118.00 ( 0.00%) 111732.00 ( -0.34%) Max 46 108541.00 ( 0.00%) 116849.00 ( 7.65%) TPut 1 123813.00 ( 0.00%) 127671.00 ( 3.12%) TPut 10 497730.00 ( 0.00%) 563616.00 ( 13.24%) TPut 19 464134.00 ( 0.00%) 477179.00 ( 2.81%) TPut 28 433461.00 ( 0.00%) 427518.00 ( -1.37%) TPut 37 411939.00 ( 0.00%) 427697.00 ( 3.83%) TPut 46 403133.00 ( 0.00%) 421474.00 ( 4.55%) 3.17.0-rc2 3.17.0-rc2 vanillanodeorder-v1r1 DMA allocs 0 0 DMA32 allocs 57 1491992 Normal allocs 32543566 30026383 Movable allocs 0 0 Direct pages scanned 0 0 Kswapd pages scanned 0 0 Kswapd pages reclaimed 0 0 Direct pages reclaimed 0 0 Kswapd efficiency 100% 100% Kswapd velocity 0.000 0.000 Direct efficiency 100% 100% Direct velocity 0.000 0.000 Percentage direct scans 0% 0% Zone normal velocity 0.000 0.000 Zone dma32 velocity 0.000 0.000 Zone dma velocity 0.000 0.000 THP fault alloc 55164 52987 THP collapse alloc 139 147 THP splits 26 21 NUMA alloc hit 4169066 4250692 NUMA alloc miss 0 0 Note that there were more DMA32 allocations with the patch applied. In this particular case there was no difference in numa_hit and numa_miss. The expectation is that DMA32 was being used at the low watermark instead of falling into the slow path. kswapd was not woken but it's not worken for THP allocations. On 32-bit, this patch defaults to zone-ordering as low memory depletion can be a serious problem on 32-bit large memory machines. If the default ordering was node then processes on node 0 will deplete the Normal zone due to normal activity. The problem is worse if CONFIG_HIGHPTE is not set. If combined with large amounts of dirty/writeback pages in Normal zone then there is also a high risk of OOM. The heuristics are removed as it's not clear they were ever important on 32-bit. They were only relevant for setting node-ordering on 64-bit. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2014-10-09 22:25:58 -04:00
..
Kconfig	mm/zpool: update zswap to use zpool	2014-08-06 18:01:23 -07:00
Kconfig.debug	mm: more intensive memory corruption debugging	2012-01-10 16:30:42 -08:00
Makefile	mm: Support compiling out madvise and fadvise	2014-08-17 19:44:24 -05:00
backing-dev.c	mm: clean up zone flags	2014-10-09 22:25:57 -04:00
balloon_compaction.c	mm: print more details for bad_page()	2014-01-23 16:36:50 -08:00
bootmem.c	mm/bootmem.c: remove unused local `map'	2013-11-13 12:09:09 +09:00
cleancache.c	mm: dump page when hitting a VM_BUG_ON using VM_BUG_ON_PAGE	2014-01-23 16:36:50 -08:00
cma.c	mm: cma: adjust address limit to avoid hitting low/high memory boundary	2014-10-09 22:25:53 -04:00
compaction.c	mm/compaction.c: fix warning of 'flags' may be used uninitialized	2014-10-09 22:25:57 -04:00
debug-pagealloc.c	mm, x86: Remove debug_pagealloc_enabled	2011-12-06 09:24:07 +01:00
dmapool.c	Fix unbalanced mutex in dma_pool_create().	2014-09-18 10:39:16 -07:00
early_ioremap.c	mm: create generic early_ioremap() support	2014-04-07 16:36:15 -07:00
fadvise.c	teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long long	2013-03-03 22:46:22 -05:00
failslab.c	switch debugfs to umode_t	2012-01-03 22:54:56 -05:00
filemap.c	NFS client updates for Linux 3.18	2014-10-08 12:49:23 -04:00
filemap_xip.c	seqcount: Add lockdep functionality to seqcount/seqlock structures	2013-11-06 12:40:26 +01:00
fremap.c	mm: mark remap_file_pages() syscall as deprecated	2014-06-06 16:08:17 -07:00
frontswap.c	swap: change swap_list_head to plist, add swap_avail_head	2014-06-04 16:54:07 -07:00
gup.c	kvm: Faults which trigger IO release the mmap_sem	2014-09-24 14:07:54 +02:00
highmem.c	mm/highmem: make kmap cache coloring aware	2014-08-06 18:01:22 -07:00
huge_memory.c	mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA	2014-10-09 22:25:57 -04:00
hugetlb.c	mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA	2014-10-09 22:25:57 -04:00
hugetlb_cgroup.c	hugetlb_cgroup: use lockdep_assert_held rather than spin_is_locked	2014-08-29 16:28:16 -07:00
hwpoison-inject.c	mm/hwpoison-inject.c: remove unnecessary null test before debugfs_remove_recursive	2014-08-06 18:01:19 -07:00
init-mm.c	atomic: use <linux/atomic.h>	2011-07-26 16:49:47 -07:00
internal.h	mm, compaction: pass gfp mask to compact_control	2014-10-09 22:25:55 -04:00
interval_tree.c	mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA	2014-10-09 22:25:57 -04:00
iov_iter.c	fuse: honour max_read and max_write in direct_io mode	2014-09-26 21:16:51 -04:00
kmemcheck.c	mm/slab_common: move kmem_cache definition to internal header	2014-10-09 22:25:50 -04:00
kmemleak-test.c	mm/kmemleak-test.c: use pr_fmt for logging	2014-06-06 16:08:18 -07:00
kmemleak.c	mm: introduce kmemleak_update_trace()	2014-06-06 16:08:17 -07:00
ksm.c	sched: Remove proliferation of wait_on_bit() action functions	2014-07-16 15:10:39 +02:00
list_lru.c	mm: keep page cache radix tree nodes in check	2014-04-03 16:21:01 -07:00
maccess.c	mm: Map most files to use export.h instead of module.h	2011-10-31 09:20:12 -04:00
madvise.c	mm: update the description for madvise_remove	2014-08-06 18:01:18 -07:00
memblock.c	mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()	2014-09-10 15:42:12 -07:00
memcontrol.c	mm: memcontrol: do not iterate uninitialized memcgs	2014-10-02 16:28:44 -07:00
memory-failure.c	hwpoison: fix race with changing page during offlining	2014-08-06 18:01:19 -07:00
memory.c	mm: softdirty: keep bit when zapping file pte	2014-09-26 08:10:35 -07:00
memory_hotplug.c	memory-hotplug: add sysfs valid_zones attribute	2014-10-09 22:25:52 -04:00
mempolicy.c	mempolicy: unexport get_vma_policy() and remove its "task" arg	2014-10-09 22:25:56 -04:00
mempool.c	mm/mempool.c: update the kmemleak stack trace for mempool allocations	2014-06-06 16:08:17 -07:00
migrate.c	mm: migrate: Close race between migration completion and mprotect	2014-10-02 11:57:18 -07:00
mincore.c	mm + fs: prepare for non-page entries in page cache radix trees	2014-04-03 16:21:00 -07:00
mlock.c	mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA	2014-10-09 22:25:57 -04:00
mm_init.c	mm: bring back /sys/kernel/mm	2014-01-27 21:02:39 -08:00
mmap.c	mm/mmap.c: clean up CONFIG_DEBUG_VM_RB checks	2014-10-09 22:25:57 -04:00
mmu_context.c	sched/mm: call finish_arch_post_lock_switch in idle_task_exit and use_mm	2014-02-21 08:50:17 +01:00
mmu_notifier.c	kvm: Fix page ageing bugs	2014-09-24 14:07:58 +02:00
mmzone.c	mm: numa: Change page last {nid,pid} into {cpu,pid}	2013-10-09 14:47:45 +02:00
mprotect.c	mm: move mmu notifier call from change_protection to change_pmd_range	2014-04-07 16:35:50 -07:00
mremap.c	mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA	2014-10-09 22:25:57 -04:00
msync.c	msync: fix incorrect fstart calculation	2014-07-03 09:21:53 -07:00
nobootmem.c	mem-hotplug: let memblock skip the hotpluggable memory regions in __next_mem_range()	2014-09-10 15:42:12 -07:00
nommu.c	arm64,ia64,ppc,s390,sh,tile,um,x86,mm: remove default gate area	2014-08-08 15:57:27 -07:00
oom_kill.c	mm: clean up zone flags	2014-10-09 22:25:57 -04:00
page-writeback.c	mm/page-writeback.c: use min3/max3 macros to avoid shadow warnings	2014-10-09 22:25:57 -04:00
page_alloc.c	mm: page_alloc: default node-ordering on 64-bit NUMA, zone-ordering on 32-bit	2014-10-09 22:25:58 -04:00
page_cgroup.c	mm/page_cgroup.c: mark functions as static	2014-04-03 16:21:02 -07:00
page_io.c	fix __swap_writepage() compile failure on old gcc versions	2014-06-14 19:30:48 -05:00
page_isolation.c	mm: memory-hotplug: enable memory hotplug to handle hugepage	2013-09-11 15:57:48 -07:00
pagewalk.c	mm/pagewalk.c: fix walk_page_range() access of wrong PTEs	2013-10-30 14:27:03 -07:00
percpu-km.c	percpu: clear memory allocated with the km allocator	2010-10-02 10:28:42 +03:00
percpu-vm.c	percpu: perform tlb flush after pcpu_map_pages() failure	2014-08-15 16:06:10 -04:00
percpu.c	percpu: free percpu allocation info for uniprocessor system	2014-08-16 08:59:02 -04:00
pgtable-generic.c	mm: actually clear pmd_numa before invalidating	2014-08-29 16:28:15 -07:00
process_vm_access.c	start adding the tag to iov_iter	2014-05-06 17:32:49 -04:00
quicklist.c	mm: delete various needless include <linux/module.h>	2011-10-31 09:20:11 -04:00
readahead.c	mm/readahead.c: remove unused file_ra_state from count_history_pages	2014-08-06 18:01:15 -07:00
rmap.c	mm: convert a few VM_BUG_ON callers to VM_BUG_ON_VMA	2014-10-09 22:25:57 -04:00
shmem.c	include/linux/migrate.h: remove migrate_page #define	2014-10-09 22:25:56 -04:00
slab.c	mm/slab.c: use __seq_open_private() instead of seq_open()	2014-10-09 22:25:57 -04:00
slab.h	mm/slab: use percpu allocator for cpu cache	2014-10-09 22:25:51 -04:00
slab_common.c	mm/slab_common: commonize slab merge logic	2014-10-09 22:25:51 -04:00
slob.c	mm/sl[ao]b: always track caller in kmalloc_(node_)track_caller()	2014-10-09 22:25:50 -04:00
slub.c	mm/slab_common: commonize slab merge logic	2014-10-09 22:25:51 -04:00
sparse-vmemmap.c	mm/sparse: use memblock apis for early memory allocations	2014-01-21 16:19:47 -08:00
sparse.c	mm: use macros from compiler.h instead of __attribute__((...))	2014-04-07 16:35:54 -07:00
swap.c	mm: memcontrol: use page lists for uncharge batching	2014-08-08 15:57:18 -07:00
swap_state.c	include/linux/migrate.h: remove migrate_page #define	2014-10-09 22:25:56 -04:00
swapfile.c	mm: memcontrol: rewrite uncharge API	2014-08-08 15:57:17 -07:00
truncate.c	mm: memcontrol: rewrite uncharge API	2014-08-08 15:57:17 -07:00
util.c	proc/maps: make vm_is_stack() logic namespace-friendly	2014-10-09 22:25:50 -04:00
vmacache.c	mm,vmacache: optimize overflow system-wide flushing	2014-06-04 16:53:57 -07:00
vmalloc.c	mm/vmalloc.c: use seq_open_private() instead of seq_open()	2014-10-09 22:25:56 -04:00
vmpressure.c	arm, pm, vmpressure: add missing slab.h includes	2014-02-03 13:24:01 -05:00
vmscan.c	mm: clean up zone flags	2014-10-09 22:25:57 -04:00
vmstat.c	mm: vmscan: only update per-cpu thresholds for online CPU	2014-08-06 18:01:20 -07:00
workingset.c	mm: keep page cache radix tree nodes in check	2014-04-03 16:21:01 -07:00
zbud.c	mm/zpool: use prefixed module loading	2014-08-29 16:28:16 -07:00
zpool.c	mm/zpool: use prefixed module loading	2014-08-29 16:28:16 -07:00
zsmalloc.c	mm/zpool: use prefixed module loading	2014-08-29 16:28:16 -07:00
zswap.c	mm/zswap.c: add __init to zswap_entry_cache_destroy()	2014-08-08 15:57:18 -07:00