2019-05-19 15:07:45 +03:00
|
|
|
# SPDX-License-Identifier: GPL-2.0-only
|
mm/page_ext: resurrect struct page extending code for debugging
When we debug something, we'd like to insert some information to every
page. For this purpose, we sometimes modify struct page itself. But,
this has drawbacks. First, it requires re-compile. This makes us
hesitate to use the powerful debug feature so development process is
slowed down. And, second, sometimes it is impossible to rebuild the
kernel due to third party module dependency. At third, system behaviour
would be largely different after re-compile, because it changes size of
struct page greatly and this structure is accessed by every part of
kernel. Keeping this as it is would be better to reproduce errornous
situation.
This feature is intended to overcome above mentioned problems. This
feature allocates memory for extended data per page in certain place
rather than the struct page itself. This memory can be accessed by the
accessor functions provided by this code. During the boot process, it
checks whether allocation of huge chunk of memory is needed or not. If
not, it avoids allocating memory at all. With this advantage, we can
include this feature into the kernel in default and can avoid rebuild and
solve related problems.
Until now, memcg uses this technique. But, now, memcg decides to embed
their variable to struct page itself and it's code to extend struct page
has been removed. I'd like to use this code to develop debug feature, so
this patch resurrect it.
To help these things to work well, this patch introduces two callbacks for
clients. One is the need callback which is mandatory if user wants to
avoid useless memory allocation at boot-time. The other is optional, init
callback, which is used to do proper initialization after memory is
allocated. Detailed explanation about purpose of these functions is in
code comment. Please refer it.
Others are completely same with previous extension code in memcg.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 03:55:46 +03:00
|
|
|
config PAGE_EXTENSION
|
|
|
|
bool "Extend memmap on extra space for more information on page"
|
2020-06-13 19:50:22 +03:00
|
|
|
help
|
mm/page_ext: resurrect struct page extending code for debugging
When we debug something, we'd like to insert some information to every
page. For this purpose, we sometimes modify struct page itself. But,
this has drawbacks. First, it requires re-compile. This makes us
hesitate to use the powerful debug feature so development process is
slowed down. And, second, sometimes it is impossible to rebuild the
kernel due to third party module dependency. At third, system behaviour
would be largely different after re-compile, because it changes size of
struct page greatly and this structure is accessed by every part of
kernel. Keeping this as it is would be better to reproduce errornous
situation.
This feature is intended to overcome above mentioned problems. This
feature allocates memory for extended data per page in certain place
rather than the struct page itself. This memory can be accessed by the
accessor functions provided by this code. During the boot process, it
checks whether allocation of huge chunk of memory is needed or not. If
not, it avoids allocating memory at all. With this advantage, we can
include this feature into the kernel in default and can avoid rebuild and
solve related problems.
Until now, memcg uses this technique. But, now, memcg decides to embed
their variable to struct page itself and it's code to extend struct page
has been removed. I'd like to use this code to develop debug feature, so
this patch resurrect it.
To help these things to work well, this patch introduces two callbacks for
clients. One is the need callback which is mandatory if user wants to
avoid useless memory allocation at boot-time. The other is optional, init
callback, which is used to do proper initialization after memory is
allocated. Detailed explanation about purpose of these functions is in
code comment. Please refer it.
Others are completely same with previous extension code in memcg.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 03:55:46 +03:00
|
|
|
Extend memmap on extra space for more information on page. This
|
|
|
|
could be used for debugging features that need to insert extra
|
|
|
|
field for every page. This extension enables us to save memory
|
|
|
|
by not allocating this extra memory according to boottime
|
|
|
|
configuration.
|
|
|
|
|
2009-04-03 03:56:30 +04:00
|
|
|
config DEBUG_PAGEALLOC
|
|
|
|
bool "Debug page memory allocations"
|
2011-03-23 02:32:46 +03:00
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on !HIBERNATION || ARCH_SUPPORTS_DEBUG_PAGEALLOC && !PPC && !SPARC
|
|
|
|
select PAGE_POISONING if !ARCH_SUPPORTS_DEBUG_PAGEALLOC
|
2020-06-13 19:50:22 +03:00
|
|
|
help
|
2009-04-03 03:56:30 +04:00
|
|
|
Unmap pages from the kernel linear mapping after free_pages().
|
2016-03-16 00:55:30 +03:00
|
|
|
Depending on runtime enablement, this results in a small or large
|
|
|
|
slowdown, but helps to find certain types of memory corruption.
|
2009-04-03 03:56:30 +04:00
|
|
|
|
mm, page_alloc: more extensive free page checking with debug_pagealloc
The page allocator checks struct pages for expected state (mapcount,
flags etc) as pages are being allocated (check_new_page()) and freed
(free_pages_check()) to provide some defense against errors in page
allocator users.
Prior commits 479f854a207c ("mm, page_alloc: defer debugging checks of
pages allocated from the PCP") and 4db7548ccbd9 ("mm, page_alloc: defer
debugging checks of freed pages until a PCP drain") this has happened
for order-0 pages as they were allocated from or freed to the per-cpu
caches (pcplists). Since those are fast paths, the checks are now
performed only when pages are moved between pcplists and global free
lists. This however lowers the chances of catching errors soon enough.
In order to increase the chances of the checks to catch errors, the
kernel has to be rebuilt with CONFIG_DEBUG_VM, which also enables
multiple other internal debug checks (VM_BUG_ON() etc), which is
suboptimal when the goal is to catch errors in mm users, not in mm code
itself.
To catch some wrong users of the page allocator we have
CONFIG_DEBUG_PAGEALLOC, which is designed to have virtually no overhead
unless enabled at boot time. Memory corruptions when writing to freed
pages have often the same underlying errors (use-after-free, double free)
as corrupting the corresponding struct pages, so this existing debugging
functionality is a good fit to extend by also perform struct page checks
at least as often as if CONFIG_DEBUG_VM was enabled.
Specifically, after this patch, when debug_pagealloc is enabled on boot,
and CONFIG_DEBUG_VM disabled, pages are checked when allocated from or
freed to the pcplists *in addition* to being moved between pcplists and
free lists. When both debug_pagealloc and CONFIG_DEBUG_VM are enabled,
pages are checked when being moved between pcplists and free lists *in
addition* to when allocated from or freed to the pcplists.
When debug_pagealloc is not enabled on boot, the overhead in fast paths
should be virtually none thanks to the use of static key.
Link: http://lkml.kernel.org/r/20190603143451.27353-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12 06:55:09 +03:00
|
|
|
Also, the state of page tracking structures is checked more often as
|
|
|
|
pages are being allocated and freed, as unexpected state changes
|
|
|
|
often happen for same reasons as memory corruption (e.g. double free,
|
mm, page_owner, debug_pagealloc: save and dump freeing stack trace
The debug_pagealloc functionality is useful to catch buggy page allocator
users that cause e.g. use after free or double free. When page
inconsistency is detected, debugging is often simpler by knowing the call
stack of process that last allocated and freed the page. When page_owner
is also enabled, we record the allocation stack trace, but not freeing.
This patch therefore adds recording of freeing process stack trace to page
owner info, if both page_owner and debug_pagealloc are configured and
enabled. With only page_owner enabled, this info is not useful for the
memory leak debugging use case. dump_page() is adjusted to print the
info. An example result of calling __free_pages() twice may look like
this (note the page last free stack trace):
BUG: Bad page state in process bash pfn:13d8f8
page:ffffc31984f63e00 refcount:-1 mapcount:0 mapping:0000000000000000 index:0x0
flags: 0x1affff800000000()
raw: 01affff800000000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000000 ffffffffffffffff 0000000000000000
page dumped because: nonzero _refcount
page_owner tracks the page as freed
page last allocated via order 0, migratetype Unmovable, gfp_mask 0xcc0(GFP_KERNEL)
prep_new_page+0x143/0x150
get_page_from_freelist+0x289/0x380
__alloc_pages_nodemask+0x13c/0x2d0
khugepaged+0x6e/0xc10
kthread+0xf9/0x130
ret_from_fork+0x3a/0x50
page last free stack trace:
free_pcp_prepare+0x134/0x1e0
free_unref_page+0x18/0x90
khugepaged+0x7b/0xc10
kthread+0xf9/0x130
ret_from_fork+0x3a/0x50
Modules linked in:
CPU: 3 PID: 271 Comm: bash Not tainted 5.3.0-rc4-2.g07a1a73-default+ #57
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58-prebuilt.qemu.org 04/01/2014
Call Trace:
dump_stack+0x85/0xc0
bad_page.cold+0xba/0xbf
rmqueue_pcplist.isra.0+0x6c5/0x6d0
rmqueue+0x2d/0x810
get_page_from_freelist+0x191/0x380
__alloc_pages_nodemask+0x13c/0x2d0
__get_free_pages+0xd/0x30
__pud_alloc+0x2c/0x110
copy_page_range+0x4f9/0x630
dup_mmap+0x362/0x480
dup_mm+0x68/0x110
copy_process+0x19e1/0x1b40
_do_fork+0x73/0x310
__x64_sys_clone+0x75/0x80
do_syscall_64+0x6e/0x1e0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f10af854a10
...
Link: http://lkml.kernel.org/r/20190820131828.22684-5-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-24 01:34:42 +03:00
|
|
|
use-after-free). The error reports for these checks can be augmented
|
|
|
|
with stack traces of last allocation and freeing of the page, when
|
|
|
|
PAGE_OWNER is also selected and enabled on boot.
|
mm, page_alloc: more extensive free page checking with debug_pagealloc
The page allocator checks struct pages for expected state (mapcount,
flags etc) as pages are being allocated (check_new_page()) and freed
(free_pages_check()) to provide some defense against errors in page
allocator users.
Prior commits 479f854a207c ("mm, page_alloc: defer debugging checks of
pages allocated from the PCP") and 4db7548ccbd9 ("mm, page_alloc: defer
debugging checks of freed pages until a PCP drain") this has happened
for order-0 pages as they were allocated from or freed to the per-cpu
caches (pcplists). Since those are fast paths, the checks are now
performed only when pages are moved between pcplists and global free
lists. This however lowers the chances of catching errors soon enough.
In order to increase the chances of the checks to catch errors, the
kernel has to be rebuilt with CONFIG_DEBUG_VM, which also enables
multiple other internal debug checks (VM_BUG_ON() etc), which is
suboptimal when the goal is to catch errors in mm users, not in mm code
itself.
To catch some wrong users of the page allocator we have
CONFIG_DEBUG_PAGEALLOC, which is designed to have virtually no overhead
unless enabled at boot time. Memory corruptions when writing to freed
pages have often the same underlying errors (use-after-free, double free)
as corrupting the corresponding struct pages, so this existing debugging
functionality is a good fit to extend by also perform struct page checks
at least as often as if CONFIG_DEBUG_VM was enabled.
Specifically, after this patch, when debug_pagealloc is enabled on boot,
and CONFIG_DEBUG_VM disabled, pages are checked when allocated from or
freed to the pcplists *in addition* to being moved between pcplists and
free lists. When both debug_pagealloc and CONFIG_DEBUG_VM are enabled,
pages are checked when being moved between pcplists and free lists *in
addition* to when allocated from or freed to the pcplists.
When debug_pagealloc is not enabled on boot, the overhead in fast paths
should be virtually none thanks to the use of static key.
Link: http://lkml.kernel.org/r/20190603143451.27353-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12 06:55:09 +03:00
|
|
|
|
2011-03-23 02:32:46 +03:00
|
|
|
For architectures which don't enable ARCH_SUPPORTS_DEBUG_PAGEALLOC,
|
|
|
|
fill the pages with poison patterns after free_pages() and verify
|
mm, page_alloc: more extensive free page checking with debug_pagealloc
The page allocator checks struct pages for expected state (mapcount,
flags etc) as pages are being allocated (check_new_page()) and freed
(free_pages_check()) to provide some defense against errors in page
allocator users.
Prior commits 479f854a207c ("mm, page_alloc: defer debugging checks of
pages allocated from the PCP") and 4db7548ccbd9 ("mm, page_alloc: defer
debugging checks of freed pages until a PCP drain") this has happened
for order-0 pages as they were allocated from or freed to the per-cpu
caches (pcplists). Since those are fast paths, the checks are now
performed only when pages are moved between pcplists and global free
lists. This however lowers the chances of catching errors soon enough.
In order to increase the chances of the checks to catch errors, the
kernel has to be rebuilt with CONFIG_DEBUG_VM, which also enables
multiple other internal debug checks (VM_BUG_ON() etc), which is
suboptimal when the goal is to catch errors in mm users, not in mm code
itself.
To catch some wrong users of the page allocator we have
CONFIG_DEBUG_PAGEALLOC, which is designed to have virtually no overhead
unless enabled at boot time. Memory corruptions when writing to freed
pages have often the same underlying errors (use-after-free, double free)
as corrupting the corresponding struct pages, so this existing debugging
functionality is a good fit to extend by also perform struct page checks
at least as often as if CONFIG_DEBUG_VM was enabled.
Specifically, after this patch, when debug_pagealloc is enabled on boot,
and CONFIG_DEBUG_VM disabled, pages are checked when allocated from or
freed to the pcplists *in addition* to being moved between pcplists and
free lists. When both debug_pagealloc and CONFIG_DEBUG_VM are enabled,
pages are checked when being moved between pcplists and free lists *in
addition* to when allocated from or freed to the pcplists.
When debug_pagealloc is not enabled on boot, the overhead in fast paths
should be virtually none thanks to the use of static key.
Link: http://lkml.kernel.org/r/20190603143451.27353-3-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-12 06:55:09 +03:00
|
|
|
the patterns before alloc_pages(). Additionally, this option cannot
|
|
|
|
be enabled in combination with hibernation as that would result in
|
|
|
|
incorrect warnings of memory corruption after a resume because free
|
|
|
|
pages are not saved to the suspend image.
|
2011-03-23 02:32:46 +03:00
|
|
|
|
2016-03-16 00:55:30 +03:00
|
|
|
By default this option will have a small overhead, e.g. by not
|
|
|
|
allowing the kernel mapping to be backed by large pages on some
|
|
|
|
architectures. Even bigger overhead comes when the debugging is
|
|
|
|
enabled by DEBUG_PAGEALLOC_ENABLE_DEFAULT or the debug_pagealloc
|
|
|
|
command line parameter.
|
|
|
|
|
|
|
|
config DEBUG_PAGEALLOC_ENABLE_DEFAULT
|
|
|
|
bool "Enable debug page memory allocations by default?"
|
|
|
|
depends on DEBUG_PAGEALLOC
|
2020-06-13 19:50:22 +03:00
|
|
|
help
|
2016-03-16 00:55:30 +03:00
|
|
|
Enable debug page memory allocations by default? This value
|
|
|
|
can be overridden by debug_pagealloc=off|on.
|
|
|
|
|
2019-03-06 02:46:19 +03:00
|
|
|
config PAGE_OWNER
|
|
|
|
bool "Track page owner"
|
|
|
|
depends on DEBUG_KERNEL && STACKTRACE_SUPPORT
|
|
|
|
select DEBUG_FS
|
|
|
|
select STACKTRACE
|
|
|
|
select STACKDEPOT
|
|
|
|
select PAGE_EXTENSION
|
|
|
|
help
|
|
|
|
This keeps track of what call chain is the owner of a page, may
|
|
|
|
help to find bare alloc_page(s) leaks. Even if you include this
|
|
|
|
feature on your build, it is disabled in default. You should pass
|
|
|
|
"page_owner=on" to boot parameter in order to enable it. Eats
|
|
|
|
a fair amount of memory if enabled. See tools/vm/page_owner_sort.c
|
|
|
|
for user-space helper.
|
|
|
|
|
|
|
|
If unsure, say N.
|
|
|
|
|
2009-04-01 02:23:17 +04:00
|
|
|
config PAGE_POISONING
|
2016-03-16 00:56:27 +03:00
|
|
|
bool "Poison pages after freeing"
|
2020-06-13 19:50:22 +03:00
|
|
|
help
|
2016-03-16 00:56:27 +03:00
|
|
|
Fill the pages with poison patterns after free_pages() and verify
|
|
|
|
the patterns before alloc_pages. The filling of the memory helps
|
|
|
|
reduce the risk of information leaks from freed data. This does
|
2018-08-22 07:53:10 +03:00
|
|
|
have a potential performance impact if enabled with the
|
|
|
|
"page_poison=1" kernel boot option.
|
2016-03-16 00:56:27 +03:00
|
|
|
|
|
|
|
Note that "poison" here is not the same thing as the "HWPoison"
|
|
|
|
for CONFIG_MEMORY_FAILURE. This is software poisoning only.
|
|
|
|
|
2020-12-15 06:13:41 +03:00
|
|
|
If you are only interested in sanitization of freed pages without
|
|
|
|
checking the poison pattern on alloc, you can boot the kernel with
|
|
|
|
"init_on_free=1" instead of enabling this.
|
2016-03-16 00:56:27 +03:00
|
|
|
|
2020-12-15 06:13:41 +03:00
|
|
|
If unsure, say N
|
2016-03-16 00:56:30 +03:00
|
|
|
|
mm/page_ref: add tracepoint to track down page reference manipulation
CMA allocation should be guaranteed to succeed by definition, but,
unfortunately, it would be failed sometimes. It is hard to track down
the problem, because it is related to page reference manipulation and we
don't have any facility to analyze it.
This patch adds tracepoints to track down page reference manipulation.
With it, we can find exact reason of failure and can fix the problem.
Following is an example of tracepoint output. (note: this example is
stale version that printing flags as the number. Recent version will
print it as human readable string.)
<...>-9018 [004] 92.678375: page_ref_set: pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
<...>-9018 [004] 92.678378: kernel_stack:
=> get_page_from_freelist (ffffffff81176659)
=> __alloc_pages_nodemask (ffffffff81176d22)
=> alloc_pages_vma (ffffffff811bf675)
=> handle_mm_fault (ffffffff8119e693)
=> __do_page_fault (ffffffff810631ea)
=> trace_do_page_fault (ffffffff81063543)
=> do_async_page_fault (ffffffff8105c40a)
=> async_page_fault (ffffffff817581d8)
[snip]
<...>-9018 [004] 92.678379: page_ref_mod: pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
[snip]
...
...
<...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail
[snip]
<...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1
=> release_pages (ffffffff8117c9e4)
=> free_pages_and_swap_cache (ffffffff811b0697)
=> tlb_flush_mmu_free (ffffffff81199616)
=> tlb_finish_mmu (ffffffff8119a62c)
=> exit_mmap (ffffffff811a53f7)
=> mmput (ffffffff81073f47)
=> do_exit (ffffffff810794e9)
=> do_group_exit (ffffffff81079def)
=> SyS_exit_group (ffffffff81079e74)
=> entry_SYSCALL_64_fastpath (ffffffff817560b6)
This output shows that problem comes from exit path. In exit path, to
improve performance, pages are not freed immediately. They are gathered
and processed by batch. During this process, migration cannot be
possible and CMA allocation is failed. This problem is hard to find
without this page reference tracepoint facility.
Enabling this feature bloat kernel text 30 KB in my configuration.
text data bss dec hex filename
12127327 2243616 1507328 15878271 f2487f vmlinux_disabled
12157208 2258880 1507328 15923416 f2f8d8 vmlinux_enabled
Note that, due to header file dependency problem between mm.h and
tracepoint.h, this feature has to open code the static key functions for
tracepoints. Proposed by Steven Rostedt in following link.
https://lkml.org/lkml/2015/12/9/699
[arnd@arndb.de: crypto/async_pq: use __free_page() instead of put_page()]
[iamjoonsoo.kim@lge.com: fix build failure for xtensa]
[akpm@linux-foundation.org: tweak Kconfig text, per Vlastimil]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:19:29 +03:00
|
|
|
config DEBUG_PAGE_REF
|
|
|
|
bool "Enable tracepoint to track down page reference manipulation"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on TRACEPOINTS
|
2020-06-13 19:50:22 +03:00
|
|
|
help
|
mm/page_ref: add tracepoint to track down page reference manipulation
CMA allocation should be guaranteed to succeed by definition, but,
unfortunately, it would be failed sometimes. It is hard to track down
the problem, because it is related to page reference manipulation and we
don't have any facility to analyze it.
This patch adds tracepoints to track down page reference manipulation.
With it, we can find exact reason of failure and can fix the problem.
Following is an example of tracepoint output. (note: this example is
stale version that printing flags as the number. Recent version will
print it as human readable string.)
<...>-9018 [004] 92.678375: page_ref_set: pfn=0x17ac9 flags=0x0 count=1 mapcount=0 mapping=(nil) mt=4 val=1
<...>-9018 [004] 92.678378: kernel_stack:
=> get_page_from_freelist (ffffffff81176659)
=> __alloc_pages_nodemask (ffffffff81176d22)
=> alloc_pages_vma (ffffffff811bf675)
=> handle_mm_fault (ffffffff8119e693)
=> __do_page_fault (ffffffff810631ea)
=> trace_do_page_fault (ffffffff81063543)
=> do_async_page_fault (ffffffff8105c40a)
=> async_page_fault (ffffffff817581d8)
[snip]
<...>-9018 [004] 92.678379: page_ref_mod: pfn=0x17ac9 flags=0x40048 count=2 mapcount=1 mapping=0xffff880015a78dc1 mt=4 val=1
[snip]
...
...
<...>-9131 [001] 93.174468: test_pages_isolated: start_pfn=0x17800 end_pfn=0x17c00 fin_pfn=0x17ac9 ret=fail
[snip]
<...>-9018 [004] 93.174843: page_ref_mod_and_test: pfn=0x17ac9 flags=0x40068 count=0 mapcount=0 mapping=0xffff880015a78dc1 mt=4 val=-1 ret=1
=> release_pages (ffffffff8117c9e4)
=> free_pages_and_swap_cache (ffffffff811b0697)
=> tlb_flush_mmu_free (ffffffff81199616)
=> tlb_finish_mmu (ffffffff8119a62c)
=> exit_mmap (ffffffff811a53f7)
=> mmput (ffffffff81073f47)
=> do_exit (ffffffff810794e9)
=> do_group_exit (ffffffff81079def)
=> SyS_exit_group (ffffffff81079e74)
=> entry_SYSCALL_64_fastpath (ffffffff817560b6)
This output shows that problem comes from exit path. In exit path, to
improve performance, pages are not freed immediately. They are gathered
and processed by batch. During this process, migration cannot be
possible and CMA allocation is failed. This problem is hard to find
without this page reference tracepoint facility.
Enabling this feature bloat kernel text 30 KB in my configuration.
text data bss dec hex filename
12127327 2243616 1507328 15878271 f2487f vmlinux_disabled
12157208 2258880 1507328 15923416 f2f8d8 vmlinux_enabled
Note that, due to header file dependency problem between mm.h and
tracepoint.h, this feature has to open code the static key functions for
tracepoints. Proposed by Steven Rostedt in following link.
https://lkml.org/lkml/2015/12/9/699
[arnd@arndb.de: crypto/async_pq: use __free_page() instead of put_page()]
[iamjoonsoo.kim@lge.com: fix build failure for xtensa]
[akpm@linux-foundation.org: tweak Kconfig text, per Vlastimil]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-18 00:19:29 +03:00
|
|
|
This is a feature to add tracepoint for tracking down page reference
|
|
|
|
manipulation. This tracking is useful to diagnose functional failure
|
|
|
|
due to migration failures caused by page reference mismatches. Be
|
|
|
|
careful when enabling this feature because it adds about 30 KB to the
|
|
|
|
kernel code. However the runtime performance overhead is virtually
|
|
|
|
nil until the tracepoints are actually enabled.
|
2017-02-28 01:30:22 +03:00
|
|
|
|
|
|
|
config DEBUG_RODATA_TEST
|
|
|
|
bool "Testcase for the marking rodata read-only"
|
|
|
|
depends on STRICT_KERNEL_RWX
|
2020-06-13 19:50:22 +03:00
|
|
|
help
|
2017-02-28 01:30:22 +03:00
|
|
|
This option enables a testcase for the setting rodata read-only.
|
2020-02-04 04:36:20 +03:00
|
|
|
|
2020-06-04 02:03:52 +03:00
|
|
|
config ARCH_HAS_DEBUG_WX
|
|
|
|
bool
|
|
|
|
|
|
|
|
config DEBUG_WX
|
|
|
|
bool "Warn on W+X mappings at boot"
|
|
|
|
depends on ARCH_HAS_DEBUG_WX
|
|
|
|
depends on MMU
|
|
|
|
select PTDUMP_CORE
|
|
|
|
help
|
|
|
|
Generate a warning if any W+X mappings are found at boot.
|
|
|
|
|
|
|
|
This is useful for discovering cases where the kernel is leaving W+X
|
|
|
|
mappings after applying NX, as such mappings are a security risk.
|
|
|
|
|
|
|
|
Look for a message in dmesg output like this:
|
|
|
|
|
|
|
|
<arch>/mm: Checked W+X mappings: passed, no W+X pages found.
|
|
|
|
|
|
|
|
or like this, if the check failed:
|
|
|
|
|
|
|
|
<arch>/mm: Checked W+X mappings: failed, <N> W+X pages found.
|
|
|
|
|
|
|
|
Note that even if the check fails, your kernel is possibly
|
|
|
|
still fine, as W+X mappings are not a security hole in
|
|
|
|
themselves, what they do is that they make the exploitation
|
|
|
|
of other unfixed kernel bugs easier.
|
|
|
|
|
|
|
|
There is no runtime or memory usage effect of this option
|
|
|
|
once the kernel has booted up - it's a one time check.
|
|
|
|
|
|
|
|
If in doubt, say "Y".
|
|
|
|
|
2020-02-04 04:36:20 +03:00
|
|
|
config GENERIC_PTDUMP
|
|
|
|
bool
|
|
|
|
|
|
|
|
config PTDUMP_CORE
|
|
|
|
bool
|
|
|
|
|
|
|
|
config PTDUMP_DEBUGFS
|
|
|
|
bool "Export kernel pagetable layout to userspace via debugfs"
|
|
|
|
depends on DEBUG_KERNEL
|
|
|
|
depends on DEBUG_FS
|
|
|
|
depends on GENERIC_PTDUMP
|
|
|
|
select PTDUMP_CORE
|
|
|
|
help
|
|
|
|
Say Y here if you want to show the kernel pagetable layout in a
|
|
|
|
debugfs file. This information is only useful for kernel developers
|
|
|
|
who are working in architecture specific areas of the kernel.
|
|
|
|
It is probably not a good idea to enable this feature in a production
|
|
|
|
kernel.
|
|
|
|
|
|
|
|
If in doubt, say N.
|