License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 17:07:57 +03:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2005-04-17 02:20:36 +04:00
|
|
|
/*
|
|
|
|
* NUMA memory policies for Linux.
|
|
|
|
* Copyright 2003,2004 Andi Kleen SuSE Labs
|
|
|
|
*/
|
2012-10-13 13:46:48 +04:00
|
|
|
#ifndef _LINUX_MEMPOLICY_H
|
|
|
|
#define _LINUX_MEMPOLICY_H 1
|
2005-04-17 02:20:36 +04:00
|
|
|
|
2020-08-12 04:30:32 +03:00
|
|
|
#include <linux/sched.h>
|
2005-04-17 02:20:36 +04:00
|
|
|
#include <linux/mmzone.h>
|
|
|
|
#include <linux/slab.h>
|
|
|
|
#include <linux/rbtree.h>
|
|
|
|
#include <linux/spinlock.h>
|
2005-10-30 04:15:48 +03:00
|
|
|
#include <linux/nodemask.h>
|
2008-07-24 08:28:22 +04:00
|
|
|
#include <linux/pagemap.h>
|
2012-10-13 13:46:48 +04:00
|
|
|
#include <uapi/linux/mempolicy.h>
|
2005-04-17 02:20:36 +04:00
|
|
|
|
2006-06-08 11:43:41 +04:00
|
|
|
struct mm_struct;
|
2005-04-17 02:20:36 +04:00
|
|
|
|
|
|
|
#ifdef CONFIG_NUMA
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Describe a memory policy.
|
|
|
|
*
|
|
|
|
* A mempolicy can be either associated with a process or with a VMA.
|
|
|
|
* For VMA related allocations the VMA policy is preferred, otherwise
|
|
|
|
* the process policy is used. Interrupts ignore the memory policy
|
|
|
|
* of the current process.
|
|
|
|
*
|
2020-08-12 04:31:19 +03:00
|
|
|
* Locking policy for interleave:
|
2005-04-17 02:20:36 +04:00
|
|
|
* In process context there is no locking because only the process accesses
|
|
|
|
* its own state. All vma manipulation is somewhat protected by a down_read on
|
2020-06-09 07:33:54 +03:00
|
|
|
* mmap_lock.
|
2005-04-17 02:20:36 +04:00
|
|
|
*
|
|
|
|
* Freeing policy:
|
2008-04-28 13:12:18 +04:00
|
|
|
* Mempolicy objects are reference counted. A mempolicy will be freed when
|
2008-04-28 13:13:08 +04:00
|
|
|
* mpol_put() decrements the reference count to zero.
|
2005-04-17 02:20:36 +04:00
|
|
|
*
|
2008-04-28 13:13:09 +04:00
|
|
|
* Duplicating policy objects:
|
|
|
|
* mpol_dup() allocates a new mempolicy and copies the specified mempolicy
|
2008-04-28 13:12:18 +04:00
|
|
|
* to the new storage. The reference count of the new object is initialized
|
2008-04-28 13:13:09 +04:00
|
|
|
* to 1, representing the caller of mpol_dup().
|
2005-04-17 02:20:36 +04:00
|
|
|
*/
|
|
|
|
struct mempolicy {
|
|
|
|
atomic_t refcnt;
|
2008-04-28 13:13:12 +04:00
|
|
|
unsigned short mode; /* See MPOL_* above */
|
2008-04-28 13:12:25 +04:00
|
|
|
unsigned short flags; /* See set_mempolicy() MPOL_F_* above */
|
2021-07-01 04:51:10 +03:00
|
|
|
nodemask_t nodes; /* interleave/bind/perfer */
|
mm/mempolicy: add set_mempolicy_home_node syscall
This syscall can be used to set a home node for the MPOL_BIND and
MPOL_PREFERRED_MANY memory policy. Users should use this syscall after
setting up a memory policy for the specified range as shown below.
mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp,
new_nodes->size + 1, 0);
sys_set_mempolicy_home_node((unsigned long)p, nr_pages * page_size,
home_node, 0);
The syscall allows specifying a home node/preferred node from which
kernel will fulfill memory allocation requests first.
For address range with MPOL_BIND memory policy, if nodemask specifies
more than one node, page allocations will come from the node in the
nodemask with sufficient free memory that is closest to the home
node/preferred node.
For MPOL_PREFERRED_MANY if the nodemask specifies more than one node,
page allocation will come from the node in the nodemask with sufficient
free memory that is closest to the home node/preferred node. If there
is not enough memory in all the nodes specified in the nodemask, the
allocation will be attempted from the closest numa node to the home node
in the system.
This helps applications to hint at a memory allocation preference node
and fallback to _only_ a set of nodes if the memory is not available on
the preferred node. Fallback allocation is attempted from the node
which is nearest to the preferred node.
This helps applications to have control on memory allocation numa nodes
and avoids default fallback to slow memory NUMA nodes. For example a
system with NUMA nodes 1,2 and 3 with DRAM memory and 10, 11 and 12 of
slow memory
new_nodes = numa_bitmask_alloc(nr_nodes);
numa_bitmask_setbit(new_nodes, 1);
numa_bitmask_setbit(new_nodes, 2);
numa_bitmask_setbit(new_nodes, 3);
p = mmap(NULL, nr_pages * page_size, protflag, mapflag, -1, 0);
mbind(p, nr_pages * page_size, MPOL_BIND, new_nodes->maskp, new_nodes->size + 1, 0);
sys_set_mempolicy_home_node(p, nr_pages * page_size, 2, 0);
This will allocate from nodes closer to node 2 and will make sure the
kernel will only allocate from nodes 1, 2, and 3. Memory will not be
allocated from slow memory nodes 10, 11, and 12. This differs from
default MPOL_BIND behavior in that with default MPOL_BIND the allocation
will be attempted from node closer to the local node. One of the
reasons to specify a home node is to allow allocations from cpu less
NUMA node and its nearby NUMA nodes.
With MPOL_PREFERRED_MANY on the other hand will first try to allocate
from the closest node to node 2 from the node list 1, 2 and 3. If those
nodes don't have enough memory, kernel will allocate from slow memory
node 10, 11 and 12 which ever is closer to node 2.
Link: https://lkml.kernel.org/r/20211202123810.267175-3-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Ben Widawsky <ben.widawsky@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 01:08:17 +03:00
|
|
|
int home_node; /* Home node to use for MPOL_BIND and MPOL_PREFERRED_MANY */
|
2021-07-01 04:51:10 +03:00
|
|
|
|
mempolicy: add MPOL_F_STATIC_NODES flag
Add an optional mempolicy mode flag, MPOL_F_STATIC_NODES, that suppresses the
node remap when the policy is rebound.
Adds another member to struct mempolicy, nodemask_t user_nodemask, as part of
a union with cpuset_mems_allowed:
struct mempolicy {
...
union {
nodemask_t cpuset_mems_allowed;
nodemask_t user_nodemask;
} w;
}
that stores the the nodemask that the user passed when he or she created the
mempolicy via set_mempolicy() or mbind(). When using MPOL_F_STATIC_NODES,
which is passed with any mempolicy mode, the user's passed nodemask
intersected with the VMA or task's allowed nodes is always used when
determining the preferred node, setting the MPOL_BIND zonelist, or creating
the interleave nodemask. This happens whenever the policy is rebound,
including when a task's cpuset assignment changes or the cpuset's mems are
changed.
This creates an interesting side-effect in that it allows the mempolicy
"intent" to lie dormant and uneffected until it has access to the node(s) that
it desires. For example, if you currently ask for an interleaved policy over
a set of nodes that you do not have access to, the mempolicy is not created
and the task continues to use the previous policy. With this change, however,
it is possible to create the same mempolicy; it is only effected when access
to nodes in the nodemask is acquired.
It is also possible to mount tmpfs with the static nodemask behavior when
specifying a node or nodemask. To do this, simply add "=static" immediately
following the mempolicy mode at mount time:
mount -o remount mpol=interleave=static:1-3
Also removes mpol_check_policy() and folds its logic into mpol_new() since it
is now obsoleted. The unused vma_mpol_equal() is also removed.
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:12:27 +04:00
|
|
|
union {
|
|
|
|
nodemask_t cpuset_mems_allowed; /* relative to these nodes */
|
|
|
|
nodemask_t user_nodemask; /* nodemask passed by user */
|
|
|
|
} w;
|
2005-04-17 02:20:36 +04:00
|
|
|
};
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Support for managing mempolicy data objects (clone, copy, destroy)
|
|
|
|
* The default fast path of a NULL MPOL_DEFAULT policy is always inlined.
|
|
|
|
*/
|
|
|
|
|
2008-04-28 13:13:08 +04:00
|
|
|
extern void __mpol_put(struct mempolicy *pol);
|
|
|
|
static inline void mpol_put(struct mempolicy *pol)
|
2005-04-17 02:20:36 +04:00
|
|
|
{
|
|
|
|
if (pol)
|
2008-04-28 13:13:08 +04:00
|
|
|
__mpol_put(pol);
|
2005-04-17 02:20:36 +04:00
|
|
|
}
|
|
|
|
|
mempolicy: rework mempolicy Reference Counting [yet again]
After further discussion with Christoph Lameter, it has become clear that my
earlier attempts to clean up the mempolicy reference counting were a bit of
overkill in some areas, resulting in superflous ref/unref in what are usually
fast paths. In other areas, further inspection reveals that I botched the
unref for interleave policies.
A separate patch, suitable for upstream/stable trees, fixes up the known
errors in the previous attempt to fix reference counting.
This patch reworks the memory policy referencing counting and, one hopes,
simplifies the code. Maybe I'll get it right this time.
See the update to the numa_memory_policy.txt document for a discussion of
memory policy reference counting that motivates this patch.
Summary:
Lookup of mempolicy, based on (vma, address) need only add a reference for
shared policy, and we need only unref the policy when finished for shared
policies. So, this patch backs out all of the unneeded extra reference
counting added by my previous attempt. It then unrefs only shared policies
when we're finished with them, using the mpol_cond_put() [conditional put]
helper function introduced by this patch.
Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma
containing just the policy. read_swap_cache_async() can call alloc_page_vma()
multiple times, so we can't let alloc_page_vma() unref the shared policy in
this case. To avoid this, we make a copy of any non-null shared policy and
remove the MPOL_F_SHARED flag from the copy. This copy occurs before reading
a page [or multiple pages] from swap, so the overhead should not be an issue
here.
I introduced a new static inline function "mpol_cond_copy()" to copy the
shared policy to an on-stack policy and remove the flags that would require a
conditional free. The current implementation of mpol_cond_copy() assumes that
the struct mempolicy contains no pointers to dynamically allocated structures
that must be duplicated or reference counted during copy.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:13:16 +04:00
|
|
|
/*
|
|
|
|
* Does mempolicy pol need explicit unref after use?
|
|
|
|
* Currently only needed for shared policies.
|
|
|
|
*/
|
|
|
|
static inline int mpol_needs_cond_ref(struct mempolicy *pol)
|
|
|
|
{
|
|
|
|
return (pol && (pol->flags & MPOL_F_SHARED));
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void mpol_cond_put(struct mempolicy *pol)
|
|
|
|
{
|
|
|
|
if (mpol_needs_cond_ref(pol))
|
|
|
|
__mpol_put(pol);
|
|
|
|
}
|
|
|
|
|
2008-04-28 13:13:09 +04:00
|
|
|
extern struct mempolicy *__mpol_dup(struct mempolicy *pol);
|
|
|
|
static inline struct mempolicy *mpol_dup(struct mempolicy *pol)
|
2005-04-17 02:20:36 +04:00
|
|
|
{
|
|
|
|
if (pol)
|
2008-04-28 13:13:09 +04:00
|
|
|
pol = __mpol_dup(pol);
|
2005-04-17 02:20:36 +04:00
|
|
|
return pol;
|
|
|
|
}
|
|
|
|
|
|
|
|
#define vma_policy(vma) ((vma)->vm_policy)
|
|
|
|
|
|
|
|
static inline void mpol_get(struct mempolicy *pol)
|
|
|
|
{
|
|
|
|
if (pol)
|
|
|
|
atomic_inc(&pol->refcnt);
|
|
|
|
}
|
|
|
|
|
2012-01-11 03:08:21 +04:00
|
|
|
extern bool __mpol_equal(struct mempolicy *a, struct mempolicy *b);
|
|
|
|
static inline bool mpol_equal(struct mempolicy *a, struct mempolicy *b)
|
2005-04-17 02:20:36 +04:00
|
|
|
{
|
|
|
|
if (a == b)
|
2012-01-11 03:08:21 +04:00
|
|
|
return true;
|
2005-04-17 02:20:36 +04:00
|
|
|
return __mpol_equal(a, b);
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Tree of shared policies for a shared memory region.
|
|
|
|
* Maintain the policies in a pseudo mm that contains vmas. The vmas
|
|
|
|
* carry the policy. As a special twist the pseudo mm is indexed in pages, not
|
|
|
|
* bytes, so that we can work with shared memory segments bigger than
|
|
|
|
* unsigned long.
|
|
|
|
*/
|
|
|
|
|
|
|
|
struct sp_node {
|
|
|
|
struct rb_node nd;
|
|
|
|
unsigned long start, end;
|
|
|
|
struct mempolicy *policy;
|
|
|
|
};
|
|
|
|
|
|
|
|
struct shared_policy {
|
|
|
|
struct rb_root root;
|
2016-01-15 02:18:36 +03:00
|
|
|
rwlock_t lock;
|
2005-04-17 02:20:36 +04:00
|
|
|
};
|
|
|
|
|
2013-09-12 01:20:14 +04:00
|
|
|
int vma_dup_policy(struct vm_area_struct *src, struct vm_area_struct *dst);
|
mempolicy: use struct mempolicy pointer in shmem_sb_info
This patch replaces the mempolicy mode, mode_flags, and nodemask in the
shmem_sb_info struct with a struct mempolicy pointer, initialized to NULL.
This removes dependency on the details of mempolicy from shmem.c and hugetlbfs
inode.c and simplifies the interfaces.
mpol_parse_str() in mempolicy.c is changed to return, via a pointer to a
pointer arg, a struct mempolicy pointer on success. For MPOL_DEFAULT, the
returned pointer is NULL. Further, mpol_parse_str() now takes a 'no_context'
argument that causes the input nodemask to be stored in the w.user_nodemask of
the created mempolicy for use when the mempolicy is installed in a tmpfs inode
shared policy tree. At that time, any cpuset contextualization is applied to
the original input nodemask. This preserves the previous behavior where the
input nodemask was stored in the superblock. We can think of the returned
mempolicy as "context free".
Because mpol_parse_str() is now calling mpol_new(), we can remove from
mpol_to_str() the semantic checks that mpol_new() already performs.
Add 'no_context' parameter to mpol_to_str() to specify that it should format
the nodemask in w.user_nodemask for 'bind' and 'interleave' policies.
Change mpol_shared_policy_init() to take a pointer to a "context free" struct
mempolicy and to create a new, "contextualized" mempolicy using the mode,
mode_flags and user_nodemask from the input mempolicy.
Note: we know that the mempolicy passed to mpol_to_str() or
mpol_shared_policy_init() from a tmpfs superblock is "context free". This
is currently the only instance thereof. However, if we found more uses for
this concept, and introduced any ambiguity as to whether a mempolicy was
context free or not, we could add another internal mode flag to identify
context free mempolicies. Then, we could remove the 'no_context' argument
from mpol_to_str().
Added shmem_get_sbmpol() to return a reference counted superblock mempolicy,
if one exists, to pass to mpol_shared_policy_init(). We must add the
reference under the sb stat_lock to prevent races with replacement of the mpol
by remount. This reference is removed in mpol_shared_policy_init().
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: another build fix]
[akpm@linux-foundation.org: yet another build fix]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:13:26 +04:00
|
|
|
void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol);
|
2005-04-17 02:20:36 +04:00
|
|
|
int mpol_set_shared_policy(struct shared_policy *info,
|
|
|
|
struct vm_area_struct *vma,
|
|
|
|
struct mempolicy *new);
|
|
|
|
void mpol_free_shared_policy(struct shared_policy *p);
|
|
|
|
struct mempolicy *mpol_shared_policy_lookup(struct shared_policy *sp,
|
|
|
|
unsigned long idx);
|
|
|
|
|
2014-10-10 02:27:50 +04:00
|
|
|
struct mempolicy *get_task_policy(struct task_struct *p);
|
|
|
|
struct mempolicy *__get_vma_policy(struct vm_area_struct *vma,
|
|
|
|
unsigned long addr);
|
2014-10-10 02:27:48 +04:00
|
|
|
bool vma_policy_mof(struct vm_area_struct *vma);
|
2011-05-25 04:12:41 +04:00
|
|
|
|
2005-04-17 02:20:36 +04:00
|
|
|
extern void numa_default_policy(void);
|
|
|
|
extern void numa_policy_init(void);
|
2017-07-07 01:40:06 +03:00
|
|
|
extern void mpol_rebind_task(struct task_struct *tsk, const nodemask_t *new);
|
[PATCH] cpuset: rebind vma mempolicies fix
Fix more of longstanding bug in cpuset/mempolicy interaction.
NUMA mempolicies (mm/mempolicy.c) are constrained by the current tasks cpuset
to just the Memory Nodes allowed by that cpuset. The kernel maintains
internal state for each mempolicy, tracking what nodes are used for the
MPOL_INTERLEAVE, MPOL_BIND or MPOL_PREFERRED policies.
When a tasks cpuset memory placement changes, whether because the cpuset
changed, or because the task was attached to a different cpuset, then the
tasks mempolicies have to be rebound to the new cpuset placement, so as to
preserve the cpuset-relative numbering of the nodes in that policy.
An earlier fix handled such mempolicy rebinding for mempolicies attached to a
task.
This fix rebinds mempolicies attached to vma's (address ranges in a tasks
address space.) Due to the need to hold the task->mm->mmap_sem semaphore while
updating vma's, the rebinding of vma mempolicies has to be done when the
cpuset memory placement is changed, at which time mmap_sem can be safely
acquired. The tasks mempolicy is rebound later, when the task next attempts
to allocate memory and notices that its task->cpuset_mems_generation is
out-of-date with its cpusets mems_generation.
Because walking the tasklist to find all tasks attached to a changing cpuset
requires holding tasklist_lock, a spinlock, one cannot update the vma's of the
affected tasks while doing the tasklist scan. In general, one cannot acquire
a semaphore (which can sleep) while already holding a spinlock (such as
tasklist_lock). So a list of mm references has to be built up during the
tasklist scan, then the tasklist lock dropped, then for each mm, its mmap_sem
acquired, and the vma's in that mm rebound.
Once the tasklist lock is dropped, affected tasks may fork new tasks, before
their mm's are rebound. A kernel global 'cpuset_being_rebound' is set to
point to the cpuset being rebound (there can only be one; cpuset modifications
are done under a global 'manage_sem' semaphore), and the mpol_copy code that
is used to copy a tasks mempolicies during fork catches such forking tasks,
and ensures their children are also rebound.
When a task is moved to a different cpuset, it is easier, as there is only one
task involved. It's mm->vma's are scanned, using the same
mpol_rebind_policy() as used above.
It may happen that both the mpol_copy hook and the update done via the
tasklist scan update the same mm twice. This is ok, as the mempolicies of
each vma in an mm keep track of what mems_allowed they are relative to, and
safely no-op a second request to rebind to the same nodes.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 12:01:59 +03:00
|
|
|
extern void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new);
|
|
|
|
|
2017-07-07 01:40:03 +03:00
|
|
|
extern int huge_node(struct vm_area_struct *vma,
|
2008-04-28 13:12:18 +04:00
|
|
|
unsigned long addr, gfp_t gfp_flags,
|
|
|
|
struct mempolicy **mpol, nodemask_t **nodemask);
|
hugetlb: derive huge pages nodes allowed from task mempolicy
This patch derives a "nodes_allowed" node mask from the numa mempolicy of
the task modifying the number of persistent huge pages to control the
allocation, freeing and adjusting of surplus huge pages when the pool page
count is modified via the new sysctl or sysfs attribute
"nr_hugepages_mempolicy". The nodes_allowed mask is derived as follows:
* For "default" [NULL] task mempolicy, a NULL nodemask_t pointer
is produced. This will cause the hugetlb subsystem to use
node_online_map as the "nodes_allowed". This preserves the
behavior before this patch.
* For "preferred" mempolicy, including explicit local allocation,
a nodemask with the single preferred node will be produced.
"local" policy will NOT track any internode migrations of the
task adjusting nr_hugepages.
* For "bind" and "interleave" policy, the mempolicy's nodemask
will be used.
* Other than to inform the construction of the nodes_allowed node
mask, the actual mempolicy mode is ignored. That is, all modes
behave like interleave over the resulting nodes_allowed mask
with no "fallback".
See the updated documentation [next patch] for more information
about the implications of this patch.
Examples:
Starting with:
Node 0 HugePages_Total: 0
Node 1 HugePages_Total: 0
Node 2 HugePages_Total: 0
Node 3 HugePages_Total: 0
Default behavior [with or without this patch] balances persistent
hugepage allocation across nodes [with sufficient contiguous memory]:
sysctl vm.nr_hugepages[_mempolicy]=32
yields:
Node 0 HugePages_Total: 8
Node 1 HugePages_Total: 8
Node 2 HugePages_Total: 8
Node 3 HugePages_Total: 8
Of course, we only have nr_hugepages_mempolicy with the patch,
but with default mempolicy, nr_hugepages_mempolicy behaves the
same as nr_hugepages.
Applying mempolicy--e.g., with numactl [using '-m' a.k.a.
'--membind' because it allows multiple nodes to be specified
and it's easy to type]--we can allocate huge pages on
individual nodes or sets of nodes. So, starting from the
condition above, with 8 huge pages per node, add 8 more to
node 2 using:
numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40
This yields:
Node 0 HugePages_Total: 8
Node 1 HugePages_Total: 8
Node 2 HugePages_Total: 16
Node 3 HugePages_Total: 8
The incremental 8 huge pages were restricted to node 2 by the
specified mempolicy.
Similarly, we can use mempolicy to free persistent huge pages
from specified nodes:
numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32
yields:
Node 0 HugePages_Total: 4
Node 1 HugePages_Total: 4
Node 2 HugePages_Total: 16
Node 3 HugePages_Total: 8
The 8 huge pages freed were balanced over nodes 0 and 1.
[rientjes@google.com: accomodate reworked NODEMASK_ALLOC]
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 04:58:21 +03:00
|
|
|
extern bool init_nodemask_of_mempolicy(nodemask_t *mask);
|
2021-07-01 04:50:56 +03:00
|
|
|
extern bool mempolicy_in_oom_domain(struct task_struct *tsk,
|
2010-08-10 04:18:52 +04:00
|
|
|
const nodemask_t *mask);
|
2020-08-12 04:30:32 +03:00
|
|
|
extern nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy);
|
|
|
|
|
|
|
|
static inline nodemask_t *policy_nodemask_current(gfp_t gfp)
|
|
|
|
{
|
|
|
|
struct mempolicy *mpol = get_task_policy(current);
|
|
|
|
|
|
|
|
return policy_nodemask(gfp, mpol);
|
|
|
|
}
|
|
|
|
|
2014-04-08 02:37:29 +04:00
|
|
|
extern unsigned int mempolicy_slab_node(void);
|
2005-04-17 02:20:36 +04:00
|
|
|
|
2006-09-26 10:31:18 +04:00
|
|
|
extern enum zone_type policy_zone;
|
2006-01-06 11:11:17 +03:00
|
|
|
|
2006-09-26 10:31:18 +04:00
|
|
|
static inline void check_highest_zone(enum zone_type k)
|
2006-01-06 11:11:17 +03:00
|
|
|
{
|
2007-08-23 01:02:05 +04:00
|
|
|
if (k > policy_zone && k != ZONE_MOVABLE)
|
2006-01-06 11:11:17 +03:00
|
|
|
policy_zone = k;
|
|
|
|
}
|
|
|
|
|
2012-05-30 02:06:24 +04:00
|
|
|
int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
|
|
|
|
const nodemask_t *to, int flags);
|
2006-01-08 12:00:51 +03:00
|
|
|
|
mempolicy: rework shmem mpol parsing and display
mm/shmem.c currently contains functions to parse and display memory policy
strings for the tmpfs 'mpol' mount option. Move this to mm/mempolicy.c with
the rest of the mempolicy support. With subsequent patches, we'll be able to
remove knowledge of the details [mode, flags, policy, ...] completely from
shmem.c
1) replace shmem_parse_mpol() in mm/shmem.c with mpol_parse_str() in
mm/mempolicy.c. Rework to use the policy_types[] array [used by
mpol_to_str()] to look up mode by name.
2) use mpol_to_str() to format policy for shmem_show_mpol(). mpol_to_str()
expects a pointer to a struct mempolicy, so temporarily construct one.
This will be replaced with a reference to a struct mempolicy in the tmpfs
superblock in a subsequent patch.
NOTE 1: I changed mpol_to_str() to use a colon ':' rather than an equal
sign '=' as the nodemask delimiter to match mpol_parse_str() and the
tmpfs/shmem mpol mount option formatting that now uses mpol_to_str(). This
is a user visible change to numa_maps, but then the addition of the mode
flags already changed the display. It makes sense to me to have the mounts
and numa_maps display the policy in the same format. However, if anyone
objects strongly, I can pass the desired nodemask delimeter as an arg to
mpol_to_str().
Note 2: Like show_numa_map(), I don't check the return code from
mpol_to_str(). I do use a longer buffer than the one provided by
show_numa_map(), which seems to have sufficed so far.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:13:23 +04:00
|
|
|
|
|
|
|
#ifdef CONFIG_TMPFS
|
2013-01-02 14:04:23 +04:00
|
|
|
extern int mpol_parse_str(char *str, struct mempolicy **mpol);
|
2011-05-25 04:12:46 +04:00
|
|
|
#endif
|
mempolicy: rework shmem mpol parsing and display
mm/shmem.c currently contains functions to parse and display memory policy
strings for the tmpfs 'mpol' mount option. Move this to mm/mempolicy.c with
the rest of the mempolicy support. With subsequent patches, we'll be able to
remove knowledge of the details [mode, flags, policy, ...] completely from
shmem.c
1) replace shmem_parse_mpol() in mm/shmem.c with mpol_parse_str() in
mm/mempolicy.c. Rework to use the policy_types[] array [used by
mpol_to_str()] to look up mode by name.
2) use mpol_to_str() to format policy for shmem_show_mpol(). mpol_to_str()
expects a pointer to a struct mempolicy, so temporarily construct one.
This will be replaced with a reference to a struct mempolicy in the tmpfs
superblock in a subsequent patch.
NOTE 1: I changed mpol_to_str() to use a colon ':' rather than an equal
sign '=' as the nodemask delimiter to match mpol_parse_str() and the
tmpfs/shmem mpol mount option formatting that now uses mpol_to_str(). This
is a user visible change to numa_maps, but then the addition of the mode
flags already changed the display. It makes sense to me to have the mounts
and numa_maps display the policy in the same format. However, if anyone
objects strongly, I can pass the desired nodemask delimeter as an arg to
mpol_to_str().
Note 2: Like show_numa_map(), I don't check the return code from
mpol_to_str(). I do use a longer buffer than the one provided by
show_numa_map(), which seems to have sufficed so far.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:13:23 +04:00
|
|
|
|
2013-11-13 03:07:28 +04:00
|
|
|
extern void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol);
|
2008-07-24 08:28:22 +04:00
|
|
|
|
|
|
|
/* Check if a vma is migratable */
|
2020-04-02 07:10:52 +03:00
|
|
|
extern bool vma_migratable(struct vm_area_struct *vma);
|
2008-07-24 08:28:22 +04:00
|
|
|
|
2012-10-25 16:16:30 +04:00
|
|
|
extern int mpol_misplaced(struct page *, struct vm_area_struct *, unsigned long);
|
2016-09-02 02:15:07 +03:00
|
|
|
extern void mpol_put_task_policy(struct task_struct *);
|
2012-10-25 16:16:30 +04:00
|
|
|
|
2021-09-03 01:00:13 +03:00
|
|
|
static inline bool mpol_is_preferred_many(struct mempolicy *pol)
|
|
|
|
{
|
|
|
|
return (pol->mode == MPOL_PREFERRED_MANY);
|
|
|
|
}
|
|
|
|
|
|
|
|
|
2005-04-17 02:20:36 +04:00
|
|
|
#else
|
|
|
|
|
|
|
|
struct mempolicy {};
|
|
|
|
|
2012-01-11 03:08:21 +04:00
|
|
|
static inline bool mpol_equal(struct mempolicy *a, struct mempolicy *b)
|
2005-04-17 02:20:36 +04:00
|
|
|
{
|
2012-01-11 03:08:21 +04:00
|
|
|
return true;
|
2005-04-17 02:20:36 +04:00
|
|
|
}
|
|
|
|
|
2008-04-28 13:13:08 +04:00
|
|
|
static inline void mpol_put(struct mempolicy *p)
|
2005-04-17 02:20:36 +04:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
mempolicy: rework mempolicy Reference Counting [yet again]
After further discussion with Christoph Lameter, it has become clear that my
earlier attempts to clean up the mempolicy reference counting were a bit of
overkill in some areas, resulting in superflous ref/unref in what are usually
fast paths. In other areas, further inspection reveals that I botched the
unref for interleave policies.
A separate patch, suitable for upstream/stable trees, fixes up the known
errors in the previous attempt to fix reference counting.
This patch reworks the memory policy referencing counting and, one hopes,
simplifies the code. Maybe I'll get it right this time.
See the update to the numa_memory_policy.txt document for a discussion of
memory policy reference counting that motivates this patch.
Summary:
Lookup of mempolicy, based on (vma, address) need only add a reference for
shared policy, and we need only unref the policy when finished for shared
policies. So, this patch backs out all of the unneeded extra reference
counting added by my previous attempt. It then unrefs only shared policies
when we're finished with them, using the mpol_cond_put() [conditional put]
helper function introduced by this patch.
Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma
containing just the policy. read_swap_cache_async() can call alloc_page_vma()
multiple times, so we can't let alloc_page_vma() unref the shared policy in
this case. To avoid this, we make a copy of any non-null shared policy and
remove the MPOL_F_SHARED flag from the copy. This copy occurs before reading
a page [or multiple pages] from swap, so the overhead should not be an issue
here.
I introduced a new static inline function "mpol_cond_copy()" to copy the
shared policy to an on-stack policy and remove the flags that would require a
conditional free. The current implementation of mpol_cond_copy() assumes that
the struct mempolicy contains no pointers to dynamically allocated structures
that must be duplicated or reference counted during copy.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:13:16 +04:00
|
|
|
static inline void mpol_cond_put(struct mempolicy *pol)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2005-04-17 02:20:36 +04:00
|
|
|
static inline void mpol_get(struct mempolicy *pol)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
struct shared_policy {};
|
|
|
|
|
mempolicy: use struct mempolicy pointer in shmem_sb_info
This patch replaces the mempolicy mode, mode_flags, and nodemask in the
shmem_sb_info struct with a struct mempolicy pointer, initialized to NULL.
This removes dependency on the details of mempolicy from shmem.c and hugetlbfs
inode.c and simplifies the interfaces.
mpol_parse_str() in mempolicy.c is changed to return, via a pointer to a
pointer arg, a struct mempolicy pointer on success. For MPOL_DEFAULT, the
returned pointer is NULL. Further, mpol_parse_str() now takes a 'no_context'
argument that causes the input nodemask to be stored in the w.user_nodemask of
the created mempolicy for use when the mempolicy is installed in a tmpfs inode
shared policy tree. At that time, any cpuset contextualization is applied to
the original input nodemask. This preserves the previous behavior where the
input nodemask was stored in the superblock. We can think of the returned
mempolicy as "context free".
Because mpol_parse_str() is now calling mpol_new(), we can remove from
mpol_to_str() the semantic checks that mpol_new() already performs.
Add 'no_context' parameter to mpol_to_str() to specify that it should format
the nodemask in w.user_nodemask for 'bind' and 'interleave' policies.
Change mpol_shared_policy_init() to take a pointer to a "context free" struct
mempolicy and to create a new, "contextualized" mempolicy using the mode,
mode_flags and user_nodemask from the input mempolicy.
Note: we know that the mempolicy passed to mpol_to_str() or
mpol_shared_policy_init() from a tmpfs superblock is "context free". This
is currently the only instance thereof. However, if we found more uses for
this concept, and introduced any ambiguity as to whether a mempolicy was
context free or not, we could add another internal mode flag to identify
context free mempolicies. Then, we could remove the 'no_context' argument
from mpol_to_str().
Added shmem_get_sbmpol() to return a reference counted superblock mempolicy,
if one exists, to pass to mpol_shared_policy_init(). We must add the
reference under the sb stat_lock to prevent races with replacement of the mpol
by remount. This reference is removed in mpol_shared_policy_init().
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: another build fix]
[akpm@linux-foundation.org: yet another build fix]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:13:26 +04:00
|
|
|
static inline void mpol_shared_policy_init(struct shared_policy *sp,
|
|
|
|
struct mempolicy *mpol)
|
2005-04-17 02:20:36 +04:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void mpol_free_shared_policy(struct shared_policy *p)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2016-05-20 03:12:44 +03:00
|
|
|
static inline struct mempolicy *
|
|
|
|
mpol_shared_policy_lookup(struct shared_policy *sp, unsigned long idx)
|
|
|
|
{
|
|
|
|
return NULL;
|
|
|
|
}
|
|
|
|
|
2005-04-17 02:20:36 +04:00
|
|
|
#define vma_policy(vma) NULL
|
2013-09-12 01:20:14 +04:00
|
|
|
|
|
|
|
static inline int
|
|
|
|
vma_dup_policy(struct vm_area_struct *src, struct vm_area_struct *dst)
|
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
2005-04-17 02:20:36 +04:00
|
|
|
|
|
|
|
static inline void numa_policy_init(void)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline void numa_default_policy(void)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
[PATCH] cpuset: numa_policy_rebind cleanup
Cleanup, reorganize and make more robust the mempolicy.c code to rebind
mempolicies relative to the containing cpuset after a tasks memory placement
changes.
The real motivator for this cleanup patch is to lay more groundwork for the
upcoming patch to correctly rebind NUMA mempolicies that are attached to vma's
after the containing cpuset memory placement changes.
NUMA mempolicies are constrained by the cpuset their task is a member of.
When either (1) a task is moved to a different cpuset, or (2) the 'mems'
mems_allowed of a cpuset is changed, then the NUMA mempolicies have embedded
node numbers (for MPOL_BIND, MPOL_INTERLEAVE and MPOL_PREFERRED) that need to
be recalculated, relative to their new cpuset placement.
The old code used an unreliable method of determining what was the old
mems_allowed constraining the mempolicy. It just looked at the tasks
mems_allowed value. This sort of worked with the present code, that just
rebinds the -task- mempolicy, and leaves any -vma- mempolicies broken,
referring to the old nodes. But in an upcoming patch, the vma mempolicies
will be rebound as well. Then the order in which the various task and vma
mempolicies are updated will no longer be deterministic, and one can no longer
count on the task->mems_allowed holding the old value for as long as needed.
It's not even clear if the current code was guaranteed to work reliably for
task mempolicies.
So I added a mems_allowed field to each mempolicy, stating exactly what
mems_allowed the policy is relative to, and updated synchronously and reliably
anytime that the mempolicy is rebound.
Also removed a useless wrapper routine, numa_policy_rebind(), and had its
caller, cpuset_update_task_memory_state(), call directly to the rewritten
policy_rebind() routine, and made that rebind routine extern instead of
static, and added a "mpol_" prefix to its name, making it
mpol_rebind_policy().
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 12:01:56 +03:00
|
|
|
static inline void mpol_rebind_task(struct task_struct *tsk,
|
2017-07-07 01:40:06 +03:00
|
|
|
const nodemask_t *new)
|
[PATCH] cpusets: automatic numa mempolicy rebinding
This patch automatically updates a tasks NUMA mempolicy when its cpuset
memory placement changes. It does so within the context of the task,
without any need to support low level external mempolicy manipulation.
If a system is not using cpusets, or if running on a system with just the
root (all-encompassing) cpuset, then this remap is a no-op. Only when a
task is moved between cpusets, or a cpusets memory placement is changed
does the following apply. Otherwise, the main routine below,
rebind_policy() is not even called.
When mixing cpusets, scheduler affinity, and NUMA mempolicies, the
essential role of cpusets is to place jobs (several related tasks) on a set
of CPUs and Memory Nodes, the essential role of sched_setaffinity is to
manage a jobs processor placement within its allowed cpuset, and the
essential role of NUMA mempolicy (mbind, set_mempolicy) is to manage a jobs
memory placement within its allowed cpuset.
However, CPU affinity and NUMA memory placement are managed within the
kernel using absolute system wide numbering, not cpuset relative numbering.
This is ok until a job is migrated to a different cpuset, or what's the
same, a jobs cpuset is moved to different CPUs and Memory Nodes.
Then the CPU affinity and NUMA memory placement of the tasks in the job
need to be updated, to preserve their cpuset-relative position. This can
be done for CPU affinity using sched_setaffinity() from user code, as one
task can modify anothers CPU affinity. This cannot be done from an
external task for NUMA memory placement, as that can only be modified in
the context of the task using it.
However, it easy enough to remap a tasks NUMA mempolicy automatically when
a task is migrated, using the existing cpuset mechanism to trigger a
refresh of a tasks memory placement after its cpuset has changed. All that
is needed is the old and new nodemask, and notice to the task that it needs
to rebind its mempolicy. The tasks mems_allowed has the old mask, the
tasks cpuset has the new mask, and the existing
cpuset_update_current_mems_allowed() mechanism provides the notice. The
bitmap/cpumask/nodemask remap operators provide the cpuset relative
calculations.
This patch leaves open a couple of issues:
1) Updating vma and shmfs/tmpfs/hugetlbfs memory policies:
These mempolicies may reference nodes outside of those allowed to
the current task by its cpuset. Tasks are migrated as part of jobs,
which reside on what might be several cpusets in a subtree. When such
a job is migrated, all NUMA memory policy references to nodes within
that cpuset subtree should be translated, and references to any nodes
outside that subtree should be left untouched. A future patch will
provide the cpuset mechanism needed to mark such subtrees. With that
patch, we will be able to correctly migrate these other memory policies
across a job migration.
2) Updating cpuset, affinity and memory policies in user space:
This is harder. Any placement state stored in user space using
system-wide numbering will be invalidated across a migration. More
work will be required to provide user code with a migration-safe means
to manage its cpuset relative placement, while preserving the current
API's that pass system wide numbers, not cpuset relative numbers across
the kernel-user boundary.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-31 02:02:36 +03:00
|
|
|
{
|
|
|
|
}
|
|
|
|
|
[PATCH] cpuset: rebind vma mempolicies fix
Fix more of longstanding bug in cpuset/mempolicy interaction.
NUMA mempolicies (mm/mempolicy.c) are constrained by the current tasks cpuset
to just the Memory Nodes allowed by that cpuset. The kernel maintains
internal state for each mempolicy, tracking what nodes are used for the
MPOL_INTERLEAVE, MPOL_BIND or MPOL_PREFERRED policies.
When a tasks cpuset memory placement changes, whether because the cpuset
changed, or because the task was attached to a different cpuset, then the
tasks mempolicies have to be rebound to the new cpuset placement, so as to
preserve the cpuset-relative numbering of the nodes in that policy.
An earlier fix handled such mempolicy rebinding for mempolicies attached to a
task.
This fix rebinds mempolicies attached to vma's (address ranges in a tasks
address space.) Due to the need to hold the task->mm->mmap_sem semaphore while
updating vma's, the rebinding of vma mempolicies has to be done when the
cpuset memory placement is changed, at which time mmap_sem can be safely
acquired. The tasks mempolicy is rebound later, when the task next attempts
to allocate memory and notices that its task->cpuset_mems_generation is
out-of-date with its cpusets mems_generation.
Because walking the tasklist to find all tasks attached to a changing cpuset
requires holding tasklist_lock, a spinlock, one cannot update the vma's of the
affected tasks while doing the tasklist scan. In general, one cannot acquire
a semaphore (which can sleep) while already holding a spinlock (such as
tasklist_lock). So a list of mm references has to be built up during the
tasklist scan, then the tasklist lock dropped, then for each mm, its mmap_sem
acquired, and the vma's in that mm rebound.
Once the tasklist lock is dropped, affected tasks may fork new tasks, before
their mm's are rebound. A kernel global 'cpuset_being_rebound' is set to
point to the cpuset being rebound (there can only be one; cpuset modifications
are done under a global 'manage_sem' semaphore), and the mpol_copy code that
is used to copy a tasks mempolicies during fork catches such forking tasks,
and ensures their children are also rebound.
When a task is moved to a different cpuset, it is easier, as there is only one
task involved. It's mm->vma's are scanned, using the same
mpol_rebind_policy() as used above.
It may happen that both the mpol_copy hook and the update done via the
tasklist scan update the same mm twice. This is ok, as the mempolicies of
each vma in an mm keep track of what mems_allowed they are relative to, and
safely no-op a second request to rebind to the same nodes.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 12:01:59 +03:00
|
|
|
static inline void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new)
|
|
|
|
{
|
|
|
|
}
|
|
|
|
|
2017-07-07 01:40:03 +03:00
|
|
|
static inline int huge_node(struct vm_area_struct *vma,
|
2008-04-28 13:12:18 +04:00
|
|
|
unsigned long addr, gfp_t gfp_flags,
|
|
|
|
struct mempolicy **mpol, nodemask_t **nodemask)
|
2006-01-06 11:10:46 +03:00
|
|
|
{
|
2008-04-28 13:12:18 +04:00
|
|
|
*mpol = NULL;
|
|
|
|
*nodemask = NULL;
|
2017-07-07 01:40:03 +03:00
|
|
|
return 0;
|
2006-01-06 11:10:46 +03:00
|
|
|
}
|
|
|
|
|
2010-08-10 04:18:52 +04:00
|
|
|
static inline bool init_nodemask_of_mempolicy(nodemask_t *m)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2012-05-30 02:06:24 +04:00
|
|
|
static inline int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
|
|
|
|
const nodemask_t *to, int flags)
|
[PATCH] cpusets: swap migration interface
Add a boolean "memory_migrate" to each cpuset, represented by a file
containing "0" or "1" in each directory below /dev/cpuset.
It defaults to false (file contains "0"). It can be set true by writing
"1" to the file.
If true, then anytime that a task is attached to the cpuset so marked, the
pages of that task will be moved to that cpuset, preserving, to the extent
practical, the cpuset-relative placement of the pages.
Also anytime that a cpuset so marked has its memory placement changed (by
writing to its "mems" file), the tasks in that cpuset will have their pages
moved to the cpusets new nodes, preserving, to the extent practical, the
cpuset-relative placement of the moved pages.
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 12:00:56 +03:00
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2006-01-06 11:11:17 +03:00
|
|
|
static inline void check_highest_zone(int k)
|
|
|
|
{
|
|
|
|
}
|
mempolicy: rework shmem mpol parsing and display
mm/shmem.c currently contains functions to parse and display memory policy
strings for the tmpfs 'mpol' mount option. Move this to mm/mempolicy.c with
the rest of the mempolicy support. With subsequent patches, we'll be able to
remove knowledge of the details [mode, flags, policy, ...] completely from
shmem.c
1) replace shmem_parse_mpol() in mm/shmem.c with mpol_parse_str() in
mm/mempolicy.c. Rework to use the policy_types[] array [used by
mpol_to_str()] to look up mode by name.
2) use mpol_to_str() to format policy for shmem_show_mpol(). mpol_to_str()
expects a pointer to a struct mempolicy, so temporarily construct one.
This will be replaced with a reference to a struct mempolicy in the tmpfs
superblock in a subsequent patch.
NOTE 1: I changed mpol_to_str() to use a colon ':' rather than an equal
sign '=' as the nodemask delimiter to match mpol_parse_str() and the
tmpfs/shmem mpol mount option formatting that now uses mpol_to_str(). This
is a user visible change to numa_maps, but then the addition of the mode
flags already changed the display. It makes sense to me to have the mounts
and numa_maps display the policy in the same format. However, if anyone
objects strongly, I can pass the desired nodemask delimeter as an arg to
mpol_to_str().
Note 2: Like show_numa_map(), I don't check the return code from
mpol_to_str(). I do use a longer buffer than the one provided by
show_numa_map(), which seems to have sufficed so far.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:13:23 +04:00
|
|
|
|
|
|
|
#ifdef CONFIG_TMPFS
|
2013-01-02 14:04:23 +04:00
|
|
|
static inline int mpol_parse_str(char *str, struct mempolicy **mpol)
|
mempolicy: rework shmem mpol parsing and display
mm/shmem.c currently contains functions to parse and display memory policy
strings for the tmpfs 'mpol' mount option. Move this to mm/mempolicy.c with
the rest of the mempolicy support. With subsequent patches, we'll be able to
remove knowledge of the details [mode, flags, policy, ...] completely from
shmem.c
1) replace shmem_parse_mpol() in mm/shmem.c with mpol_parse_str() in
mm/mempolicy.c. Rework to use the policy_types[] array [used by
mpol_to_str()] to look up mode by name.
2) use mpol_to_str() to format policy for shmem_show_mpol(). mpol_to_str()
expects a pointer to a struct mempolicy, so temporarily construct one.
This will be replaced with a reference to a struct mempolicy in the tmpfs
superblock in a subsequent patch.
NOTE 1: I changed mpol_to_str() to use a colon ':' rather than an equal
sign '=' as the nodemask delimiter to match mpol_parse_str() and the
tmpfs/shmem mpol mount option formatting that now uses mpol_to_str(). This
is a user visible change to numa_maps, but then the addition of the mode
flags already changed the display. It makes sense to me to have the mounts
and numa_maps display the policy in the same format. However, if anyone
objects strongly, I can pass the desired nodemask delimeter as an arg to
mpol_to_str().
Note 2: Like show_numa_map(), I don't check the return code from
mpol_to_str(). I do use a longer buffer than the one provided by
show_numa_map(), which seems to have sufficed so far.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:13:23 +04:00
|
|
|
{
|
mempolicy: use struct mempolicy pointer in shmem_sb_info
This patch replaces the mempolicy mode, mode_flags, and nodemask in the
shmem_sb_info struct with a struct mempolicy pointer, initialized to NULL.
This removes dependency on the details of mempolicy from shmem.c and hugetlbfs
inode.c and simplifies the interfaces.
mpol_parse_str() in mempolicy.c is changed to return, via a pointer to a
pointer arg, a struct mempolicy pointer on success. For MPOL_DEFAULT, the
returned pointer is NULL. Further, mpol_parse_str() now takes a 'no_context'
argument that causes the input nodemask to be stored in the w.user_nodemask of
the created mempolicy for use when the mempolicy is installed in a tmpfs inode
shared policy tree. At that time, any cpuset contextualization is applied to
the original input nodemask. This preserves the previous behavior where the
input nodemask was stored in the superblock. We can think of the returned
mempolicy as "context free".
Because mpol_parse_str() is now calling mpol_new(), we can remove from
mpol_to_str() the semantic checks that mpol_new() already performs.
Add 'no_context' parameter to mpol_to_str() to specify that it should format
the nodemask in w.user_nodemask for 'bind' and 'interleave' policies.
Change mpol_shared_policy_init() to take a pointer to a "context free" struct
mempolicy and to create a new, "contextualized" mempolicy using the mode,
mode_flags and user_nodemask from the input mempolicy.
Note: we know that the mempolicy passed to mpol_to_str() or
mpol_shared_policy_init() from a tmpfs superblock is "context free". This
is currently the only instance thereof. However, if we found more uses for
this concept, and introduced any ambiguity as to whether a mempolicy was
context free or not, we could add another internal mode flag to identify
context free mempolicies. Then, we could remove the 'no_context' argument
from mpol_to_str().
Added shmem_get_sbmpol() to return a reference counted superblock mempolicy,
if one exists, to pass to mpol_shared_policy_init(). We must add the
reference under the sb stat_lock to prevent races with replacement of the mpol
by remount. This reference is removed in mpol_shared_policy_init().
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: another build fix]
[akpm@linux-foundation.org: yet another build fix]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:13:26 +04:00
|
|
|
return 1; /* error */
|
mempolicy: rework shmem mpol parsing and display
mm/shmem.c currently contains functions to parse and display memory policy
strings for the tmpfs 'mpol' mount option. Move this to mm/mempolicy.c with
the rest of the mempolicy support. With subsequent patches, we'll be able to
remove knowledge of the details [mode, flags, policy, ...] completely from
shmem.c
1) replace shmem_parse_mpol() in mm/shmem.c with mpol_parse_str() in
mm/mempolicy.c. Rework to use the policy_types[] array [used by
mpol_to_str()] to look up mode by name.
2) use mpol_to_str() to format policy for shmem_show_mpol(). mpol_to_str()
expects a pointer to a struct mempolicy, so temporarily construct one.
This will be replaced with a reference to a struct mempolicy in the tmpfs
superblock in a subsequent patch.
NOTE 1: I changed mpol_to_str() to use a colon ':' rather than an equal
sign '=' as the nodemask delimiter to match mpol_parse_str() and the
tmpfs/shmem mpol mount option formatting that now uses mpol_to_str(). This
is a user visible change to numa_maps, but then the addition of the mode
flags already changed the display. It makes sense to me to have the mounts
and numa_maps display the policy in the same format. However, if anyone
objects strongly, I can pass the desired nodemask delimeter as an arg to
mpol_to_str().
Note 2: Like show_numa_map(), I don't check the return code from
mpol_to_str(). I do use a longer buffer than the one provided by
show_numa_map(), which seems to have sufficed so far.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:13:23 +04:00
|
|
|
}
|
2011-05-25 04:12:46 +04:00
|
|
|
#endif
|
mempolicy: rework shmem mpol parsing and display
mm/shmem.c currently contains functions to parse and display memory policy
strings for the tmpfs 'mpol' mount option. Move this to mm/mempolicy.c with
the rest of the mempolicy support. With subsequent patches, we'll be able to
remove knowledge of the details [mode, flags, policy, ...] completely from
shmem.c
1) replace shmem_parse_mpol() in mm/shmem.c with mpol_parse_str() in
mm/mempolicy.c. Rework to use the policy_types[] array [used by
mpol_to_str()] to look up mode by name.
2) use mpol_to_str() to format policy for shmem_show_mpol(). mpol_to_str()
expects a pointer to a struct mempolicy, so temporarily construct one.
This will be replaced with a reference to a struct mempolicy in the tmpfs
superblock in a subsequent patch.
NOTE 1: I changed mpol_to_str() to use a colon ':' rather than an equal
sign '=' as the nodemask delimiter to match mpol_parse_str() and the
tmpfs/shmem mpol mount option formatting that now uses mpol_to_str(). This
is a user visible change to numa_maps, but then the addition of the mode
flags already changed the display. It makes sense to me to have the mounts
and numa_maps display the policy in the same format. However, if anyone
objects strongly, I can pass the desired nodemask delimeter as an arg to
mpol_to_str().
Note 2: Like show_numa_map(), I don't check the return code from
mpol_to_str(). I do use a longer buffer than the one provided by
show_numa_map(), which seems to have sufficed so far.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-28 13:13:23 +04:00
|
|
|
|
2012-10-25 16:16:30 +04:00
|
|
|
static inline int mpol_misplaced(struct page *page, struct vm_area_struct *vma,
|
|
|
|
unsigned long address)
|
|
|
|
{
|
|
|
|
return -1; /* no node preference */
|
|
|
|
}
|
|
|
|
|
2016-09-02 02:15:07 +03:00
|
|
|
static inline void mpol_put_task_policy(struct task_struct *task)
|
|
|
|
{
|
|
|
|
}
|
2020-08-12 04:30:32 +03:00
|
|
|
|
|
|
|
static inline nodemask_t *policy_nodemask_current(gfp_t gfp)
|
|
|
|
{
|
|
|
|
return NULL;
|
|
|
|
}
|
2021-09-03 00:59:33 +03:00
|
|
|
|
2021-09-03 01:00:13 +03:00
|
|
|
static inline bool mpol_is_preferred_many(struct mempolicy *pol)
|
|
|
|
{
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2005-04-17 02:20:36 +04:00
|
|
|
#endif /* CONFIG_NUMA */
|
|
|
|
#endif
|