Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo: - xattr support added. The implementation is shared with tmpfs. The usage is restricted and intended to be used to manage per-cgroup metadata by system software. tmpfs changes are routed through this branch with Hugh's permission. - cgroup subsystem ID handling simplified. * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Define CGROUP_SUBSYS_COUNT according the configuration cgroup: Assign subsystem IDs during compile time cgroup: Do not depend on a given order when populating the subsys array cgroup: Wrap subsystem selection macro cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT cgroup: net_prio: Do not define task_netpioidx() when not selected cgroup: net_cls: Do not define task_cls_classid() when not selected cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h cgroup: trivial fixes for Documentation/cgroups/cgroups.txt xattr: mark variable as uninitialized to make both gcc and smatch happy fs: add missing documentation to simple_xattr functions cgroup: add documentation on extended attributes usage cgroup: rename subsys_bits to subsys_mask cgroup: add xattr support cgroup: revise how we re-populate root directory xattr: extract simple_xattr code from tmpfs
This commit is contained in:
Коммит
c0e8a139a5
|
@ -29,7 +29,8 @@ CONTENTS:
|
|||
3.1 Overview
|
||||
3.2 Synchronization
|
||||
3.3 Subsystem API
|
||||
4. Questions
|
||||
4. Extended attributes usage
|
||||
5. Questions
|
||||
|
||||
1. Control Groups
|
||||
=================
|
||||
|
@ -62,9 +63,9 @@ an instance of the cgroup virtual filesystem associated with it.
|
|||
At any one time there may be multiple active hierarchies of task
|
||||
cgroups. Each hierarchy is a partition of all tasks in the system.
|
||||
|
||||
User level code may create and destroy cgroups by name in an
|
||||
User-level code may create and destroy cgroups by name in an
|
||||
instance of the cgroup virtual file system, specify and query to
|
||||
which cgroup a task is assigned, and list the task pids assigned to
|
||||
which cgroup a task is assigned, and list the task PIDs assigned to
|
||||
a cgroup. Those creations and assignments only affect the hierarchy
|
||||
associated with that instance of the cgroup file system.
|
||||
|
||||
|
@ -72,7 +73,7 @@ On their own, the only use for cgroups is for simple job
|
|||
tracking. The intention is that other subsystems hook into the generic
|
||||
cgroup support to provide new attributes for cgroups, such as
|
||||
accounting/limiting the resources which processes in a cgroup can
|
||||
access. For example, cpusets (see Documentation/cgroups/cpusets.txt) allows
|
||||
access. For example, cpusets (see Documentation/cgroups/cpusets.txt) allow
|
||||
you to associate a set of CPUs and a set of memory nodes with the
|
||||
tasks in each cgroup.
|
||||
|
||||
|
@ -80,11 +81,11 @@ tasks in each cgroup.
|
|||
----------------------------
|
||||
|
||||
There are multiple efforts to provide process aggregations in the
|
||||
Linux kernel, mainly for resource tracking purposes. Such efforts
|
||||
Linux kernel, mainly for resource-tracking purposes. Such efforts
|
||||
include cpusets, CKRM/ResGroups, UserBeanCounters, and virtual server
|
||||
namespaces. These all require the basic notion of a
|
||||
grouping/partitioning of processes, with newly forked processes ending
|
||||
in the same group (cgroup) as their parent process.
|
||||
up in the same group (cgroup) as their parent process.
|
||||
|
||||
The kernel cgroup patch provides the minimum essential kernel
|
||||
mechanisms required to efficiently implement such groups. It has
|
||||
|
@ -127,14 +128,14 @@ following lines:
|
|||
/ \
|
||||
Professors (15%) students (5%)
|
||||
|
||||
Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd go
|
||||
into NFS network class.
|
||||
Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd goes
|
||||
into the NFS network class.
|
||||
|
||||
At the same time Firefox/Lynx will share an appropriate CPU/Memory class
|
||||
depending on who launched it (prof/student).
|
||||
|
||||
With the ability to classify tasks differently for different resources
|
||||
(by putting those resource subsystems in different hierarchies) then
|
||||
(by putting those resource subsystems in different hierarchies),
|
||||
the admin can easily set up a script which receives exec notifications
|
||||
and depending on who is launching the browser he can
|
||||
|
||||
|
@ -145,19 +146,19 @@ a separate cgroup for every browser launched and associate it with
|
|||
appropriate network and other resource class. This may lead to
|
||||
proliferation of such cgroups.
|
||||
|
||||
Also lets say that the administrator would like to give enhanced network
|
||||
Also let's say that the administrator would like to give enhanced network
|
||||
access temporarily to a student's browser (since it is night and the user
|
||||
wants to do online gaming :)) OR give one of the students simulation
|
||||
apps enhanced CPU power,
|
||||
wants to do online gaming :)) OR give one of the student's simulation
|
||||
apps enhanced CPU power.
|
||||
|
||||
With ability to write pids directly to resource classes, it's just a
|
||||
matter of :
|
||||
With ability to write PIDs directly to resource classes, it's just a
|
||||
matter of:
|
||||
|
||||
# echo pid > /sys/fs/cgroup/network/<new_class>/tasks
|
||||
(after some time)
|
||||
# echo pid > /sys/fs/cgroup/network/<orig_class>/tasks
|
||||
|
||||
Without this ability, he would have to split the cgroup into
|
||||
Without this ability, the administrator would have to split the cgroup into
|
||||
multiple separate ones and then associate the new cgroups with the
|
||||
new resource classes.
|
||||
|
||||
|
@ -184,20 +185,20 @@ Control Groups extends the kernel as follows:
|
|||
field of each task_struct using the css_set, anchored at
|
||||
css_set->tasks.
|
||||
|
||||
- A cgroup hierarchy filesystem can be mounted for browsing and
|
||||
- A cgroup hierarchy filesystem can be mounted for browsing and
|
||||
manipulation from user space.
|
||||
|
||||
- You can list all the tasks (by pid) attached to any cgroup.
|
||||
- You can list all the tasks (by PID) attached to any cgroup.
|
||||
|
||||
The implementation of cgroups requires a few, simple hooks
|
||||
into the rest of the kernel, none in performance critical paths:
|
||||
into the rest of the kernel, none in performance-critical paths:
|
||||
|
||||
- in init/main.c, to initialize the root cgroups and initial
|
||||
css_set at system boot.
|
||||
|
||||
- in fork and exit, to attach and detach a task from its css_set.
|
||||
|
||||
In addition a new file system, of type "cgroup" may be mounted, to
|
||||
In addition, a new file system of type "cgroup" may be mounted, to
|
||||
enable browsing and modifying the cgroups presently known to the
|
||||
kernel. When mounting a cgroup hierarchy, you may specify a
|
||||
comma-separated list of subsystems to mount as the filesystem mount
|
||||
|
@ -230,13 +231,13 @@ as the path relative to the root of the cgroup file system.
|
|||
Each cgroup is represented by a directory in the cgroup file system
|
||||
containing the following files describing that cgroup:
|
||||
|
||||
- tasks: list of tasks (by pid) attached to that cgroup. This list
|
||||
is not guaranteed to be sorted. Writing a thread id into this file
|
||||
- tasks: list of tasks (by PID) attached to that cgroup. This list
|
||||
is not guaranteed to be sorted. Writing a thread ID into this file
|
||||
moves the thread into this cgroup.
|
||||
- cgroup.procs: list of tgids in the cgroup. This list is not
|
||||
guaranteed to be sorted or free of duplicate tgids, and userspace
|
||||
- cgroup.procs: list of thread group IDs in the cgroup. This list is
|
||||
not guaranteed to be sorted or free of duplicate TGIDs, and userspace
|
||||
should sort/uniquify the list if this property is required.
|
||||
Writing a thread group id into this file moves all threads in that
|
||||
Writing a thread group ID into this file moves all threads in that
|
||||
group into this cgroup.
|
||||
- notify_on_release flag: run the release agent on exit?
|
||||
- release_agent: the path to use for release notifications (this file
|
||||
|
@ -261,7 +262,7 @@ cgroup file system directories.
|
|||
|
||||
When a task is moved from one cgroup to another, it gets a new
|
||||
css_set pointer - if there's an already existing css_set with the
|
||||
desired collection of cgroups then that group is reused, else a new
|
||||
desired collection of cgroups then that group is reused, otherwise a new
|
||||
css_set is allocated. The appropriate existing css_set is located by
|
||||
looking into a hash table.
|
||||
|
||||
|
@ -292,7 +293,7 @@ file system) of the abandoned cgroup. This enables automatic
|
|||
removal of abandoned cgroups. The default value of
|
||||
notify_on_release in the root cgroup at system boot is disabled
|
||||
(0). The default value of other cgroups at creation is the current
|
||||
value of their parents notify_on_release setting. The default value of
|
||||
value of their parents' notify_on_release settings. The default value of
|
||||
a cgroup hierarchy's release_agent path is empty.
|
||||
|
||||
1.5 What does clone_children do ?
|
||||
|
@ -316,7 +317,7 @@ the "cpuset" cgroup subsystem, the steps are something like:
|
|||
4) Create the new cgroup by doing mkdir's and write's (or echo's) in
|
||||
the /sys/fs/cgroup virtual file system.
|
||||
5) Start a task that will be the "founding father" of the new job.
|
||||
6) Attach that task to the new cgroup by writing its pid to the
|
||||
6) Attach that task to the new cgroup by writing its PID to the
|
||||
/sys/fs/cgroup/cpuset/tasks file for that cgroup.
|
||||
7) fork, exec or clone the job tasks from this founding father task.
|
||||
|
||||
|
@ -344,7 +345,7 @@ and then start a subshell 'sh' in that cgroup:
|
|||
2.1 Basic Usage
|
||||
---------------
|
||||
|
||||
Creating, modifying, using the cgroups can be done through the cgroup
|
||||
Creating, modifying, using cgroups can be done through the cgroup
|
||||
virtual filesystem.
|
||||
|
||||
To mount a cgroup hierarchy with all available subsystems, type:
|
||||
|
@ -441,7 +442,7 @@ You can attach the current shell task by echoing 0:
|
|||
# echo 0 > tasks
|
||||
|
||||
You can use the cgroup.procs file instead of the tasks file to move all
|
||||
threads in a threadgroup at once. Echoing the pid of any task in a
|
||||
threads in a threadgroup at once. Echoing the PID of any task in a
|
||||
threadgroup to cgroup.procs causes all tasks in that threadgroup to be
|
||||
be attached to the cgroup. Writing 0 to cgroup.procs moves all tasks
|
||||
in the writing task's threadgroup.
|
||||
|
@ -479,7 +480,7 @@ in /proc/mounts and /proc/<pid>/cgroups.
|
|||
There is mechanism which allows to get notifications about changing
|
||||
status of a cgroup.
|
||||
|
||||
To register new notification handler you need:
|
||||
To register a new notification handler you need to:
|
||||
- create a file descriptor for event notification using eventfd(2);
|
||||
- open a control file to be monitored (e.g. memory.usage_in_bytes);
|
||||
- write "<event_fd> <control_fd> <args>" to cgroup.event_control.
|
||||
|
@ -488,7 +489,7 @@ To register new notification handler you need:
|
|||
eventfd will be woken up by control file implementation or when the
|
||||
cgroup is removed.
|
||||
|
||||
To unregister notification handler just close eventfd.
|
||||
To unregister a notification handler just close eventfd.
|
||||
|
||||
NOTE: Support of notifications should be implemented for the control
|
||||
file. See documentation for the subsystem.
|
||||
|
@ -502,7 +503,7 @@ file. See documentation for the subsystem.
|
|||
Each kernel subsystem that wants to hook into the generic cgroup
|
||||
system needs to create a cgroup_subsys object. This contains
|
||||
various methods, which are callbacks from the cgroup system, along
|
||||
with a subsystem id which will be assigned by the cgroup system.
|
||||
with a subsystem ID which will be assigned by the cgroup system.
|
||||
|
||||
Other fields in the cgroup_subsys object include:
|
||||
|
||||
|
@ -516,7 +517,7 @@ Other fields in the cgroup_subsys object include:
|
|||
at system boot.
|
||||
|
||||
Each cgroup object created by the system has an array of pointers,
|
||||
indexed by subsystem id; this pointer is entirely managed by the
|
||||
indexed by subsystem ID; this pointer is entirely managed by the
|
||||
subsystem; the generic cgroup code will never touch this pointer.
|
||||
|
||||
3.2 Synchronization
|
||||
|
@ -639,7 +640,7 @@ void post_clone(struct cgroup *cgrp)
|
|||
|
||||
Called during cgroup_create() to do any parameter
|
||||
initialization which might be required before a task could attach. For
|
||||
example in cpusets, no task may attach before 'cpus' and 'mems' are set
|
||||
example, in cpusets, no task may attach before 'cpus' and 'mems' are set
|
||||
up.
|
||||
|
||||
void bind(struct cgroup *root)
|
||||
|
@ -650,7 +651,26 @@ and root cgroup. Currently this will only involve movement between
|
|||
the default hierarchy (which never has sub-cgroups) and a hierarchy
|
||||
that is being created/destroyed (and hence has no sub-cgroups).
|
||||
|
||||
4. Questions
|
||||
4. Extended attribute usage
|
||||
===========================
|
||||
|
||||
cgroup filesystem supports certain types of extended attributes in its
|
||||
directories and files. The current supported types are:
|
||||
- Trusted (XATTR_TRUSTED)
|
||||
- Security (XATTR_SECURITY)
|
||||
|
||||
Both require CAP_SYS_ADMIN capability to set.
|
||||
|
||||
Like in tmpfs, the extended attributes in cgroup filesystem are stored
|
||||
using kernel memory and it's advised to keep the usage at minimum. This
|
||||
is the reason why user defined extended attributes are not supported, since
|
||||
any user can do it and there's no limit in the value size.
|
||||
|
||||
The current known users for this feature are SELinux to limit cgroup usage
|
||||
in containers and systemd for assorted meta data like main PID in a cgroup
|
||||
(systemd creates a cgroup per service).
|
||||
|
||||
5. Questions
|
||||
============
|
||||
|
||||
Q: what's up with this '/bin/echo' ?
|
||||
|
@ -660,5 +680,5 @@ A: bash's builtin 'echo' command does not check calls to write() against
|
|||
|
||||
Q: When I attach processes, only the first of the line gets really attached !
|
||||
A: We can only return one error code per call to write(). So you should also
|
||||
put only ONE pid.
|
||||
put only ONE PID.
|
||||
|
||||
|
|
|
@ -68,6 +68,7 @@
|
|||
#include <net/netns/generic.h>
|
||||
#include <net/rtnetlink.h>
|
||||
#include <net/sock.h>
|
||||
#include <net/cls_cgroup.h>
|
||||
|
||||
#include <asm/uaccess.h>
|
||||
|
||||
|
|
180
fs/xattr.c
180
fs/xattr.c
|
@ -791,3 +791,183 @@ EXPORT_SYMBOL(generic_getxattr);
|
|||
EXPORT_SYMBOL(generic_listxattr);
|
||||
EXPORT_SYMBOL(generic_setxattr);
|
||||
EXPORT_SYMBOL(generic_removexattr);
|
||||
|
||||
/*
|
||||
* Allocate new xattr and copy in the value; but leave the name to callers.
|
||||
*/
|
||||
struct simple_xattr *simple_xattr_alloc(const void *value, size_t size)
|
||||
{
|
||||
struct simple_xattr *new_xattr;
|
||||
size_t len;
|
||||
|
||||
/* wrap around? */
|
||||
len = sizeof(*new_xattr) + size;
|
||||
if (len <= sizeof(*new_xattr))
|
||||
return NULL;
|
||||
|
||||
new_xattr = kmalloc(len, GFP_KERNEL);
|
||||
if (!new_xattr)
|
||||
return NULL;
|
||||
|
||||
new_xattr->size = size;
|
||||
memcpy(new_xattr->value, value, size);
|
||||
return new_xattr;
|
||||
}
|
||||
|
||||
/*
|
||||
* xattr GET operation for in-memory/pseudo filesystems
|
||||
*/
|
||||
int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
|
||||
void *buffer, size_t size)
|
||||
{
|
||||
struct simple_xattr *xattr;
|
||||
int ret = -ENODATA;
|
||||
|
||||
spin_lock(&xattrs->lock);
|
||||
list_for_each_entry(xattr, &xattrs->head, list) {
|
||||
if (strcmp(name, xattr->name))
|
||||
continue;
|
||||
|
||||
ret = xattr->size;
|
||||
if (buffer) {
|
||||
if (size < xattr->size)
|
||||
ret = -ERANGE;
|
||||
else
|
||||
memcpy(buffer, xattr->value, xattr->size);
|
||||
}
|
||||
break;
|
||||
}
|
||||
spin_unlock(&xattrs->lock);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int __simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
|
||||
const void *value, size_t size, int flags)
|
||||
{
|
||||
struct simple_xattr *xattr;
|
||||
struct simple_xattr *uninitialized_var(new_xattr);
|
||||
int err = 0;
|
||||
|
||||
/* value == NULL means remove */
|
||||
if (value) {
|
||||
new_xattr = simple_xattr_alloc(value, size);
|
||||
if (!new_xattr)
|
||||
return -ENOMEM;
|
||||
|
||||
new_xattr->name = kstrdup(name, GFP_KERNEL);
|
||||
if (!new_xattr->name) {
|
||||
kfree(new_xattr);
|
||||
return -ENOMEM;
|
||||
}
|
||||
}
|
||||
|
||||
spin_lock(&xattrs->lock);
|
||||
list_for_each_entry(xattr, &xattrs->head, list) {
|
||||
if (!strcmp(name, xattr->name)) {
|
||||
if (flags & XATTR_CREATE) {
|
||||
xattr = new_xattr;
|
||||
err = -EEXIST;
|
||||
} else if (new_xattr) {
|
||||
list_replace(&xattr->list, &new_xattr->list);
|
||||
} else {
|
||||
list_del(&xattr->list);
|
||||
}
|
||||
goto out;
|
||||
}
|
||||
}
|
||||
if (flags & XATTR_REPLACE) {
|
||||
xattr = new_xattr;
|
||||
err = -ENODATA;
|
||||
} else {
|
||||
list_add(&new_xattr->list, &xattrs->head);
|
||||
xattr = NULL;
|
||||
}
|
||||
out:
|
||||
spin_unlock(&xattrs->lock);
|
||||
if (xattr) {
|
||||
kfree(xattr->name);
|
||||
kfree(xattr);
|
||||
}
|
||||
return err;
|
||||
|
||||
}
|
||||
|
||||
/**
|
||||
* simple_xattr_set - xattr SET operation for in-memory/pseudo filesystems
|
||||
* @xattrs: target simple_xattr list
|
||||
* @name: name of the new extended attribute
|
||||
* @value: value of the new xattr. If %NULL, will remove the attribute
|
||||
* @size: size of the new xattr
|
||||
* @flags: %XATTR_{CREATE|REPLACE}
|
||||
*
|
||||
* %XATTR_CREATE is set, the xattr shouldn't exist already; otherwise fails
|
||||
* with -EEXIST. If %XATTR_REPLACE is set, the xattr should exist;
|
||||
* otherwise, fails with -ENODATA.
|
||||
*
|
||||
* Returns 0 on success, -errno on failure.
|
||||
*/
|
||||
int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
|
||||
const void *value, size_t size, int flags)
|
||||
{
|
||||
if (size == 0)
|
||||
value = ""; /* empty EA, do not remove */
|
||||
return __simple_xattr_set(xattrs, name, value, size, flags);
|
||||
}
|
||||
|
||||
/*
|
||||
* xattr REMOVE operation for in-memory/pseudo filesystems
|
||||
*/
|
||||
int simple_xattr_remove(struct simple_xattrs *xattrs, const char *name)
|
||||
{
|
||||
return __simple_xattr_set(xattrs, name, NULL, 0, XATTR_REPLACE);
|
||||
}
|
||||
|
||||
static bool xattr_is_trusted(const char *name)
|
||||
{
|
||||
return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
|
||||
}
|
||||
|
||||
/*
|
||||
* xattr LIST operation for in-memory/pseudo filesystems
|
||||
*/
|
||||
ssize_t simple_xattr_list(struct simple_xattrs *xattrs, char *buffer,
|
||||
size_t size)
|
||||
{
|
||||
bool trusted = capable(CAP_SYS_ADMIN);
|
||||
struct simple_xattr *xattr;
|
||||
size_t used = 0;
|
||||
|
||||
spin_lock(&xattrs->lock);
|
||||
list_for_each_entry(xattr, &xattrs->head, list) {
|
||||
size_t len;
|
||||
|
||||
/* skip "trusted." attributes for unprivileged callers */
|
||||
if (!trusted && xattr_is_trusted(xattr->name))
|
||||
continue;
|
||||
|
||||
len = strlen(xattr->name) + 1;
|
||||
used += len;
|
||||
if (buffer) {
|
||||
if (size < used) {
|
||||
used = -ERANGE;
|
||||
break;
|
||||
}
|
||||
memcpy(buffer, xattr->name, len);
|
||||
buffer += len;
|
||||
}
|
||||
}
|
||||
spin_unlock(&xattrs->lock);
|
||||
|
||||
return used;
|
||||
}
|
||||
|
||||
/*
|
||||
* Adds an extended attribute to the list
|
||||
*/
|
||||
void simple_xattr_list_add(struct simple_xattrs *xattrs,
|
||||
struct simple_xattr *new_xattr)
|
||||
{
|
||||
spin_lock(&xattrs->lock);
|
||||
list_add(&new_xattr->list, &xattrs->head);
|
||||
spin_unlock(&xattrs->lock);
|
||||
}
|
||||
|
|
|
@ -17,6 +17,7 @@
|
|||
#include <linux/rwsem.h>
|
||||
#include <linux/idr.h>
|
||||
#include <linux/workqueue.h>
|
||||
#include <linux/xattr.h>
|
||||
|
||||
#ifdef CONFIG_CGROUPS
|
||||
|
||||
|
@ -45,17 +46,13 @@ extern const struct file_operations proc_cgroup_operations;
|
|||
|
||||
/* Define the enumeration of all builtin cgroup subsystems */
|
||||
#define SUBSYS(_x) _x ## _subsys_id,
|
||||
#define IS_SUBSYS_ENABLED(option) IS_ENABLED(option)
|
||||
enum cgroup_subsys_id {
|
||||
#include <linux/cgroup_subsys.h>
|
||||
CGROUP_BUILTIN_SUBSYS_COUNT
|
||||
CGROUP_SUBSYS_COUNT,
|
||||
};
|
||||
#undef IS_SUBSYS_ENABLED
|
||||
#undef SUBSYS
|
||||
/*
|
||||
* This define indicates the maximum number of subsystems that can be loaded
|
||||
* at once. We limit to this many since cgroupfs_root has subsys_bits to keep
|
||||
* track of all of them.
|
||||
*/
|
||||
#define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
|
||||
|
||||
/* Per-subsystem/per-cgroup state maintained by the system. */
|
||||
struct cgroup_subsys_state {
|
||||
|
@ -216,6 +213,9 @@ struct cgroup {
|
|||
/* List of events which userspace want to receive */
|
||||
struct list_head event_list;
|
||||
spinlock_t event_list_lock;
|
||||
|
||||
/* directory xattrs */
|
||||
struct simple_xattrs xattrs;
|
||||
};
|
||||
|
||||
/*
|
||||
|
@ -309,6 +309,9 @@ struct cftype {
|
|||
/* CFTYPE_* flags */
|
||||
unsigned int flags;
|
||||
|
||||
/* file xattrs */
|
||||
struct simple_xattrs xattrs;
|
||||
|
||||
int (*open)(struct inode *inode, struct file *file);
|
||||
ssize_t (*read)(struct cgroup *cgrp, struct cftype *cft,
|
||||
struct file *file,
|
||||
|
@ -394,7 +397,7 @@ struct cftype {
|
|||
*/
|
||||
struct cftype_set {
|
||||
struct list_head node; /* chained at subsys->cftsets */
|
||||
const struct cftype *cfts;
|
||||
struct cftype *cfts;
|
||||
};
|
||||
|
||||
struct cgroup_scanner {
|
||||
|
@ -406,8 +409,8 @@ struct cgroup_scanner {
|
|||
void *data;
|
||||
};
|
||||
|
||||
int cgroup_add_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts);
|
||||
int cgroup_rm_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts);
|
||||
int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
|
||||
int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
|
||||
|
||||
int cgroup_is_removed(const struct cgroup *cgrp);
|
||||
|
||||
|
@ -521,7 +524,9 @@ struct cgroup_subsys {
|
|||
};
|
||||
|
||||
#define SUBSYS(_x) extern struct cgroup_subsys _x ## _subsys;
|
||||
#define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option)
|
||||
#include <linux/cgroup_subsys.h>
|
||||
#undef IS_SUBSYS_ENABLED
|
||||
#undef SUBSYS
|
||||
|
||||
static inline struct cgroup_subsys_state *cgroup_subsys_state(
|
||||
|
|
|
@ -7,73 +7,73 @@
|
|||
|
||||
/* */
|
||||
|
||||
#ifdef CONFIG_CPUSETS
|
||||
#if IS_SUBSYS_ENABLED(CONFIG_CPUSETS)
|
||||
SUBSYS(cpuset)
|
||||
#endif
|
||||
|
||||
/* */
|
||||
|
||||
#ifdef CONFIG_CGROUP_DEBUG
|
||||
#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_DEBUG)
|
||||
SUBSYS(debug)
|
||||
#endif
|
||||
|
||||
/* */
|
||||
|
||||
#ifdef CONFIG_CGROUP_SCHED
|
||||
#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_SCHED)
|
||||
SUBSYS(cpu_cgroup)
|
||||
#endif
|
||||
|
||||
/* */
|
||||
|
||||
#ifdef CONFIG_CGROUP_CPUACCT
|
||||
#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_CPUACCT)
|
||||
SUBSYS(cpuacct)
|
||||
#endif
|
||||
|
||||
/* */
|
||||
|
||||
#ifdef CONFIG_MEMCG
|
||||
#if IS_SUBSYS_ENABLED(CONFIG_MEMCG)
|
||||
SUBSYS(mem_cgroup)
|
||||
#endif
|
||||
|
||||
/* */
|
||||
|
||||
#ifdef CONFIG_CGROUP_DEVICE
|
||||
#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_DEVICE)
|
||||
SUBSYS(devices)
|
||||
#endif
|
||||
|
||||
/* */
|
||||
|
||||
#ifdef CONFIG_CGROUP_FREEZER
|
||||
#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_FREEZER)
|
||||
SUBSYS(freezer)
|
||||
#endif
|
||||
|
||||
/* */
|
||||
|
||||
#ifdef CONFIG_NET_CLS_CGROUP
|
||||
#if IS_SUBSYS_ENABLED(CONFIG_NET_CLS_CGROUP)
|
||||
SUBSYS(net_cls)
|
||||
#endif
|
||||
|
||||
/* */
|
||||
|
||||
#ifdef CONFIG_BLK_CGROUP
|
||||
#if IS_SUBSYS_ENABLED(CONFIG_BLK_CGROUP)
|
||||
SUBSYS(blkio)
|
||||
#endif
|
||||
|
||||
/* */
|
||||
|
||||
#ifdef CONFIG_CGROUP_PERF
|
||||
#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_PERF)
|
||||
SUBSYS(perf)
|
||||
#endif
|
||||
|
||||
/* */
|
||||
|
||||
#ifdef CONFIG_NETPRIO_CGROUP
|
||||
#if IS_SUBSYS_ENABLED(CONFIG_NETPRIO_CGROUP)
|
||||
SUBSYS(net_prio)
|
||||
#endif
|
||||
|
||||
/* */
|
||||
|
||||
#ifdef CONFIG_CGROUP_HUGETLB
|
||||
#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_HUGETLB)
|
||||
SUBSYS(hugetlb)
|
||||
#endif
|
||||
|
||||
|
|
|
@ -5,6 +5,7 @@
|
|||
#include <linux/mempolicy.h>
|
||||
#include <linux/pagemap.h>
|
||||
#include <linux/percpu_counter.h>
|
||||
#include <linux/xattr.h>
|
||||
|
||||
/* inode in-kernel data */
|
||||
|
||||
|
@ -18,7 +19,7 @@ struct shmem_inode_info {
|
|||
};
|
||||
struct shared_policy policy; /* NUMA memory alloc policy */
|
||||
struct list_head swaplist; /* chain of maybes on swap */
|
||||
struct list_head xattr_list; /* list of shmem_xattr */
|
||||
struct simple_xattrs xattrs; /* list of xattrs */
|
||||
struct inode vfs_inode;
|
||||
};
|
||||
|
||||
|
|
|
@ -59,7 +59,9 @@
|
|||
|
||||
#ifdef __KERNEL__
|
||||
|
||||
#include <linux/slab.h>
|
||||
#include <linux/types.h>
|
||||
#include <linux/spinlock.h>
|
||||
|
||||
struct inode;
|
||||
struct dentry;
|
||||
|
@ -96,6 +98,52 @@ ssize_t vfs_getxattr_alloc(struct dentry *dentry, const char *name,
|
|||
char **xattr_value, size_t size, gfp_t flags);
|
||||
int vfs_xattr_cmp(struct dentry *dentry, const char *xattr_name,
|
||||
const char *value, size_t size, gfp_t flags);
|
||||
|
||||
struct simple_xattrs {
|
||||
struct list_head head;
|
||||
spinlock_t lock;
|
||||
};
|
||||
|
||||
struct simple_xattr {
|
||||
struct list_head list;
|
||||
char *name;
|
||||
size_t size;
|
||||
char value[0];
|
||||
};
|
||||
|
||||
/*
|
||||
* initialize the simple_xattrs structure
|
||||
*/
|
||||
static inline void simple_xattrs_init(struct simple_xattrs *xattrs)
|
||||
{
|
||||
INIT_LIST_HEAD(&xattrs->head);
|
||||
spin_lock_init(&xattrs->lock);
|
||||
}
|
||||
|
||||
/*
|
||||
* free all the xattrs
|
||||
*/
|
||||
static inline void simple_xattrs_free(struct simple_xattrs *xattrs)
|
||||
{
|
||||
struct simple_xattr *xattr, *node;
|
||||
|
||||
list_for_each_entry_safe(xattr, node, &xattrs->head, list) {
|
||||
kfree(xattr->name);
|
||||
kfree(xattr);
|
||||
}
|
||||
}
|
||||
|
||||
struct simple_xattr *simple_xattr_alloc(const void *value, size_t size);
|
||||
int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
|
||||
void *buffer, size_t size);
|
||||
int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
|
||||
const void *value, size_t size, int flags);
|
||||
int simple_xattr_remove(struct simple_xattrs *xattrs, const char *name);
|
||||
ssize_t simple_xattr_list(struct simple_xattrs *xattrs, char *buffer,
|
||||
size_t size);
|
||||
void simple_xattr_list_add(struct simple_xattrs *xattrs,
|
||||
struct simple_xattr *new_xattr);
|
||||
|
||||
#endif /* __KERNEL__ */
|
||||
|
||||
#endif /* _LINUX_XATTR_H */
|
||||
|
|
|
@ -17,14 +17,16 @@
|
|||
#include <linux/hardirq.h>
|
||||
#include <linux/rcupdate.h>
|
||||
|
||||
#ifdef CONFIG_CGROUPS
|
||||
#if IS_ENABLED(CONFIG_NET_CLS_CGROUP)
|
||||
struct cgroup_cls_state
|
||||
{
|
||||
struct cgroup_subsys_state css;
|
||||
u32 classid;
|
||||
};
|
||||
|
||||
#ifdef CONFIG_NET_CLS_CGROUP
|
||||
extern void sock_update_classid(struct sock *sk);
|
||||
|
||||
#if IS_BUILTIN(CONFIG_NET_CLS_CGROUP)
|
||||
static inline u32 task_cls_classid(struct task_struct *p)
|
||||
{
|
||||
int classid;
|
||||
|
@ -39,32 +41,33 @@ static inline u32 task_cls_classid(struct task_struct *p)
|
|||
|
||||
return classid;
|
||||
}
|
||||
#else
|
||||
extern int net_cls_subsys_id;
|
||||
|
||||
#elif IS_MODULE(CONFIG_NET_CLS_CGROUP)
|
||||
static inline u32 task_cls_classid(struct task_struct *p)
|
||||
{
|
||||
int id;
|
||||
struct cgroup_subsys_state *css;
|
||||
u32 classid = 0;
|
||||
|
||||
if (in_interrupt())
|
||||
return 0;
|
||||
|
||||
rcu_read_lock();
|
||||
id = rcu_dereference_index_check(net_cls_subsys_id,
|
||||
rcu_read_lock_held());
|
||||
if (id >= 0)
|
||||
classid = container_of(task_subsys_state(p, id),
|
||||
css = task_subsys_state(p, net_cls_subsys_id);
|
||||
if (css)
|
||||
classid = container_of(css,
|
||||
struct cgroup_cls_state, css)->classid;
|
||||
rcu_read_unlock();
|
||||
|
||||
return classid;
|
||||
}
|
||||
#endif
|
||||
#else
|
||||
#else /* !CGROUP_NET_CLS_CGROUP */
|
||||
static inline void sock_update_classid(struct sock *sk)
|
||||
{
|
||||
}
|
||||
|
||||
static inline u32 task_cls_classid(struct task_struct *p)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
#endif
|
||||
#endif /* CGROUP_NET_CLS_CGROUP */
|
||||
#endif /* _NET_CLS_CGROUP_H */
|
||||
|
|
|
@ -18,23 +18,18 @@
|
|||
#include <linux/rcupdate.h>
|
||||
|
||||
|
||||
#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
|
||||
struct netprio_map {
|
||||
struct rcu_head rcu;
|
||||
u32 priomap_len;
|
||||
u32 priomap[];
|
||||
};
|
||||
|
||||
#ifdef CONFIG_CGROUPS
|
||||
|
||||
struct cgroup_netprio_state {
|
||||
struct cgroup_subsys_state css;
|
||||
u32 prioidx;
|
||||
};
|
||||
|
||||
#ifndef CONFIG_NETPRIO_CGROUP
|
||||
extern int net_prio_subsys_id;
|
||||
#endif
|
||||
|
||||
extern void sock_update_netprioidx(struct sock *sk, struct task_struct *task);
|
||||
|
||||
#if IS_BUILTIN(CONFIG_NETPRIO_CGROUP)
|
||||
|
@ -56,33 +51,28 @@ static inline u32 task_netprioidx(struct task_struct *p)
|
|||
|
||||
static inline u32 task_netprioidx(struct task_struct *p)
|
||||
{
|
||||
struct cgroup_netprio_state *state;
|
||||
int subsys_id;
|
||||
struct cgroup_subsys_state *css;
|
||||
u32 idx = 0;
|
||||
|
||||
rcu_read_lock();
|
||||
subsys_id = rcu_dereference_index_check(net_prio_subsys_id,
|
||||
rcu_read_lock_held());
|
||||
if (subsys_id >= 0) {
|
||||
state = container_of(task_subsys_state(p, subsys_id),
|
||||
struct cgroup_netprio_state, css);
|
||||
idx = state->prioidx;
|
||||
}
|
||||
css = task_subsys_state(p, net_prio_subsys_id);
|
||||
if (css)
|
||||
idx = container_of(css,
|
||||
struct cgroup_netprio_state, css)->prioidx;
|
||||
rcu_read_unlock();
|
||||
return idx;
|
||||
}
|
||||
#endif
|
||||
|
||||
#else
|
||||
#else /* !CONFIG_NETPRIO_CGROUP */
|
||||
|
||||
static inline u32 task_netprioidx(struct task_struct *p)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
#define sock_update_netprioidx(sk, task)
|
||||
|
||||
#endif /* CONFIG_NETPRIO_CGROUP */
|
||||
|
||||
#else
|
||||
#define sock_update_netprioidx(sk, task)
|
||||
#endif
|
||||
|
||||
#endif /* _NET_CLS_CGROUP_H */
|
||||
|
|
|
@ -1486,14 +1486,6 @@ extern void *sock_kmalloc(struct sock *sk, int size,
|
|||
extern void sock_kfree_s(struct sock *sk, void *mem, int size);
|
||||
extern void sk_send_sigurg(struct sock *sk);
|
||||
|
||||
#ifdef CONFIG_CGROUPS
|
||||
extern void sock_update_classid(struct sock *sk);
|
||||
#else
|
||||
static inline void sock_update_classid(struct sock *sk)
|
||||
{
|
||||
}
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Functions to fill in entries in struct proto_ops when a protocol
|
||||
* does not implement a particular function.
|
||||
|
|
320
kernel/cgroup.c
320
kernel/cgroup.c
|
@ -88,11 +88,12 @@ static DEFINE_MUTEX(cgroup_root_mutex);
|
|||
|
||||
/*
|
||||
* Generate an array of cgroup subsystem pointers. At boot time, this is
|
||||
* populated up to CGROUP_BUILTIN_SUBSYS_COUNT, and modular subsystems are
|
||||
* populated with the built in subsystems, and modular subsystems are
|
||||
* registered after that. The mutable section of this array is protected by
|
||||
* cgroup_mutex.
|
||||
*/
|
||||
#define SUBSYS(_x) &_x ## _subsys,
|
||||
#define SUBSYS(_x) [_x ## _subsys_id] = &_x ## _subsys,
|
||||
#define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option)
|
||||
static struct cgroup_subsys *subsys[CGROUP_SUBSYS_COUNT] = {
|
||||
#include <linux/cgroup_subsys.h>
|
||||
};
|
||||
|
@ -111,13 +112,13 @@ struct cgroupfs_root {
|
|||
* The bitmask of subsystems intended to be attached to this
|
||||
* hierarchy
|
||||
*/
|
||||
unsigned long subsys_bits;
|
||||
unsigned long subsys_mask;
|
||||
|
||||
/* Unique id for this hierarchy. */
|
||||
int hierarchy_id;
|
||||
|
||||
/* The bitmask of subsystems currently attached to this hierarchy */
|
||||
unsigned long actual_subsys_bits;
|
||||
unsigned long actual_subsys_mask;
|
||||
|
||||
/* A list running through the attached subsystems */
|
||||
struct list_head subsys_list;
|
||||
|
@ -276,7 +277,8 @@ inline int cgroup_is_removed(const struct cgroup *cgrp)
|
|||
|
||||
/* bits in struct cgroupfs_root flags field */
|
||||
enum {
|
||||
ROOT_NOPREFIX, /* mounted subsystems have no named prefix */
|
||||
ROOT_NOPREFIX, /* mounted subsystems have no named prefix */
|
||||
ROOT_XATTR, /* supports extended attributes */
|
||||
};
|
||||
|
||||
static int cgroup_is_releasable(const struct cgroup *cgrp)
|
||||
|
@ -556,7 +558,7 @@ static struct css_set *find_existing_css_set(
|
|||
* won't change, so no need for locking.
|
||||
*/
|
||||
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
|
||||
if (root->subsys_bits & (1UL << i)) {
|
||||
if (root->subsys_mask & (1UL << i)) {
|
||||
/* Subsystem is in this hierarchy. So we want
|
||||
* the subsystem state from the new
|
||||
* cgroup */
|
||||
|
@ -824,7 +826,8 @@ EXPORT_SYMBOL_GPL(cgroup_unlock);
|
|||
static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode);
|
||||
static struct dentry *cgroup_lookup(struct inode *, struct dentry *, unsigned int);
|
||||
static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry);
|
||||
static int cgroup_populate_dir(struct cgroup *cgrp);
|
||||
static int cgroup_populate_dir(struct cgroup *cgrp, bool base_files,
|
||||
unsigned long subsys_mask);
|
||||
static const struct inode_operations cgroup_dir_inode_operations;
|
||||
static const struct file_operations proc_cgroupstats_operations;
|
||||
|
||||
|
@ -912,15 +915,19 @@ static void cgroup_diput(struct dentry *dentry, struct inode *inode)
|
|||
*/
|
||||
BUG_ON(!list_empty(&cgrp->pidlists));
|
||||
|
||||
simple_xattrs_free(&cgrp->xattrs);
|
||||
|
||||
kfree_rcu(cgrp, rcu_head);
|
||||
} else {
|
||||
struct cfent *cfe = __d_cfe(dentry);
|
||||
struct cgroup *cgrp = dentry->d_parent->d_fsdata;
|
||||
struct cftype *cft = cfe->type;
|
||||
|
||||
WARN_ONCE(!list_empty(&cfe->node) &&
|
||||
cgrp != &cgrp->root->top_cgroup,
|
||||
"cfe still linked for %s\n", cfe->type->name);
|
||||
kfree(cfe);
|
||||
simple_xattrs_free(&cft->xattrs);
|
||||
}
|
||||
iput(inode);
|
||||
}
|
||||
|
@ -963,12 +970,29 @@ static int cgroup_rm_file(struct cgroup *cgrp, const struct cftype *cft)
|
|||
return -ENOENT;
|
||||
}
|
||||
|
||||
static void cgroup_clear_directory(struct dentry *dir)
|
||||
/**
|
||||
* cgroup_clear_directory - selective removal of base and subsystem files
|
||||
* @dir: directory containing the files
|
||||
* @base_files: true if the base files should be removed
|
||||
* @subsys_mask: mask of the subsystem ids whose files should be removed
|
||||
*/
|
||||
static void cgroup_clear_directory(struct dentry *dir, bool base_files,
|
||||
unsigned long subsys_mask)
|
||||
{
|
||||
struct cgroup *cgrp = __d_cgrp(dir);
|
||||
struct cgroup_subsys *ss;
|
||||
|
||||
while (!list_empty(&cgrp->files))
|
||||
cgroup_rm_file(cgrp, NULL);
|
||||
for_each_subsys(cgrp->root, ss) {
|
||||
struct cftype_set *set;
|
||||
if (!test_bit(ss->subsys_id, &subsys_mask))
|
||||
continue;
|
||||
list_for_each_entry(set, &ss->cftsets, node)
|
||||
cgroup_rm_file(cgrp, set->cfts);
|
||||
}
|
||||
if (base_files) {
|
||||
while (!list_empty(&cgrp->files))
|
||||
cgroup_rm_file(cgrp, NULL);
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -977,8 +1001,9 @@ static void cgroup_clear_directory(struct dentry *dir)
|
|||
static void cgroup_d_remove_dir(struct dentry *dentry)
|
||||
{
|
||||
struct dentry *parent;
|
||||
struct cgroupfs_root *root = dentry->d_sb->s_fs_info;
|
||||
|
||||
cgroup_clear_directory(dentry);
|
||||
cgroup_clear_directory(dentry, true, root->subsys_mask);
|
||||
|
||||
parent = dentry->d_parent;
|
||||
spin_lock(&parent->d_lock);
|
||||
|
@ -1022,22 +1047,22 @@ void cgroup_release_and_wakeup_rmdir(struct cgroup_subsys_state *css)
|
|||
* returns an error, no reference counts are touched.
|
||||
*/
|
||||
static int rebind_subsystems(struct cgroupfs_root *root,
|
||||
unsigned long final_bits)
|
||||
unsigned long final_subsys_mask)
|
||||
{
|
||||
unsigned long added_bits, removed_bits;
|
||||
unsigned long added_mask, removed_mask;
|
||||
struct cgroup *cgrp = &root->top_cgroup;
|
||||
int i;
|
||||
|
||||
BUG_ON(!mutex_is_locked(&cgroup_mutex));
|
||||
BUG_ON(!mutex_is_locked(&cgroup_root_mutex));
|
||||
|
||||
removed_bits = root->actual_subsys_bits & ~final_bits;
|
||||
added_bits = final_bits & ~root->actual_subsys_bits;
|
||||
removed_mask = root->actual_subsys_mask & ~final_subsys_mask;
|
||||
added_mask = final_subsys_mask & ~root->actual_subsys_mask;
|
||||
/* Check that any added subsystems are currently free */
|
||||
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
|
||||
unsigned long bit = 1UL << i;
|
||||
struct cgroup_subsys *ss = subsys[i];
|
||||
if (!(bit & added_bits))
|
||||
if (!(bit & added_mask))
|
||||
continue;
|
||||
/*
|
||||
* Nobody should tell us to do a subsys that doesn't exist:
|
||||
|
@ -1062,7 +1087,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
|
|||
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
|
||||
struct cgroup_subsys *ss = subsys[i];
|
||||
unsigned long bit = 1UL << i;
|
||||
if (bit & added_bits) {
|
||||
if (bit & added_mask) {
|
||||
/* We're binding this subsystem to this hierarchy */
|
||||
BUG_ON(ss == NULL);
|
||||
BUG_ON(cgrp->subsys[i]);
|
||||
|
@ -1075,7 +1100,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
|
|||
if (ss->bind)
|
||||
ss->bind(cgrp);
|
||||
/* refcount was already taken, and we're keeping it */
|
||||
} else if (bit & removed_bits) {
|
||||
} else if (bit & removed_mask) {
|
||||
/* We're removing this subsystem */
|
||||
BUG_ON(ss == NULL);
|
||||
BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
|
||||
|
@ -1088,7 +1113,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
|
|||
list_move(&ss->sibling, &rootnode.subsys_list);
|
||||
/* subsystem is now free - drop reference on module */
|
||||
module_put(ss->module);
|
||||
} else if (bit & final_bits) {
|
||||
} else if (bit & final_subsys_mask) {
|
||||
/* Subsystem state should already exist */
|
||||
BUG_ON(ss == NULL);
|
||||
BUG_ON(!cgrp->subsys[i]);
|
||||
|
@ -1105,7 +1130,7 @@ static int rebind_subsystems(struct cgroupfs_root *root,
|
|||
BUG_ON(cgrp->subsys[i]);
|
||||
}
|
||||
}
|
||||
root->subsys_bits = root->actual_subsys_bits = final_bits;
|
||||
root->subsys_mask = root->actual_subsys_mask = final_subsys_mask;
|
||||
synchronize_rcu();
|
||||
|
||||
return 0;
|
||||
|
@ -1121,6 +1146,8 @@ static int cgroup_show_options(struct seq_file *seq, struct dentry *dentry)
|
|||
seq_printf(seq, ",%s", ss->name);
|
||||
if (test_bit(ROOT_NOPREFIX, &root->flags))
|
||||
seq_puts(seq, ",noprefix");
|
||||
if (test_bit(ROOT_XATTR, &root->flags))
|
||||
seq_puts(seq, ",xattr");
|
||||
if (strlen(root->release_agent_path))
|
||||
seq_printf(seq, ",release_agent=%s", root->release_agent_path);
|
||||
if (clone_children(&root->top_cgroup))
|
||||
|
@ -1132,7 +1159,7 @@ static int cgroup_show_options(struct seq_file *seq, struct dentry *dentry)
|
|||
}
|
||||
|
||||
struct cgroup_sb_opts {
|
||||
unsigned long subsys_bits;
|
||||
unsigned long subsys_mask;
|
||||
unsigned long flags;
|
||||
char *release_agent;
|
||||
bool clone_children;
|
||||
|
@ -1189,6 +1216,10 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
|
|||
opts->clone_children = true;
|
||||
continue;
|
||||
}
|
||||
if (!strcmp(token, "xattr")) {
|
||||
set_bit(ROOT_XATTR, &opts->flags);
|
||||
continue;
|
||||
}
|
||||
if (!strncmp(token, "release_agent=", 14)) {
|
||||
/* Specifying two release agents is forbidden */
|
||||
if (opts->release_agent)
|
||||
|
@ -1237,7 +1268,7 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
|
|||
/* Mutually exclusive option 'all' + subsystem name */
|
||||
if (all_ss)
|
||||
return -EINVAL;
|
||||
set_bit(i, &opts->subsys_bits);
|
||||
set_bit(i, &opts->subsys_mask);
|
||||
one_ss = true;
|
||||
|
||||
break;
|
||||
|
@ -1258,7 +1289,7 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
|
|||
continue;
|
||||
if (ss->disabled)
|
||||
continue;
|
||||
set_bit(i, &opts->subsys_bits);
|
||||
set_bit(i, &opts->subsys_mask);
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -1270,19 +1301,19 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
|
|||
* the cpuset subsystem.
|
||||
*/
|
||||
if (test_bit(ROOT_NOPREFIX, &opts->flags) &&
|
||||
(opts->subsys_bits & mask))
|
||||
(opts->subsys_mask & mask))
|
||||
return -EINVAL;
|
||||
|
||||
|
||||
/* Can't specify "none" and some subsystems */
|
||||
if (opts->subsys_bits && opts->none)
|
||||
if (opts->subsys_mask && opts->none)
|
||||
return -EINVAL;
|
||||
|
||||
/*
|
||||
* We either have to specify by name or by subsystems. (So all
|
||||
* empty hierarchies must have a name).
|
||||
*/
|
||||
if (!opts->subsys_bits && !opts->name)
|
||||
if (!opts->subsys_mask && !opts->name)
|
||||
return -EINVAL;
|
||||
|
||||
/*
|
||||
|
@ -1291,10 +1322,10 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
|
|||
* take duplicate reference counts on a subsystem that's already used,
|
||||
* but rebind_subsystems handles this case.
|
||||
*/
|
||||
for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
|
||||
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
|
||||
unsigned long bit = 1UL << i;
|
||||
|
||||
if (!(bit & opts->subsys_bits))
|
||||
if (!(bit & opts->subsys_mask))
|
||||
continue;
|
||||
if (!try_module_get(subsys[i]->module)) {
|
||||
module_pin_failed = true;
|
||||
|
@ -1307,11 +1338,11 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
|
|||
* raced with a module_delete call, and to the user this is
|
||||
* essentially a "subsystem doesn't exist" case.
|
||||
*/
|
||||
for (i--; i >= CGROUP_BUILTIN_SUBSYS_COUNT; i--) {
|
||||
for (i--; i >= 0; i--) {
|
||||
/* drop refcounts only on the ones we took */
|
||||
unsigned long bit = 1UL << i;
|
||||
|
||||
if (!(bit & opts->subsys_bits))
|
||||
if (!(bit & opts->subsys_mask))
|
||||
continue;
|
||||
module_put(subsys[i]->module);
|
||||
}
|
||||
|
@ -1321,13 +1352,13 @@ static int parse_cgroupfs_options(char *data, struct cgroup_sb_opts *opts)
|
|||
return 0;
|
||||
}
|
||||
|
||||
static void drop_parsed_module_refcounts(unsigned long subsys_bits)
|
||||
static void drop_parsed_module_refcounts(unsigned long subsys_mask)
|
||||
{
|
||||
int i;
|
||||
for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
|
||||
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
|
||||
unsigned long bit = 1UL << i;
|
||||
|
||||
if (!(bit & subsys_bits))
|
||||
if (!(bit & subsys_mask))
|
||||
continue;
|
||||
module_put(subsys[i]->module);
|
||||
}
|
||||
|
@ -1339,6 +1370,7 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
|
|||
struct cgroupfs_root *root = sb->s_fs_info;
|
||||
struct cgroup *cgrp = &root->top_cgroup;
|
||||
struct cgroup_sb_opts opts;
|
||||
unsigned long added_mask, removed_mask;
|
||||
|
||||
mutex_lock(&cgrp->dentry->d_inode->i_mutex);
|
||||
mutex_lock(&cgroup_mutex);
|
||||
|
@ -1350,27 +1382,31 @@ static int cgroup_remount(struct super_block *sb, int *flags, char *data)
|
|||
goto out_unlock;
|
||||
|
||||
/* See feature-removal-schedule.txt */
|
||||
if (opts.subsys_bits != root->actual_subsys_bits || opts.release_agent)
|
||||
if (opts.subsys_mask != root->actual_subsys_mask || opts.release_agent)
|
||||
pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n",
|
||||
task_tgid_nr(current), current->comm);
|
||||
|
||||
added_mask = opts.subsys_mask & ~root->subsys_mask;
|
||||
removed_mask = root->subsys_mask & ~opts.subsys_mask;
|
||||
|
||||
/* Don't allow flags or name to change at remount */
|
||||
if (opts.flags != root->flags ||
|
||||
(opts.name && strcmp(opts.name, root->name))) {
|
||||
ret = -EINVAL;
|
||||
drop_parsed_module_refcounts(opts.subsys_bits);
|
||||
drop_parsed_module_refcounts(opts.subsys_mask);
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
ret = rebind_subsystems(root, opts.subsys_bits);
|
||||
ret = rebind_subsystems(root, opts.subsys_mask);
|
||||
if (ret) {
|
||||
drop_parsed_module_refcounts(opts.subsys_bits);
|
||||
drop_parsed_module_refcounts(opts.subsys_mask);
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
/* clear out any existing files and repopulate subsystem files */
|
||||
cgroup_clear_directory(cgrp->dentry);
|
||||
cgroup_populate_dir(cgrp);
|
||||
cgroup_clear_directory(cgrp->dentry, false, removed_mask);
|
||||
/* re-populate subsystem files */
|
||||
cgroup_populate_dir(cgrp, false, added_mask);
|
||||
|
||||
if (opts.release_agent)
|
||||
strcpy(root->release_agent_path, opts.release_agent);
|
||||
|
@ -1401,6 +1437,7 @@ static void init_cgroup_housekeeping(struct cgroup *cgrp)
|
|||
mutex_init(&cgrp->pidlist_mutex);
|
||||
INIT_LIST_HEAD(&cgrp->event_list);
|
||||
spin_lock_init(&cgrp->event_list_lock);
|
||||
simple_xattrs_init(&cgrp->xattrs);
|
||||
}
|
||||
|
||||
static void init_cgroup_root(struct cgroupfs_root *root)
|
||||
|
@ -1455,8 +1492,8 @@ static int cgroup_test_super(struct super_block *sb, void *data)
|
|||
* If we asked for subsystems (or explicitly for no
|
||||
* subsystems) then they must match
|
||||
*/
|
||||
if ((opts->subsys_bits || opts->none)
|
||||
&& (opts->subsys_bits != root->subsys_bits))
|
||||
if ((opts->subsys_mask || opts->none)
|
||||
&& (opts->subsys_mask != root->subsys_mask))
|
||||
return 0;
|
||||
|
||||
return 1;
|
||||
|
@ -1466,7 +1503,7 @@ static struct cgroupfs_root *cgroup_root_from_opts(struct cgroup_sb_opts *opts)
|
|||
{
|
||||
struct cgroupfs_root *root;
|
||||
|
||||
if (!opts->subsys_bits && !opts->none)
|
||||
if (!opts->subsys_mask && !opts->none)
|
||||
return NULL;
|
||||
|
||||
root = kzalloc(sizeof(*root), GFP_KERNEL);
|
||||
|
@ -1479,7 +1516,7 @@ static struct cgroupfs_root *cgroup_root_from_opts(struct cgroup_sb_opts *opts)
|
|||
}
|
||||
init_cgroup_root(root);
|
||||
|
||||
root->subsys_bits = opts->subsys_bits;
|
||||
root->subsys_mask = opts->subsys_mask;
|
||||
root->flags = opts->flags;
|
||||
if (opts->release_agent)
|
||||
strcpy(root->release_agent_path, opts->release_agent);
|
||||
|
@ -1511,7 +1548,7 @@ static int cgroup_set_super(struct super_block *sb, void *data)
|
|||
if (!opts->new_root)
|
||||
return -EINVAL;
|
||||
|
||||
BUG_ON(!opts->subsys_bits && !opts->none);
|
||||
BUG_ON(!opts->subsys_mask && !opts->none);
|
||||
|
||||
ret = set_anon_super(sb, NULL);
|
||||
if (ret)
|
||||
|
@ -1629,7 +1666,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
|
|||
if (ret)
|
||||
goto unlock_drop;
|
||||
|
||||
ret = rebind_subsystems(root, root->subsys_bits);
|
||||
ret = rebind_subsystems(root, root->subsys_mask);
|
||||
if (ret == -EBUSY) {
|
||||
free_cg_links(&tmp_cg_links);
|
||||
goto unlock_drop;
|
||||
|
@ -1669,7 +1706,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
|
|||
BUG_ON(root->number_of_cgroups != 1);
|
||||
|
||||
cred = override_creds(&init_cred);
|
||||
cgroup_populate_dir(root_cgrp);
|
||||
cgroup_populate_dir(root_cgrp, true, root->subsys_mask);
|
||||
revert_creds(cred);
|
||||
mutex_unlock(&cgroup_root_mutex);
|
||||
mutex_unlock(&cgroup_mutex);
|
||||
|
@ -1681,7 +1718,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
|
|||
*/
|
||||
cgroup_drop_root(opts.new_root);
|
||||
/* no subsys rebinding, so refcounts don't change */
|
||||
drop_parsed_module_refcounts(opts.subsys_bits);
|
||||
drop_parsed_module_refcounts(opts.subsys_mask);
|
||||
}
|
||||
|
||||
kfree(opts.release_agent);
|
||||
|
@ -1695,7 +1732,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
|
|||
drop_new_super:
|
||||
deactivate_locked_super(sb);
|
||||
drop_modules:
|
||||
drop_parsed_module_refcounts(opts.subsys_bits);
|
||||
drop_parsed_module_refcounts(opts.subsys_mask);
|
||||
out_err:
|
||||
kfree(opts.release_agent);
|
||||
kfree(opts.name);
|
||||
|
@ -1745,6 +1782,8 @@ static void cgroup_kill_sb(struct super_block *sb) {
|
|||
mutex_unlock(&cgroup_root_mutex);
|
||||
mutex_unlock(&cgroup_mutex);
|
||||
|
||||
simple_xattrs_free(&cgrp->xattrs);
|
||||
|
||||
kill_litter_super(sb);
|
||||
cgroup_drop_root(root);
|
||||
}
|
||||
|
@ -2551,6 +2590,64 @@ static int cgroup_rename(struct inode *old_dir, struct dentry *old_dentry,
|
|||
return simple_rename(old_dir, old_dentry, new_dir, new_dentry);
|
||||
}
|
||||
|
||||
static struct simple_xattrs *__d_xattrs(struct dentry *dentry)
|
||||
{
|
||||
if (S_ISDIR(dentry->d_inode->i_mode))
|
||||
return &__d_cgrp(dentry)->xattrs;
|
||||
else
|
||||
return &__d_cft(dentry)->xattrs;
|
||||
}
|
||||
|
||||
static inline int xattr_enabled(struct dentry *dentry)
|
||||
{
|
||||
struct cgroupfs_root *root = dentry->d_sb->s_fs_info;
|
||||
return test_bit(ROOT_XATTR, &root->flags);
|
||||
}
|
||||
|
||||
static bool is_valid_xattr(const char *name)
|
||||
{
|
||||
if (!strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN) ||
|
||||
!strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN))
|
||||
return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
static int cgroup_setxattr(struct dentry *dentry, const char *name,
|
||||
const void *val, size_t size, int flags)
|
||||
{
|
||||
if (!xattr_enabled(dentry))
|
||||
return -EOPNOTSUPP;
|
||||
if (!is_valid_xattr(name))
|
||||
return -EINVAL;
|
||||
return simple_xattr_set(__d_xattrs(dentry), name, val, size, flags);
|
||||
}
|
||||
|
||||
static int cgroup_removexattr(struct dentry *dentry, const char *name)
|
||||
{
|
||||
if (!xattr_enabled(dentry))
|
||||
return -EOPNOTSUPP;
|
||||
if (!is_valid_xattr(name))
|
||||
return -EINVAL;
|
||||
return simple_xattr_remove(__d_xattrs(dentry), name);
|
||||
}
|
||||
|
||||
static ssize_t cgroup_getxattr(struct dentry *dentry, const char *name,
|
||||
void *buf, size_t size)
|
||||
{
|
||||
if (!xattr_enabled(dentry))
|
||||
return -EOPNOTSUPP;
|
||||
if (!is_valid_xattr(name))
|
||||
return -EINVAL;
|
||||
return simple_xattr_get(__d_xattrs(dentry), name, buf, size);
|
||||
}
|
||||
|
||||
static ssize_t cgroup_listxattr(struct dentry *dentry, char *buf, size_t size)
|
||||
{
|
||||
if (!xattr_enabled(dentry))
|
||||
return -EOPNOTSUPP;
|
||||
return simple_xattr_list(__d_xattrs(dentry), buf, size);
|
||||
}
|
||||
|
||||
static const struct file_operations cgroup_file_operations = {
|
||||
.read = cgroup_file_read,
|
||||
.write = cgroup_file_write,
|
||||
|
@ -2559,11 +2656,22 @@ static const struct file_operations cgroup_file_operations = {
|
|||
.release = cgroup_file_release,
|
||||
};
|
||||
|
||||
static const struct inode_operations cgroup_file_inode_operations = {
|
||||
.setxattr = cgroup_setxattr,
|
||||
.getxattr = cgroup_getxattr,
|
||||
.listxattr = cgroup_listxattr,
|
||||
.removexattr = cgroup_removexattr,
|
||||
};
|
||||
|
||||
static const struct inode_operations cgroup_dir_inode_operations = {
|
||||
.lookup = cgroup_lookup,
|
||||
.mkdir = cgroup_mkdir,
|
||||
.rmdir = cgroup_rmdir,
|
||||
.rename = cgroup_rename,
|
||||
.setxattr = cgroup_setxattr,
|
||||
.getxattr = cgroup_getxattr,
|
||||
.listxattr = cgroup_listxattr,
|
||||
.removexattr = cgroup_removexattr,
|
||||
};
|
||||
|
||||
static struct dentry *cgroup_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
|
||||
|
@ -2611,6 +2719,7 @@ static int cgroup_create_file(struct dentry *dentry, umode_t mode,
|
|||
} else if (S_ISREG(mode)) {
|
||||
inode->i_size = 0;
|
||||
inode->i_fop = &cgroup_file_operations;
|
||||
inode->i_op = &cgroup_file_inode_operations;
|
||||
}
|
||||
d_instantiate(dentry, inode);
|
||||
dget(dentry); /* Extra count - pin the dentry in core */
|
||||
|
@ -2671,7 +2780,7 @@ static umode_t cgroup_file_mode(const struct cftype *cft)
|
|||
}
|
||||
|
||||
static int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
|
||||
const struct cftype *cft)
|
||||
struct cftype *cft)
|
||||
{
|
||||
struct dentry *dir = cgrp->dentry;
|
||||
struct cgroup *parent = __d_cgrp(dir);
|
||||
|
@ -2681,6 +2790,8 @@ static int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
|
|||
umode_t mode;
|
||||
char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
|
||||
|
||||
simple_xattrs_init(&cft->xattrs);
|
||||
|
||||
/* does @cft->flags tell us to skip creation on @cgrp? */
|
||||
if ((cft->flags & CFTYPE_NOT_ON_ROOT) && !cgrp->parent)
|
||||
return 0;
|
||||
|
@ -2721,9 +2832,9 @@ out:
|
|||
}
|
||||
|
||||
static int cgroup_addrm_files(struct cgroup *cgrp, struct cgroup_subsys *subsys,
|
||||
const struct cftype cfts[], bool is_add)
|
||||
struct cftype cfts[], bool is_add)
|
||||
{
|
||||
const struct cftype *cft;
|
||||
struct cftype *cft;
|
||||
int err, ret = 0;
|
||||
|
||||
for (cft = cfts; cft->name[0] != '\0'; cft++) {
|
||||
|
@ -2757,7 +2868,7 @@ static void cgroup_cfts_prepare(void)
|
|||
}
|
||||
|
||||
static void cgroup_cfts_commit(struct cgroup_subsys *ss,
|
||||
const struct cftype *cfts, bool is_add)
|
||||
struct cftype *cfts, bool is_add)
|
||||
__releases(&cgroup_mutex) __releases(&cgroup_cft_mutex)
|
||||
{
|
||||
LIST_HEAD(pending);
|
||||
|
@ -2808,7 +2919,7 @@ static void cgroup_cfts_commit(struct cgroup_subsys *ss,
|
|||
* function currently returns 0 as long as @cfts registration is successful
|
||||
* even if some file creation attempts on existing cgroups fail.
|
||||
*/
|
||||
int cgroup_add_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts)
|
||||
int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
|
||||
{
|
||||
struct cftype_set *set;
|
||||
|
||||
|
@ -2838,7 +2949,7 @@ EXPORT_SYMBOL_GPL(cgroup_add_cftypes);
|
|||
* Returns 0 on successful unregistration, -ENOENT if @cfts is not
|
||||
* registered with @ss.
|
||||
*/
|
||||
int cgroup_rm_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts)
|
||||
int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
|
||||
{
|
||||
struct cftype_set *set;
|
||||
|
||||
|
@ -3843,18 +3954,29 @@ static struct cftype files[] = {
|
|||
{ } /* terminate */
|
||||
};
|
||||
|
||||
static int cgroup_populate_dir(struct cgroup *cgrp)
|
||||
/**
|
||||
* cgroup_populate_dir - selectively creation of files in a directory
|
||||
* @cgrp: target cgroup
|
||||
* @base_files: true if the base files should be added
|
||||
* @subsys_mask: mask of the subsystem ids whose files should be added
|
||||
*/
|
||||
static int cgroup_populate_dir(struct cgroup *cgrp, bool base_files,
|
||||
unsigned long subsys_mask)
|
||||
{
|
||||
int err;
|
||||
struct cgroup_subsys *ss;
|
||||
|
||||
err = cgroup_addrm_files(cgrp, NULL, files, true);
|
||||
if (err < 0)
|
||||
return err;
|
||||
if (base_files) {
|
||||
err = cgroup_addrm_files(cgrp, NULL, files, true);
|
||||
if (err < 0)
|
||||
return err;
|
||||
}
|
||||
|
||||
/* process cftsets of each subsystem */
|
||||
for_each_subsys(cgrp->root, ss) {
|
||||
struct cftype_set *set;
|
||||
if (!test_bit(ss->subsys_id, &subsys_mask))
|
||||
continue;
|
||||
|
||||
list_for_each_entry(set, &ss->cftsets, node)
|
||||
cgroup_addrm_files(cgrp, ss, set->cfts, true);
|
||||
|
@ -3988,7 +4110,7 @@ static long cgroup_create(struct cgroup *parent, struct dentry *dentry,
|
|||
|
||||
list_add_tail(&cgrp->allcg_node, &root->allcg_list);
|
||||
|
||||
err = cgroup_populate_dir(cgrp);
|
||||
err = cgroup_populate_dir(cgrp, true, root->subsys_mask);
|
||||
/* If err < 0, we have a half-filled directory - oh well ;) */
|
||||
|
||||
mutex_unlock(&cgroup_mutex);
|
||||
|
@ -4321,8 +4443,7 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
|
|||
* since cgroup_init_subsys will have already taken care of it.
|
||||
*/
|
||||
if (ss->module == NULL) {
|
||||
/* a few sanity checks */
|
||||
BUG_ON(ss->subsys_id >= CGROUP_BUILTIN_SUBSYS_COUNT);
|
||||
/* a sanity check */
|
||||
BUG_ON(subsys[ss->subsys_id] != ss);
|
||||
return 0;
|
||||
}
|
||||
|
@ -4330,24 +4451,8 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
|
|||
/* init base cftset */
|
||||
cgroup_init_cftsets(ss);
|
||||
|
||||
/*
|
||||
* need to register a subsys id before anything else - for example,
|
||||
* init_cgroup_css needs it.
|
||||
*/
|
||||
mutex_lock(&cgroup_mutex);
|
||||
/* find the first empty slot in the array */
|
||||
for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
|
||||
if (subsys[i] == NULL)
|
||||
break;
|
||||
}
|
||||
if (i == CGROUP_SUBSYS_COUNT) {
|
||||
/* maximum number of subsystems already registered! */
|
||||
mutex_unlock(&cgroup_mutex);
|
||||
return -EBUSY;
|
||||
}
|
||||
/* assign ourselves the subsys_id */
|
||||
ss->subsys_id = i;
|
||||
subsys[i] = ss;
|
||||
subsys[ss->subsys_id] = ss;
|
||||
|
||||
/*
|
||||
* no ss->create seems to need anything important in the ss struct, so
|
||||
|
@ -4356,7 +4461,7 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
|
|||
css = ss->create(dummytop);
|
||||
if (IS_ERR(css)) {
|
||||
/* failure case - need to deassign the subsys[] slot. */
|
||||
subsys[i] = NULL;
|
||||
subsys[ss->subsys_id] = NULL;
|
||||
mutex_unlock(&cgroup_mutex);
|
||||
return PTR_ERR(css);
|
||||
}
|
||||
|
@ -4372,7 +4477,7 @@ int __init_or_module cgroup_load_subsys(struct cgroup_subsys *ss)
|
|||
if (ret) {
|
||||
dummytop->subsys[ss->subsys_id] = NULL;
|
||||
ss->destroy(dummytop);
|
||||
subsys[i] = NULL;
|
||||
subsys[ss->subsys_id] = NULL;
|
||||
mutex_unlock(&cgroup_mutex);
|
||||
return ret;
|
||||
}
|
||||
|
@ -4439,7 +4544,6 @@ void cgroup_unload_subsys(struct cgroup_subsys *ss)
|
|||
|
||||
mutex_lock(&cgroup_mutex);
|
||||
/* deassign the subsys_id */
|
||||
BUG_ON(ss->subsys_id < CGROUP_BUILTIN_SUBSYS_COUNT);
|
||||
subsys[ss->subsys_id] = NULL;
|
||||
|
||||
/* remove subsystem from rootnode's list of subsystems */
|
||||
|
@ -4502,10 +4606,13 @@ int __init cgroup_init_early(void)
|
|||
for (i = 0; i < CSS_SET_TABLE_SIZE; i++)
|
||||
INIT_HLIST_HEAD(&css_set_table[i]);
|
||||
|
||||
/* at bootup time, we don't worry about modular subsystems */
|
||||
for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
|
||||
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
|
||||
struct cgroup_subsys *ss = subsys[i];
|
||||
|
||||
/* at bootup time, we don't worry about modular subsystems */
|
||||
if (!ss || ss->module)
|
||||
continue;
|
||||
|
||||
BUG_ON(!ss->name);
|
||||
BUG_ON(strlen(ss->name) > MAX_CGROUP_TYPE_NAMELEN);
|
||||
BUG_ON(!ss->create);
|
||||
|
@ -4538,9 +4645,12 @@ int __init cgroup_init(void)
|
|||
if (err)
|
||||
return err;
|
||||
|
||||
/* at bootup time, we don't worry about modular subsystems */
|
||||
for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
|
||||
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
|
||||
struct cgroup_subsys *ss = subsys[i];
|
||||
|
||||
/* at bootup time, we don't worry about modular subsystems */
|
||||
if (!ss || ss->module)
|
||||
continue;
|
||||
if (!ss->early_init)
|
||||
cgroup_init_subsys(ss);
|
||||
if (ss->use_id)
|
||||
|
@ -4735,13 +4845,16 @@ void cgroup_fork_callbacks(struct task_struct *child)
|
|||
{
|
||||
if (need_forkexit_callback) {
|
||||
int i;
|
||||
/*
|
||||
* forkexit callbacks are only supported for builtin
|
||||
* subsystems, and the builtin section of the subsys array is
|
||||
* immutable, so we don't need to lock the subsys array here.
|
||||
*/
|
||||
for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
|
||||
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
|
||||
struct cgroup_subsys *ss = subsys[i];
|
||||
|
||||
/*
|
||||
* forkexit callbacks are only supported for
|
||||
* builtin subsystems.
|
||||
*/
|
||||
if (!ss || ss->module)
|
||||
continue;
|
||||
|
||||
if (ss->fork)
|
||||
ss->fork(child);
|
||||
}
|
||||
|
@ -4846,12 +4959,13 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
|
|||
tsk->cgroups = &init_css_set;
|
||||
|
||||
if (run_callbacks && need_forkexit_callback) {
|
||||
/*
|
||||
* modular subsystems can't use callbacks, so no need to lock
|
||||
* the subsys array
|
||||
*/
|
||||
for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
|
||||
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
|
||||
struct cgroup_subsys *ss = subsys[i];
|
||||
|
||||
/* modular subsystems can't use callbacks */
|
||||
if (!ss || ss->module)
|
||||
continue;
|
||||
|
||||
if (ss->exit) {
|
||||
struct cgroup *old_cgrp =
|
||||
rcu_dereference_raw(cg->subsys[i])->cgroup;
|
||||
|
@ -5037,13 +5151,17 @@ static int __init cgroup_disable(char *str)
|
|||
while ((token = strsep(&str, ",")) != NULL) {
|
||||
if (!*token)
|
||||
continue;
|
||||
/*
|
||||
* cgroup_disable, being at boot time, can't know about module
|
||||
* subsystems, so we don't worry about them.
|
||||
*/
|
||||
for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
|
||||
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
|
||||
struct cgroup_subsys *ss = subsys[i];
|
||||
|
||||
/*
|
||||
* cgroup_disable, being at boot time, can't
|
||||
* know about module subsystems, so we don't
|
||||
* worry about them.
|
||||
*/
|
||||
if (!ss || ss->module)
|
||||
continue;
|
||||
|
||||
if (!strcmp(token, ss->name)) {
|
||||
ss->disabled = 1;
|
||||
printk(KERN_INFO "Disabling %s control group"
|
||||
|
|
171
mm/shmem.c
171
mm/shmem.c
|
@ -77,13 +77,6 @@ static struct vfsmount *shm_mnt;
|
|||
/* Symlink up to this size is kmalloc'ed instead of using a swappable page */
|
||||
#define SHORT_SYMLINK_LEN 128
|
||||
|
||||
struct shmem_xattr {
|
||||
struct list_head list; /* anchored by shmem_inode_info->xattr_list */
|
||||
char *name; /* xattr name */
|
||||
size_t size;
|
||||
char value[0];
|
||||
};
|
||||
|
||||
/*
|
||||
* shmem_fallocate and shmem_writepage communicate via inode->i_private
|
||||
* (with i_mutex making sure that it has only one user at a time):
|
||||
|
@ -636,7 +629,6 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
|
|||
static void shmem_evict_inode(struct inode *inode)
|
||||
{
|
||||
struct shmem_inode_info *info = SHMEM_I(inode);
|
||||
struct shmem_xattr *xattr, *nxattr;
|
||||
|
||||
if (inode->i_mapping->a_ops == &shmem_aops) {
|
||||
shmem_unacct_size(info->flags, inode->i_size);
|
||||
|
@ -650,10 +642,7 @@ static void shmem_evict_inode(struct inode *inode)
|
|||
} else
|
||||
kfree(info->symlink);
|
||||
|
||||
list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) {
|
||||
kfree(xattr->name);
|
||||
kfree(xattr);
|
||||
}
|
||||
simple_xattrs_free(&info->xattrs);
|
||||
BUG_ON(inode->i_blocks);
|
||||
shmem_free_inode(inode->i_sb);
|
||||
clear_inode(inode);
|
||||
|
@ -1377,7 +1366,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode
|
|||
spin_lock_init(&info->lock);
|
||||
info->flags = flags & VM_NORESERVE;
|
||||
INIT_LIST_HEAD(&info->swaplist);
|
||||
INIT_LIST_HEAD(&info->xattr_list);
|
||||
simple_xattrs_init(&info->xattrs);
|
||||
cache_no_acl(inode);
|
||||
|
||||
switch (mode & S_IFMT) {
|
||||
|
@ -2059,28 +2048,6 @@ static void shmem_put_link(struct dentry *dentry, struct nameidata *nd, void *co
|
|||
* filesystem level, though.
|
||||
*/
|
||||
|
||||
/*
|
||||
* Allocate new xattr and copy in the value; but leave the name to callers.
|
||||
*/
|
||||
static struct shmem_xattr *shmem_xattr_alloc(const void *value, size_t size)
|
||||
{
|
||||
struct shmem_xattr *new_xattr;
|
||||
size_t len;
|
||||
|
||||
/* wrap around? */
|
||||
len = sizeof(*new_xattr) + size;
|
||||
if (len <= sizeof(*new_xattr))
|
||||
return NULL;
|
||||
|
||||
new_xattr = kmalloc(len, GFP_KERNEL);
|
||||
if (!new_xattr)
|
||||
return NULL;
|
||||
|
||||
new_xattr->size = size;
|
||||
memcpy(new_xattr->value, value, size);
|
||||
return new_xattr;
|
||||
}
|
||||
|
||||
/*
|
||||
* Callback for security_inode_init_security() for acquiring xattrs.
|
||||
*/
|
||||
|
@ -2090,11 +2057,11 @@ static int shmem_initxattrs(struct inode *inode,
|
|||
{
|
||||
struct shmem_inode_info *info = SHMEM_I(inode);
|
||||
const struct xattr *xattr;
|
||||
struct shmem_xattr *new_xattr;
|
||||
struct simple_xattr *new_xattr;
|
||||
size_t len;
|
||||
|
||||
for (xattr = xattr_array; xattr->name != NULL; xattr++) {
|
||||
new_xattr = shmem_xattr_alloc(xattr->value, xattr->value_len);
|
||||
new_xattr = simple_xattr_alloc(xattr->value, xattr->value_len);
|
||||
if (!new_xattr)
|
||||
return -ENOMEM;
|
||||
|
||||
|
@ -2111,91 +2078,12 @@ static int shmem_initxattrs(struct inode *inode,
|
|||
memcpy(new_xattr->name + XATTR_SECURITY_PREFIX_LEN,
|
||||
xattr->name, len);
|
||||
|
||||
spin_lock(&info->lock);
|
||||
list_add(&new_xattr->list, &info->xattr_list);
|
||||
spin_unlock(&info->lock);
|
||||
simple_xattr_list_add(&info->xattrs, new_xattr);
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int shmem_xattr_get(struct dentry *dentry, const char *name,
|
||||
void *buffer, size_t size)
|
||||
{
|
||||
struct shmem_inode_info *info;
|
||||
struct shmem_xattr *xattr;
|
||||
int ret = -ENODATA;
|
||||
|
||||
info = SHMEM_I(dentry->d_inode);
|
||||
|
||||
spin_lock(&info->lock);
|
||||
list_for_each_entry(xattr, &info->xattr_list, list) {
|
||||
if (strcmp(name, xattr->name))
|
||||
continue;
|
||||
|
||||
ret = xattr->size;
|
||||
if (buffer) {
|
||||
if (size < xattr->size)
|
||||
ret = -ERANGE;
|
||||
else
|
||||
memcpy(buffer, xattr->value, xattr->size);
|
||||
}
|
||||
break;
|
||||
}
|
||||
spin_unlock(&info->lock);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int shmem_xattr_set(struct inode *inode, const char *name,
|
||||
const void *value, size_t size, int flags)
|
||||
{
|
||||
struct shmem_inode_info *info = SHMEM_I(inode);
|
||||
struct shmem_xattr *xattr;
|
||||
struct shmem_xattr *new_xattr = NULL;
|
||||
int err = 0;
|
||||
|
||||
/* value == NULL means remove */
|
||||
if (value) {
|
||||
new_xattr = shmem_xattr_alloc(value, size);
|
||||
if (!new_xattr)
|
||||
return -ENOMEM;
|
||||
|
||||
new_xattr->name = kstrdup(name, GFP_KERNEL);
|
||||
if (!new_xattr->name) {
|
||||
kfree(new_xattr);
|
||||
return -ENOMEM;
|
||||
}
|
||||
}
|
||||
|
||||
spin_lock(&info->lock);
|
||||
list_for_each_entry(xattr, &info->xattr_list, list) {
|
||||
if (!strcmp(name, xattr->name)) {
|
||||
if (flags & XATTR_CREATE) {
|
||||
xattr = new_xattr;
|
||||
err = -EEXIST;
|
||||
} else if (new_xattr) {
|
||||
list_replace(&xattr->list, &new_xattr->list);
|
||||
} else {
|
||||
list_del(&xattr->list);
|
||||
}
|
||||
goto out;
|
||||
}
|
||||
}
|
||||
if (flags & XATTR_REPLACE) {
|
||||
xattr = new_xattr;
|
||||
err = -ENODATA;
|
||||
} else {
|
||||
list_add(&new_xattr->list, &info->xattr_list);
|
||||
xattr = NULL;
|
||||
}
|
||||
out:
|
||||
spin_unlock(&info->lock);
|
||||
if (xattr)
|
||||
kfree(xattr->name);
|
||||
kfree(xattr);
|
||||
return err;
|
||||
}
|
||||
|
||||
static const struct xattr_handler *shmem_xattr_handlers[] = {
|
||||
#ifdef CONFIG_TMPFS_POSIX_ACL
|
||||
&generic_acl_access_handler,
|
||||
|
@ -2226,6 +2114,7 @@ static int shmem_xattr_validate(const char *name)
|
|||
static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
|
||||
void *buffer, size_t size)
|
||||
{
|
||||
struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
|
||||
int err;
|
||||
|
||||
/*
|
||||
|
@ -2240,12 +2129,13 @@ static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
|
|||
if (err)
|
||||
return err;
|
||||
|
||||
return shmem_xattr_get(dentry, name, buffer, size);
|
||||
return simple_xattr_get(&info->xattrs, name, buffer, size);
|
||||
}
|
||||
|
||||
static int shmem_setxattr(struct dentry *dentry, const char *name,
|
||||
const void *value, size_t size, int flags)
|
||||
{
|
||||
struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
|
||||
int err;
|
||||
|
||||
/*
|
||||
|
@ -2260,15 +2150,12 @@ static int shmem_setxattr(struct dentry *dentry, const char *name,
|
|||
if (err)
|
||||
return err;
|
||||
|
||||
if (size == 0)
|
||||
value = ""; /* empty EA, do not remove */
|
||||
|
||||
return shmem_xattr_set(dentry->d_inode, name, value, size, flags);
|
||||
|
||||
return simple_xattr_set(&info->xattrs, name, value, size, flags);
|
||||
}
|
||||
|
||||
static int shmem_removexattr(struct dentry *dentry, const char *name)
|
||||
{
|
||||
struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
|
||||
int err;
|
||||
|
||||
/*
|
||||
|
@ -2283,45 +2170,13 @@ static int shmem_removexattr(struct dentry *dentry, const char *name)
|
|||
if (err)
|
||||
return err;
|
||||
|
||||
return shmem_xattr_set(dentry->d_inode, name, NULL, 0, XATTR_REPLACE);
|
||||
}
|
||||
|
||||
static bool xattr_is_trusted(const char *name)
|
||||
{
|
||||
return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
|
||||
return simple_xattr_remove(&info->xattrs, name);
|
||||
}
|
||||
|
||||
static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
|
||||
{
|
||||
bool trusted = capable(CAP_SYS_ADMIN);
|
||||
struct shmem_xattr *xattr;
|
||||
struct shmem_inode_info *info;
|
||||
size_t used = 0;
|
||||
|
||||
info = SHMEM_I(dentry->d_inode);
|
||||
|
||||
spin_lock(&info->lock);
|
||||
list_for_each_entry(xattr, &info->xattr_list, list) {
|
||||
size_t len;
|
||||
|
||||
/* skip "trusted." attributes for unprivileged callers */
|
||||
if (!trusted && xattr_is_trusted(xattr->name))
|
||||
continue;
|
||||
|
||||
len = strlen(xattr->name) + 1;
|
||||
used += len;
|
||||
if (buffer) {
|
||||
if (size < used) {
|
||||
used = -ERANGE;
|
||||
break;
|
||||
}
|
||||
memcpy(buffer, xattr->name, len);
|
||||
buffer += len;
|
||||
}
|
||||
}
|
||||
spin_unlock(&info->lock);
|
||||
|
||||
return used;
|
||||
struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
|
||||
return simple_xattr_list(&info->xattrs, buffer, size);
|
||||
}
|
||||
#endif /* CONFIG_TMPFS_XATTR */
|
||||
|
||||
|
|
|
@ -326,9 +326,7 @@ struct cgroup_subsys net_prio_subsys = {
|
|||
.create = cgrp_create,
|
||||
.destroy = cgrp_destroy,
|
||||
.attach = net_prio_attach,
|
||||
#ifdef CONFIG_NETPRIO_CGROUP
|
||||
.subsys_id = net_prio_subsys_id,
|
||||
#endif
|
||||
.base_cftypes = ss_files,
|
||||
.module = THIS_MODULE
|
||||
};
|
||||
|
@ -366,10 +364,6 @@ static int __init init_cgroup_netprio(void)
|
|||
ret = cgroup_load_subsys(&net_prio_subsys);
|
||||
if (ret)
|
||||
goto out;
|
||||
#ifndef CONFIG_NETPRIO_CGROUP
|
||||
smp_wmb();
|
||||
net_prio_subsys_id = net_prio_subsys.subsys_id;
|
||||
#endif
|
||||
|
||||
register_netdevice_notifier(&netprio_device_notifier);
|
||||
|
||||
|
@ -386,11 +380,6 @@ static void __exit exit_cgroup_netprio(void)
|
|||
|
||||
cgroup_unload_subsys(&net_prio_subsys);
|
||||
|
||||
#ifndef CONFIG_NETPRIO_CGROUP
|
||||
net_prio_subsys_id = -1;
|
||||
synchronize_rcu();
|
||||
#endif
|
||||
|
||||
rtnl_lock();
|
||||
for_each_netdev(&init_net, dev) {
|
||||
old = rtnl_dereference(dev->priomap);
|
||||
|
|
|
@ -326,17 +326,6 @@ int __sk_backlog_rcv(struct sock *sk, struct sk_buff *skb)
|
|||
}
|
||||
EXPORT_SYMBOL(__sk_backlog_rcv);
|
||||
|
||||
#if defined(CONFIG_CGROUPS)
|
||||
#if !defined(CONFIG_NET_CLS_CGROUP)
|
||||
int net_cls_subsys_id = -1;
|
||||
EXPORT_SYMBOL_GPL(net_cls_subsys_id);
|
||||
#endif
|
||||
#if !defined(CONFIG_NETPRIO_CGROUP)
|
||||
int net_prio_subsys_id = -1;
|
||||
EXPORT_SYMBOL_GPL(net_prio_subsys_id);
|
||||
#endif
|
||||
#endif
|
||||
|
||||
static int sock_set_timeout(long *timeo_p, char __user *optval, int optlen)
|
||||
{
|
||||
struct timeval tv;
|
||||
|
@ -1224,6 +1213,7 @@ static void sk_prot_free(struct proto *prot, struct sock *sk)
|
|||
}
|
||||
|
||||
#ifdef CONFIG_CGROUPS
|
||||
#if IS_ENABLED(CONFIG_NET_CLS_CGROUP)
|
||||
void sock_update_classid(struct sock *sk)
|
||||
{
|
||||
u32 classid;
|
||||
|
@ -1235,7 +1225,9 @@ void sock_update_classid(struct sock *sk)
|
|||
sk->sk_classid = classid;
|
||||
}
|
||||
EXPORT_SYMBOL(sock_update_classid);
|
||||
#endif
|
||||
|
||||
#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
|
||||
void sock_update_netprioidx(struct sock *sk, struct task_struct *task)
|
||||
{
|
||||
if (in_interrupt())
|
||||
|
@ -1245,6 +1237,7 @@ void sock_update_netprioidx(struct sock *sk, struct task_struct *task)
|
|||
}
|
||||
EXPORT_SYMBOL_GPL(sock_update_netprioidx);
|
||||
#endif
|
||||
#endif
|
||||
|
||||
/**
|
||||
* sk_alloc - All socket objects are allocated here
|
||||
|
|
|
@ -77,9 +77,7 @@ struct cgroup_subsys net_cls_subsys = {
|
|||
.name = "net_cls",
|
||||
.create = cgrp_create,
|
||||
.destroy = cgrp_destroy,
|
||||
#ifdef CONFIG_NET_CLS_CGROUP
|
||||
.subsys_id = net_cls_subsys_id,
|
||||
#endif
|
||||
.base_cftypes = ss_files,
|
||||
.module = THIS_MODULE,
|
||||
};
|
||||
|
@ -283,12 +281,6 @@ static int __init init_cgroup_cls(void)
|
|||
if (ret)
|
||||
goto out;
|
||||
|
||||
#ifndef CONFIG_NET_CLS_CGROUP
|
||||
/* We can't use rcu_assign_pointer because this is an int. */
|
||||
smp_wmb();
|
||||
net_cls_subsys_id = net_cls_subsys.subsys_id;
|
||||
#endif
|
||||
|
||||
ret = register_tcf_proto_ops(&cls_cgroup_ops);
|
||||
if (ret)
|
||||
cgroup_unload_subsys(&net_cls_subsys);
|
||||
|
@ -301,11 +293,6 @@ static void __exit exit_cgroup_cls(void)
|
|||
{
|
||||
unregister_tcf_proto_ops(&cls_cgroup_ops);
|
||||
|
||||
#ifndef CONFIG_NET_CLS_CGROUP
|
||||
net_cls_subsys_id = -1;
|
||||
synchronize_rcu();
|
||||
#endif
|
||||
|
||||
cgroup_unload_subsys(&net_cls_subsys);
|
||||
}
|
||||
|
||||
|
|
Загрузка…
Ссылка в новой задаче