Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cache QoS (RDT/CAR) updates from Thomas Gleixner: "Add support for pseudo-locked cache regions. Cache Allocation Technology (CAT) allows on certain CPUs to isolate a region of cache and 'lock' it. Cache pseudo-locking builds on the fact that a CPU can still read and write data pre-allocated outside its current allocated area on cache hit. With cache pseudo-locking data can be preloaded into a reserved portion of cache that no application can fill, and from that point on will only serve cache hits. The cache pseudo-locked memory is made accessible to user space where an application can map it into its virtual address space and thus have a region of memory with reduced average read latency. The locking is not perfect and gets totally screwed by WBINDV and similar mechanisms, but it provides a reasonable enhancement for certain types of latency sensitive applications. The implementation extends the current CAT mechanism and provides a generally useful exclusive CAT mode on which it builds the extra pseude-locked regions" * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits) x86/intel_rdt: Disable PMU access x86/intel_rdt: Fix possible circular lock dependency x86/intel_rdt: Make CPU information accessible for pseudo-locked regions x86/intel_rdt: Support restoration of subset of permissions x86/intel_rdt: Fix cleanup of plr structure on error x86/intel_rdt: Move pseudo_lock_region_clear() x86/intel_rdt: Limit C-states dynamically when pseudo-locking active x86/intel_rdt: Support L3 cache performance event of Broadwell x86/intel_rdt: More precise L2 hit/miss measurements x86/intel_rdt: Create character device exposing pseudo-locked region x86/intel_rdt: Create debugfs files for pseudo-locking testing x86/intel_rdt: Create resctrl debug area x86/intel_rdt: Ensure RDT cleanup on exit x86/intel_rdt: Resctrl files reflect pseudo-locked information x86/intel_rdt: Support creation/removal of pseudo-locked region x86/intel_rdt: Pseudo-lock region creation/removal core x86/intel_rdt: Discover supported platforms via prefetch disable bits x86/intel_rdt: Add utilities to test pseudo-locked region possibility x86/intel_rdt: Split resource group removal in two x86/intel_rdt: Enable entering of pseudo-locksetup mode ...
This commit is contained in:
Коммит
30de24c7dd
|
@ -29,7 +29,11 @@ mount options are:
|
|||
L2 and L3 CDP are controlled seperately.
|
||||
|
||||
RDT features are orthogonal. A particular system may support only
|
||||
monitoring, only control, or both monitoring and control.
|
||||
monitoring, only control, or both monitoring and control. Cache
|
||||
pseudo-locking is a unique way of using cache control to "pin" or
|
||||
"lock" data in the cache. Details can be found in
|
||||
"Cache Pseudo-Locking".
|
||||
|
||||
|
||||
The mount succeeds if either of allocation or monitoring is present, but
|
||||
only those files and directories supported by the system will be created.
|
||||
|
@ -65,6 +69,29 @@ related to allocation:
|
|||
some platforms support devices that have their
|
||||
own settings for cache use which can over-ride
|
||||
these bits.
|
||||
"bit_usage": Annotated capacity bitmasks showing how all
|
||||
instances of the resource are used. The legend is:
|
||||
"0" - Corresponding region is unused. When the system's
|
||||
resources have been allocated and a "0" is found
|
||||
in "bit_usage" it is a sign that resources are
|
||||
wasted.
|
||||
"H" - Corresponding region is used by hardware only
|
||||
but available for software use. If a resource
|
||||
has bits set in "shareable_bits" but not all
|
||||
of these bits appear in the resource groups'
|
||||
schematas then the bits appearing in
|
||||
"shareable_bits" but no resource group will
|
||||
be marked as "H".
|
||||
"X" - Corresponding region is available for sharing and
|
||||
used by hardware and software. These are the
|
||||
bits that appear in "shareable_bits" as
|
||||
well as a resource group's allocation.
|
||||
"S" - Corresponding region is used by software
|
||||
and available for sharing.
|
||||
"E" - Corresponding region is used exclusively by
|
||||
one resource group. No sharing allowed.
|
||||
"P" - Corresponding region is pseudo-locked. No
|
||||
sharing allowed.
|
||||
|
||||
Memory bandwitdh(MB) subdirectory contains the following files
|
||||
with respect to allocation:
|
||||
|
@ -151,6 +178,9 @@ All groups contain the following files:
|
|||
CPUs to/from this group. As with the tasks file a hierarchy is
|
||||
maintained where MON groups may only include CPUs owned by the
|
||||
parent CTRL_MON group.
|
||||
When the resouce group is in pseudo-locked mode this file will
|
||||
only be readable, reflecting the CPUs associated with the
|
||||
pseudo-locked region.
|
||||
|
||||
|
||||
"cpus_list":
|
||||
|
@ -163,6 +193,21 @@ When control is enabled all CTRL_MON groups will also contain:
|
|||
A list of all the resources available to this group.
|
||||
Each resource has its own line and format - see below for details.
|
||||
|
||||
"size":
|
||||
Mirrors the display of the "schemata" file to display the size in
|
||||
bytes of each allocation instead of the bits representing the
|
||||
allocation.
|
||||
|
||||
"mode":
|
||||
The "mode" of the resource group dictates the sharing of its
|
||||
allocations. A "shareable" resource group allows sharing of its
|
||||
allocations while an "exclusive" resource group does not. A
|
||||
cache pseudo-locked region is created by first writing
|
||||
"pseudo-locksetup" to the "mode" file before writing the cache
|
||||
pseudo-locked region's schemata to the resource group's "schemata"
|
||||
file. On successful pseudo-locked region creation the mode will
|
||||
automatically change to "pseudo-locked".
|
||||
|
||||
When monitoring is enabled all MON groups will also contain:
|
||||
|
||||
"mon_data":
|
||||
|
@ -379,6 +424,170 @@ L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
|
|||
L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
|
||||
L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
|
||||
|
||||
Cache Pseudo-Locking
|
||||
--------------------
|
||||
CAT enables a user to specify the amount of cache space that an
|
||||
application can fill. Cache pseudo-locking builds on the fact that a
|
||||
CPU can still read and write data pre-allocated outside its current
|
||||
allocated area on a cache hit. With cache pseudo-locking, data can be
|
||||
preloaded into a reserved portion of cache that no application can
|
||||
fill, and from that point on will only serve cache hits. The cache
|
||||
pseudo-locked memory is made accessible to user space where an
|
||||
application can map it into its virtual address space and thus have
|
||||
a region of memory with reduced average read latency.
|
||||
|
||||
The creation of a cache pseudo-locked region is triggered by a request
|
||||
from the user to do so that is accompanied by a schemata of the region
|
||||
to be pseudo-locked. The cache pseudo-locked region is created as follows:
|
||||
- Create a CAT allocation CLOSNEW with a CBM matching the schemata
|
||||
from the user of the cache region that will contain the pseudo-locked
|
||||
memory. This region must not overlap with any current CAT allocation/CLOS
|
||||
on the system and no future overlap with this cache region is allowed
|
||||
while the pseudo-locked region exists.
|
||||
- Create a contiguous region of memory of the same size as the cache
|
||||
region.
|
||||
- Flush the cache, disable hardware prefetchers, disable preemption.
|
||||
- Make CLOSNEW the active CLOS and touch the allocated memory to load
|
||||
it into the cache.
|
||||
- Set the previous CLOS as active.
|
||||
- At this point the closid CLOSNEW can be released - the cache
|
||||
pseudo-locked region is protected as long as its CBM does not appear in
|
||||
any CAT allocation. Even though the cache pseudo-locked region will from
|
||||
this point on not appear in any CBM of any CLOS an application running with
|
||||
any CLOS will be able to access the memory in the pseudo-locked region since
|
||||
the region continues to serve cache hits.
|
||||
- The contiguous region of memory loaded into the cache is exposed to
|
||||
user-space as a character device.
|
||||
|
||||
Cache pseudo-locking increases the probability that data will remain
|
||||
in the cache via carefully configuring the CAT feature and controlling
|
||||
application behavior. There is no guarantee that data is placed in
|
||||
cache. Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict
|
||||
“locked” data from cache. Power management C-states may shrink or
|
||||
power off cache. Deeper C-states will automatically be restricted on
|
||||
pseudo-locked region creation.
|
||||
|
||||
It is required that an application using a pseudo-locked region runs
|
||||
with affinity to the cores (or a subset of the cores) associated
|
||||
with the cache on which the pseudo-locked region resides. A sanity check
|
||||
within the code will not allow an application to map pseudo-locked memory
|
||||
unless it runs with affinity to cores associated with the cache on which the
|
||||
pseudo-locked region resides. The sanity check is only done during the
|
||||
initial mmap() handling, there is no enforcement afterwards and the
|
||||
application self needs to ensure it remains affine to the correct cores.
|
||||
|
||||
Pseudo-locking is accomplished in two stages:
|
||||
1) During the first stage the system administrator allocates a portion
|
||||
of cache that should be dedicated to pseudo-locking. At this time an
|
||||
equivalent portion of memory is allocated, loaded into allocated
|
||||
cache portion, and exposed as a character device.
|
||||
2) During the second stage a user-space application maps (mmap()) the
|
||||
pseudo-locked memory into its address space.
|
||||
|
||||
Cache Pseudo-Locking Interface
|
||||
------------------------------
|
||||
A pseudo-locked region is created using the resctrl interface as follows:
|
||||
|
||||
1) Create a new resource group by creating a new directory in /sys/fs/resctrl.
|
||||
2) Change the new resource group's mode to "pseudo-locksetup" by writing
|
||||
"pseudo-locksetup" to the "mode" file.
|
||||
3) Write the schemata of the pseudo-locked region to the "schemata" file. All
|
||||
bits within the schemata should be "unused" according to the "bit_usage"
|
||||
file.
|
||||
|
||||
On successful pseudo-locked region creation the "mode" file will contain
|
||||
"pseudo-locked" and a new character device with the same name as the resource
|
||||
group will exist in /dev/pseudo_lock. This character device can be mmap()'ed
|
||||
by user space in order to obtain access to the pseudo-locked memory region.
|
||||
|
||||
An example of cache pseudo-locked region creation and usage can be found below.
|
||||
|
||||
Cache Pseudo-Locking Debugging Interface
|
||||
---------------------------------------
|
||||
The pseudo-locking debugging interface is enabled by default (if
|
||||
CONFIG_DEBUG_FS is enabled) and can be found in /sys/kernel/debug/resctrl.
|
||||
|
||||
There is no explicit way for the kernel to test if a provided memory
|
||||
location is present in the cache. The pseudo-locking debugging interface uses
|
||||
the tracing infrastructure to provide two ways to measure cache residency of
|
||||
the pseudo-locked region:
|
||||
1) Memory access latency using the pseudo_lock_mem_latency tracepoint. Data
|
||||
from these measurements are best visualized using a hist trigger (see
|
||||
example below). In this test the pseudo-locked region is traversed at
|
||||
a stride of 32 bytes while hardware prefetchers and preemption
|
||||
are disabled. This also provides a substitute visualization of cache
|
||||
hits and misses.
|
||||
2) Cache hit and miss measurements using model specific precision counters if
|
||||
available. Depending on the levels of cache on the system the pseudo_lock_l2
|
||||
and pseudo_lock_l3 tracepoints are available.
|
||||
WARNING: triggering this measurement uses from two (for just L2
|
||||
measurements) to four (for L2 and L3 measurements) precision counters on
|
||||
the system, if any other measurements are in progress the counters and
|
||||
their corresponding event registers will be clobbered.
|
||||
|
||||
When a pseudo-locked region is created a new debugfs directory is created for
|
||||
it in debugfs as /sys/kernel/debug/resctrl/<newdir>. A single
|
||||
write-only file, pseudo_lock_measure, is present in this directory. The
|
||||
measurement on the pseudo-locked region depends on the number, 1 or 2,
|
||||
written to this debugfs file. Since the measurements are recorded with the
|
||||
tracing infrastructure the relevant tracepoints need to be enabled before the
|
||||
measurement is triggered.
|
||||
|
||||
Example of latency debugging interface:
|
||||
In this example a pseudo-locked region named "newlock" was created. Here is
|
||||
how we can measure the latency in cycles of reading from this region and
|
||||
visualize this data with a histogram that is available if CONFIG_HIST_TRIGGERS
|
||||
is set:
|
||||
# :> /sys/kernel/debug/tracing/trace
|
||||
# echo 'hist:keys=latency' > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/trigger
|
||||
# echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
|
||||
# echo 1 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
|
||||
# echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/enable
|
||||
# cat /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_mem_latency/hist
|
||||
|
||||
# event histogram
|
||||
#
|
||||
# trigger info: hist:keys=latency:vals=hitcount:sort=hitcount:size=2048 [active]
|
||||
#
|
||||
|
||||
{ latency: 456 } hitcount: 1
|
||||
{ latency: 50 } hitcount: 83
|
||||
{ latency: 36 } hitcount: 96
|
||||
{ latency: 44 } hitcount: 174
|
||||
{ latency: 48 } hitcount: 195
|
||||
{ latency: 46 } hitcount: 262
|
||||
{ latency: 42 } hitcount: 693
|
||||
{ latency: 40 } hitcount: 3204
|
||||
{ latency: 38 } hitcount: 3484
|
||||
|
||||
Totals:
|
||||
Hits: 8192
|
||||
Entries: 9
|
||||
Dropped: 0
|
||||
|
||||
Example of cache hits/misses debugging:
|
||||
In this example a pseudo-locked region named "newlock" was created on the L2
|
||||
cache of a platform. Here is how we can obtain details of the cache hits
|
||||
and misses using the platform's precision counters.
|
||||
|
||||
# :> /sys/kernel/debug/tracing/trace
|
||||
# echo 1 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
|
||||
# echo 2 > /sys/kernel/debug/resctrl/newlock/pseudo_lock_measure
|
||||
# echo 0 > /sys/kernel/debug/tracing/events/resctrl/pseudo_lock_l2/enable
|
||||
# cat /sys/kernel/debug/tracing/trace
|
||||
|
||||
# tracer: nop
|
||||
#
|
||||
# _-----=> irqs-off
|
||||
# / _----=> need-resched
|
||||
# | / _---=> hardirq/softirq
|
||||
# || / _--=> preempt-depth
|
||||
# ||| / delay
|
||||
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
|
||||
# | | | |||| | |
|
||||
pseudo_lock_mea-1672 [002] .... 3132.860500: pseudo_lock_l2: hits=4097 miss=0
|
||||
|
||||
|
||||
Examples for RDT allocation usage:
|
||||
|
||||
Example 1
|
||||
|
@ -502,7 +711,172 @@ siblings and only the real time threads are scheduled on the cores 4-7.
|
|||
|
||||
# echo F0 > p0/cpus
|
||||
|
||||
4) Locking between applications
|
||||
Example 4
|
||||
---------
|
||||
|
||||
The resource groups in previous examples were all in the default "shareable"
|
||||
mode allowing sharing of their cache allocations. If one resource group
|
||||
configures a cache allocation then nothing prevents another resource group
|
||||
to overlap with that allocation.
|
||||
|
||||
In this example a new exclusive resource group will be created on a L2 CAT
|
||||
system with two L2 cache instances that can be configured with an 8-bit
|
||||
capacity bitmask. The new exclusive resource group will be configured to use
|
||||
25% of each cache instance.
|
||||
|
||||
# mount -t resctrl resctrl /sys/fs/resctrl/
|
||||
# cd /sys/fs/resctrl
|
||||
|
||||
First, we observe that the default group is configured to allocate to all L2
|
||||
cache:
|
||||
|
||||
# cat schemata
|
||||
L2:0=ff;1=ff
|
||||
|
||||
We could attempt to create the new resource group at this point, but it will
|
||||
fail because of the overlap with the schemata of the default group:
|
||||
# mkdir p0
|
||||
# echo 'L2:0=0x3;1=0x3' > p0/schemata
|
||||
# cat p0/mode
|
||||
shareable
|
||||
# echo exclusive > p0/mode
|
||||
-sh: echo: write error: Invalid argument
|
||||
# cat info/last_cmd_status
|
||||
schemata overlaps
|
||||
|
||||
To ensure that there is no overlap with another resource group the default
|
||||
resource group's schemata has to change, making it possible for the new
|
||||
resource group to become exclusive.
|
||||
# echo 'L2:0=0xfc;1=0xfc' > schemata
|
||||
# echo exclusive > p0/mode
|
||||
# grep . p0/*
|
||||
p0/cpus:0
|
||||
p0/mode:exclusive
|
||||
p0/schemata:L2:0=03;1=03
|
||||
p0/size:L2:0=262144;1=262144
|
||||
|
||||
A new resource group will on creation not overlap with an exclusive resource
|
||||
group:
|
||||
# mkdir p1
|
||||
# grep . p1/*
|
||||
p1/cpus:0
|
||||
p1/mode:shareable
|
||||
p1/schemata:L2:0=fc;1=fc
|
||||
p1/size:L2:0=786432;1=786432
|
||||
|
||||
The bit_usage will reflect how the cache is used:
|
||||
# cat info/L2/bit_usage
|
||||
0=SSSSSSEE;1=SSSSSSEE
|
||||
|
||||
A resource group cannot be forced to overlap with an exclusive resource group:
|
||||
# echo 'L2:0=0x1;1=0x1' > p1/schemata
|
||||
-sh: echo: write error: Invalid argument
|
||||
# cat info/last_cmd_status
|
||||
overlaps with exclusive group
|
||||
|
||||
Example of Cache Pseudo-Locking
|
||||
-------------------------------
|
||||
Lock portion of L2 cache from cache id 1 using CBM 0x3. Pseudo-locked
|
||||
region is exposed at /dev/pseudo_lock/newlock that can be provided to
|
||||
application for argument to mmap().
|
||||
|
||||
# mount -t resctrl resctrl /sys/fs/resctrl/
|
||||
# cd /sys/fs/resctrl
|
||||
|
||||
Ensure that there are bits available that can be pseudo-locked, since only
|
||||
unused bits can be pseudo-locked the bits to be pseudo-locked needs to be
|
||||
removed from the default resource group's schemata:
|
||||
# cat info/L2/bit_usage
|
||||
0=SSSSSSSS;1=SSSSSSSS
|
||||
# echo 'L2:1=0xfc' > schemata
|
||||
# cat info/L2/bit_usage
|
||||
0=SSSSSSSS;1=SSSSSS00
|
||||
|
||||
Create a new resource group that will be associated with the pseudo-locked
|
||||
region, indicate that it will be used for a pseudo-locked region, and
|
||||
configure the requested pseudo-locked region capacity bitmask:
|
||||
|
||||
# mkdir newlock
|
||||
# echo pseudo-locksetup > newlock/mode
|
||||
# echo 'L2:1=0x3' > newlock/schemata
|
||||
|
||||
On success the resource group's mode will change to pseudo-locked, the
|
||||
bit_usage will reflect the pseudo-locked region, and the character device
|
||||
exposing the pseudo-locked region will exist:
|
||||
|
||||
# cat newlock/mode
|
||||
pseudo-locked
|
||||
# cat info/L2/bit_usage
|
||||
0=SSSSSSSS;1=SSSSSSPP
|
||||
# ls -l /dev/pseudo_lock/newlock
|
||||
crw------- 1 root root 243, 0 Apr 3 05:01 /dev/pseudo_lock/newlock
|
||||
|
||||
/*
|
||||
* Example code to access one page of pseudo-locked cache region
|
||||
* from user space.
|
||||
*/
|
||||
#define _GNU_SOURCE
|
||||
#include <fcntl.h>
|
||||
#include <sched.h>
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <unistd.h>
|
||||
#include <sys/mman.h>
|
||||
|
||||
/*
|
||||
* It is required that the application runs with affinity to only
|
||||
* cores associated with the pseudo-locked region. Here the cpu
|
||||
* is hardcoded for convenience of example.
|
||||
*/
|
||||
static int cpuid = 2;
|
||||
|
||||
int main(int argc, char *argv[])
|
||||
{
|
||||
cpu_set_t cpuset;
|
||||
long page_size;
|
||||
void *mapping;
|
||||
int dev_fd;
|
||||
int ret;
|
||||
|
||||
page_size = sysconf(_SC_PAGESIZE);
|
||||
|
||||
CPU_ZERO(&cpuset);
|
||||
CPU_SET(cpuid, &cpuset);
|
||||
ret = sched_setaffinity(0, sizeof(cpuset), &cpuset);
|
||||
if (ret < 0) {
|
||||
perror("sched_setaffinity");
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
dev_fd = open("/dev/pseudo_lock/newlock", O_RDWR);
|
||||
if (dev_fd < 0) {
|
||||
perror("open");
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
mapping = mmap(0, page_size, PROT_READ | PROT_WRITE, MAP_SHARED,
|
||||
dev_fd, 0);
|
||||
if (mapping == MAP_FAILED) {
|
||||
perror("mmap");
|
||||
close(dev_fd);
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
/* Application interacts with pseudo-locked memory @mapping */
|
||||
|
||||
ret = munmap(mapping, page_size);
|
||||
if (ret < 0) {
|
||||
perror("munmap");
|
||||
close(dev_fd);
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
close(dev_fd);
|
||||
exit(EXIT_SUCCESS);
|
||||
}
|
||||
|
||||
Locking between applications
|
||||
----------------------------
|
||||
|
||||
Certain operations on the resctrl filesystem, composed of read/writes
|
||||
to/from multiple files, must be atomic.
|
||||
|
@ -510,7 +884,7 @@ to/from multiple files, must be atomic.
|
|||
As an example, the allocation of an exclusive reservation of L3 cache
|
||||
involves:
|
||||
|
||||
1. Read the cbmmasks from each directory
|
||||
1. Read the cbmmasks from each directory or the per-resource "bit_usage"
|
||||
2. Find a contiguous set of bits in the global CBM bitmask that is clear
|
||||
in any of the directory cbmmasks
|
||||
3. Create a new directory
|
||||
|
|
|
@ -35,7 +35,9 @@ obj-$(CONFIG_CPU_SUP_CENTAUR) += centaur.o
|
|||
obj-$(CONFIG_CPU_SUP_TRANSMETA_32) += transmeta.o
|
||||
obj-$(CONFIG_CPU_SUP_UMC_32) += umc.o
|
||||
|
||||
obj-$(CONFIG_INTEL_RDT) += intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_monitor.o intel_rdt_ctrlmondata.o
|
||||
obj-$(CONFIG_INTEL_RDT) += intel_rdt.o intel_rdt_rdtgroup.o intel_rdt_monitor.o
|
||||
obj-$(CONFIG_INTEL_RDT) += intel_rdt_ctrlmondata.o intel_rdt_pseudo_lock.o
|
||||
CFLAGS_intel_rdt_pseudo_lock.o = -I$(src)
|
||||
|
||||
obj-$(CONFIG_X86_MCE) += mcheck/
|
||||
obj-$(CONFIG_MTRR) += mtrr/
|
||||
|
|
|
@ -859,6 +859,8 @@ static __init bool get_rdt_resources(void)
|
|||
return (rdt_mon_capable || rdt_alloc_capable);
|
||||
}
|
||||
|
||||
static enum cpuhp_state rdt_online;
|
||||
|
||||
static int __init intel_rdt_late_init(void)
|
||||
{
|
||||
struct rdt_resource *r;
|
||||
|
@ -880,6 +882,7 @@ static int __init intel_rdt_late_init(void)
|
|||
cpuhp_remove_state(state);
|
||||
return ret;
|
||||
}
|
||||
rdt_online = state;
|
||||
|
||||
for_each_alloc_capable_rdt_resource(r)
|
||||
pr_info("Intel RDT %s allocation detected\n", r->name);
|
||||
|
@ -891,3 +894,11 @@ static int __init intel_rdt_late_init(void)
|
|||
}
|
||||
|
||||
late_initcall(intel_rdt_late_init);
|
||||
|
||||
static void __exit intel_rdt_exit(void)
|
||||
{
|
||||
cpuhp_remove_state(rdt_online);
|
||||
rdtgroup_exit();
|
||||
}
|
||||
|
||||
__exitcall(intel_rdt_exit);
|
||||
|
|
|
@ -80,6 +80,34 @@ enum rdt_group_type {
|
|||
RDT_NUM_GROUP,
|
||||
};
|
||||
|
||||
/**
|
||||
* enum rdtgrp_mode - Mode of a RDT resource group
|
||||
* @RDT_MODE_SHAREABLE: This resource group allows sharing of its allocations
|
||||
* @RDT_MODE_EXCLUSIVE: No sharing of this resource group's allocations allowed
|
||||
* @RDT_MODE_PSEUDO_LOCKSETUP: Resource group will be used for Pseudo-Locking
|
||||
* @RDT_MODE_PSEUDO_LOCKED: No sharing of this resource group's allocations
|
||||
* allowed AND the allocations are Cache Pseudo-Locked
|
||||
*
|
||||
* The mode of a resource group enables control over the allowed overlap
|
||||
* between allocations associated with different resource groups (classes
|
||||
* of service). User is able to modify the mode of a resource group by
|
||||
* writing to the "mode" resctrl file associated with the resource group.
|
||||
*
|
||||
* The "shareable", "exclusive", and "pseudo-locksetup" modes are set by
|
||||
* writing the appropriate text to the "mode" file. A resource group enters
|
||||
* "pseudo-locked" mode after the schemata is written while the resource
|
||||
* group is in "pseudo-locksetup" mode.
|
||||
*/
|
||||
enum rdtgrp_mode {
|
||||
RDT_MODE_SHAREABLE = 0,
|
||||
RDT_MODE_EXCLUSIVE,
|
||||
RDT_MODE_PSEUDO_LOCKSETUP,
|
||||
RDT_MODE_PSEUDO_LOCKED,
|
||||
|
||||
/* Must be last */
|
||||
RDT_NUM_MODES,
|
||||
};
|
||||
|
||||
/**
|
||||
* struct mongroup - store mon group's data in resctrl fs.
|
||||
* @mon_data_kn kernlfs node for the mon_data directory
|
||||
|
@ -94,6 +122,43 @@ struct mongroup {
|
|||
u32 rmid;
|
||||
};
|
||||
|
||||
/**
|
||||
* struct pseudo_lock_region - pseudo-lock region information
|
||||
* @r: RDT resource to which this pseudo-locked region
|
||||
* belongs
|
||||
* @d: RDT domain to which this pseudo-locked region
|
||||
* belongs
|
||||
* @cbm: bitmask of the pseudo-locked region
|
||||
* @lock_thread_wq: waitqueue used to wait on the pseudo-locking thread
|
||||
* completion
|
||||
* @thread_done: variable used by waitqueue to test if pseudo-locking
|
||||
* thread completed
|
||||
* @cpu: core associated with the cache on which the setup code
|
||||
* will be run
|
||||
* @line_size: size of the cache lines
|
||||
* @size: size of pseudo-locked region in bytes
|
||||
* @kmem: the kernel memory associated with pseudo-locked region
|
||||
* @minor: minor number of character device associated with this
|
||||
* region
|
||||
* @debugfs_dir: pointer to this region's directory in the debugfs
|
||||
* filesystem
|
||||
* @pm_reqs: Power management QoS requests related to this region
|
||||
*/
|
||||
struct pseudo_lock_region {
|
||||
struct rdt_resource *r;
|
||||
struct rdt_domain *d;
|
||||
u32 cbm;
|
||||
wait_queue_head_t lock_thread_wq;
|
||||
int thread_done;
|
||||
int cpu;
|
||||
unsigned int line_size;
|
||||
unsigned int size;
|
||||
void *kmem;
|
||||
unsigned int minor;
|
||||
struct dentry *debugfs_dir;
|
||||
struct list_head pm_reqs;
|
||||
};
|
||||
|
||||
/**
|
||||
* struct rdtgroup - store rdtgroup's data in resctrl file system.
|
||||
* @kn: kernfs node
|
||||
|
@ -106,16 +171,20 @@ struct mongroup {
|
|||
* @type: indicates type of this rdtgroup - either
|
||||
* monitor only or ctrl_mon group
|
||||
* @mon: mongroup related data
|
||||
* @mode: mode of resource group
|
||||
* @plr: pseudo-locked region
|
||||
*/
|
||||
struct rdtgroup {
|
||||
struct kernfs_node *kn;
|
||||
struct list_head rdtgroup_list;
|
||||
u32 closid;
|
||||
struct cpumask cpu_mask;
|
||||
int flags;
|
||||
atomic_t waitcount;
|
||||
enum rdt_group_type type;
|
||||
struct mongroup mon;
|
||||
struct kernfs_node *kn;
|
||||
struct list_head rdtgroup_list;
|
||||
u32 closid;
|
||||
struct cpumask cpu_mask;
|
||||
int flags;
|
||||
atomic_t waitcount;
|
||||
enum rdt_group_type type;
|
||||
struct mongroup mon;
|
||||
enum rdtgrp_mode mode;
|
||||
struct pseudo_lock_region *plr;
|
||||
};
|
||||
|
||||
/* rdtgroup.flags */
|
||||
|
@ -148,6 +217,7 @@ extern struct list_head rdt_all_groups;
|
|||
extern int max_name_width, max_data_width;
|
||||
|
||||
int __init rdtgroup_init(void);
|
||||
void __exit rdtgroup_exit(void);
|
||||
|
||||
/**
|
||||
* struct rftype - describe each file in the resctrl file system
|
||||
|
@ -216,22 +286,24 @@ struct mbm_state {
|
|||
* @mbps_val: When mba_sc is enabled, this holds the bandwidth in MBps
|
||||
* @new_ctrl: new ctrl value to be loaded
|
||||
* @have_new_ctrl: did user provide new_ctrl for this domain
|
||||
* @plr: pseudo-locked region (if any) associated with domain
|
||||
*/
|
||||
struct rdt_domain {
|
||||
struct list_head list;
|
||||
int id;
|
||||
struct cpumask cpu_mask;
|
||||
unsigned long *rmid_busy_llc;
|
||||
struct mbm_state *mbm_total;
|
||||
struct mbm_state *mbm_local;
|
||||
struct delayed_work mbm_over;
|
||||
struct delayed_work cqm_limbo;
|
||||
int mbm_work_cpu;
|
||||
int cqm_work_cpu;
|
||||
u32 *ctrl_val;
|
||||
u32 *mbps_val;
|
||||
u32 new_ctrl;
|
||||
bool have_new_ctrl;
|
||||
struct list_head list;
|
||||
int id;
|
||||
struct cpumask cpu_mask;
|
||||
unsigned long *rmid_busy_llc;
|
||||
struct mbm_state *mbm_total;
|
||||
struct mbm_state *mbm_local;
|
||||
struct delayed_work mbm_over;
|
||||
struct delayed_work cqm_limbo;
|
||||
int mbm_work_cpu;
|
||||
int cqm_work_cpu;
|
||||
u32 *ctrl_val;
|
||||
u32 *mbps_val;
|
||||
u32 new_ctrl;
|
||||
bool have_new_ctrl;
|
||||
struct pseudo_lock_region *plr;
|
||||
};
|
||||
|
||||
/**
|
||||
|
@ -351,7 +423,7 @@ struct rdt_resource {
|
|||
struct rdt_cache cache;
|
||||
struct rdt_membw membw;
|
||||
const char *format_str;
|
||||
int (*parse_ctrlval) (char *buf, struct rdt_resource *r,
|
||||
int (*parse_ctrlval) (void *data, struct rdt_resource *r,
|
||||
struct rdt_domain *d);
|
||||
struct list_head evt_list;
|
||||
int num_rmid;
|
||||
|
@ -359,8 +431,8 @@ struct rdt_resource {
|
|||
unsigned long fflags;
|
||||
};
|
||||
|
||||
int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d);
|
||||
int parse_bw(char *buf, struct rdt_resource *r, struct rdt_domain *d);
|
||||
int parse_cbm(void *_data, struct rdt_resource *r, struct rdt_domain *d);
|
||||
int parse_bw(void *_buf, struct rdt_resource *r, struct rdt_domain *d);
|
||||
|
||||
extern struct mutex rdtgroup_mutex;
|
||||
|
||||
|
@ -368,7 +440,7 @@ extern struct rdt_resource rdt_resources_all[];
|
|||
extern struct rdtgroup rdtgroup_default;
|
||||
DECLARE_STATIC_KEY_FALSE(rdt_alloc_enable_key);
|
||||
|
||||
int __init rdtgroup_init(void);
|
||||
extern struct dentry *debugfs_resctrl;
|
||||
|
||||
enum {
|
||||
RDT_RESOURCE_L3,
|
||||
|
@ -439,13 +511,32 @@ void rdt_last_cmd_printf(const char *fmt, ...);
|
|||
void rdt_ctrl_update(void *arg);
|
||||
struct rdtgroup *rdtgroup_kn_lock_live(struct kernfs_node *kn);
|
||||
void rdtgroup_kn_unlock(struct kernfs_node *kn);
|
||||
int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name);
|
||||
int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name,
|
||||
umode_t mask);
|
||||
struct rdt_domain *rdt_find_domain(struct rdt_resource *r, int id,
|
||||
struct list_head **pos);
|
||||
ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
|
||||
char *buf, size_t nbytes, loff_t off);
|
||||
int rdtgroup_schemata_show(struct kernfs_open_file *of,
|
||||
struct seq_file *s, void *v);
|
||||
bool rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
|
||||
u32 _cbm, int closid, bool exclusive);
|
||||
unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r, struct rdt_domain *d,
|
||||
u32 cbm);
|
||||
enum rdtgrp_mode rdtgroup_mode_by_closid(int closid);
|
||||
int rdtgroup_tasks_assigned(struct rdtgroup *r);
|
||||
int rdtgroup_locksetup_enter(struct rdtgroup *rdtgrp);
|
||||
int rdtgroup_locksetup_exit(struct rdtgroup *rdtgrp);
|
||||
bool rdtgroup_cbm_overlaps_pseudo_locked(struct rdt_domain *d, u32 _cbm);
|
||||
bool rdtgroup_pseudo_locked_in_hierarchy(struct rdt_domain *d);
|
||||
int rdt_pseudo_lock_init(void);
|
||||
void rdt_pseudo_lock_release(void);
|
||||
int rdtgroup_pseudo_lock_create(struct rdtgroup *rdtgrp);
|
||||
void rdtgroup_pseudo_lock_remove(struct rdtgroup *rdtgrp);
|
||||
struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r);
|
||||
int update_domains(struct rdt_resource *r, int closid);
|
||||
void closid_free(int closid);
|
||||
int alloc_rmid(void);
|
||||
void free_rmid(u32 rmid);
|
||||
int rdt_get_mon_l3_config(struct rdt_resource *r);
|
||||
|
|
|
@ -64,9 +64,10 @@ static bool bw_validate(char *buf, unsigned long *data, struct rdt_resource *r)
|
|||
return true;
|
||||
}
|
||||
|
||||
int parse_bw(char *buf, struct rdt_resource *r, struct rdt_domain *d)
|
||||
int parse_bw(void *_buf, struct rdt_resource *r, struct rdt_domain *d)
|
||||
{
|
||||
unsigned long data;
|
||||
char *buf = _buf;
|
||||
|
||||
if (d->have_new_ctrl) {
|
||||
rdt_last_cmd_printf("duplicate domain %d\n", d->id);
|
||||
|
@ -87,7 +88,7 @@ int parse_bw(char *buf, struct rdt_resource *r, struct rdt_domain *d)
|
|||
* are allowed (e.g. FFFFH, 0FF0H, 003CH, etc.).
|
||||
* Additionally Haswell requires at least two bits set.
|
||||
*/
|
||||
static bool cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r)
|
||||
static bool cbm_validate(char *buf, u32 *data, struct rdt_resource *r)
|
||||
{
|
||||
unsigned long first_bit, zero_bit, val;
|
||||
unsigned int cbm_len = r->cache.cbm_len;
|
||||
|
@ -122,22 +123,64 @@ static bool cbm_validate(char *buf, unsigned long *data, struct rdt_resource *r)
|
|||
return true;
|
||||
}
|
||||
|
||||
struct rdt_cbm_parse_data {
|
||||
struct rdtgroup *rdtgrp;
|
||||
char *buf;
|
||||
};
|
||||
|
||||
/*
|
||||
* Read one cache bit mask (hex). Check that it is valid for the current
|
||||
* resource type.
|
||||
*/
|
||||
int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d)
|
||||
int parse_cbm(void *_data, struct rdt_resource *r, struct rdt_domain *d)
|
||||
{
|
||||
unsigned long data;
|
||||
struct rdt_cbm_parse_data *data = _data;
|
||||
struct rdtgroup *rdtgrp = data->rdtgrp;
|
||||
u32 cbm_val;
|
||||
|
||||
if (d->have_new_ctrl) {
|
||||
rdt_last_cmd_printf("duplicate domain %d\n", d->id);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if(!cbm_validate(buf, &data, r))
|
||||
/*
|
||||
* Cannot set up more than one pseudo-locked region in a cache
|
||||
* hierarchy.
|
||||
*/
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP &&
|
||||
rdtgroup_pseudo_locked_in_hierarchy(d)) {
|
||||
rdt_last_cmd_printf("pseudo-locked region in hierarchy\n");
|
||||
return -EINVAL;
|
||||
d->new_ctrl = data;
|
||||
}
|
||||
|
||||
if (!cbm_validate(data->buf, &cbm_val, r))
|
||||
return -EINVAL;
|
||||
|
||||
if ((rdtgrp->mode == RDT_MODE_EXCLUSIVE ||
|
||||
rdtgrp->mode == RDT_MODE_SHAREABLE) &&
|
||||
rdtgroup_cbm_overlaps_pseudo_locked(d, cbm_val)) {
|
||||
rdt_last_cmd_printf("CBM overlaps with pseudo-locked region\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/*
|
||||
* The CBM may not overlap with the CBM of another closid if
|
||||
* either is exclusive.
|
||||
*/
|
||||
if (rdtgroup_cbm_overlaps(r, d, cbm_val, rdtgrp->closid, true)) {
|
||||
rdt_last_cmd_printf("overlaps with exclusive group\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
if (rdtgroup_cbm_overlaps(r, d, cbm_val, rdtgrp->closid, false)) {
|
||||
if (rdtgrp->mode == RDT_MODE_EXCLUSIVE ||
|
||||
rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
|
||||
rdt_last_cmd_printf("overlaps with other group\n");
|
||||
return -EINVAL;
|
||||
}
|
||||
}
|
||||
|
||||
d->new_ctrl = cbm_val;
|
||||
d->have_new_ctrl = true;
|
||||
|
||||
return 0;
|
||||
|
@ -149,8 +192,10 @@ int parse_cbm(char *buf, struct rdt_resource *r, struct rdt_domain *d)
|
|||
* separated by ";". The "id" is in decimal, and must match one of
|
||||
* the "id"s for this resource.
|
||||
*/
|
||||
static int parse_line(char *line, struct rdt_resource *r)
|
||||
static int parse_line(char *line, struct rdt_resource *r,
|
||||
struct rdtgroup *rdtgrp)
|
||||
{
|
||||
struct rdt_cbm_parse_data data;
|
||||
char *dom = NULL, *id;
|
||||
struct rdt_domain *d;
|
||||
unsigned long dom_id;
|
||||
|
@ -167,15 +212,32 @@ next:
|
|||
dom = strim(dom);
|
||||
list_for_each_entry(d, &r->domains, list) {
|
||||
if (d->id == dom_id) {
|
||||
if (r->parse_ctrlval(dom, r, d))
|
||||
data.buf = dom;
|
||||
data.rdtgrp = rdtgrp;
|
||||
if (r->parse_ctrlval(&data, r, d))
|
||||
return -EINVAL;
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
|
||||
/*
|
||||
* In pseudo-locking setup mode and just
|
||||
* parsed a valid CBM that should be
|
||||
* pseudo-locked. Only one locked region per
|
||||
* resource group and domain so just do
|
||||
* the required initialization for single
|
||||
* region and return.
|
||||
*/
|
||||
rdtgrp->plr->r = r;
|
||||
rdtgrp->plr->d = d;
|
||||
rdtgrp->plr->cbm = d->new_ctrl;
|
||||
d->plr = rdtgrp->plr;
|
||||
return 0;
|
||||
}
|
||||
goto next;
|
||||
}
|
||||
}
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
static int update_domains(struct rdt_resource *r, int closid)
|
||||
int update_domains(struct rdt_resource *r, int closid)
|
||||
{
|
||||
struct msr_param msr_param;
|
||||
cpumask_var_t cpu_mask;
|
||||
|
@ -220,13 +282,14 @@ done:
|
|||
return 0;
|
||||
}
|
||||
|
||||
static int rdtgroup_parse_resource(char *resname, char *tok, int closid)
|
||||
static int rdtgroup_parse_resource(char *resname, char *tok,
|
||||
struct rdtgroup *rdtgrp)
|
||||
{
|
||||
struct rdt_resource *r;
|
||||
|
||||
for_each_alloc_enabled_rdt_resource(r) {
|
||||
if (!strcmp(resname, r->name) && closid < r->num_closid)
|
||||
return parse_line(tok, r);
|
||||
if (!strcmp(resname, r->name) && rdtgrp->closid < r->num_closid)
|
||||
return parse_line(tok, r, rdtgrp);
|
||||
}
|
||||
rdt_last_cmd_printf("unknown/unsupported resource name '%s'\n", resname);
|
||||
return -EINVAL;
|
||||
|
@ -239,7 +302,7 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
|
|||
struct rdt_domain *dom;
|
||||
struct rdt_resource *r;
|
||||
char *tok, *resname;
|
||||
int closid, ret = 0;
|
||||
int ret = 0;
|
||||
|
||||
/* Valid input requires a trailing newline */
|
||||
if (nbytes == 0 || buf[nbytes - 1] != '\n')
|
||||
|
@ -253,7 +316,15 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
|
|||
}
|
||||
rdt_last_cmd_clear();
|
||||
|
||||
closid = rdtgrp->closid;
|
||||
/*
|
||||
* No changes to pseudo-locked region allowed. It has to be removed
|
||||
* and re-created instead.
|
||||
*/
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
|
||||
ret = -EINVAL;
|
||||
rdt_last_cmd_puts("resource group is pseudo-locked\n");
|
||||
goto out;
|
||||
}
|
||||
|
||||
for_each_alloc_enabled_rdt_resource(r) {
|
||||
list_for_each_entry(dom, &r->domains, list)
|
||||
|
@ -272,17 +343,27 @@ ssize_t rdtgroup_schemata_write(struct kernfs_open_file *of,
|
|||
ret = -EINVAL;
|
||||
goto out;
|
||||
}
|
||||
ret = rdtgroup_parse_resource(resname, tok, closid);
|
||||
ret = rdtgroup_parse_resource(resname, tok, rdtgrp);
|
||||
if (ret)
|
||||
goto out;
|
||||
}
|
||||
|
||||
for_each_alloc_enabled_rdt_resource(r) {
|
||||
ret = update_domains(r, closid);
|
||||
ret = update_domains(r, rdtgrp->closid);
|
||||
if (ret)
|
||||
goto out;
|
||||
}
|
||||
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
|
||||
/*
|
||||
* If pseudo-locking fails we keep the resource group in
|
||||
* mode RDT_MODE_PSEUDO_LOCKSETUP with its class of service
|
||||
* active and updated for just the domain the pseudo-locked
|
||||
* region was requested for.
|
||||
*/
|
||||
ret = rdtgroup_pseudo_lock_create(rdtgrp);
|
||||
}
|
||||
|
||||
out:
|
||||
rdtgroup_kn_unlock(of->kn);
|
||||
return ret ?: nbytes;
|
||||
|
@ -318,10 +399,18 @@ int rdtgroup_schemata_show(struct kernfs_open_file *of,
|
|||
|
||||
rdtgrp = rdtgroup_kn_lock_live(of->kn);
|
||||
if (rdtgrp) {
|
||||
closid = rdtgrp->closid;
|
||||
for_each_alloc_enabled_rdt_resource(r) {
|
||||
if (closid < r->num_closid)
|
||||
show_doms(s, r, closid);
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
|
||||
for_each_alloc_enabled_rdt_resource(r)
|
||||
seq_printf(s, "%s:uninitialized\n", r->name);
|
||||
} else if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
|
||||
seq_printf(s, "%s:%d=%x\n", rdtgrp->plr->r->name,
|
||||
rdtgrp->plr->d->id, rdtgrp->plr->cbm);
|
||||
} else {
|
||||
closid = rdtgrp->closid;
|
||||
for_each_alloc_enabled_rdt_resource(r) {
|
||||
if (closid < r->num_closid)
|
||||
show_doms(s, r, closid);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
ret = -ENOENT;
|
||||
|
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -0,0 +1,43 @@
|
|||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
#undef TRACE_SYSTEM
|
||||
#define TRACE_SYSTEM resctrl
|
||||
|
||||
#if !defined(_TRACE_PSEUDO_LOCK_H) || defined(TRACE_HEADER_MULTI_READ)
|
||||
#define _TRACE_PSEUDO_LOCK_H
|
||||
|
||||
#include <linux/tracepoint.h>
|
||||
|
||||
TRACE_EVENT(pseudo_lock_mem_latency,
|
||||
TP_PROTO(u32 latency),
|
||||
TP_ARGS(latency),
|
||||
TP_STRUCT__entry(__field(u32, latency)),
|
||||
TP_fast_assign(__entry->latency = latency),
|
||||
TP_printk("latency=%u", __entry->latency)
|
||||
);
|
||||
|
||||
TRACE_EVENT(pseudo_lock_l2,
|
||||
TP_PROTO(u64 l2_hits, u64 l2_miss),
|
||||
TP_ARGS(l2_hits, l2_miss),
|
||||
TP_STRUCT__entry(__field(u64, l2_hits)
|
||||
__field(u64, l2_miss)),
|
||||
TP_fast_assign(__entry->l2_hits = l2_hits;
|
||||
__entry->l2_miss = l2_miss;),
|
||||
TP_printk("hits=%llu miss=%llu",
|
||||
__entry->l2_hits, __entry->l2_miss));
|
||||
|
||||
TRACE_EVENT(pseudo_lock_l3,
|
||||
TP_PROTO(u64 l3_hits, u64 l3_miss),
|
||||
TP_ARGS(l3_hits, l3_miss),
|
||||
TP_STRUCT__entry(__field(u64, l3_hits)
|
||||
__field(u64, l3_miss)),
|
||||
TP_fast_assign(__entry->l3_hits = l3_hits;
|
||||
__entry->l3_miss = l3_miss;),
|
||||
TP_printk("hits=%llu miss=%llu",
|
||||
__entry->l3_hits, __entry->l3_miss));
|
||||
|
||||
#endif /* _TRACE_PSEUDO_LOCK_H */
|
||||
|
||||
#undef TRACE_INCLUDE_PATH
|
||||
#define TRACE_INCLUDE_PATH .
|
||||
#define TRACE_INCLUDE_FILE intel_rdt_pseudo_lock_event
|
||||
#include <trace/define_trace.h>
|
|
@ -20,7 +20,9 @@
|
|||
|
||||
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
|
||||
|
||||
#include <linux/cacheinfo.h>
|
||||
#include <linux/cpu.h>
|
||||
#include <linux/debugfs.h>
|
||||
#include <linux/fs.h>
|
||||
#include <linux/sysfs.h>
|
||||
#include <linux/kernfs.h>
|
||||
|
@ -55,6 +57,8 @@ static struct kernfs_node *kn_mondata;
|
|||
static struct seq_buf last_cmd_status;
|
||||
static char last_cmd_status_buf[512];
|
||||
|
||||
struct dentry *debugfs_resctrl;
|
||||
|
||||
void rdt_last_cmd_clear(void)
|
||||
{
|
||||
lockdep_assert_held(&rdtgroup_mutex);
|
||||
|
@ -121,11 +125,65 @@ static int closid_alloc(void)
|
|||
return closid;
|
||||
}
|
||||
|
||||
static void closid_free(int closid)
|
||||
void closid_free(int closid)
|
||||
{
|
||||
closid_free_map |= 1 << closid;
|
||||
}
|
||||
|
||||
/**
|
||||
* closid_allocated - test if provided closid is in use
|
||||
* @closid: closid to be tested
|
||||
*
|
||||
* Return: true if @closid is currently associated with a resource group,
|
||||
* false if @closid is free
|
||||
*/
|
||||
static bool closid_allocated(unsigned int closid)
|
||||
{
|
||||
return (closid_free_map & (1 << closid)) == 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* rdtgroup_mode_by_closid - Return mode of resource group with closid
|
||||
* @closid: closid if the resource group
|
||||
*
|
||||
* Each resource group is associated with a @closid. Here the mode
|
||||
* of a resource group can be queried by searching for it using its closid.
|
||||
*
|
||||
* Return: mode as &enum rdtgrp_mode of resource group with closid @closid
|
||||
*/
|
||||
enum rdtgrp_mode rdtgroup_mode_by_closid(int closid)
|
||||
{
|
||||
struct rdtgroup *rdtgrp;
|
||||
|
||||
list_for_each_entry(rdtgrp, &rdt_all_groups, rdtgroup_list) {
|
||||
if (rdtgrp->closid == closid)
|
||||
return rdtgrp->mode;
|
||||
}
|
||||
|
||||
return RDT_NUM_MODES;
|
||||
}
|
||||
|
||||
static const char * const rdt_mode_str[] = {
|
||||
[RDT_MODE_SHAREABLE] = "shareable",
|
||||
[RDT_MODE_EXCLUSIVE] = "exclusive",
|
||||
[RDT_MODE_PSEUDO_LOCKSETUP] = "pseudo-locksetup",
|
||||
[RDT_MODE_PSEUDO_LOCKED] = "pseudo-locked",
|
||||
};
|
||||
|
||||
/**
|
||||
* rdtgroup_mode_str - Return the string representation of mode
|
||||
* @mode: the resource group mode as &enum rdtgroup_mode
|
||||
*
|
||||
* Return: string representation of valid mode, "unknown" otherwise
|
||||
*/
|
||||
static const char *rdtgroup_mode_str(enum rdtgrp_mode mode)
|
||||
{
|
||||
if (mode < RDT_MODE_SHAREABLE || mode >= RDT_NUM_MODES)
|
||||
return "unknown";
|
||||
|
||||
return rdt_mode_str[mode];
|
||||
}
|
||||
|
||||
/* set uid and gid of rdtgroup dirs and files to that of the creator */
|
||||
static int rdtgroup_kn_set_ugid(struct kernfs_node *kn)
|
||||
{
|
||||
|
@ -207,8 +265,12 @@ static int rdtgroup_cpus_show(struct kernfs_open_file *of,
|
|||
rdtgrp = rdtgroup_kn_lock_live(of->kn);
|
||||
|
||||
if (rdtgrp) {
|
||||
seq_printf(s, is_cpu_list(of) ? "%*pbl\n" : "%*pb\n",
|
||||
cpumask_pr_args(&rdtgrp->cpu_mask));
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
|
||||
seq_printf(s, is_cpu_list(of) ? "%*pbl\n" : "%*pb\n",
|
||||
cpumask_pr_args(&rdtgrp->plr->d->cpu_mask));
|
||||
else
|
||||
seq_printf(s, is_cpu_list(of) ? "%*pbl\n" : "%*pb\n",
|
||||
cpumask_pr_args(&rdtgrp->cpu_mask));
|
||||
} else {
|
||||
ret = -ENOENT;
|
||||
}
|
||||
|
@ -394,6 +456,13 @@ static ssize_t rdtgroup_cpus_write(struct kernfs_open_file *of,
|
|||
goto unlock;
|
||||
}
|
||||
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED ||
|
||||
rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
|
||||
ret = -EINVAL;
|
||||
rdt_last_cmd_puts("pseudo-locking in progress\n");
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
if (is_cpu_list(of))
|
||||
ret = cpulist_parse(buf, newmask);
|
||||
else
|
||||
|
@ -509,6 +578,32 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
|
|||
return ret;
|
||||
}
|
||||
|
||||
/**
|
||||
* rdtgroup_tasks_assigned - Test if tasks have been assigned to resource group
|
||||
* @r: Resource group
|
||||
*
|
||||
* Return: 1 if tasks have been assigned to @r, 0 otherwise
|
||||
*/
|
||||
int rdtgroup_tasks_assigned(struct rdtgroup *r)
|
||||
{
|
||||
struct task_struct *p, *t;
|
||||
int ret = 0;
|
||||
|
||||
lockdep_assert_held(&rdtgroup_mutex);
|
||||
|
||||
rcu_read_lock();
|
||||
for_each_process_thread(p, t) {
|
||||
if ((r->type == RDTCTRL_GROUP && t->closid == r->closid) ||
|
||||
(r->type == RDTMON_GROUP && t->rmid == r->mon.rmid)) {
|
||||
ret = 1;
|
||||
break;
|
||||
}
|
||||
}
|
||||
rcu_read_unlock();
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int rdtgroup_task_write_permission(struct task_struct *task,
|
||||
struct kernfs_open_file *of)
|
||||
{
|
||||
|
@ -570,13 +665,22 @@ static ssize_t rdtgroup_tasks_write(struct kernfs_open_file *of,
|
|||
if (kstrtoint(strstrip(buf), 0, &pid) || pid < 0)
|
||||
return -EINVAL;
|
||||
rdtgrp = rdtgroup_kn_lock_live(of->kn);
|
||||
if (!rdtgrp) {
|
||||
rdtgroup_kn_unlock(of->kn);
|
||||
return -ENOENT;
|
||||
}
|
||||
rdt_last_cmd_clear();
|
||||
|
||||
if (rdtgrp)
|
||||
ret = rdtgroup_move_task(pid, rdtgrp, of);
|
||||
else
|
||||
ret = -ENOENT;
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED ||
|
||||
rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
|
||||
ret = -EINVAL;
|
||||
rdt_last_cmd_puts("pseudo-locking in progress\n");
|
||||
goto unlock;
|
||||
}
|
||||
|
||||
ret = rdtgroup_move_task(pid, rdtgrp, of);
|
||||
|
||||
unlock:
|
||||
rdtgroup_kn_unlock(of->kn);
|
||||
|
||||
return ret ?: nbytes;
|
||||
|
@ -662,6 +766,94 @@ static int rdt_shareable_bits_show(struct kernfs_open_file *of,
|
|||
return 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* rdt_bit_usage_show - Display current usage of resources
|
||||
*
|
||||
* A domain is a shared resource that can now be allocated differently. Here
|
||||
* we display the current regions of the domain as an annotated bitmask.
|
||||
* For each domain of this resource its allocation bitmask
|
||||
* is annotated as below to indicate the current usage of the corresponding bit:
|
||||
* 0 - currently unused
|
||||
* X - currently available for sharing and used by software and hardware
|
||||
* H - currently used by hardware only but available for software use
|
||||
* S - currently used and shareable by software only
|
||||
* E - currently used exclusively by one resource group
|
||||
* P - currently pseudo-locked by one resource group
|
||||
*/
|
||||
static int rdt_bit_usage_show(struct kernfs_open_file *of,
|
||||
struct seq_file *seq, void *v)
|
||||
{
|
||||
struct rdt_resource *r = of->kn->parent->priv;
|
||||
u32 sw_shareable = 0, hw_shareable = 0;
|
||||
u32 exclusive = 0, pseudo_locked = 0;
|
||||
struct rdt_domain *dom;
|
||||
int i, hwb, swb, excl, psl;
|
||||
enum rdtgrp_mode mode;
|
||||
bool sep = false;
|
||||
u32 *ctrl;
|
||||
|
||||
mutex_lock(&rdtgroup_mutex);
|
||||
hw_shareable = r->cache.shareable_bits;
|
||||
list_for_each_entry(dom, &r->domains, list) {
|
||||
if (sep)
|
||||
seq_putc(seq, ';');
|
||||
ctrl = dom->ctrl_val;
|
||||
sw_shareable = 0;
|
||||
exclusive = 0;
|
||||
seq_printf(seq, "%d=", dom->id);
|
||||
for (i = 0; i < r->num_closid; i++, ctrl++) {
|
||||
if (!closid_allocated(i))
|
||||
continue;
|
||||
mode = rdtgroup_mode_by_closid(i);
|
||||
switch (mode) {
|
||||
case RDT_MODE_SHAREABLE:
|
||||
sw_shareable |= *ctrl;
|
||||
break;
|
||||
case RDT_MODE_EXCLUSIVE:
|
||||
exclusive |= *ctrl;
|
||||
break;
|
||||
case RDT_MODE_PSEUDO_LOCKSETUP:
|
||||
/*
|
||||
* RDT_MODE_PSEUDO_LOCKSETUP is possible
|
||||
* here but not included since the CBM
|
||||
* associated with this CLOSID in this mode
|
||||
* is not initialized and no task or cpu can be
|
||||
* assigned this CLOSID.
|
||||
*/
|
||||
break;
|
||||
case RDT_MODE_PSEUDO_LOCKED:
|
||||
case RDT_NUM_MODES:
|
||||
WARN(1,
|
||||
"invalid mode for closid %d\n", i);
|
||||
break;
|
||||
}
|
||||
}
|
||||
for (i = r->cache.cbm_len - 1; i >= 0; i--) {
|
||||
pseudo_locked = dom->plr ? dom->plr->cbm : 0;
|
||||
hwb = test_bit(i, (unsigned long *)&hw_shareable);
|
||||
swb = test_bit(i, (unsigned long *)&sw_shareable);
|
||||
excl = test_bit(i, (unsigned long *)&exclusive);
|
||||
psl = test_bit(i, (unsigned long *)&pseudo_locked);
|
||||
if (hwb && swb)
|
||||
seq_putc(seq, 'X');
|
||||
else if (hwb && !swb)
|
||||
seq_putc(seq, 'H');
|
||||
else if (!hwb && swb)
|
||||
seq_putc(seq, 'S');
|
||||
else if (excl)
|
||||
seq_putc(seq, 'E');
|
||||
else if (psl)
|
||||
seq_putc(seq, 'P');
|
||||
else /* Unused bits remain */
|
||||
seq_putc(seq, '0');
|
||||
}
|
||||
sep = true;
|
||||
}
|
||||
seq_putc(seq, '\n');
|
||||
mutex_unlock(&rdtgroup_mutex);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int rdt_min_bw_show(struct kernfs_open_file *of,
|
||||
struct seq_file *seq, void *v)
|
||||
{
|
||||
|
@ -740,6 +932,269 @@ static ssize_t max_threshold_occ_write(struct kernfs_open_file *of,
|
|||
return nbytes;
|
||||
}
|
||||
|
||||
/*
|
||||
* rdtgroup_mode_show - Display mode of this resource group
|
||||
*/
|
||||
static int rdtgroup_mode_show(struct kernfs_open_file *of,
|
||||
struct seq_file *s, void *v)
|
||||
{
|
||||
struct rdtgroup *rdtgrp;
|
||||
|
||||
rdtgrp = rdtgroup_kn_lock_live(of->kn);
|
||||
if (!rdtgrp) {
|
||||
rdtgroup_kn_unlock(of->kn);
|
||||
return -ENOENT;
|
||||
}
|
||||
|
||||
seq_printf(s, "%s\n", rdtgroup_mode_str(rdtgrp->mode));
|
||||
|
||||
rdtgroup_kn_unlock(of->kn);
|
||||
return 0;
|
||||
}
|
||||
|
||||
/**
|
||||
* rdtgroup_cbm_overlaps - Does CBM for intended closid overlap with other
|
||||
* @r: Resource to which domain instance @d belongs.
|
||||
* @d: The domain instance for which @closid is being tested.
|
||||
* @cbm: Capacity bitmask being tested.
|
||||
* @closid: Intended closid for @cbm.
|
||||
* @exclusive: Only check if overlaps with exclusive resource groups
|
||||
*
|
||||
* Checks if provided @cbm intended to be used for @closid on domain
|
||||
* @d overlaps with any other closids or other hardware usage associated
|
||||
* with this domain. If @exclusive is true then only overlaps with
|
||||
* resource groups in exclusive mode will be considered. If @exclusive
|
||||
* is false then overlaps with any resource group or hardware entities
|
||||
* will be considered.
|
||||
*
|
||||
* Return: false if CBM does not overlap, true if it does.
|
||||
*/
|
||||
bool rdtgroup_cbm_overlaps(struct rdt_resource *r, struct rdt_domain *d,
|
||||
u32 _cbm, int closid, bool exclusive)
|
||||
{
|
||||
unsigned long *cbm = (unsigned long *)&_cbm;
|
||||
unsigned long *ctrl_b;
|
||||
enum rdtgrp_mode mode;
|
||||
u32 *ctrl;
|
||||
int i;
|
||||
|
||||
/* Check for any overlap with regions used by hardware directly */
|
||||
if (!exclusive) {
|
||||
if (bitmap_intersects(cbm,
|
||||
(unsigned long *)&r->cache.shareable_bits,
|
||||
r->cache.cbm_len))
|
||||
return true;
|
||||
}
|
||||
|
||||
/* Check for overlap with other resource groups */
|
||||
ctrl = d->ctrl_val;
|
||||
for (i = 0; i < r->num_closid; i++, ctrl++) {
|
||||
ctrl_b = (unsigned long *)ctrl;
|
||||
mode = rdtgroup_mode_by_closid(i);
|
||||
if (closid_allocated(i) && i != closid &&
|
||||
mode != RDT_MODE_PSEUDO_LOCKSETUP) {
|
||||
if (bitmap_intersects(cbm, ctrl_b, r->cache.cbm_len)) {
|
||||
if (exclusive) {
|
||||
if (mode == RDT_MODE_EXCLUSIVE)
|
||||
return true;
|
||||
continue;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
/**
|
||||
* rdtgroup_mode_test_exclusive - Test if this resource group can be exclusive
|
||||
*
|
||||
* An exclusive resource group implies that there should be no sharing of
|
||||
* its allocated resources. At the time this group is considered to be
|
||||
* exclusive this test can determine if its current schemata supports this
|
||||
* setting by testing for overlap with all other resource groups.
|
||||
*
|
||||
* Return: true if resource group can be exclusive, false if there is overlap
|
||||
* with allocations of other resource groups and thus this resource group
|
||||
* cannot be exclusive.
|
||||
*/
|
||||
static bool rdtgroup_mode_test_exclusive(struct rdtgroup *rdtgrp)
|
||||
{
|
||||
int closid = rdtgrp->closid;
|
||||
struct rdt_resource *r;
|
||||
struct rdt_domain *d;
|
||||
|
||||
for_each_alloc_enabled_rdt_resource(r) {
|
||||
list_for_each_entry(d, &r->domains, list) {
|
||||
if (rdtgroup_cbm_overlaps(r, d, d->ctrl_val[closid],
|
||||
rdtgrp->closid, false))
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
/**
|
||||
* rdtgroup_mode_write - Modify the resource group's mode
|
||||
*
|
||||
*/
|
||||
static ssize_t rdtgroup_mode_write(struct kernfs_open_file *of,
|
||||
char *buf, size_t nbytes, loff_t off)
|
||||
{
|
||||
struct rdtgroup *rdtgrp;
|
||||
enum rdtgrp_mode mode;
|
||||
int ret = 0;
|
||||
|
||||
/* Valid input requires a trailing newline */
|
||||
if (nbytes == 0 || buf[nbytes - 1] != '\n')
|
||||
return -EINVAL;
|
||||
buf[nbytes - 1] = '\0';
|
||||
|
||||
rdtgrp = rdtgroup_kn_lock_live(of->kn);
|
||||
if (!rdtgrp) {
|
||||
rdtgroup_kn_unlock(of->kn);
|
||||
return -ENOENT;
|
||||
}
|
||||
|
||||
rdt_last_cmd_clear();
|
||||
|
||||
mode = rdtgrp->mode;
|
||||
|
||||
if ((!strcmp(buf, "shareable") && mode == RDT_MODE_SHAREABLE) ||
|
||||
(!strcmp(buf, "exclusive") && mode == RDT_MODE_EXCLUSIVE) ||
|
||||
(!strcmp(buf, "pseudo-locksetup") &&
|
||||
mode == RDT_MODE_PSEUDO_LOCKSETUP) ||
|
||||
(!strcmp(buf, "pseudo-locked") && mode == RDT_MODE_PSEUDO_LOCKED))
|
||||
goto out;
|
||||
|
||||
if (mode == RDT_MODE_PSEUDO_LOCKED) {
|
||||
rdt_last_cmd_printf("cannot change pseudo-locked group\n");
|
||||
ret = -EINVAL;
|
||||
goto out;
|
||||
}
|
||||
|
||||
if (!strcmp(buf, "shareable")) {
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
|
||||
ret = rdtgroup_locksetup_exit(rdtgrp);
|
||||
if (ret)
|
||||
goto out;
|
||||
}
|
||||
rdtgrp->mode = RDT_MODE_SHAREABLE;
|
||||
} else if (!strcmp(buf, "exclusive")) {
|
||||
if (!rdtgroup_mode_test_exclusive(rdtgrp)) {
|
||||
rdt_last_cmd_printf("schemata overlaps\n");
|
||||
ret = -EINVAL;
|
||||
goto out;
|
||||
}
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
|
||||
ret = rdtgroup_locksetup_exit(rdtgrp);
|
||||
if (ret)
|
||||
goto out;
|
||||
}
|
||||
rdtgrp->mode = RDT_MODE_EXCLUSIVE;
|
||||
} else if (!strcmp(buf, "pseudo-locksetup")) {
|
||||
ret = rdtgroup_locksetup_enter(rdtgrp);
|
||||
if (ret)
|
||||
goto out;
|
||||
rdtgrp->mode = RDT_MODE_PSEUDO_LOCKSETUP;
|
||||
} else {
|
||||
rdt_last_cmd_printf("unknown/unsupported mode\n");
|
||||
ret = -EINVAL;
|
||||
}
|
||||
|
||||
out:
|
||||
rdtgroup_kn_unlock(of->kn);
|
||||
return ret ?: nbytes;
|
||||
}
|
||||
|
||||
/**
|
||||
* rdtgroup_cbm_to_size - Translate CBM to size in bytes
|
||||
* @r: RDT resource to which @d belongs.
|
||||
* @d: RDT domain instance.
|
||||
* @cbm: bitmask for which the size should be computed.
|
||||
*
|
||||
* The bitmask provided associated with the RDT domain instance @d will be
|
||||
* translated into how many bytes it represents. The size in bytes is
|
||||
* computed by first dividing the total cache size by the CBM length to
|
||||
* determine how many bytes each bit in the bitmask represents. The result
|
||||
* is multiplied with the number of bits set in the bitmask.
|
||||
*/
|
||||
unsigned int rdtgroup_cbm_to_size(struct rdt_resource *r,
|
||||
struct rdt_domain *d, u32 cbm)
|
||||
{
|
||||
struct cpu_cacheinfo *ci;
|
||||
unsigned int size = 0;
|
||||
int num_b, i;
|
||||
|
||||
num_b = bitmap_weight((unsigned long *)&cbm, r->cache.cbm_len);
|
||||
ci = get_cpu_cacheinfo(cpumask_any(&d->cpu_mask));
|
||||
for (i = 0; i < ci->num_leaves; i++) {
|
||||
if (ci->info_list[i].level == r->cache_level) {
|
||||
size = ci->info_list[i].size / r->cache.cbm_len * num_b;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return size;
|
||||
}
|
||||
|
||||
/**
|
||||
* rdtgroup_size_show - Display size in bytes of allocated regions
|
||||
*
|
||||
* The "size" file mirrors the layout of the "schemata" file, printing the
|
||||
* size in bytes of each region instead of the capacity bitmask.
|
||||
*
|
||||
*/
|
||||
static int rdtgroup_size_show(struct kernfs_open_file *of,
|
||||
struct seq_file *s, void *v)
|
||||
{
|
||||
struct rdtgroup *rdtgrp;
|
||||
struct rdt_resource *r;
|
||||
struct rdt_domain *d;
|
||||
unsigned int size;
|
||||
bool sep = false;
|
||||
u32 cbm;
|
||||
|
||||
rdtgrp = rdtgroup_kn_lock_live(of->kn);
|
||||
if (!rdtgrp) {
|
||||
rdtgroup_kn_unlock(of->kn);
|
||||
return -ENOENT;
|
||||
}
|
||||
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
|
||||
seq_printf(s, "%*s:", max_name_width, rdtgrp->plr->r->name);
|
||||
size = rdtgroup_cbm_to_size(rdtgrp->plr->r,
|
||||
rdtgrp->plr->d,
|
||||
rdtgrp->plr->cbm);
|
||||
seq_printf(s, "%d=%u\n", rdtgrp->plr->d->id, size);
|
||||
goto out;
|
||||
}
|
||||
|
||||
for_each_alloc_enabled_rdt_resource(r) {
|
||||
seq_printf(s, "%*s:", max_name_width, r->name);
|
||||
list_for_each_entry(d, &r->domains, list) {
|
||||
if (sep)
|
||||
seq_putc(s, ';');
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP) {
|
||||
size = 0;
|
||||
} else {
|
||||
cbm = d->ctrl_val[rdtgrp->closid];
|
||||
size = rdtgroup_cbm_to_size(r, d, cbm);
|
||||
}
|
||||
seq_printf(s, "%d=%u", d->id, size);
|
||||
sep = true;
|
||||
}
|
||||
seq_putc(s, '\n');
|
||||
}
|
||||
|
||||
out:
|
||||
rdtgroup_kn_unlock(of->kn);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* rdtgroup information files for one cache resource. */
|
||||
static struct rftype res_common_files[] = {
|
||||
{
|
||||
|
@ -791,6 +1246,13 @@ static struct rftype res_common_files[] = {
|
|||
.seq_show = rdt_shareable_bits_show,
|
||||
.fflags = RF_CTRL_INFO | RFTYPE_RES_CACHE,
|
||||
},
|
||||
{
|
||||
.name = "bit_usage",
|
||||
.mode = 0444,
|
||||
.kf_ops = &rdtgroup_kf_single_ops,
|
||||
.seq_show = rdt_bit_usage_show,
|
||||
.fflags = RF_CTRL_INFO | RFTYPE_RES_CACHE,
|
||||
},
|
||||
{
|
||||
.name = "min_bandwidth",
|
||||
.mode = 0444,
|
||||
|
@ -853,6 +1315,22 @@ static struct rftype res_common_files[] = {
|
|||
.seq_show = rdtgroup_schemata_show,
|
||||
.fflags = RF_CTRL_BASE,
|
||||
},
|
||||
{
|
||||
.name = "mode",
|
||||
.mode = 0644,
|
||||
.kf_ops = &rdtgroup_kf_single_ops,
|
||||
.write = rdtgroup_mode_write,
|
||||
.seq_show = rdtgroup_mode_show,
|
||||
.fflags = RF_CTRL_BASE,
|
||||
},
|
||||
{
|
||||
.name = "size",
|
||||
.mode = 0444,
|
||||
.kf_ops = &rdtgroup_kf_single_ops,
|
||||
.seq_show = rdtgroup_size_show,
|
||||
.fflags = RF_CTRL_BASE,
|
||||
},
|
||||
|
||||
};
|
||||
|
||||
static int rdtgroup_add_files(struct kernfs_node *kn, unsigned long fflags)
|
||||
|
@ -883,6 +1361,103 @@ error:
|
|||
return ret;
|
||||
}
|
||||
|
||||
/**
|
||||
* rdtgroup_kn_mode_restrict - Restrict user access to named resctrl file
|
||||
* @r: The resource group with which the file is associated.
|
||||
* @name: Name of the file
|
||||
*
|
||||
* The permissions of named resctrl file, directory, or link are modified
|
||||
* to not allow read, write, or execute by any user.
|
||||
*
|
||||
* WARNING: This function is intended to communicate to the user that the
|
||||
* resctrl file has been locked down - that it is not relevant to the
|
||||
* particular state the system finds itself in. It should not be relied
|
||||
* on to protect from user access because after the file's permissions
|
||||
* are restricted the user can still change the permissions using chmod
|
||||
* from the command line.
|
||||
*
|
||||
* Return: 0 on success, <0 on failure.
|
||||
*/
|
||||
int rdtgroup_kn_mode_restrict(struct rdtgroup *r, const char *name)
|
||||
{
|
||||
struct iattr iattr = {.ia_valid = ATTR_MODE,};
|
||||
struct kernfs_node *kn;
|
||||
int ret = 0;
|
||||
|
||||
kn = kernfs_find_and_get_ns(r->kn, name, NULL);
|
||||
if (!kn)
|
||||
return -ENOENT;
|
||||
|
||||
switch (kernfs_type(kn)) {
|
||||
case KERNFS_DIR:
|
||||
iattr.ia_mode = S_IFDIR;
|
||||
break;
|
||||
case KERNFS_FILE:
|
||||
iattr.ia_mode = S_IFREG;
|
||||
break;
|
||||
case KERNFS_LINK:
|
||||
iattr.ia_mode = S_IFLNK;
|
||||
break;
|
||||
}
|
||||
|
||||
ret = kernfs_setattr(kn, &iattr);
|
||||
kernfs_put(kn);
|
||||
return ret;
|
||||
}
|
||||
|
||||
/**
|
||||
* rdtgroup_kn_mode_restore - Restore user access to named resctrl file
|
||||
* @r: The resource group with which the file is associated.
|
||||
* @name: Name of the file
|
||||
* @mask: Mask of permissions that should be restored
|
||||
*
|
||||
* Restore the permissions of the named file. If @name is a directory the
|
||||
* permissions of its parent will be used.
|
||||
*
|
||||
* Return: 0 on success, <0 on failure.
|
||||
*/
|
||||
int rdtgroup_kn_mode_restore(struct rdtgroup *r, const char *name,
|
||||
umode_t mask)
|
||||
{
|
||||
struct iattr iattr = {.ia_valid = ATTR_MODE,};
|
||||
struct kernfs_node *kn, *parent;
|
||||
struct rftype *rfts, *rft;
|
||||
int ret, len;
|
||||
|
||||
rfts = res_common_files;
|
||||
len = ARRAY_SIZE(res_common_files);
|
||||
|
||||
for (rft = rfts; rft < rfts + len; rft++) {
|
||||
if (!strcmp(rft->name, name))
|
||||
iattr.ia_mode = rft->mode & mask;
|
||||
}
|
||||
|
||||
kn = kernfs_find_and_get_ns(r->kn, name, NULL);
|
||||
if (!kn)
|
||||
return -ENOENT;
|
||||
|
||||
switch (kernfs_type(kn)) {
|
||||
case KERNFS_DIR:
|
||||
parent = kernfs_get_parent(kn);
|
||||
if (parent) {
|
||||
iattr.ia_mode |= parent->mode;
|
||||
kernfs_put(parent);
|
||||
}
|
||||
iattr.ia_mode |= S_IFDIR;
|
||||
break;
|
||||
case KERNFS_FILE:
|
||||
iattr.ia_mode |= S_IFREG;
|
||||
break;
|
||||
case KERNFS_LINK:
|
||||
iattr.ia_mode |= S_IFLNK;
|
||||
break;
|
||||
}
|
||||
|
||||
ret = kernfs_setattr(kn, &iattr);
|
||||
kernfs_put(kn);
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int rdtgroup_mkdir_info_resdir(struct rdt_resource *r, char *name,
|
||||
unsigned long fflags)
|
||||
{
|
||||
|
@ -1224,6 +1799,9 @@ void rdtgroup_kn_unlock(struct kernfs_node *kn)
|
|||
|
||||
if (atomic_dec_and_test(&rdtgrp->waitcount) &&
|
||||
(rdtgrp->flags & RDT_DELETED)) {
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
|
||||
rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
|
||||
rdtgroup_pseudo_lock_remove(rdtgrp);
|
||||
kernfs_unbreak_active_protection(kn);
|
||||
kernfs_put(rdtgrp->kn);
|
||||
kfree(rdtgrp);
|
||||
|
@ -1289,10 +1867,16 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
|
|||
rdtgroup_default.mon.mon_data_kn = kn_mondata;
|
||||
}
|
||||
|
||||
ret = rdt_pseudo_lock_init();
|
||||
if (ret) {
|
||||
dentry = ERR_PTR(ret);
|
||||
goto out_mondata;
|
||||
}
|
||||
|
||||
dentry = kernfs_mount(fs_type, flags, rdt_root,
|
||||
RDTGROUP_SUPER_MAGIC, NULL);
|
||||
if (IS_ERR(dentry))
|
||||
goto out_mondata;
|
||||
goto out_psl;
|
||||
|
||||
if (rdt_alloc_capable)
|
||||
static_branch_enable_cpuslocked(&rdt_alloc_enable_key);
|
||||
|
@ -1310,6 +1894,8 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type,
|
|||
|
||||
goto out;
|
||||
|
||||
out_psl:
|
||||
rdt_pseudo_lock_release();
|
||||
out_mondata:
|
||||
if (rdt_mon_capable)
|
||||
kernfs_remove(kn_mondata);
|
||||
|
@ -1447,6 +2033,10 @@ static void rmdir_all_sub(void)
|
|||
if (rdtgrp == &rdtgroup_default)
|
||||
continue;
|
||||
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
|
||||
rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)
|
||||
rdtgroup_pseudo_lock_remove(rdtgrp);
|
||||
|
||||
/*
|
||||
* Give any CPUs back to the default group. We cannot copy
|
||||
* cpu_online_mask because a CPU might have executed the
|
||||
|
@ -1483,6 +2073,8 @@ static void rdt_kill_sb(struct super_block *sb)
|
|||
reset_all_ctrls(r);
|
||||
cdp_disable_all();
|
||||
rmdir_all_sub();
|
||||
rdt_pseudo_lock_release();
|
||||
rdtgroup_default.mode = RDT_MODE_SHAREABLE;
|
||||
static_branch_disable_cpuslocked(&rdt_alloc_enable_key);
|
||||
static_branch_disable_cpuslocked(&rdt_mon_enable_key);
|
||||
static_branch_disable_cpuslocked(&rdt_enable_key);
|
||||
|
@ -1682,6 +2274,114 @@ out_destroy:
|
|||
return ret;
|
||||
}
|
||||
|
||||
/**
|
||||
* cbm_ensure_valid - Enforce validity on provided CBM
|
||||
* @_val: Candidate CBM
|
||||
* @r: RDT resource to which the CBM belongs
|
||||
*
|
||||
* The provided CBM represents all cache portions available for use. This
|
||||
* may be represented by a bitmap that does not consist of contiguous ones
|
||||
* and thus be an invalid CBM.
|
||||
* Here the provided CBM is forced to be a valid CBM by only considering
|
||||
* the first set of contiguous bits as valid and clearing all bits.
|
||||
* The intention here is to provide a valid default CBM with which a new
|
||||
* resource group is initialized. The user can follow this with a
|
||||
* modification to the CBM if the default does not satisfy the
|
||||
* requirements.
|
||||
*/
|
||||
static void cbm_ensure_valid(u32 *_val, struct rdt_resource *r)
|
||||
{
|
||||
/*
|
||||
* Convert the u32 _val to an unsigned long required by all the bit
|
||||
* operations within this function. No more than 32 bits of this
|
||||
* converted value can be accessed because all bit operations are
|
||||
* additionally provided with cbm_len that is initialized during
|
||||
* hardware enumeration using five bits from the EAX register and
|
||||
* thus never can exceed 32 bits.
|
||||
*/
|
||||
unsigned long *val = (unsigned long *)_val;
|
||||
unsigned int cbm_len = r->cache.cbm_len;
|
||||
unsigned long first_bit, zero_bit;
|
||||
|
||||
if (*val == 0)
|
||||
return;
|
||||
|
||||
first_bit = find_first_bit(val, cbm_len);
|
||||
zero_bit = find_next_zero_bit(val, cbm_len, first_bit);
|
||||
|
||||
/* Clear any remaining bits to ensure contiguous region */
|
||||
bitmap_clear(val, zero_bit, cbm_len - zero_bit);
|
||||
}
|
||||
|
||||
/**
|
||||
* rdtgroup_init_alloc - Initialize the new RDT group's allocations
|
||||
*
|
||||
* A new RDT group is being created on an allocation capable (CAT)
|
||||
* supporting system. Set this group up to start off with all usable
|
||||
* allocations. That is, all shareable and unused bits.
|
||||
*
|
||||
* All-zero CBM is invalid. If there are no more shareable bits available
|
||||
* on any domain then the entire allocation will fail.
|
||||
*/
|
||||
static int rdtgroup_init_alloc(struct rdtgroup *rdtgrp)
|
||||
{
|
||||
u32 used_b = 0, unused_b = 0;
|
||||
u32 closid = rdtgrp->closid;
|
||||
struct rdt_resource *r;
|
||||
enum rdtgrp_mode mode;
|
||||
struct rdt_domain *d;
|
||||
int i, ret;
|
||||
u32 *ctrl;
|
||||
|
||||
for_each_alloc_enabled_rdt_resource(r) {
|
||||
list_for_each_entry(d, &r->domains, list) {
|
||||
d->have_new_ctrl = false;
|
||||
d->new_ctrl = r->cache.shareable_bits;
|
||||
used_b = r->cache.shareable_bits;
|
||||
ctrl = d->ctrl_val;
|
||||
for (i = 0; i < r->num_closid; i++, ctrl++) {
|
||||
if (closid_allocated(i) && i != closid) {
|
||||
mode = rdtgroup_mode_by_closid(i);
|
||||
if (mode == RDT_MODE_PSEUDO_LOCKSETUP)
|
||||
break;
|
||||
used_b |= *ctrl;
|
||||
if (mode == RDT_MODE_SHAREABLE)
|
||||
d->new_ctrl |= *ctrl;
|
||||
}
|
||||
}
|
||||
if (d->plr && d->plr->cbm > 0)
|
||||
used_b |= d->plr->cbm;
|
||||
unused_b = used_b ^ (BIT_MASK(r->cache.cbm_len) - 1);
|
||||
unused_b &= BIT_MASK(r->cache.cbm_len) - 1;
|
||||
d->new_ctrl |= unused_b;
|
||||
/*
|
||||
* Force the initial CBM to be valid, user can
|
||||
* modify the CBM based on system availability.
|
||||
*/
|
||||
cbm_ensure_valid(&d->new_ctrl, r);
|
||||
if (bitmap_weight((unsigned long *) &d->new_ctrl,
|
||||
r->cache.cbm_len) <
|
||||
r->cache.min_cbm_bits) {
|
||||
rdt_last_cmd_printf("no space on %s:%d\n",
|
||||
r->name, d->id);
|
||||
return -ENOSPC;
|
||||
}
|
||||
d->have_new_ctrl = true;
|
||||
}
|
||||
}
|
||||
|
||||
for_each_alloc_enabled_rdt_resource(r) {
|
||||
ret = update_domains(r, rdtgrp->closid);
|
||||
if (ret < 0) {
|
||||
rdt_last_cmd_puts("failed to initialize allocations\n");
|
||||
return ret;
|
||||
}
|
||||
rdtgrp->mode = RDT_MODE_SHAREABLE;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
|
||||
struct kernfs_node *prgrp_kn,
|
||||
const char *name, umode_t mode,
|
||||
|
@ -1700,6 +2400,14 @@ static int mkdir_rdt_prepare(struct kernfs_node *parent_kn,
|
|||
goto out_unlock;
|
||||
}
|
||||
|
||||
if (rtype == RDTMON_GROUP &&
|
||||
(prdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
|
||||
prdtgrp->mode == RDT_MODE_PSEUDO_LOCKED)) {
|
||||
ret = -EINVAL;
|
||||
rdt_last_cmd_puts("pseudo-locking in progress\n");
|
||||
goto out_unlock;
|
||||
}
|
||||
|
||||
/* allocate the rdtgroup. */
|
||||
rdtgrp = kzalloc(sizeof(*rdtgrp), GFP_KERNEL);
|
||||
if (!rdtgrp) {
|
||||
|
@ -1840,6 +2548,10 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
|
|||
ret = 0;
|
||||
|
||||
rdtgrp->closid = closid;
|
||||
ret = rdtgroup_init_alloc(rdtgrp);
|
||||
if (ret < 0)
|
||||
goto out_id_free;
|
||||
|
||||
list_add(&rdtgrp->rdtgroup_list, &rdt_all_groups);
|
||||
|
||||
if (rdt_mon_capable) {
|
||||
|
@ -1850,15 +2562,16 @@ static int rdtgroup_mkdir_ctrl_mon(struct kernfs_node *parent_kn,
|
|||
ret = mongroup_create_dir(kn, NULL, "mon_groups", NULL);
|
||||
if (ret) {
|
||||
rdt_last_cmd_puts("kernfs subdir error\n");
|
||||
goto out_id_free;
|
||||
goto out_del_list;
|
||||
}
|
||||
}
|
||||
|
||||
goto out_unlock;
|
||||
|
||||
out_del_list:
|
||||
list_del(&rdtgrp->rdtgroup_list);
|
||||
out_id_free:
|
||||
closid_free(closid);
|
||||
list_del(&rdtgrp->rdtgroup_list);
|
||||
out_common_fail:
|
||||
mkdir_rdt_prepare_clean(rdtgrp);
|
||||
out_unlock:
|
||||
|
@ -1945,6 +2658,21 @@ static int rdtgroup_rmdir_mon(struct kernfs_node *kn, struct rdtgroup *rdtgrp,
|
|||
return 0;
|
||||
}
|
||||
|
||||
static int rdtgroup_ctrl_remove(struct kernfs_node *kn,
|
||||
struct rdtgroup *rdtgrp)
|
||||
{
|
||||
rdtgrp->flags = RDT_DELETED;
|
||||
list_del(&rdtgrp->rdtgroup_list);
|
||||
|
||||
/*
|
||||
* one extra hold on this, will drop when we kfree(rdtgrp)
|
||||
* in rdtgroup_kn_unlock()
|
||||
*/
|
||||
kernfs_get(kn);
|
||||
kernfs_remove(rdtgrp->kn);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int rdtgroup_rmdir_ctrl(struct kernfs_node *kn, struct rdtgroup *rdtgrp,
|
||||
cpumask_var_t tmpmask)
|
||||
{
|
||||
|
@ -1970,7 +2698,6 @@ static int rdtgroup_rmdir_ctrl(struct kernfs_node *kn, struct rdtgroup *rdtgrp,
|
|||
cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
|
||||
update_closid_rmid(tmpmask, NULL);
|
||||
|
||||
rdtgrp->flags = RDT_DELETED;
|
||||
closid_free(rdtgrp->closid);
|
||||
free_rmid(rdtgrp->mon.rmid);
|
||||
|
||||
|
@ -1979,14 +2706,7 @@ static int rdtgroup_rmdir_ctrl(struct kernfs_node *kn, struct rdtgroup *rdtgrp,
|
|||
*/
|
||||
free_all_child_rdtgrp(rdtgrp);
|
||||
|
||||
list_del(&rdtgrp->rdtgroup_list);
|
||||
|
||||
/*
|
||||
* one extra hold on this, will drop when we kfree(rdtgrp)
|
||||
* in rdtgroup_kn_unlock()
|
||||
*/
|
||||
kernfs_get(kn);
|
||||
kernfs_remove(rdtgrp->kn);
|
||||
rdtgroup_ctrl_remove(kn, rdtgrp);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
@ -2014,13 +2734,19 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
|
|||
* If the rdtgroup is a mon group and parent directory
|
||||
* is a valid "mon_groups" directory, remove the mon group.
|
||||
*/
|
||||
if (rdtgrp->type == RDTCTRL_GROUP && parent_kn == rdtgroup_default.kn)
|
||||
ret = rdtgroup_rmdir_ctrl(kn, rdtgrp, tmpmask);
|
||||
else if (rdtgrp->type == RDTMON_GROUP &&
|
||||
is_mon_groups(parent_kn, kn->name))
|
||||
if (rdtgrp->type == RDTCTRL_GROUP && parent_kn == rdtgroup_default.kn) {
|
||||
if (rdtgrp->mode == RDT_MODE_PSEUDO_LOCKSETUP ||
|
||||
rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
|
||||
ret = rdtgroup_ctrl_remove(kn, rdtgrp);
|
||||
} else {
|
||||
ret = rdtgroup_rmdir_ctrl(kn, rdtgrp, tmpmask);
|
||||
}
|
||||
} else if (rdtgrp->type == RDTMON_GROUP &&
|
||||
is_mon_groups(parent_kn, kn->name)) {
|
||||
ret = rdtgroup_rmdir_mon(kn, rdtgrp, tmpmask);
|
||||
else
|
||||
} else {
|
||||
ret = -EPERM;
|
||||
}
|
||||
|
||||
out:
|
||||
rdtgroup_kn_unlock(kn);
|
||||
|
@ -2046,7 +2772,8 @@ static int __init rdtgroup_setup_root(void)
|
|||
int ret;
|
||||
|
||||
rdt_root = kernfs_create_root(&rdtgroup_kf_syscall_ops,
|
||||
KERNFS_ROOT_CREATE_DEACTIVATED,
|
||||
KERNFS_ROOT_CREATE_DEACTIVATED |
|
||||
KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK,
|
||||
&rdtgroup_default);
|
||||
if (IS_ERR(rdt_root))
|
||||
return PTR_ERR(rdt_root);
|
||||
|
@ -2102,6 +2829,29 @@ int __init rdtgroup_init(void)
|
|||
if (ret)
|
||||
goto cleanup_mountpoint;
|
||||
|
||||
/*
|
||||
* Adding the resctrl debugfs directory here may not be ideal since
|
||||
* it would let the resctrl debugfs directory appear on the debugfs
|
||||
* filesystem before the resctrl filesystem is mounted.
|
||||
* It may also be ok since that would enable debugging of RDT before
|
||||
* resctrl is mounted.
|
||||
* The reason why the debugfs directory is created here and not in
|
||||
* rdt_mount() is because rdt_mount() takes rdtgroup_mutex and
|
||||
* during the debugfs directory creation also &sb->s_type->i_mutex_key
|
||||
* (the lockdep class of inode->i_rwsem). Other filesystem
|
||||
* interactions (eg. SyS_getdents) have the lock ordering:
|
||||
* &sb->s_type->i_mutex_key --> &mm->mmap_sem
|
||||
* During mmap(), called with &mm->mmap_sem, the rdtgroup_mutex
|
||||
* is taken, thus creating dependency:
|
||||
* &mm->mmap_sem --> rdtgroup_mutex for the latter that can cause
|
||||
* issues considering the other two lock dependencies.
|
||||
* By creating the debugfs directory here we avoid a dependency
|
||||
* that may cause deadlock (even though file operations cannot
|
||||
* occur until the filesystem is mounted, but I do not know how to
|
||||
* tell lockdep that).
|
||||
*/
|
||||
debugfs_resctrl = debugfs_create_dir("resctrl", NULL);
|
||||
|
||||
return 0;
|
||||
|
||||
cleanup_mountpoint:
|
||||
|
@ -2111,3 +2861,11 @@ cleanup_root:
|
|||
|
||||
return ret;
|
||||
}
|
||||
|
||||
void __exit rdtgroup_exit(void)
|
||||
{
|
||||
debugfs_remove_recursive(debugfs_resctrl);
|
||||
unregister_filesystem(&rdt_fs_type);
|
||||
sysfs_remove_mount_point(fs_kobj, "resctrl");
|
||||
kernfs_destroy_root(rdt_root);
|
||||
}
|
||||
|
|
Загрузка…
Ссылка в новой задаче