The source for the Linux kernel used in Windows Subsystem for Linux 2 (WSL2)
Перейти к файлу
Kirill Tkhai e0295238e5 mm/list_lru.c: combine code under the same define
Patch series "Improve shrink_slab() scalability (old complexity was O(n^2), new is O(n))", v8.

This patcheset solves the problem with slow shrink_slab() occuring on
the machines having many shrinkers and memory cgroups (i.e., with many
containers).  The problem is complexity of shrink_slab() is O(n^2) and
it grows too fast with the growth of containers numbers.

Let us have 200 containers, and every container has 10 mounts and 10
cgroups.  All container tasks are isolated, and they don't touch foreign
containers mounts.

In case of global reclaim, a task has to iterate all over the memcgs and
to call all the memcg-aware shrinkers for all of them.  This means, the
task has to visit 200 * 10 = 2000 shrinkers for every memcg, and since
there are 2000 memcgs, the total calls of do_shrink_slab() are 2000 *
2000 = 4000000.

4 million calls are not a number operations, which can takes 1 cpu
cycle.  E.g., super_cache_count() accesses at least two lists, and makes
arifmetical calculations.  Even, if there are no charged objects, we do
these calculations, and replaces cpu caches by read memory.  I observed
nodes spending almost 100% time in kernel, in case of intensive writing
and global reclaim.  The writer consumes pages fast, but it's need to
shrink_slab() before the reclaimer reached shrink pages function (and
frees SWAP_CLUSTER_MAX pages).  Even if there is no writing, the
iterations just waste the time, and slows reclaim down.

Let's see the small test below:

  $echo 1 > /sys/fs/cgroup/memory/memory.use_hierarchy
  $mkdir /sys/fs/cgroup/memory/ct
  $echo 4000M > /sys/fs/cgroup/memory/ct/memory.kmem.limit_in_bytes
  $for i in `seq 0 4000`;
          do mkdir /sys/fs/cgroup/memory/ct/$i;
          echo $$ > /sys/fs/cgroup/memory/ct/$i/cgroup.procs;
          mkdir -p s/$i; mount -t tmpfs $i s/$i; touch s/$i/file;
  done

Then, let's see drop caches time (5 sequential calls):

  $time echo 3 > /proc/sys/vm/drop_caches

  0.00user 13.78system 0:13.78elapsed 99%CPU
  0.00user 5.59system 0:05.60elapsed 99%CPU
  0.00user 5.48system 0:05.48elapsed 99%CPU
  0.00user 8.35system 0:08.35elapsed 99%CPU
  0.00user 8.34system 0:08.35elapsed 99%CPU

The last four calls don't actually shrink anything.  So, the iterations
over slab shrinkers take 5.48 seconds.  Not so good for scalability.

The patchset solves the problem by making shrink_slab() of O(n)
complexity.  There are following functional actions:

1) Assign id to every registered memcg-aware shrinker.

2) Maintain per-memcgroup bitmap of memcg-aware shrinkers, and set a
   shrinker-related bit after the first element is added to lru list
   (also, when removed child memcg elements are reparanted).

3) Split memcg-aware shrinkers and !memcg-aware shrinkers, and call a
   shrinker if its bit is set in memcg's shrinker bitmap.  (Also, there is
   a functionality to clear the bit, after last element is shrinked).

This gives significant performance increase.  The result after patchset
is applied:

  $time echo 3 > /proc/sys/vm/drop_caches

  0.00user 1.10system 0:01.10elapsed 99%CPU
  0.00user 0.00system 0:00.01elapsed 64%CPU
  0.00user 0.01system 0:00.01elapsed 82%CPU
  0.00user 0.00system 0:00.01elapsed 64%CPU
  0.00user 0.01system 0:00.01elapsed 82%CPU

The results show the performance increases at least in 548 times.

So, the patchset makes shrink_slab() of less complexity and improves the
performance in such types of load I pointed.  This will give a profit in
case of !global reclaim case, since there also will be less
do_shrink_slab() calls.

This patch (of 17):

These two pairs of blocks of code are under the same #ifdef #else
#endif.

Link: http://lkml.kernel.org/r/153063052519.1818.9393587113056959488.stgit@localhost.localdomain
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Tested-by: Shakeel Butt <shakeelb@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Sahitya Tummala <stummala@codeaurora.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Roman Gushchin <guro@fb.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Waiman Long <longman@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Li RongQing <lirongqing@baidu.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:30 -07:00
Documentation tools/vm/page-types.c: add support for idle page tracking 2018-08-17 16:20:28 -07:00
LICENSES LICENSES: Add Linux-OpenIB license text 2018-04-27 16:41:53 -06:00
arch mm: convert return type of handle_mm_fault() caller to vm_fault_t 2018-08-17 16:20:28 -07:00
block for-4.19/block-20180812 2018-08-14 10:23:25 -07:00
certs Replace magic for trusting the secondary keyring with #define 2018-08-16 09:57:20 -07:00
crypto Replace magic for trusting the secondary keyring with #define 2018-08-16 09:57:20 -07:00
drivers mm: provide a fallback for PAGE_KERNEL_RO for architectures 2018-08-17 16:20:29 -07:00
firmware kbuild: remove all dummy assignments to obj- 2017-11-18 11:46:06 +09:00
fs fs, mm: account buffer_head to kmemcg 2018-08-17 16:20:30 -07:00
include memcg, oom: move out_of_memory back to the charge path 2018-08-17 16:20:30 -07:00
init Consolidation of Kconfig files by Christoph Hellwig. 2018-08-15 13:05:12 -07:00
ipc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-08-15 15:04:25 -07:00
kernel kernel/memremap, kasan: make ZONE_DEVICE with work with KASAN 2018-08-17 16:20:30 -07:00
lib SCSI misc on 20180815 2018-08-15 22:06:26 -07:00
mm mm/list_lru.c: combine code under the same define 2018-08-17 16:20:30 -07:00
net Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2018-08-15 16:01:47 -07:00
samples Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2018-08-15 15:04:25 -07:00
scripts scripts: add Python 3 compatibility to spdxcheck.py 2018-08-17 16:20:27 -07:00
security Merge branch 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security 2018-08-15 22:54:12 -07:00
sound ALSA: update dell-wmi mic-mute registration to new world order 2018-08-15 19:08:10 -07:00
tools tools/vm/page-types.c: add support for idle page tracking 2018-08-17 16:20:28 -07:00
usr kbuild: rename built-in.o to built-in.a 2018-03-26 02:01:19 +09:00
virt arm64 updates for 4.19 2018-08-14 16:39:13 -07:00
.clang-format clang-format: Set IndentWrappedFunctionNames false 2018-08-01 18:38:51 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Add hch to .get_maintainer.ignore 2015-08-21 14:30:10 -07:00
.gitattributes .gitattributes: set git diff driver for C source code files 2016-10-07 18:46:30 -07:00
.gitignore Kbuild updates for v4.17 (2nd) 2018-04-15 17:21:30 -07:00
.mailmap mailmap: remap some of my email addresses to kernel.org address 2018-08-06 13:15:16 -04:00
COPYING COPYING: use the new text with points to the license files 2018-03-23 12:41:45 -06:00
CREDITS 9p: remove Ron Minnich from MAINTAINERS 2018-08-17 16:20:26 -07:00
Kbuild Kbuild updates for v4.15 2017-11-17 17:45:29 -08:00
Kconfig kconfig: move the "Executable file formats" menu to fs/Kconfig.binfmt 2018-08-02 08:06:55 +09:00
MAINTAINERS 9p: add Dominique Martinet to MAINTAINERS 2018-08-17 16:20:27 -07:00
Makefile Kconfig updates for v4.19 2018-08-15 12:50:10 -07:00
README Docs: Added a pointer to the formatted docs to README 2018-03-21 09:02:53 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
See Documentation/00-INDEX for a list of what is contained in each file.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.