WSL2-Linux-Kernel/fs
David Howells 8feae13110 NOMMU: Make VMAs per MM as for MMU-mode linux
Make VMAs per mm_struct as for MMU-mode linux.  This solves two problems:

 (1) In SYSV SHM where nattch for a segment does not reflect the number of
     shmat's (and forks) done.

 (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
     exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
     that a VMA might be shared and already have its vm_mm assigned to another
     process or a dead process.

A new struct (vm_region) is introduced to track a mapped region and to remember
the circumstances under which it may be shared and the vm_list_struct structure
is discarded as it's no longer required.

This patch makes the following additional changes:

 (1) Regions are now allocated with alloc_pages() rather than kmalloc() and
     with no recourse to __GFP_COMP, so the pages are not composite.  Instead,
     each page has a reference on it held by the region.  Anything else that is
     interested in such a page will have to get a reference on it to retain it.
     When the pages are released due to unmapping, each page is passed to
     put_page() and will be freed when the page usage count reaches zero.

 (2) Excess pages are trimmed after an allocation as the allocation must be
     made as a power-of-2 quantity of pages.

 (3) VMAs are added to the parent MM's R/B tree and mmap lists.  As an MM may
     end up with overlapping VMAs within the tree, the VMA struct address is
     appended to the sort key.

 (4) Non-anonymous VMAs are now added to the backing inode's prio list.

 (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
     the backing region.  The VMA and region structs will be split if
     necessary.

 (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
     segment instead of all the attachments at that addresss.  Multiple
     shmat()'s return the same address under NOMMU-mode instead of different
     virtual addresses as under MMU-mode.

 (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.

 (8) /proc/maps is now the global list of mapped regions, and may list bits
     that aren't actually mapped anywhere.

 (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
     of RAM currently allocated by mmap to hold mappable regions that can't be
     mapped directly.  These are copies of the backing device or file if not
     anonymous.

These changes make NOMMU mode more similar to MMU mode.  The downside is that
NOMMU mode requires some extra memory to track things over NOMMU without this
patch (VMAs are no longer shared, and there are now region structs).

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Mike Frysinger <vapier.adi@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
2009-01-08 12:04:47 +00:00
..
9p Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
adfs vfs: Use const for kernel parser table 2008-10-13 10:10:37 -07:00
affs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-01-05 18:32:06 -08:00
afs fs: symlink write_begin allocation context fix 2009-01-04 13:33:20 -08:00
autofs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
autofs4 autofs4: fix string validation check order 2009-01-06 15:59:23 -08:00
befs befs: ensure fast symlinks are NUL-terminated 2008-12-31 18:07:40 -05:00
bfs bfs: check that filesystem fits on the blockdevice 2009-01-06 15:59:31 -08:00
cifs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-01-05 18:32:06 -08:00
coda add a vfs_fsync helper 2009-01-05 11:54:28 -05:00
configfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
cramfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
debugfs debugfs: add helpers for exporting a size_t simple value 2009-01-07 10:00:16 -08:00
devpts zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
dlm Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm 2009-01-05 19:02:09 -08:00
ecryptfs fs/ecryptfs/inode.c: cleanup kerneldoc 2009-01-06 15:59:22 -08:00
efs [PATCH] switch all filesystems over to d_obtain_alias 2008-10-23 05:13:01 -04:00
exportfs Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
ext2 nfsd race fixes: ext2 2008-12-31 18:07:43 -05:00
ext3 ext3: Add default allocation routines for quota structures 2009-01-05 08:40:25 -08:00
ext4 percpu_counter: FBC_BATCH should be a variable 2009-01-06 15:59:13 -08:00
fat Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6 2008-12-30 20:33:34 -08:00
freevxfs freevxfs: ensure fast symlinks are NUL-terminated 2008-12-31 18:07:40 -05:00
fuse Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse 2009-01-06 17:01:20 -08:00
gfs2 GFS2: Fix typo in gfs_page_mkwrite() 2009-01-07 08:58:28 +00:00
hfs CRED: Wrap task credential accesses in the HFS filesystem 2008-11-14 10:38:54 +11:00
hfsplus CRED: Wrap task credential accesses in the HFSplus filesystem 2008-11-14 10:38:54 +11:00
hostfs fs: symlink write_begin allocation context fix 2009-01-04 13:33:20 -08:00
hpfs CRED: Wrap task credential accesses in the HPFS filesystem 2008-11-14 10:38:55 +11:00
hppfs CRED: Use creds in file structs 2008-11-14 10:39:25 +11:00
hugetlbfs hugetlb: unsigned ret cannot be negative 2009-01-06 15:59:08 -08:00
isofs isofs check for NULL ->i_op in root directory is dead code 2009-01-05 11:53:38 -05:00
jbd jbd: don't give up looking for space so easily in __log_wait_for_space 2008-11-06 22:37:59 -05:00
jbd2 jbd2: Add buffer triggers 2009-01-05 08:40:30 -08:00
jffs2 fs: symlink write_begin allocation context fix 2009-01-04 13:33:20 -08:00
jfs fix the treatment of jfs special inodes 2009-01-05 11:54:29 -05:00
lockd NLM: Clean up flow of control in make_socks() function 2009-01-07 15:40:44 -05:00
minix minix: fix add link's wrong position calculation 2009-01-06 15:59:27 -08:00
ncpfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-01-07 11:31:52 -08:00
nfs fs: symlink write_begin allocation context fix 2009-01-04 13:33:20 -08:00
nfs_common SUNRPC: nfsacl_encode/nfsacl_decode should be exported as GPL-only 2008-12-23 15:21:32 -05:00
nfsd nfsd: last_byte_offset 2009-01-07 17:38:31 -05:00
nls remove CONFIG_KMOD from fs 2008-10-17 02:38:36 +11:00
notify inotify: fix type errors in interfaces 2009-01-05 11:54:29 -05:00
ntfs ntfs: don't NULL i_op 2009-01-05 11:54:27 -05:00
ocfs2 trivial: fix then -> than typos in comments and documentation 2009-01-06 11:28:06 +01:00
omfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
openpromfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
partitions block: struct device - replace bus_id with dev_name(), dev_set_name() 2009-01-06 10:44:43 -08:00
proc NOMMU: Make VMAs per MM as for MMU-mode linux 2009-01-08 12:04:47 +00:00
qnx4 SL*B: drop kmem cache argument from constructor 2008-07-26 12:00:07 -07:00
ramfs NOMMU: Fix cleanup handling in ramfs_nommu_get_umapped_area() 2009-01-08 12:04:46 +00:00
reiserfs Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 2009-01-05 18:32:43 -08:00
romfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
smbfs fs: symlink write_begin allocation context fix 2009-01-04 13:33:20 -08:00
sysfs zero i_uid/i_gid on inode allocation 2009-01-05 11:54:28 -05:00
sysv sysv: ensure fast symlinks are NUL-terminated 2008-12-31 18:07:39 -05:00
ubifs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2009-01-07 11:31:52 -08:00
udf Merge branch 'master' into next 2008-12-04 17:16:36 +11:00
ufs CRED: Wrap task credential accesses in the UFS filesystem 2008-11-14 10:39:04 +11:00
xfs trivial: fix then -> than typos in comments and documentation 2009-01-06 11:28:06 +01:00
Kconfig fs: use menuconfig to control the Misc. filesystems menu 2009-01-06 15:59:12 -08:00
Kconfig.binfmt add CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS 2008-10-20 08:52:39 -07:00
Makefile quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
aio.c aio: make the lookup_ioctx() lockless 2008-12-29 08:29:50 +01:00
anon_inodes.c anon_inodes: use fops->owner for module refcount 2008-12-31 16:55:44 +02:00
attr.c CRED: Wrap task credential accesses in the filesystem subsystem 2008-11-14 10:39:05 +11:00
bad_inode.c kill ->dir_notify() 2008-12-31 18:07:43 -05:00
binfmt_aout.c sanitize ifdefs in binfmt_aout 2009-01-03 11:45:54 -08:00
binfmt_elf.c Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6 2008-12-28 12:33:21 -08:00
binfmt_elf_fdpic.c NOMMU: Make VMAs per MM as for MMU-mode linux 2009-01-08 12:04:47 +00:00
binfmt_em86.c Allow recursion in binfmt_script and binfmt_misc 2008-10-16 11:21:38 -07:00
binfmt_flat.c CRED: Make execve() take advantage of copy-on-write credentials 2008-11-14 10:39:24 +11:00
binfmt_misc.c fs/binfmt_misc.c: add terminating newline to /proc/sys/fs/binfmt_misc/status 2009-01-06 15:59:19 -08:00
binfmt_script.c Allow recursion in binfmt_script and binfmt_misc 2008-10-16 11:21:38 -07:00
binfmt_som.c CRED: Make execve() take advantage of copy-on-write credentials 2008-11-14 10:39:24 +11:00
bio-integrity.c bio: allow individual slabs in the bio_set 2008-12-29 08:29:23 +01:00
bio.c bio: get rid of bio_vec clearing 2008-12-29 08:29:53 +01:00
block_dev.c fs: fix function param name in kernel-doc 2009-01-06 15:59:14 -08:00
buffer.c block_write_begin(): remove useless goto 2009-01-06 15:59:08 -08:00
char_dev.c fs: fix name overwrite in __register_chrdev_region() 2009-01-06 15:59:13 -08:00
compat.c add missing accounting calls to compat_sys_{readv,writev} 2009-01-06 15:59:13 -08:00
compat_binfmt_elf.c
compat_ioctl.c remove unused #include <linux/dirent.h>'s 2008-07-25 10:53:34 -07:00
dcache.c filp_cachep can be static in fs/file_table.c 2008-12-31 18:07:42 -05:00
dcookies.c shrink struct dentry 2008-12-31 18:07:38 -05:00
direct-io.c fs: truncate blocks outside i_size after O_DIRECT write error 2009-01-06 15:59:06 -08:00
dquot.c quota: Export dquot_alloc() and dquot_destroy() functions 2009-01-05 08:40:25 -08:00
drop_caches.c vfs: skip inodes without pages to free in drop_pagecache_sb() 2008-04-29 08:06:05 -07:00
eventfd.c flag parameters: check magic constants 2008-07-24 10:47:29 -07:00
eventpoll.c epoll: introduce resource usage limits 2008-12-01 19:55:24 -08:00
exec.c fs/exec.c: make do_coredump() void 2009-01-06 15:59:29 -08:00
fcntl.c Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
fifo.c [PATCH] introduce fmode_t, do annotations 2008-10-21 07:47:06 -04:00
file.c [PATCH] merge locate_fd() and get_unused_fd() 2008-08-01 11:25:23 -04:00
file_table.c filp_cachep can be static in fs/file_table.c 2008-12-31 18:07:42 -05:00
filesystems.c vfs: remove duplicate code in get_fs_type() 2009-01-05 11:54:29 -05:00
fs-writeback.c fs: sys_sync fix 2009-01-06 15:59:09 -08:00
generic_acl.c
inode.c async: make the final inode deletion an asynchronous event 2009-01-07 08:47:24 -08:00
internal.h CRED: Make execve() take advantage of copy-on-write credentials 2008-11-14 10:39:24 +11:00
ioctl.c GFS2: Support for FIEMAP ioctl 2009-01-05 07:38:46 +00:00
ioprio.c CRED: Use RCU to access another task's creds and to release a task's own creds 2008-11-14 10:39:19 +11:00
libfs.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-01-05 18:32:06 -08:00
locks.c CRED: Wrap task credential accesses in the filesystem subsystem 2008-11-14 10:39:05 +11:00
mbcache.c
mpage.c do_mpage_readpage(): remove useless clear_buffer_mapped() call 2009-01-06 15:59:01 -08:00
namei.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2009-01-05 18:32:06 -08:00
namespace.c fs/namespace.c: drop code after return 2008-12-31 18:07:38 -05:00
nfsctl.c pass a struct path * to may_open 2008-12-31 18:07:41 -05:00
no-block.c
open.c inode->i_op is never NULL 2009-01-05 11:54:28 -05:00
pipe.c sanitize audit_fd_pair() 2009-01-04 15:14:41 -05:00
pnode.c [patch 7/7] vfs: mountinfo: show dominating group id 2008-04-23 00:05:09 -04:00
pnode.h [patch 7/7] vfs: mountinfo: show dominating group id 2008-04-23 00:05:09 -04:00
posix_acl.c CRED: Wrap task credential accesses in the filesystem subsystem 2008-11-14 10:39:05 +11:00
quota.c quota: Introduce DQUOT_QUOTA_SYS_FILE flag 2009-01-05 08:36:57 -08:00
quota_tree.c quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
quota_tree.h quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
quota_v1.c quota: Move quotaio_v[12].h from include/linux/ to fs/ 2009-01-05 08:36:58 -08:00
quota_v2.c quota: Convert union in mem_dqinfo to a pointer 2009-01-05 08:40:21 -08:00
quotaio_v1.h quota: Move quotaio_v[12].h from include/linux/ to fs/ 2009-01-05 08:36:58 -08:00
quotaio_v2.h quota: Split off quota tree handling into a separate file 2009-01-05 08:40:21 -08:00
read_write.c vfs: lseek(fd, 0, SEEK_CUR) race condition 2009-01-05 11:53:07 -05:00
read_write.h
readdir.c [PATCH] prepare vfs_readdir() callers to returning filldir result 2008-10-23 05:13:10 -04:00
select.c poll: allow f_op->poll to sleep 2009-01-06 15:59:12 -08:00
seq_file.c Merge branch 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2009-01-03 12:04:39 -08:00
signalfd.c flag parameters: check magic constants 2008-07-24 10:47:29 -07:00
splice.c fs: remove prepare_write/commit_write 2008-10-30 11:38:45 -07:00
stack.c
stat.c inode->i_op is never NULL 2009-01-05 11:54:28 -05:00
super.c async: make the final inode deletion an asynchronous event 2009-01-07 08:47:24 -08:00
sync.c mm: do_sync_mapping_range integrity fix 2009-01-06 15:59:00 -08:00
timerfd.c hrtimer: convert timerfd to the new hrtimer apis 2008-09-05 21:35:09 -07:00
utimes.c [PATCH] sanitize __user_walk_fd() et.al. 2008-07-26 20:53:34 -04:00
xattr.c inode->i_op is never NULL 2009-01-05 11:54:28 -05:00
xattr_acl.c