WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
David Drysdale	27d6ec7ad6	x86: hook up execveat system call Hook up x86-64, i386 and x32 ABIs. Signed-off-by: David Drysdale <drysdale@google.com> Cc: Meredydd Luff <meredydd@senatehouse.org> Cc: Shuah Khan <shuah.kh@samsung.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Rich Felker <dalias@aerifal.cx> Cc: Christoph Hellwig <hch@infradead.org> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-12-13 12:42:51 -08:00
Alexei Starovoitov	749730ce42	bpf: enable bpf syscall on x64 and i386 done as separate commit to ease conflict resolution Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-26 15:05:14 -04:00
David Herrmann	9183df25fe	shm: add memfd_create() syscall memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor that you can pass to mmap(). It can support sealing and avoids any connection to user-visible mount-points. Thus, it's not subject to quotas on mounted file-systems, but can be used like malloc()'ed memory, but with a file-descriptor to it. memfd_create() returns the raw shmem file, so calls like ftruncate() can be used to modify the underlying inode. Also calls like fstat() will return proper information and mark the file as regular file. If you want sealing, you can specify MFD_ALLOW_SEALING. Otherwise, sealing is not supported (like on all other regular files). Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not subject to a filesystem size limit. It is still properly accounted to memcg limits, though, and to the same overcommit or no-overcommit accounting as all user memory. Signed-off-by: David Herrmann <dh.herrmann@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Ryan Lortie <desrt@desrt.ca> Cc: Lennart Poettering <lennart@poettering.net> Cc: Daniel Mack <zonque@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2014-08-08 15:57:31 -07:00
Linus Torvalds	f4f142ed4e	Cleanups and bug fixes to /dev/random, add a new getrandom(2) system call, which is a superset of OpenBSD's getentropy(2) call, for use with userspace crypto libraries such as LibreSSL. Also add the ability to have a kernel thread to pull entropy from hardware rng devices into /dev/random. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJT4VkhAAoJENNvdpvBGATwGMwP/0DvcJnk8Xg2pE67GrBlkL4V ltDYZBUNI3Z9YqPFMbN02kt8jBJ4o8NVrD9XXSAmk0NbNV6pc4SdGUU7BBcms4BF DX4CasmQS1EMKOxsszlvEbj9Q25u9ODJhUKsr1ZQKe3wfjx1gKRQ1QHHcrqgbGc0 tjkBU/TW+8daza6dGYrUrO34BPeN5Y4xbBG5WmVOLGgbDH7J3ZKGzkG21R5zHraI tPJzZ3KGj+Cf1TtamBOpyF+SLqM7qi43JY/1l8LfDzJgJhB3NxOR1ig/Pk6z1qLi 2xYm1hb+EQqJGaToMXEl5fLLcYfnJmLYD/dWNq/pOVXFqC5cGxYIH1h+Nwzywvy3 hVqh4yDU5HXgu8mOMPPc23azicJflZwCNq0vTTDE+orYnb8n9Sbg0l+rUQ45BZua tVfGKT1LZuYtM0axYQ4fIfqS9bxsyRJcF6HNNaEMQJsm0V0prwlz0hXkaod1uOJd CwOn9+CpZUGCgj5paRS+zTOtcl39+X1tIhcWTHEDMpMzIqnk8KpkLGqCDisBZNBF UbjEaTA8w6tBxRX5FZ9qdmRFvsxCJH7nOxmmsaIOZ/7QXQHQNrxI2+v6yd4HWJAw yZnaVR5o6sojKc8zp9nOXQ219G1zvt4l6XyTqIP+gKWJGDKGCsMXXzEg1OchO+rI Oo8s5+ytZB9qei7QwLAf =wLqJ -----END PGP SIGNATURE----- Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull randomness updates from Ted Ts'o: "Cleanups and bug fixes to /dev/random, add a new getrandom(2) system call, which is a superset of OpenBSD's getentropy(2) call, for use with userspace crypto libraries such as LibreSSL. Also add the ability to have a kernel thread to pull entropy from hardware rng devices into /dev/random" * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: hwrng: Pass entropy to add_hwgenerator_randomness() in bits, not bytes random: limit the contribution of the hw rng to at most half random: introduce getrandom(2) system call hw_random: fix sparse warning (NULL vs 0 for pointer) random: use registers from interrupted code for CPU's w/o a cycle counter hwrng: add per-device entropy derating hwrng: create filler thread random: add_hwgenerator_randomness() for feeding entropy from devices random: use an improved fast_mix() function random: clean up interrupt entropy accounting for archs w/o cycle counters random: only update the last_pulled time if we actually transferred entropy random: remove unneeded hash of a portion of the entropy pool random: always update the entropy pool under the spinlock	2014-08-06 08:16:24 -07:00
Theodore Ts'o	c6e9d6f388	random: introduce getrandom(2) system call The getrandom(2) system call was requested by the LibreSSL Portable developers. It is analoguous to the getentropy(2) system call in OpenBSD. The rationale of this system call is to provide resiliance against file descriptor exhaustion attacks, where the attacker consumes all available file descriptors, forcing the use of the fallback code where /dev/[u]random is not available. Since the fallback code is often not well-tested, it is better to eliminate this potential failure mode entirely. The other feature provided by this new system call is the ability to request randomness from the /dev/urandom entropy pool, but to block until at least 128 bits of entropy has been accumulated in the /dev/urandom entropy pool. Historically, the emphasis in the /dev/urandom development has been to ensure that urandom pool is initialized as quickly as possible after system boot, and preferably before the init scripts start execution. This is because changing /dev/urandom reads to block represents an interface change that could potentially break userspace which is not acceptable. In practice, on most x86 desktop and server systems, in general the entropy pool can be initialized before it is needed (and in modern kernels, we will printk a warning message if not). However, on an embedded system, this may not be the case. And so with this new interface, we can provide the functionality of blocking until the urandom pool has been initialized. Any userspace program which uses this new functionality must take care to assure that if it is used during the boot process, that it will not cause the init scripts or other portions of the system startup to hang indefinitely. SYNOPSIS #include <linux/random.h> int getrandom(void buf, size_t buflen, unsigned int flags); DESCRIPTION The system call getrandom() fills the buffer pointed to by buf with up to buflen random bytes which can be used to seed user space random number generators (i.e., DRBG's) or for other cryptographic uses. It should not be used for Monte Carlo simulations or other programs/algorithms which are doing probabilistic sampling. If the GRND_RANDOM flags bit is set, then draw from the /dev/random pool instead of the /dev/urandom pool. The /dev/random pool is limited based on the entropy that can be obtained from environmental noise, so if there is insufficient entropy, the requested number of bytes may not be returned. If there is no entropy available at all, getrandom(2) will either block, or return an error with errno set to EAGAIN if the GRND_NONBLOCK bit is set in flags. If the GRND_RANDOM bit is not set, then the /dev/urandom pool will be used. Unlike using read(2) to fetch data from /dev/urandom, if the urandom pool has not been sufficiently initialized, getrandom(2) will block (or return -1 with the errno set to EAGAIN if the GRND_NONBLOCK bit is set in flags). The getentropy(2) system call in OpenBSD can be emulated using the following function: int getentropy(void buf, size_t buflen) { int ret; if (buflen > 256) goto failure; ret = getrandom(buf, buflen, 0); if (ret < 0) return ret; if (ret == buflen) return 0; failure: errno = EIO; return -1; } RETURN VALUE On success, the number of bytes that was filled in the buf is returned. This may not be all the bytes requested by the caller via buflen if insufficient entropy was present in the /dev/random pool, or if the system call was interrupted by a signal. On error, -1 is returned, and errno is set appropriately. ERRORS EINVAL An invalid flag was passed to getrandom(2) EFAULT buf is outside the accessible address space. EAGAIN The requested entropy was not available, and getentropy(2) would have blocked if the GRND_NONBLOCK flag was not set. EINTR While blocked waiting for entropy, the call was interrupted by a signal handler; see the description of how interrupted read(2) calls on "slow" devices are handled with and without the SA_RESTART flag in the signal(7) man page. NOTES For small requests (buflen <= 256) getrandom(2) will not return EINTR when reading from the urandom pool once the entropy pool has been initialized, and it will return all of the bytes that have been requested. This is the recommended way to use getrandom(2), and is designed for compatibility with OpenBSD's getentropy() system call. However, if you are using GRND_RANDOM, then getrandom(2) may block until the entropy accounting determines that sufficient environmental noise has been gathered such that getrandom(2) will be operating as a NRBG instead of a DRBG for those people who are working in the NIST SP 800-90 regime. Since it may block for a long time, these guarantees do not apply. The user may want to interrupt a hanging process using a signal, so blocking until all of the requested bytes are returned would be unfriendly. For this reason, the user of getrandom(2) MUST always check the return value, in case it returns some error, or if fewer bytes than requested was returned. In the case of !GRND_RANDOM and small request, the latter should never happen, but the careful userspace code (and all crypto code should be careful) should check for this anyway! Finally, unless you are doing long-term key generation (and perhaps not even then), you probably shouldn't be using GRND_RANDOM. The cryptographic algorithms used for /dev/urandom are quite conservative, and so should be sufficient for all purposes. The disadvantage of GRND_RANDOM is that it can block, and the increased complexity required to deal with partially fulfilled getrandom(2) requests. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Zach Brown <zab@zabbo.net>	2014-08-05 16:41:22 -04:00
Kees Cook	48dc92b9fc	seccomp: add "seccomp" syscall This adds the new "seccomp" syscall with both an "operation" and "flags" parameter for future expansion. The third argument is a pointer value, used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...). In addition to the TSYNC flag later in this patch series, there is a non-zero chance that this syscall could be used for configuring a fixed argument area for seccomp-tracer-aware processes to pass syscall arguments in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter" for this syscall. Additionally, this syscall uses operation, flags, and user pointer for arguments because strictly passing arguments via a user pointer would mean seccomp itself would be unable to trivially filter the seccomp syscall itself. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>	2014-07-18 12:13:37 -07:00
Miklos Szeredi	ab0a9358ec	i386: Wire up the renameat2() syscall The renameat2() system call was only wired up for x86-64. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Link: http://lkml.kernel.org/r/1397211951-20549-2-git-send-email-miklos@szeredi.hu Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2014-04-11 13:59:58 -07:00
Dario Faggioli	d50dde5a10	sched: Add new scheduler syscalls to support an extended scheduling parameters ABI Add the syscalls needed for supporting scheduling algorithms with extended scheduling parameters (e.g., SCHED_DEADLINE). In general, it makes possible to specify a periodic/sporadic task, that executes for a given amount of runtime at each instance, and is scheduled according to the urgency of their own timing constraints, i.e.: - a (maximum/typical) instance execution time, - a minimum interval between consecutive instances, - a time constraint by which each instance must be completed. Thus, both the data structure that holds the scheduling parameters of the tasks and the system calls dealing with it must be extended. Unfortunately, modifying the existing struct sched_param would break the ABI and result in potentially serious compatibility issues with legacy binaries. For these reasons, this patch: - defines the new struct sched_attr, containing all the fields that are necessary for specifying a task in the computational model described above; - defines and implements the new scheduling related syscalls that manipulate it, i.e., sched_setattr() and sched_getattr(). Syscalls are introduced for x86 (32 and 64 bits) and ARM only, as a proof of concept and for developing and testing purposes. Making them available on other architectures is straightforward. Since no "user" for these new parameters is introduced in this patch, the implementation of the new system calls is just identical to their already existing counterpart. Future patches that implement scheduling policies able to exploit the new data structure must also take care of modifying the sched_*attr() calls accordingly with their own purposes. Signed-off-by: Dario Faggioli <raistlin@linux.it> [ Rewrote to use sched_attr. ] Signed-off-by: Juri Lelli <juri.lelli@gmail.com> [ Removed sched_setscheduler2() for now. ] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1383831828-15501-3-git-send-email-juri.lelli@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2014-01-13 13:41:04 +01:00
Al Viro	91c2e0bcae	unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-05-09 13:46:38 -04:00
Al Viro	07b053457b	x86: sys32_kill and sys32_mprotect are pointless their argument types are identical to those of sys_kill and sys_mprotect resp., so we are not doing any kind of argument validation, etc. in those - they turn into unconditional branches to corresponding syscalls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-03-03 23:00:33 -05:00
Al Viro	56e41d3c5a	merge compat sys_ipc instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-03-03 23:00:27 -05:00
Al Viro	d5dc77bfee	consolidate compat lookup_dcookie() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-03-03 23:00:23 -05:00
Al Viro	19f4fc3aee	convert sendfile{,64} to COMPAT_SYSCALL_DEFINE Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-03-03 22:58:46 -05:00
Al Viro	3f6d078d4a	fix compat truncate/ftruncate Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-25 09:24:55 -05:00
Al Viro	561c673197	switch lseek to COMPAT_SYSCALL_DEFINE Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-24 10:52:26 -05:00
Al Viro	5b3eb3ade4	x86: switch to generic old sigaction Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:27 -05:00
Al Viro	29fd448084	x86: switch to generic compat rt_sigaction() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:26 -05:00
Al Viro	d7c43e4afb	x86: switch to generic compat sched_rr_get_interval() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:26 -05:00
Al Viro	15ce1f7154	x86,um: switch to generic old sigsuspend() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:26 -05:00
Al Viro	7b83d1a297	x86: switch to generic compat rt_sigqueueinfo() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:25 -05:00
Al Viro	f45adb0499	x86: switch to generic compat rt_sigpending() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:25 -05:00
Al Viro	49cb25e929	x86: get rid of pt_regs argument in vm86/vm86old Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:24 -05:00
Al Viro	3fe26fa34d	x86: get rid of pt_regs argument in sigreturn variants Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:24 -05:00
Al Viro	b3af11afe0	x86: get rid of pt_regs argument of iopl(2) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2013-02-03 18:16:24 -05:00
Linus Torvalds	54d46ea993	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal handling cleanups from Al Viro: "sigaltstack infrastructure + conversion for x86, alpha and um, COMPAT_SYSCALL_DEFINE infrastructure. Note that there are several conflicts between "unify SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline; resolution is trivial - just remove definitions of SS_ONSTACK and SS_DISABLED from arch//uapi/asm/signal.h; they are all identical and include/uapi/linux/signal.h contains the unified variant." Fixed up conflicts as per Al. 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: alpha: switch to generic sigaltstack new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those generic compat_sys_sigaltstack() introduce generic sys_sigaltstack(), switch x86 and um to it new helper: compat_user_stack_pointer() new helper: restore_altstack() unify SS_ONSTACK/SS_DISABLE definitions new helper: current_user_stack_pointer() missing user_stack_pointer() instances Bury the conditionals from kernel_thread/kernel_execve series COMPAT_SYSCALL_DEFINE: infrastructure	2012-12-20 18:05:28 -08:00
Al Viro	9026843952	generic compat_sys_sigaltstack() Again, conditional on CONFIG_GENERIC_SIGALTSTACK Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-12-19 18:07:41 -05:00
Al Viro	6bf9adfc90	introduce generic sys_sigaltstack(), switch x86 and um to it Conditional on CONFIG_GENERIC_SIGALTSTACK; architectures that do not select it are completely unaffected Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-12-19 18:07:40 -05:00
Linus Torvalds	7a684c452e	Nothing all that exciting; a new module-from-fd syscall for those who want to verify the source of the module (ChromeOS) and/or use standard IMA on it or other security hooks. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQ0VKlAAoJENkgDmzRrbjxjuEQALVHpD1cSmryOzVwkNn7rVGP PV3KVbUs+qzUCm2c3AafIIlSBm2LOUl+cR3uNC7di8aHarRF3VHkK2OQ4Fx97ECd KKBqAyY3R0q1mAKujb/MWwiK0YgosEDIOzGGn2yQhNFsxKqnMB02P4j82IO7+g+w Cc3XuDyWHoH2I+ySgz0Q8NHAqufD/DMZUKud7jw2Lsv6PuICJ1Oqgl/Gd/muxort 4a5tV3tjhRGywHS/8b2fbDUXkybC5NKK0FN+gyoaROmJ/THeHEQDGXZT9bc2vmVx HvRy/5k8dzQ6LAJ2mLnPvy0pmv0u7NYMvjxTxxUlUkFMkYuVticikQfwSYDbDPt4 mbsLxchpgi8z4x8HltEERffCX5tldo/5hz1uemqhqIsMRIrRFnlHkSIgkGjVHf2u LXQBLT8uTm6C0VyNQPrI/hUZzIax7WtKbPSoK9lmExNbKqloEFh/mVXvfQxei2kp wnUZcnmPIqSvw7b4CWu7HibMYu2VvGBgm3YIfJRi4AQme1mzFYLpZoxF5Pj+Ykbt T//Hb1EsNQTTFCg7MZhnJSAw/EVUvNDUoullORClyqw6+xxjVKqWpPJgYDRfWOlJ Xa+s7DNrL+Oo1WWR8l5ruoQszbR8szIyeyPKKxRUcQj2zsqghoWuzKAx2saSEw3W pNkoJU+dGC7kG/yVAS8N =uoJj -----END PGP SIGNATURE----- Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module update from Rusty Russell: "Nothing all that exciting; a new module-from-fd syscall for those who want to verify the source of the module (ChromeOS) and/or use standard IMA on it or other security hooks." * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: MODSIGN: Fix kbuild output when using default extra_certificates MODSIGN: Avoid using .incbin in C source modules: don't hand 0 to vmalloc. module: Remove a extra null character at the top of module->strtab. ASN.1: Use the ASN1_LONG_TAG and ASN1_INDEFINITE_LENGTH constants ASN.1: Define indefinite length marker constant moduleparam: use __UNIQUE_ID() __UNIQUE_ID() MODSIGN: Add modules_sign make target powerpc: add finit_module syscall. ima: support new kernel module syscall add finit_module syscall to asm-generic ARM: add finit_module syscall to ARM security: introduce kernel_module_from_file hook module: add flags arg to sys_finit_module() module: add syscall to load module from fd	2012-12-19 07:55:08 -08:00
Kees Cook	34e1169d99	module: add syscall to load module from fd As part of the effort to create a stronger boundary between root and kernel, Chrome OS wants to be able to enforce that kernel modules are being loaded only from our read-only crypto-hash verified (dm_verity) root filesystem. Since the init_module syscall hands the kernel a module as a memory blob, no reasoning about the origin of the blob can be made. Earlier proposals for appending signatures to kernel modules would not be useful in Chrome OS, since it would involve adding an additional set of keys to our kernel and builds for no good reason: we already trust the contents of our root filesystem. We don't need to verify those kernel modules a second time. Having to do signature checking on module loading would slow us down and be redundant. All we need to know is where a module is coming from so we can say yes/no to loading it. If a file descriptor is used as the source of a kernel module, many more things can be reasoned about. In Chrome OS's case, we could enforce that the module lives on the filesystem we expect it to live on. In the case of IMA (or other LSMs), it would be possible, for example, to examine extended attributes that may contain signatures over the contents of the module. This introduces a new syscall (on x86), similar to init_module, that has only two arguments. The first argument is used as a file descriptor to the module and the second argument is a pointer to the NULL terminated string of module arguments. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (merge fixes)	2012-12-14 13:05:22 +10:30
Al Viro	1d4b4b2994	x86, um: switch to generic fork/vfork/clone Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-11-28 22:13:44 -05:00
Al Viro	6783eaa2e1	x86, um/x86: switch to generic sys_execve and kernel_execve 32bit wrapper is lost on that; 64bit one is not, since we need to arrange for full pt_regs on stack when we call sys_execve() and we need to load callee-saved ones from there afterwards. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2012-09-30 22:53:32 -04:00
Cyrill Gorcunov	d97b46a646	syscalls, x86: add __NR_kcmp syscall While doing the checkpoint-restore in the user space one need to determine whether various kernel objects (like mm_struct-s of file_struct-s) are shared between tasks and restore this state. The 2nd step can be solved by using appropriate CLONE_ flags and the unshare syscall, while there's currently no ways for solving the 1st one. One of the ways for checking whether two tasks share e.g. mm_struct is to provide some mm_struct ID of a task to its proc file, but showing such info considered to be not that good for security reasons. Thus after some debates we end up in conclusion that using that named 'comparison' syscall might be the best candidate. So here is it -- __NR_kcmp. It takes up to 5 arguments - the pids of the two tasks (which characteristics should be compared), the comparison type and (in case of comparison of files) two file descriptors. Lookups for pids are done in the caller's PID namespace only. At moment only x86 is supported and tested. [akpm@linux-foundation.org: fix up selftests, warnings] [akpm@linux-foundation.org: include errno.h] [akpm@linux-foundation.org: tweak comment text] Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Andrey Vagin <avagin@openvz.org> Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Glauber Costa <glommer@parallels.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Tejun Heo <tj@kernel.org> Cc: Matt Helsley <matthltc@us.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Vasiliy Kulikov <segoon@openwall.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Valdis.Kletnieks@vt.edu Cc: Michal Marek <mmarek@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-05-31 17:49:32 -07:00
Linus Torvalds	a591afc01d	Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x32 support for x86-64 from Ingo Molnar: "This tree introduces the X32 binary format and execution mode for x86: 32-bit data space binaries using 64-bit instructions and 64-bit kernel syscalls. This allows applications whose working set fits into a 32 bits address space to make use of 64-bit instructions while using a 32-bit address space with shorter pointers, more compressed data structures, etc." Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c} * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) x32: Fix alignment fail in struct compat_siginfo x32: Fix stupid ia32/x32 inversion in the siginfo format x32: Add ptrace for x32 x32: Switch to a 64-bit clock_t x32: Provide separate is_ia32_task() and is_x32_task() predicates x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls x86/x32: Fix the binutils auto-detect x32: Warn and disable rather than error if binutils too old x32: Only clear TIF_X32 flag once x32: Make sure TS_COMPAT is cleared for x32 tasks fs: Remove missed ->fds_bits from cessation use of fd_set structs internally fs: Fix close_on_exec pointer in alloc_fdtable x32: Drop non-__vdso weak symbols from the x32 VDSO x32: Fix coding style violations in the x32 VDSO code x32: Add x32 VDSO support x32: Allow x32 to be configured x32: If configured, add x32 system calls to system call tables x32: Handle process creation x32: Signal-related system calls x86: Add #ifdef CONFIG_COMPAT to <asm/sys_ia32.h> ...	2012-03-29 18:12:23 -07:00
Thierry Reding	13354dc412	x86-32: Fix typo for mq_getsetattr in syscall table Syscall 282 was mistakenly named mq_getsetaddr instead of mq_getsetattr. When building uClibc against the Linux kernel this would result in a shared library that doesn't provide the mq_getattr() and mq_setattr() functions. Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de> Link: http://lkml.kernel.org/r/1332366608-2695-2-git-send-email-thierry.reding@avionic-design.de Cc: <stable@vger.kernel.org> v3.3 Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-03-22 12:42:41 -07:00
H. Peter Anvin	2c73ce7346	x86-64, ia32: Drop sys32_rt_sigprocmask On x86, the only difference between sys_rt_sigprocmask and sys32_rt_sigprocmask is the alignment of the data structures. However, x86 allows data accesses with arbitrary alignment, and therefore there is no reason for this code to be different. Reported-by: Gregory M. Lueck <gregory.m.lueck@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>	2012-02-20 12:48:49 -08:00
H. Peter Anvin	d181764ccf	x86: Machine-readable syscall tables and scripts to process them Create a simple set of syscall tables and scripts to turn them into both header files (unistd_*.h) and macros for generating the system call tables. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>	2011-11-17 13:35:36 -08:00

36 Коммитов