Seth Forshee reports that in 4.8-rcN some automounts are failing
because the requesting the automount changed.
The relevant call path is:
follow_automount()
->d_automount
autofs4_d_automount
autofs4_mount_wait
autofs4_wait
In autofs4_wait wq_uid and wq_gid are set to current_uid() and
current_gid respectively. With follow_automount now overriding creds
uid that we export to userspace changes and that breaks existing
setups.
To remove the regression set wq_uid and wq_gid from
current_real_cred()->uid and current_real_cred()->gid respectively.
This restores the current behavior as current->real_cred is identical
to current->cred except when override creds are used.
Cc: stable@vger.kernel.org
Fixes: aeaa4a79ff ("fs: Call d_automount with the filesystems creds")
Reported-by: Seth Forshee <seth.forshee@canonical.com>
Tested-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
CAI Qian <caiqian@redhat.com> pointed out that the semantics
of shared subtrees make it possible to create an exponentially
increasing number of mounts in a mount namespace.
mkdir /tmp/1 /tmp/2
mount --make-rshared /
for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done
Will create create 2^20 or 1048576 mounts, which is a practical problem
as some people have managed to hit this by accident.
As such CVE-2016-6213 was assigned.
Ian Kent <raven@themaw.net> described the situation for autofs users
as follows:
> The number of mounts for direct mount maps is usually not very large because of
> the way they are implemented, large direct mount maps can have performance
> problems. There can be anywhere from a few (likely case a few hundred) to less
> than 10000, plus mounts that have been triggered and not yet expired.
>
> Indirect mounts have one autofs mount at the root plus the number of mounts that
> have been triggered and not yet expired.
>
> The number of autofs indirect map entries can range from a few to the common
> case of several thousand and in rare cases up to between 30000 and 50000. I've
> not heard of people with maps larger than 50000 entries.
>
> The larger the number of map entries the greater the possibility for a large
> number of active mounts so it's not hard to expect cases of a 1000 or somewhat
> more active mounts.
So I am setting the default number of mounts allowed per mount
namespace at 100,000. This is more than enough for any use case I
know of, but small enough to quickly stop an exponential increase
in mounts. Which should be perfect to catch misconfigurations and
malfunctioning programs.
For anyone who needs a higher limit this can be changed by writing
to the new /proc/sys/fs/mount-max sysctl.
Tested-by: CAI Qian <caiqian@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
With the newly enforced limit on the number of namespaces,
we get a build warning if CONFIG_NETNS is disabled:
net/core/net_namespace.c:273:13: error: 'dec_net_namespaces' defined but not used [-Werror=unused-function]
net/core/net_namespace.c:268:24: error: 'inc_net_namespaces' defined but not used [-Werror=unused-function]
This moves the two added functions inside the #ifdef that guards
their callers.
Fixes: 703286608a ("netns: Add a limit on the number of net namespaces")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Move mntget from the very beginning of __ns_get_path to
the success path of __ns_get_path, and remove the mntget
calls.
This removes the possibility that there will be a mntget/mntput
pair of __ns_get_path has to retry, and generally simplifies the code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
From: Andrey Vagin <avagin@openvz.org>
Each namespace has an owning user namespace and now there is not way
to discover these relationships.
Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships too.
Why we may want to know relationships between namespaces?
One use would be visualization, in order to understand the running
system. Another would be to answer the question: what capability does
process X have to perform operations on a resource governed by namespace
Y?
One more use-case (which usually called abnormal) is checkpoint/restart.
In CRIU we are going to dump and restore nested namespaces.
There [1] was a discussion about which interface to choose to determing
relationships between namespaces.
Eric suggested to add two ioctl-s [2]:
> Grumble, Grumble. I think this may actually a case for creating ioctls
> for these two cases. Now that random nsfs file descriptors are bind
> mountable the original reason for using proc files is not as pressing.
>
> One ioctl for the user namespace that owns a file descriptor.
> One ioctl for the parent namespace of a namespace file descriptor.
Here is an implementaions of these ioctl-s.
$ man man7/namespaces.7
...
Since Linux 4.X, the following ioctl(2) calls are supported for
namespace file descriptors. The correct syntax is:
fd = ioctl(ns_fd, ioctl_type);
where ioctl_type is one of the following:
NS_GET_USERNS
Returns a file descriptor that refers to an owning user names‐
pace.
NS_GET_PARENT
Returns a file descriptor that refers to a parent namespace.
This ioctl(2) can be used for pid and user namespaces. For
user namespaces, NS_GET_PARENT and NS_GET_USERNS have the same
meaning.
In addition to generic ioctl(2) errors, the following specific ones
can occur:
EINVAL NS_GET_PARENT was called for a nonhierarchical namespace.
EPERM The requested namespace is outside of the current namespace
scope.
[1] https://lkml.org/lkml/2016/7/6/158
[2] https://lkml.org/lkml/2016/7/9/101
Changes for v2:
* don't return ENOENT for init_user_ns and init_pid_ns. There is nothing
outside of the init namespace, so we can return EPERM in this case too.
> The fewer special cases the easier the code is to get
> correct, and the easier it is to read. // Eric
Changes for v3:
* rename ns->get_owner() to ns->owner(). get_* usually means that it
grabs a reference.
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com>
Cc: "W. Trevor King" <wking@tremily.us>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
There are two new ioctl-s:
One ioctl for the user namespace that owns a file descriptor.
One ioctl for the parent namespace of a namespace file descriptor.
The test checks that these ioctl-s works and that they handle a case
when a target namespace is outside of the current process namespace.
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Pid and user namepaces are hierarchical. There is no way to discover
parent-child relationships.
In a future we will use this interface to dump and restore nested
namespaces.
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Each namespace has an owning user namespace and now there is not way
to discover these relationships.
Understending namespaces relationships allows to answer the question:
what capability does process X have to perform operations on a resource
governed by namespace Y?
After a long discussion, Eric W. Biederman proposed to use ioctl-s for
this purpose.
The NS_GET_USERNS ioctl returns a file descriptor to an owning user
namespace.
It returns EPERM if a target namespace is outside of a current user
namespace.
v2: rename parent to relative
v3: Add a missing mntput when returning -EAGAIN --EWB
Acked-by: Serge Hallyn <serge@hallyn.com>
Link: https://lkml.org/lkml/2016/7/6/158
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Return -EPERM if an owning user namespace is outside of a process
current user namespace.
v2: In a first version ns_get_owner returned ENOENT for init_user_ns.
This special cases was removed from this version. There is nothing
outside of init_user_ns, so we can return EPERM.
v3: rename ns->get_owner() to ns->owner(). get_* usually means that it
grabs a reference.
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
In 99.99% of the cases only root in a user namespace can mount /dev/pts
and in those cases the owner of /dev/pts/ptmx will remain root.root
In the oddball case where someone else has CAP_SYS_ADMIN this code
modifies the /dev/pts mount code to use current_fsuid and current_fsgid
as the values to use when creating the /dev/ptmx inode. As is done
when any other file is created.
This is a code simplification, and it allows running without a root
user entirely.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
devpts does not and never will have anything to sync
so don't bother calling sync_filesystems on remount.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Now that all of the work of setting up a superblock has been moved to
devpts_fill_super simplify devpts_mount by calling mount_nodev instead
of rolling mount_nodev by hand.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The current error codes returned when a the per user per user
namespace limit are hit (EINVAL, EUSERS, and ENFILE) are wrong. I
asked for advice on linux-api and it we made clear that those were
the wrong error code, but a correct effor code was not suggested.
The best general error code I have found for hitting a resource limit
is ENOSPC. It is not perfect but as it is unambiguous it will serve
until someone comes up with a better error code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
v2: Fixed the very obvious lack of setting ucounts
on struct mnt_ns reported by Andrei Vagin, and the kbuild
test report.
Reported-by: Andrei Vagin <avagin@openvz.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The same kind of recursive sane default limit and policy
countrol that has been implemented for the user namespace
is desirable for the other namespaces, so generalize
the user namespace refernce count into a ucount.
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Add a structure that is per user and per user ns and use it to hold
the count of user namespaces. This makes prevents one user from
creating denying service to another user by creating the maximum
number of user namespaces.
Rename the sysctl export of the maximum count from
/proc/sys/userns/max_user_namespaces to /proc/sys/user/max_user_namespaces
to reflect that the count is now per user.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Export the export the maximum number of user namespaces as
/proc/sys/userns/max_user_namespaces.
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Limit per userns sysctls to only be opened for write by a holder
of CAP_SYS_RESOURCE.
Add all of the necessary boilerplate for having per user namespace
sysctls.
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Add the necessary boiler plate to move freeing of user namespaces into
work queue and thus into process context where things can sleep.
This is a necessary precursor to per user namespace sysctls.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Passing nsproxy into sysctl_table_root.lookup was a premature
optimization in attempt to avoid depending on current. The
directory /proc/self/sys has not appeared and if and when
it does this code will need to be reviewed closely and reworked
anyway. So remove the premature optimization.
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Pull more block fixes from Jens Axboe:
"As mentioned in the pull the other day, a few more fixes for this
round, all related to the bio op changes in this series.
Two fixes, and then a cleanup, renaming bio->bi_rw to bio->bi_opf. I
wanted to do that change right after or right before -rc1, so that
risk of conflict was reduced. I just rebased the series on top of
current master, and no new ->bi_rw usage has snuck in"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: rename bio bi_rw to bi_opf
target: iblock_execute_sync_cache() should use bio_set_op_attrs()
mm: make __swap_writepage() use bio_set_op_attrs()
block/mm: make bdev_ops->rw_page() take a bool for read/write
Pull drm zpos property support from Dave Airlie:
"This tree was waiting on some media stuff I hadn't had time to get a
stable branchpoint off, so I just waited until it was all in your tree
first.
It's been around a bit on the list and shouldn't affect anything
outside adding the generic API and moving some ARM drivers to using
it"
* tag 'drm-for-v4.8-zpos' of git://people.freedesktop.org/~airlied/linux:
drm: rcar: use generic code for managing zpos plane property
drm/exynos: use generic code for managing zpos plane property
drm: sti: use generic zpos for plane
drm: add generic zpos property
Since commit 63a4cc2486, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.
No intended functional changes in this commit.
Signed-off-by: Jens Axboe <axboe@fb.com>
The original commit missed this function, it needs to mark it a
write flush.
Cc: Mike Christie <mchristi@redhat.com>
Fixes: e742fc32fc ("target: use bio op accessors")
Signed-off-by: Jens Axboe <axboe@fb.com>
Commit abf545484d changed it from an 'rw' flags type to the
newer ops based interface, but now we're effectively leaking
some bdev internals to the rest of the kernel. Since we only
care about whether it's a read or a write at that level, just
pass in a bool 'is_write' parameter instead.
Then we can also move op_is_write() and friends back under
CONFIG_BLOCK protection.
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
"make help" if sphinx isn't present.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXo8sIAAoJEI3ONVYwIuV6po0P/0ZZo+YF0GrPvOHr7uuUqAND
0+4WRfSsT74z5Rn/W3apeX6CM7IGBMSR2zM89E2nWmbE2Uo7bIbrwj6C+Y6gMMfd
aws0Xi9899Jr6hVkeFVZ9foze+M2yc3tE1vFBby035uW3Zwyz2XHzaU/9vyLOLkJ
c7jhqCWebqFEqOSWtw2FZYegt2oHSjUsQgGCh3kk2pCU+DzLHntwbblJLeMuTy+h
zPVxTTBcBkUZcIjpkSvhqc/oCLCiWKLElmwxPBwfpNU9UlE0rol2Lx1eLClxadFl
QVlb1UAIjPcLnHQoM8dL9NR0tkfGopIDuNCL26GU5ie9N4zurOj5a6hj+G5mZKLB
tsMqIw+N7ig5FnaQhaCx3oN/VMZ0djxURu9XvKsHBmOCd2Bp8SDoqpCkTwCqCxcN
DVdUjpS1WUT9w2A1jhH32mx+23eRwJa5oaTFpM3Y0z7Bl9N40ScY2WJcgBKWqHgx
LRROJAzNOPojbBkwTDNsRValwgtutCcqaRw5mNQTp3YjjmltmqylCvJH3AST+z5r
CmMDO96O3rUGsCZYoBhxafC2FUUh5RkUwQq/Cy8nrioMookE3Yd5A9DN6wWQ2pTt
tev/z6s3ov8dygeF6u3noOHCa8GPUpSHO62FyHUKYnn6Tl8xh3x7rmUkUqrJZi5a
dnXOZzp34eVhev5xDeDN
=iD7L
-----END PGP SIGNATURE-----
Merge tag 'doc-4.8-fixes' of git://git.lwn.net/linux
Pull documentation fixes from Jonathan Corbet:
"Three fixes for the docs build, including removing an annoying warning
on 'make help' if sphinx isn't present"
* tag 'doc-4.8-fixes' of git://git.lwn.net/linux:
DocBook: use DOCBOOKS="" to ignore DocBooks instead of IGNORE_DOCBOOKS=1
Documenation: update cgroup's document path
Documentation/sphinx: do not warn about missing tools in 'make help'
First off, the intention of this pull is to declare that I'll be the
binfmt_misc maintainer (mainly on the grounds of you touched it last,
it's yours). There's no MAINTAINERS entry, but get_maintainers.pl
will now finger me.
The update itself is to allow architecture emulation containers to
function such that the emulation binary can be housed outside the
container itself. The container and fs parts both have acks from
relevant experts.
The change is user visible. To use the new feature you have to add an
F option to your binfmt_misc configuration. However, the existing
tools, like systemd-binfmt work with this without modification.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJXmW5WAAoJEAVr7HOZEZN4K1QQAKgx5MPkoTU3QKKgzaMBBnWH
pSMdoN8BhVSwENE/YJGMEyLaRa0zmrHVtFcnH2CHQE/GoXNnaej9l3LtBIwJ9K2P
nrv4Rlhla5BxjhDkg8IWf3iG7iKDDHGZoyuVPx4dwxHFK1yCNH4SDeHaJCKK5qsC
aLltMJMRnjsgJvBUC01dCUlp8srkWywHcyk9M9ic/Fr5vJ6JzdUr6/Md29eHmAXe
NgCGwkVgSDiKfnTGZjIMsAtpwPsJ6RqBWQTcTdM/mkIpqwrMiVuaVOHqu2cmMU2i
j4cQE6rQpy3sedDKZbHBQMOfYJNT4QYgYGuvyIWce9EPkIpOWHzQ7kYPJ/A/jZCE
lN37TeyodbUDCnyuKk1YOrTBjJ0qdtc4FXJ1aq5s92GkgDs+LtxMdGzKDf3yUGiU
W0TsE/wVy4rmEaeiyut33661ud4vivP4WklWK1Y+bklQcIcKQKKWnOCnDFDR5vuz
CbL5ykVcJb3F28YhGYHvGLeXl0YcR3SwngWnnPCDPtBCeSirohuKb1SEe21C/RaB
rm9S27d+LcKCXJyCqKh8BGsqroZ0iSZQI0Lbdqt+BCuuBw2rQhGStDeccDDUp9jg
MOwpQwabjEseK0n75+hZ2SFS5Q+TQ6pccMlUJIDiBKWmRly8NpKlSKKWvBX8obIe
0Gq6hgX1IwQnXI1O8QMC
=6OjN
-----END PGP SIGNATURE-----
Merge tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc
Pull binfmt_misc update from James Bottomley:
"This update is to allow architecture emulation containers to function
such that the emulation binary can be housed outside the container
itself. The container and fs parts both have acks from relevant
experts.
To use the new feature you have to add an F option to your binfmt_misc
configuration"
From the docs:
"The usual behaviour of binfmt_misc is to spawn the binary lazily when
the misc format file is invoked. However, this doesn't work very well
in the face of mount namespaces and changeroots, so the F mode opens
the binary as soon as the emulation is installed and uses the opened
image to spawn the emulator, meaning it is always available once
installed, regardless of how the environment changes"
* tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc:
binfmt_misc: add F option description to documentation
binfmt_misc: add persistent opened binary handler for containers
fs: add filp_clone_open API
In most cases, EPERM is returned on immutable inode, and there're only a
few places returning EACCES. I noticed this when running LTP on
overlayfs, setxattr03 failed due to unexpected EACCES on immutable
inode.
So converting all EACCES to EPERM on immutable inode.
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull more vfs updates from Al Viro:
"Assorted cleanups and fixes.
In the "trivial API change" department - ->d_compare() losing 'parent'
argument"
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
cachefiles: Fix race between inactivating and culling a cache object
9p: use clone_fid()
9p: fix braino introduced in "9p: new helper - v9fs_parent_fid()"
vfs: make dentry_needs_remove_privs() internal
vfs: remove file_needs_remove_privs()
vfs: fix deadlock in file_remove_privs() on overlayfs
get rid of 'parent' argument of ->d_compare()
cifs, msdos, vfat, hfs+: don't bother with parent in ->d_compare()
affs ->d_compare(): don't bother with ->d_inode
fold _d_rehash() and __d_rehash() together
fold dentry_rcuwalk_invalidate() into its only remaining caller
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXpRdVAAoJEK3oKUf0dfod2tkP/24f1Znl9OQEPHoSZty9nXF0
dSjOzE2lHbR4xjjuYjbn1siFnIX0A5nPPqleBYmt3gatiO+24vE1BiNWjM6Y/y7r
3KHENRqmfSj26ha6wl/TUNaKnuFooBcQ0BaHI1IExFROitOSvZgPJPSrk29AH/Er
OVJkaoi3N3o9mrfUpF9/M55Yi/DhQiPBYxkqcXvaqcakbL91EIj5TLZ72MJqgfje
d6og33zxb21EDx9eIJEA0cWX4MLO2UQqFAuiJLzk2RkSAm6vRjbRJyYGG9jv81tP
9ZX1gAw47v0qk3nPVyAgbi862ukYCYzmr1g2b4S2b0UKLXxQb8Fw8D2mRbFXl2wg
wq0nKLg9jwsd8Yo7k8qOrUI9nl/E9Ytmj8t92Y49XvPjtsVFZREoCw3ojyjmlyZA
9BywL5BzMHF6SsXe6LBGJpoebrxCnq5176FREBnpmH7UHM0BcWa4YSekQShwg3DW
PFlBOxk5saz4Ktr5V3YUY+G6XgZ/AXWKlDox5+dESLIOgG0hyzbiVbPNSTQgDrnR
m9yUJPef1NQj2JWSZbqKn7FSZDO6/IT2aeokn1KuoaDJww5HC80juyB1VThmpZnl
QJGN6nmsYDVCLYjbT6scAzyGMYw9ZVhTM7eEk3kqAtCBf/nEyqJM+H0HYUDjfg9B
cG5cRtZNDDkc30lFezJX
=nXKv
-----END PGP SIGNATURE-----
Merge tag 'xfs-rmap-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull more xfs updates from Dave Chinner:
"This is the second part of the XFS updates for this merge cycle, and
contains the new reverse block mapping feature for XFS.
Reverse mapping allows us to track the owner of a specific block on
disk precisely. It is implemented as a set of btrees (one per
allocation group) that track the owners of allocated extents.
Effectively it is a "used space tree" that is updated when we allocate
or free extents. i.e. it is coherent with the free space btrees we
already maintain and never overlaps with them.
This reverse mapping infrastructure is the building block of several
upcoming features - reflink, copy-on-write data, dedupe, online
metadata and data scrubbing, highly accurate bad sector/data loss
reporting to users, and significantly improved reconstruction of
damaged and corrupted filesystems. There's a lot of new stuff coming
along in the next couple of cycles,a nd it all builds in the rmap
infrastructure.
As such, it's a huge chunk of new code with new on-disk format
features and internal infrastructure. It warns at mount time as an
experimental feature and that it may eat data (as we do with all new
on-disk features until they stabilise). We have not released
userspace suport for it yet - userspace support currently requires
download from Darrick's xfsprogs repo and build from source, so the
access to this feature is really developer/tester only at this point.
Initial userspace support will be released at the same time kernel
with this code in it is released.
The new rmap enabled code regresses 3 xfstests - all are ENOSPC
related corner cases, one of which Darrick posted a fix for a few
hours ago. The other two are fixed by infrastructure that is part of
the upcoming reflink patchset. This new ENOSPC infrastructure
requires a on-disk format tweak required to keep mount times in
check - we need to keep an on-disk count of allocated rmapbt blocks so
we don't have to scan the entire btrees at mount time to count them.
This is currently being tested and will be part of the fixes sent in
the next week or two so users will not be exposed to this change"
* tag 'xfs-rmap-for-linus-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (52 commits)
xfs: move (and rename) the deferred bmap-free tracepoints
xfs: collapse single use static functions
xfs: remove unnecessary parentheses from log redo item recovery functions
xfs: remove the extents array from the rmap update done log item
xfs: in btree_lshift, only allocate temporary cursor when needed
xfs: remove unnecesary lshift/rshift key initialization
xfs: remove the get*keys and update_keys btree ops pointers
xfs: enable the rmap btree functionality
xfs: don't update rmapbt when fixing agfl
xfs: disable XFS_IOC_SWAPEXT when rmap btree is enabled
xfs: add rmap btree block detection to log recovery
xfs: add rmap btree geometry feature flag
xfs: propagate bmap updates to rmapbt
xfs: enable the xfs_defer mechanism to process rmaps to update
xfs: log rmap intent items
xfs: create rmap update intent log items
xfs: add rmap btree insert and delete helpers
xfs: convert unwritten status of reverse mappings
xfs: remove an extent from the rmap btree
xfs: add an extent to the rmap btree
...
Pull qstr constification updates from Al Viro:
"Fairly self-contained bunch - surprising lot of places passes struct
qstr * as an argument when const struct qstr * would suffice; it
complicates analysis for no good reason.
I'd prefer to feed that separately from the assorted fixes (those are
in #for-linus and with somewhat trickier topology)"
* 'work.const-qstr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
qstr: constify instances in adfs
qstr: constify instances in lustre
qstr: constify instances in f2fs
qstr: constify instances in ext2
qstr: constify instances in vfat
qstr: constify instances in procfs
qstr: constify instances in fuse
qstr constify instances in fs/dcache.c
qstr: constify instances in nfs
qstr: constify instances in ocfs2
qstr: constify instances in autofs4
qstr: constify instances in hfs
qstr: constify instances in hfsplus
qstr: constify instances in logfs
qstr: constify dentry_init_security
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXpbwiAAoJEAhfPr2O5OEV8fsP/RgrPl3LqFZyRgFBigWgumQ1
JdWSVs0Vhi7/D7B+MfUI5vdSygzSHQo9bukrMdwGEXYiG6VNzFc2pCI6Fklu1Att
qbiVIotsecTlXdBFoDpxpA8AfvPCHFCjh3ufFWldMPjMTNp6aDg1OBcod160tDx7
IABoB2r0Tdvwy5Y1szdxjaxisptab3T7HYJKaXhkClEjxlp+zrCmro9XHNE7s4SB
Z5UrvBoIcUCw8oSQ2b9g4+2Q7sTE8wWHugAhVaeoT8rBp3xSCMgxmxArYUaELNst
Ce4IzJAefoJ9tQyN2eQp3sKE+wviqZ1Ywct3D5YIs4nju4HbZGmqa9CDjg6JRBCQ
0TWWwayyVPrLgm3uXx264wsTjBa2t3fP3wZTwlAoITFqEZJGx9afMmOAoejJch2E
O2hFpVkig+axHAZdqhgRLlXkhX6hLuI3BSrRHiLMloatuHij7XLAxFOD0Fu4ZVq0
OkxswOZU/Dh69uYd01hYIvUTN7fzZbxxaR3yM/134TtxnqMJy/kbjohVMmKafn6E
LV+TAZye2cEgNnMRJHWQMkxmmvRax9GQgdK8d2BKnbVsO6alyzzhU3ShGw4f12lt
rUu8PGC1iAgHpc5kwYULOPLXC7Wio2V7AmdG/XWzxnMBHN7bfcXec6ipjbyQYMVr
9Xz3fGaMCdoBDxCZsVMD
=06UE
-----END PGP SIGNATURE-----
Merge tag 'media/v4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull mailcap fixlets from Mauro Carvalho Chehab:
"A small fixup for my and Shuah's entries in .mailcap.
Basically, those entries were with a syntax that makes
get_maintainer.pl to do the wrong thing"
* tag 'media/v4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
.mailmap: Correct entries for Mauro Carvalho Chehab and Shuah Khan
- New vsock device support in host and guest
- Platform IOMMU support in host and guest,
including compatibility quirks for legacy systems.
- Misc fixes and cleanups.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXofvbAAoJECgfDbjSjVRpUTIH/iEoK9h636tBayXy0PXkPby0
6fMaRFy6H1HgEttgDhJE8Pqg/ba3qaW9Em0fHyFq7Mp2waFHAZ8hAT8phC6TAK3c
CIBnfzyyuI8u3N9SnNOfelPVcwCBfuALuuTsXB/rwKbYQEVv+U5Rdt3Vyx9+lXkj
P005klz7PfqxFhQrrnj4Eh7VawtHwmMuLH8YoWpCZpM71dHPo6eL+3ftKwhH2boo
qK86uVprwba03Pewpm13vQnotemfVfUUkjXd4EJpG3dx7E0KZosuj0ZG9OV8mPGQ
Cl2gBdUhocdJgeUnAHmf6tumYi9KFlYfy6xLy44YMmN7FL3E9nQjaKZp25UKfiM=
=ztIm
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio/vhost updates from Michael Tsirkin:
- new vsock device support in host and guest
- platform IOMMU support in host and guest, including compatibility
quirks for legacy systems.
- misc fixes and cleanups.
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
VSOCK: Use kvfree()
vhost: split out vringh Kconfig
vhost: detect 32 bit integer wrap around
vhost: new device IOTLB API
vhost: drop vringh dependency
vhost: convert pre sorted vhost memory array to interval tree
vhost: introduce vhost memory accessors
VSOCK: Add Makefile and Kconfig
VSOCK: Introduce vhost_vsock.ko
VSOCK: Introduce virtio_transport.ko
VSOCK: Introduce virtio_vsock_common.ko
VSOCK: defer sock removal to transports
VSOCK: transport-specific vsock_transport functions
vhost: drop vringh dependency
vop: pull in vhost Kconfig
virtio: new feature to detect IOMMU device quirk
balloon: check the number of available pages in leak balloon
vhost: lockless enqueuing
vhost: simplify work flushing
* x86 nested virt tweak and OOPS fix
* Simplify pvclock code (vdso bits acked by Andy Lutomirski).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJXpKOnAAoJEL/70l94x66D5z4H/R2660Vy3brQrI8lGxCtkXJt
AVe8PwI8nDfYJ/UkMZ2KcHPSvy+sHW2ydaZXYNqXHVBeTaUxiPW9rTgK61ebypGL
1tPOgJ3kGZF6XEdAz6gS8LniNFc+D3W6Y6sRylkEsqPj39/hxe7QMoOMSCQ9imbW
WMIx7/81i1EMw6oi+9FVtq+yHCpvyfFnD8t1TDsYWOReVn1J15SxbEs4Ih+hBMLz
HZ5DEjp9cAmzeR7GLje5eH1t6TEEoNb1MNgFWuscoAsDf8D9DKqRB9s0hC+TLFYn
oZbGSqjQwu3/VMblgedinH6X9MTm8V0zW29ToGnDcoO00AUmdlNmXSaZUhvT/Rs=
=H5cD
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM updates from Paolo Bonzini:
- ARM bugfix and MSI injection support
- x86 nested virt tweak and OOPS fix
- Simplify pvclock code (vdso bits acked by Andy Lutomirski).
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
nvmx: mark ept single context invalidation as supported
nvmx: remove comment about missing nested vpid support
KVM: lapic: fix access preemption timer stuff even if kernel_irqchip=off
KVM: documentation: fix KVM_CAP_X2APIC_API information
x86: vdso: use __pvclock_read_cycles
pvclock: introduce seqcount-like API
arm64: KVM: Set cpsr before spsr on fault injection
KVM: arm: vgic-irqfd: Workaround changing kvm_set_routing_entry prototype
KVM: arm/arm64: Enable MSI routing
KVM: arm/arm64: Enable irqchip routing
KVM: Move kvm_setup_default/empty_irq_routing declaration in arch specific header
KVM: irqchip: Convey devid to kvm_set_msi
KVM: Add devid in kvm_kernel_irq_routing_entry
KVM: api: Pass the devid in the msi routing entry
Pull MIPS updates from Ralf Baechle:
"This is the main pull request for MIPS for 4.8. Also includes is a
minor SSB cleanup as SSB code traditionally is merged through the MIPS
tree:
ATH25:
- MIPS: Add default configuration for ath25
Boot:
- For zboot, copy appended dtb to the end of the kernel
- store the appended dtb address in a variable
BPF:
- Fix off by one error in offset allocation
Cobalt code:
- Fix typos
Core code:
- debugfs_create_file returns NULL on error, so don't use IS_ERR for
testing for errors.
- Fix double locking issue in RM7000 S-cache code. This would only
affect RM7000 ARC systems on reboot.
- Fix page table corruption on THP permission changes.
- Use compat_sys_keyctl for 32 bit userspace on 64 bit kernels.
David says, there are no compatibility issues raised by this fix.
- Move some signal code around.
- Rewrite r4k count/compare clockevent device registration such that
min_delta_ticks/max_delta_ticks files are guaranteed to be
initialized.
- Only register r4k count/compare as clockevent device if we can
assume the clock to be constant.
- Fix MSA asm warnings in control reg accessors
- uasm and tlbex fixes and tweaking.
- Print segment physical address when EU=1.
- Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO.
- CP: Allow booting by VP other than VP 0
- Cache handling fixes and optimizations for r4k class caches
- Add hotplug support for R6 processors
- Cleanup hotplug bits in kconfig
- traps: return correct si code for accessing nonmapped addresses
- Remove cpu_has_safe_index_cacheops
Lantiq:
- Register IRQ handler for virtual IRQ number
- Fix EIU interrupt loading code
- Use the real EXIN count
- Fix build error.
Loongson 3:
- Increase HPET_MIN_PROG_DELTA and decrease HPET_MIN_CYCLES
Octeon:
- Delete built-in DTB pruning code for D-Link DSR-1000N.
- Clean up GPIO definitions in dlink_dsr-1000n.dts.
- Add more LEDs to the DSR-100n DTS
- Fix off by one in octeon_irq_gpio_map()
- Typo fixes
- Enable SATA by default in cavium_octeon_defconfig
- Support readq/writeq()
- Remove forced mappings of USB interrupts.
- Ensure DMA descriptors are always in the low 4GB
- Improve USB reset code for OCTEON II.
Pistachio:
- Add maintainers entry for pistachio SoC Support
- Remove plat_setup_iocoherency
Ralink:
- Fix pwm UART in spis group pinmux.
SSB:
- Change bare unsigned to unsigned int to suit coding style
Tools:
- Fix reloc tool compiler warnings.
Other:
- Delete use of ARCH_WANT_OPTIONAL_GPIOLIB"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (61 commits)
MIPS: mm: Fix definition of R6 cache instruction
MIPS: tools: Fix relocs tool compiler warnings
MIPS: Cobalt: Fix typo
MIPS: Octeon: Fix typo
MIPS: Lantiq: Fix build failure
MIPS: Use CPHYSADDR to implement mips32 __pa
MIPS: Octeon: Dlink_dsr-1000n.dts: add more leds.
MIPS: Octeon: Clean up GPIO definitions in dlink_dsr-1000n.dts.
MIPS: Octeon: Delete built-in DTB pruning code for D-Link DSR-1000N.
MIPS: store the appended dtb address in a variable
MIPS: ZBOOT: copy appended dtb to the end of the kernel
MIPS: ralink: fix spis group pinmux
MIPS: Factor o32 specific code into signal_o32.c
MIPS: non-exec stack & heap when non-exec PT_GNU_STACK is present
MIPS: Use per-mm page to execute branch delay slot instructions
MIPS: Modify error handling
MIPS: c-r4k: Use SMP calls for CM indexed cache ops
MIPS: c-r4k: Avoid small flush_icache_range SMP calls
MIPS: c-r4k: Local flush_icache_range cache op override
MIPS: c-r4k: Split r4k_flush_kernel_vmap_range()
...
Pull x86 fixes from Ingo Molnar:
"Two fixes and a cleanup-fix, to the syscall entry code and to ptrace"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace
x86/ptrace: Stop setting TS_COMPAT in ptrace code
x86/vdso: Error out if the vDSO isn't a valid DSO
support for the J-Core J2 processor, an open source synthesizable
reimplementation of the SH-2 ISA, resolve a longstanding sigcontext
ABI mismatch issue, and fix various bugs including nommu-specific
issues and minor regressions introduced in 4.6.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJXpQweAAoJELcQ+SIFb8Ha2vgH/Rm3YTHEgGeQhRvBle8DTPNv
l9xBCQ6UMVb9T8C5nyP0jdioAVDQr7gh7sv2c7inIjN8hQh16DXFtNV8X6G3b0jv
OC0+rBmcYjpO7gGC/L2sRE8ghuNpoIBJFojZy6bwOIvpF6EDMAZ9bAU/VFbY28so
nCUdEo0gAmrdqGyHRfEJke7D7AKPvpAnN/cmRcvNQPhkkzKjRSNg5rHLthmvAKyp
1ChASb3YYPTgOY09izD8JUp4rk7v7q2smpgfeZfGQhIN/w6QKpv5OIqe+vrm1iKU
B6q5gBHS7Y2VYilp1zKQedLM9ZthH6rnpkB25RzyH655uTwf//6ihyP3kEwlPkc=
=wwNa
-----END PGP SIGNATURE-----
Merge tag 'sh-for-4.8' of git://git.libc.org/linux-sh
Pull arch/sh updates from Rich Felker:
"These changes improve device tree support (including builtin DTB), add
support for the J-Core J2 processor, an open source synthesizable
reimplementation of the SH-2 ISA, resolve a longstanding sigcontext
ABI mismatch issue, and fix various bugs including nommu-specific
issues and minor regressions introduced in 4.6.
The J-Core arch support is included here but to be usable it needs
drivers that are waiting on approval/inclusion from their subsystem
maintainers"
* tag 'sh-for-4.8' of git://git.libc.org/linux-sh: (23 commits)
sh: add device tree source for J2 FPGA on Mimas v2 board
sh: add defconfig for J-Core J2
sh: use common clock framework with device tree boards
sh: system call wire up
sh: Delete unnecessary checks before the function call "mempool_destroy"
sh: do not perform IPI-based cache flush except on boards that need it
sh: add SMP support for J2
sh: SMP support for SH2 entry.S
sh: add working futex atomic ops on userspace addresses for smp
sh: add J2 atomics using the cas.l instruction
sh: add AT_HWCAP flag for J-Core cas.l instruction
sh: add support for J-Core J2 processor
sh: fix build regression with CONFIG_OF && !CONFIG_OF_FLATTREE
sh: allow clocksource drivers to register sched_clock backends
sh: make heartbeat driver explicitly non-modular
sh: make board-secureedge5410 explicitly non-modular
sh: make mm/asids-debugfs explicitly non-modular
sh: make time.c explicitly non-modular
sh: fix futex/robust_list on nommu models
sh: disable aliased page logic on NOMMU models
...
- Fix HugeTLB leak due to CoW and PTE_RDONLY mismatch
- Avoid accessing unmapped FDT fields when checking validity
- Correctly account for vDSO AUX entry in ARCH_DLINFO
- Fix kallsyms with absolute expressions in linker script
- Kill unnecessary symbol-based relocs in vmlinux
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJXpFZ5AAoJELescNyEwWM0PI4IALsTuHRzClOSMDLiqMUj8t+5
WNAcqybxAjCOVxAHckhweju++TeJBxcRH1nvBoNwiHIdHTv4fq1TZ3PeEq9kWMg5
JbKjYjvd9dW8k6LXMya8iXCYtG3kzbNejkNpOTVebC86yvas1IiEjNb/ztPdhJeM
HBSOkhfk8RcskfNxhuscZzGXbbdH9/R+XSTNRHN/RwCZH8PlInmduD9BbMvDhZyP
NLFonD2IgQ4as1kYG/HdIcw0BamHiURjd043+gyoqMvm7JjPksRzlQnr91SMkX17
LykXjHYPi2Me3aTrZ1NtkUNd5FHLHZ6/b9Wg6nA19d5KWkd3ER9uSJqGxkkbnt0=
=dtGK
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- fix HugeTLB leak due to CoW and PTE_RDONLY mismatch
- avoid accessing unmapped FDT fields when checking validity
- correctly account for vDSO AUX entry in ARCH_DLINFO
- fix kallsyms with absolute expressions in linker script
- kill unnecessary symbol-based relocs in vmlinux
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Fix copy-on-write referencing in HugeTLB
arm64: mm: avoid fdt_check_header() before the FDT is fully mapped
arm64: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
arm64: relocatable: suppress R_AARCH64_ABS64 relocations in vmlinux
arm64: vmlinux.lds: make __rela_offset and __dynsym_offset ABSOLUTE