wake_oom_reaper has allowed only 1 oom victim to be queued. The main
reason for that was the simplicity as other solutions would require some
way of queuing. The current approach is racy and that was deemed
sufficient as the oom_reaper is considered a best effort approach to
help with oom handling when the OOM victim cannot terminate in a
reasonable time. The race could lead to missing an oom victim which can
get stuck
out_of_memory
wake_oom_reaper
cmpxchg // OK
oom_reaper
oom_reap_task
__oom_reap_task
oom_victim terminates
atomic_inc_not_zero // fail
out_of_memory
wake_oom_reaper
cmpxchg // fails
task_to_reap = NULL
This race requires 2 OOM invocations in a short time period which is not
very likely but certainly not impossible. E.g. the original victim
might have not released a lot of memory for some reason.
The situation would improve considerably if wake_oom_reaper used a more
robust queuing. This is what this patch implements. This means adding
oom_reaper_list list_head into task_struct (eat a hole before embeded
thread_struct for that purpose) and a oom_reaper_lock spinlock for
queuing synchronization. wake_oom_reaper will then add the task on the
queue and oom_reaper will dequeue it.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Inform about the successful/failed oom_reaper attempts and dump all the
held locks to tell us more who is blocking the progress.
[akpm@linux-foundation.org: fix CONFIG_MMU=n build]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When oom_reaper manages to unmap all the eligible vmas there shouldn't
be much of the freable memory held by the oom victim left anymore so it
makes sense to clear the TIF_MEMDIE flag for the victim and allow the
OOM killer to select another task.
The lack of TIF_MEMDIE also means that the victim cannot access memory
reserves anymore but that shouldn't be a problem because it would get
the access again if it needs to allocate and hits the OOM killer again
due to the fatal_signal_pending resp. PF_EXITING check. We can safely
hide the task from the OOM killer because it is clearly not a good
candidate anymore as everyhing reclaimable has been torn down already.
This patch will allow to cap the time an OOM victim can keep TIF_MEMDIE
and thus hold off further global OOM killer actions granted the oom
reaper is able to take mmap_sem for the associated mm struct. This is
not guaranteed now but further steps should make sure that mmap_sem for
write should be blocked killable which will help to reduce such a lock
contention. This is not done by this patch.
Note that exit_oom_victim might be called on a remote task from
__oom_reap_task now so we have to check and clear the flag atomically
otherwise we might race and underflow oom_victims or wake up waiters too
early.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch (of 5):
This is based on the idea from Mel Gorman discussed during LSFMM 2015
and independently brought up by Oleg Nesterov.
The OOM killer currently allows to kill only a single task in a good
hope that the task will terminate in a reasonable time and frees up its
memory. Such a task (oom victim) will get an access to memory reserves
via mark_oom_victim to allow a forward progress should there be a need
for additional memory during exit path.
It has been shown (e.g. by Tetsuo Handa) that it is not that hard to
construct workloads which break the core assumption mentioned above and
the OOM victim might take unbounded amount of time to exit because it
might be blocked in the uninterruptible state waiting for an event (e.g.
lock) which is blocked by another task looping in the page allocator.
This patch reduces the probability of such a lockup by introducing a
specialized kernel thread (oom_reaper) which tries to reclaim additional
memory by preemptively reaping the anonymous or swapped out memory owned
by the oom victim under an assumption that such a memory won't be needed
when its owner is killed and kicked from the userspace anyway. There is
one notable exception to this, though, if the OOM victim was in the
process of coredumping the result would be incomplete. This is
considered a reasonable constrain because the overall system health is
more important than debugability of a particular application.
A kernel thread has been chosen because we need a reliable way of
invocation so workqueue context is not appropriate because all the
workers might be busy (e.g. allocating memory). Kswapd which sounds
like another good fit is not appropriate as well because it might get
blocked on locks during reclaim as well.
oom_reaper has to take mmap_sem on the target task for reading so the
solution is not 100% because the semaphore might be held or blocked for
write but the probability is reduced considerably wrt. basically any
lock blocking forward progress as described above. In order to prevent
from blocking on the lock without any forward progress we are using only
a trylock and retry 10 times with a short sleep in between. Users of
mmap_sem which need it for write should be carefully reviewed to use
_killable waiting as much as possible and reduce allocations requests
done with the lock held to absolute minimum to reduce the risk even
further.
The API between oom killer and oom reaper is quite trivial.
wake_oom_reaper updates mm_to_reap with cmpxchg to guarantee only
NULL->mm transition and oom_reaper clear this atomically once it is done
with the work. This means that only a single mm_struct can be reaped at
the time. As the operation is potentially disruptive we are trying to
limit it to the ncessary minimum and the reaper blocks any updates while
it operates on an mm. mm_struct is pinned by mm_count to allow parallel
exit_mmap and a race is detected by atomic_inc_not_zero(mm_users).
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Mel Gorman <mgorman@suse.de>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This will be needed in the patch "mm, oom: introduce oom reaper".
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The permissions of this file were modified by commit (f447671b9e PM /
AVS: rockchip-io: add io selectors and supplies for rk3399) by mistake,
so fix them.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Use kmem_cache_zalloc() instead of kmem_cache_alloc() with flag GFP_ZERO.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
If dentry has no lease, ceph_d_revalidate() previously return 0.
This causes VFS to invalidate the dentry and create a new dentry
for later lookup. Invalidating a dentry also detach any underneath
mount points. So mount point inside cephfs can disapear mystically
(even the mount point is not modified by other hosts).
The fix is using lookup request to revalidate dentry without lease.
This can partly solve the mount points disapear issue (as long as
the mount point is not modified by other hosts)
Signed-off-by: Yan, Zheng <zyan@redhat.com>
When security is enabled, security module can call filesystem's
getxattr/setxattr callbacks during d_instantiate(). For cephfs,
d_instantiate() is usually called by MDS' dispatch thread, while
handling MDS reply. If the MDS reply does not include xattrs and
corresponding caps, getxattr/setxattr need to send a new request
to MDS and waits for the reply. This makes MDS' dispatch sleep,
nobody handles later MDS replies.
The fix is make sure lookup/atomic_open reply include xattrs and
corresponding caps. So getxattr can be handled by cached xattrs.
This requires some modification to both MDS and request message.
(Client tells MDS what caps it wants; MDS encodes proper caps in
the reply)
Smack security module may call setxattr during d_instantiate().
Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
to us. So just make setxattr return error when called by MDS'
dispatch thread.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
If page->mapping is NULL, releasepage() callback does not get called.
Remove the unnecessary NULL check to make static code analysis tool
happy
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Readdir cache uses page cache to save dentry pointers. When adding
dentry pointers to middle of a page, we need to make sure the page
already exists. Otherwise the beginning part of the page will be
invalid pointers.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Don't open-code sizeof_footer() in read_partial_message() and
ceph_msg_revoke(). Also, after switching to sizeof_footer(), it's now
possible to use con_out_kvec_add() in prepare_write_message_footer().
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
ceph_empty_snapc->num_snaps == 0 at all times. Passing such a snapc to
ceph_osdc_alloc_request() (possibly through ceph_osdc_new_request()) is
equivalent to passing NULL, as ceph_osdc_alloc_request() uses it only
for sizing the request message.
Further, in all four cases the subsequent ceph_osdc_build_request() is
passed NULL for snapc, meaning that 0 is encoded for seq and num_snaps
and making ceph_empty_snapc entirely useless. The two cases where it
actually mattered were removed in commits 8605609049 ("ceph: avoid
sending unnessesary FLUSHSNAP message") and 23078637e0 ("ceph: fix
queuing inode to mdsdir's snaprealm").
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
A negative value rc compared to the positive value ENOENT in the
finish_read() function.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_fs_time() instead.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
This patch makes ceph_writepages_start() try using single OSD request
to write all dirty pages within a strip unit. When a nonconsecutive
dirty page is found, ceph_writepages_start() tries starting a new write
operation to existing OSD request. If it succeeds, it uses the new
operation to writeback the dirty page.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
This helper duplicates last extent operation in OSD request, then
adjusts the new extent operation's offset and length. The helper
is for scatterd page writeback, which adds nonconsecutive dirty
pages to single OSD request.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Turn r_ops into a flexible array member to enable large, consisting of
up to 16 ops, OSD requests. The use case is scattered writeback in
cephfs and, as far as the kernel client is concerned, 16 is just a made
up number.
r_ops had size 3 for copyup+hint+write, but copyup is really a special
case - it can only happen once. ceph_osd_request_cache is therefore
stuffed with num_ops=2 requests, anything bigger than that is allocated
with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which
means either num_ops=1 or num_ops=2 for use_mempool=true - all existing
users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with
that.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
ceph_osd_request_cache was introduced a long time ago. Also, osd_req
is about to get a flexible array member, which ceph_osd_request_cache
is going to be aware of.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Although msg_size is calculated correctly, the terms are grouped in
a misleading way - snaps appears to not have room for a u32 length.
Move calculation closer to its use and regroup terms.
No functional change.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This avoids defining large array of r_reply_op_{len,result} in
in struct ceph_osd_request.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
When rbytes mount option is enabled, directory size is recursive
size. Recursive size is not updated instantly. This can cause
directory size to change between successive stat(1)
Signed-off-by: Yan, Zheng <zyan@redhat.com>
This can happen if __close_session() in ceph_monc_stop() races with
a connection reset. We need to ignore such faults, otherwise it's
likely we would take !hunting, call __schedule_delayed() and end up
with delayed_work() executing on invalid memory, among other things.
The (two!) con->private tests are useless, as nothing ever clears
con->private. Nuke them.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Doing __schedule_delayed() in the hunting branch is pointless, as the
tick will have already been scheduled by then.
What we need to do instead is *reschedule* it in the !hunting branch,
after reopen_session() changes hunt_mult, which affects the delay.
This helps with spacing out connection attempts and avoiding things
like two back-to-back attempts followed by a longer period of waiting
around.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
hunting is now set in __open_session() and cleared in finish_hunting(),
instead of all around. The "session lost" message is printed not only
on connection resets, but also on keepalive timeouts.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Unless we are in the process of setting up a client (i.e. connecting to
the monitor cluster for the first time), apply a backoff: every time we
want to reopen a session, increase our timeout by a multiple (currently
2); when we complete the connection, reduce that multipler by 50%.
Mirrors ceph.git commit 794c86fd289bd62a35ed14368fa096c46736e9a2.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Split ping interval and ping timeout: ping interval is 10s; keepalive
timeout is 30s.
Make monc_ping_timeout a constant while at it - it's not actually
exported as a mount option (and the rest of tick-related settings won't
be either), so it's got no place in ceph_options.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Don't try to reconnect to the same monitor when we fail to establish
a session within a timeout or it's lost.
For that, pick_new_mon() needs to see the old value of cur_mon, so
don't clear it in __close_session() - all calls to __close_session()
but one are followed by __open_session() anyway. __open_session() is
only called when a new session needs to be established, so the "already
open?" branch, which is now in the way, is simply dropped.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
It is currently hard-coded in the mon_client that mdsmap and monmap
subs are continuous, while osdmap sub is always "onetime". To better
handle full clusters/pools in the osd_client, we need to be able to
issue continuous osdmap subs. Revamp subs code to allow us to specify
for each sub whether it should be continuous or not.
Although not strictly required for the above, switch to SUBSCRIBE2
protocol while at it, eliminating the ambiguity between a request for
"every map since X" and a request for "just the latest" when we don't
have a map yet (i.e. have epoch 0). SUBSCRIBE2 feature bit is now
required - it's been supported since pre-argonaut (2010).
Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
in before we validate the epoch and successfully install the new map
can mess up mon_client sub state.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Coupling hunting state with subscribe state is not a good idea. Clear
hunting when we complete the authentication handshake.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Our debugfs dir name is a concatenation of cluster fsid and client
unique ID ("global_id"). It used to be the case that we learned
global_id first, nowadays we always learn fsid first - the monmap is
sent before any auth replies are. ceph_debugfs_client_init() call in
ceph_monc_handle_map() is therefore never executed and can be removed.
Its counterpart in handle_auth_reply() doesn't really belong there
either: having to do monc->client and unlocking early to work around
lockdep is a testament to that. Move it into __ceph_open_session(),
where it can be called unconditionally.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
This reverts commit e7223f1860.
It causes problems when a ppdev tries to register before the parport
driver has been registered with the device model. That will trigger the
BUG_ON(!drv->bus->p);
at drivers/base/driver.c:153. The call chain is
kernel_init ->
kernel_init_freeable ->
do_one_initcall ->
ppdev_init ->
__parport_register_driver ->
driver_register *BOOM*
Reported-by: kernel test robot <fengguang.wu@intel.com>
Reported-by: Ross Zwisler <zwisler@gmail.com>
Reported-by: Petr Mladek <pmladek@suse.com>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Occurrences of timeval were supposed to be eliminated last round,
now remove a last forgotten one.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJW9PnoAAoJEHnzb7JUXXnQlBAQAOZELI61TS2XAk+94EvxODgY
FrTOzZFGDUlXlkdyI7Smlla1D6LwV47FCRpZYTuf4z1xD5+nT9RcvxjiaAGUZxmJ
swhdei74LPRIW7OfVtODz7kLsNLBJM1Le6+kVfugxDPxaOMFn7oqn7TsLakg005l
gyzhP7NnZm15IuUnGpjjZgx1utNvVeMZ+0GdhUfTzVHPyHzYh0iquYC8yCnW2AfT
Z9PN5j5XKdE/sJDTtWwq2hvSaDfUr6cybNkk+jYGIn9qxtTdUebg+ZddoU0aLo6L
DsJPUdAQ2mo9rrjJchR5v+jbbZ2AYDFFUz8GfJceUvc2dcllm7WQsYQO4AsRM0A7
YllY8xzkK11Mid7ZxoHpj6F8c6ZJztPZ2qT/H58KgGngkn4frkPCGMkverTRF2Yp
D/bxQ/sRifK3kLWxTjHEcomifXasvHtj4Cwfklcu1RqCJuKgMawhrtjktjqTZKFC
vfG9OoRb6xV2RJtAnDVz+QuaGklPLP/NekQzFGryHYWHCEjI62sH8+WUhVkOs+m5
1/23N4mqJ/sOurp3Gj0a19lKiAbAAYHMp2v8vLWC7dfQ7VhJC/K0vEQm0HL0Bvwf
FqIUg6QqfVYjckdL9Fh/0XCMKrOF9Qb59JWOWXPnFjuc7KD2/s/vpfgby6cP+14/
D1NrNVkp2bEiLZ9avdSL
=S0zD
-----END PGP SIGNATURE-----
Merge tag 'firewire-update2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
Pull firewire leftover from Stefan Richter:
"Occurrences of timeval were supposed to be eliminated last round, now
remove a last forgotten one"
* tag 'firewire-update2' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: nosy: Replace timeval with timespec64
Pull drm fixes from Dave Airlie:
"Just a couple of dma-buf related fixes and some amdgpu fixes, along
with a regression fix for radeon off but default feature, but makes my
30" monitor happy again"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux:
drm/radeon/mst: cleanup code indentation
drm/radeon/mst: fix regression in lane/link handling.
drm/amdgpu: add invalidate_page callback for userptrs
drm/amdgpu: Revert "remove the userptr rmn->lock"
drm/amdgpu: clean up path handling for powerplay
drm/amd/powerplay: fix memory leak of tdp_table
dma-buf/fence: fix fence_is_later v2
dma-buf: Update docs for SYNC ioctl
drm: remove excess description
dma-buf, drm, ion: Propagate error code from dma_buf_start_cpu_access()
drm/atmel-hlcdc: use helper to get crtc state
drm/atomic: use helper to get crtc state
There are only three patches this time, most other changes to
files in include/asm-generic tend to go through the tree of whoever
depends on the change.
Two patches are cleanups for stuff that is no longer needed,
the main change is to adapt the generic version of BUG_ON()
for CONFIG_BUG=n to make it behave consistently with BUG().
This avoids undefined behavior along with a number of warnings
about that undefined behavior in randconfig builds when
we keep going on after hitting a BUG_ON().
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIVAwUAVvRXDGCrR//JCVInAQKUFRAAmp23pohv08LZzXL8Qu7XfFN+b1RkZ936
WYBeiA9PEWufQs2hgXaEUXy0onO7ah4cs2NWfkaBPyxT+I9mN+ThdzqVrlTE+AEO
2K0f2RaZANC238zB86Yv/YvTj7FegH0DDdMBq/P06vlYdgBegx49U3pMpguxl3d0
/q9MyqTzo9j4uOEK4ix4/Dko+4eKIS5Y/xeb0TkeKA6HiBVzAhGLZFl+eMku07Bf
ap8B705hBDXSBFeWcK9AvKjHZCM+FCkb+C3TXo9x5tUu8g5OIG1t962OQvT9ldsP
rvo5ppRh/TAY2Z9chN3cKrsvshbHiZ9uRzeksCunL+SK+dOhEIPCVzLXndQpi3RD
NgeNKgo6gKYdle44pEj0EH2ktuvr0u8sbjQg9SY2miC1H4DmEbCakSqtQegHXTKd
chJ6xyNiQXktdfo0pFOtCA2gjqiAriugttBqUtGcK9zRqjGGpP5hOUQVm3jR7UMp
Hjb+oj5o+Gjz5J1t5zsjbhFINDCHAgXRzqqaoT9RfE9+QlUftUhu+N9KVFgzhe9I
93VHaqgGIRoi856BO7UZSaMGhy7ljm1nQ18jP9aZl/tBco0kpd3AO8og9dJ0u2j+
3fEqAHH30ia8GJCfIDnolxTL6uaqcCIeAoLgGcmn+QZS7ka+tD+000rtgd2pdy9/
gy/VPpFG064=
=8tPL
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are only three patches this time, most other changes to files in
include/asm-generic tend to go through the tree of whoever depends on
the change.
Two patches are cleanups for stuff that is no longer needed, the main
change is to adapt the generic version of BUG_ON() for CONFIG_BUG=n to
make it behave consistently with BUG().
This avoids undefined behavior along with a number of warnings about
that undefined behavior in randconfig builds when we keep going on
after hitting a BUG_ON()"
* tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
asm-generic: remove old nonatomic-io wrapper files
asm-generic: default BUG_ON(x) to if(x)BUG()
asm-generic: page.h: Remove useless get_user_page and free_user_page
some amd fixes
* 'drm-next-4.6' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon/mst: cleanup code indentation
drm/radeon/mst: fix regression in lane/link handling.
drm/amdgpu: add invalidate_page callback for userptrs
drm/amdgpu: Revert "remove the userptr rmn->lock"
drm/amdgpu: clean up path handling for powerplay
drm/amd/powerplay: fix memory leak of tdp_table
- Fix for an intel_pstate driver issue related to the handling of
MSR updates uncovered by the recent cpufreq rework (Rafael Wysocki).
- cpufreq core cleanups related to starting governors and frequency
synchronization during resume from system suspend and a locking
fix for cpufreq_quick_get() (Rafael Wysocki, Richard Cochran).
- acpi-cpufreq and powernv cpufreq driver updates (Jisheng Zhang,
Michael Neuling, Richard Cochran, Shilpasri Bhat).
- intel_idle driver update preventing some Skylake-H systems
from hanging during initialization by disabling deep C-states
mishandled by the platform in the problematic configurations (Len
Brown).
- Intel Xeon Phi Processor x200 support for intel_idle (Dasaratharaman
Chandramouli).
- cpuidle menu governor updates to make it always honor PM QoS
latency constraints (and prevent C1 from being used as the
fallback C-state on x86 when they are set below its exit latency)
and to restore the previous behavior to fall back to C1 if the next
timer event is set far enough in the future that was changed in 4.4
which led to an energy consumption regression (Rik van Riel, Rafael
Wysocki).
- New device ID for a future AMD UART controller in the ACPI driver
for AMD SoCs (Wang Hongcheng).
- Rockchip rk3399 support for the rockchip-io-domain adaptive voltage
scaling (AVS) driver (David Wu).
- ACPI PCI resources management fix for the handling of IO space
resources on architectures where the IO space is memory mapped
(IA64 and ARM64) broken by the introduction of common ACPI
resources parsing for PCI host bridges in 4.4 (Lorenzo Pieralisi).
- Fix for the ACPI backend of the generic device properties API
to make it parse non-device (data node only) children of an
ACPI device correctly (Irina Tirdea).
- Fixes for the handling of global suspend flags (introduced in 4.4)
during hibernation and resume from it (Lukas Wunner).
- Support for obtaining configuration information from Device Trees
in the PM clocks framework (Jon Hunter).
- ACPI _DSM helper code and devfreq framework cleanups (Colin Ian
King, Geert Uytterhoeven).
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJW9JaRAAoJEILEb/54YlRx/GAQAJujANWilWHZYm24a9JDcIE9
rsNZIC/FdeBVilPtRTZQnig/Pj32Z4Jm7IZ/DLOq0Deu1YK/9uv3y59M3BcX6WyL
H5VR80L8geUJZ7RRk0WfM5D4X82ovzwpE/kWt2Z7HDuvJSCBmFBZOvNrXbaRncKD
jIvat/p6uCuxt5c08+ebnBLQ6tOs8wLTWiCx3fO128GIrGRGN2xFV6hzRWVGnJ4g
WXGAR+AdLxRMZz4PPmqdTfRj4TNSR071GjKyaeKfZUjQGAsf5O9A77JFjeNVomDx
g1K37Byid2bTByzVavlEXPJZ7eKb5dAhlo7IJ9HAcOAXChLqH2Czjrpd+1XjR9MF
SV/78rCnF8eet83QYLbGV/Mzf7gbJP2Xp6wiaM22VAPpGe+sYfphJoQka9XRTfId
OgAjyYMYdWAKo5DhxVNI8WyN0W5dsoBFPxnaUFhHSGDCIJH7Ksy20m6y3plG2Bxf
ahoiQhmd9ohjtB5JbRnf4MY0hjekp8Srdf+DoNKsk/+JscIyROpYY3msQ3smUKo+
f628MC/wAosMpSV+l+KOYkbjCbtB49IabWtZ//NVD9hYB3E1f6aTN59yFbWB+1rp
L7Y8iaxzSkyJy/yYVuBal3rSk356+BvvoXBlLXmBsyu1TMlcDjALIYztSiTVT5MB
RZBhgNwdkxNCYJfU3ex+
=hUVj
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management and ACPI updates from Rafael Wysocki:
"The second batch of power management and ACPI updates for v4.6.
Included are fixups on top of the previous PM/ACPI pull request and
other material that didn't make into it but still should go into 4.6.
Among other things, there's a fix for an intel_pstate driver issue
uncovered by recent cpufreq changes, a workaround for a boot hang on
Skylake-H related to the handling of deep C-states by the platform and
a PCI/ACPI fix for the handling of IO port resources on non-x86
architectures plus some new device IDs and similar.
Specifics:
- Fix for an intel_pstate driver issue related to the handling of MSR
updates uncovered by the recent cpufreq rework (Rafael Wysocki).
- cpufreq core cleanups related to starting governors and frequency
synchronization during resume from system suspend and a locking fix
for cpufreq_quick_get() (Rafael Wysocki, Richard Cochran).
- acpi-cpufreq and powernv cpufreq driver updates (Jisheng Zhang,
Michael Neuling, Richard Cochran, Shilpasri Bhat).
- intel_idle driver update preventing some Skylake-H systems from
hanging during initialization by disabling deep C-states mishandled
by the platform in the problematic configurations (Len Brown).
- Intel Xeon Phi Processor x200 support for intel_idle
(Dasaratharaman Chandramouli).
- cpuidle menu governor updates to make it always honor PM QoS
latency constraints (and prevent C1 from being used as the fallback
C-state on x86 when they are set below its exit latency) and to
restore the previous behavior to fall back to C1 if the next timer
event is set far enough in the future that was changed in 4.4 which
led to an energy consumption regression (Rik van Riel, Rafael
Wysocki).
- New device ID for a future AMD UART controller in the ACPI driver
for AMD SoCs (Wang Hongcheng).
- Rockchip rk3399 support for the rockchip-io-domain adaptive voltage
scaling (AVS) driver (David Wu).
- ACPI PCI resources management fix for the handling of IO space
resources on architectures where the IO space is memory mapped
(IA64 and ARM64) broken by the introduction of common ACPI
resources parsing for PCI host bridges in 4.4 (Lorenzo Pieralisi).
- Fix for the ACPI backend of the generic device properties API to
make it parse non-device (data node only) children of an ACPI
device correctly (Irina Tirdea).
- Fixes for the handling of global suspend flags (introduced in 4.4)
during hibernation and resume from it (Lukas Wunner).
- Support for obtaining configuration information from Device Trees
in the PM clocks framework (Jon Hunter).
- ACPI _DSM helper code and devfreq framework cleanups (Colin Ian
King, Geert Uytterhoeven)"
* tag 'pm+acpi-4.6-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
PM / AVS: rockchip-io: add io selectors and supplies for rk3399
intel_idle: Support for Intel Xeon Phi Processor x200 Product Family
intel_idle: prevent SKL-H boot failure when C8+C9+C10 enabled
ACPI / PM: Runtime resume devices when waking from hibernate
PM / sleep: Clear pm_suspend_global_flags upon hibernate
cpufreq: governor: Always schedule work on the CPU running update
cpufreq: Always update current frequency before startig governor
cpufreq: Introduce cpufreq_update_current_freq()
cpufreq: Introduce cpufreq_start_governor()
cpufreq: powernv: Add sysfs attributes to show throttle stats
cpufreq: acpi-cpufreq: make Intel/AMD MSR access, io port access static
PCI: ACPI: IA64: fix IO port generic range check
ACPI / util: cast data to u64 before shifting to fix sign extension
cpufreq: powernv: Define per_cpu chip pointer to optimize hot-path
cpuidle: menu: Fall back to polling if next timer event is near
cpufreq: acpi-cpufreq: Clean up hot plug notifier callback
intel_pstate: Do not call wrmsrl_on_cpu() with disabled interrupts
cpufreq: Make cpufreq_quick_get() safe to call
ACPI / property: fix data node parsing in acpi_get_next_subnode()
ACPI / APD: Add device HID for future AMD UART controller
...
Drivers:
- abx80x: handle both XT and RC oscillators, XT failure bit and autocalibration
- m41t80: avoid out of range year values
- rv8803: workaround an i2c HW issue
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJW9JdpAAoJEKbNnwlvZCyz9hYP/RU/8fJtdO2veoY+xLVta7UV
mJahSrMn7CMkZeVh7vy/T8tIJDNB+G4TqkG6V8kT41yltzTZszn/Qm/t6vrEcwY3
GXj11NtIiOeo6UgGeEKj0feM9Imy/OtHvCfXc7XmTV78f0DYE5MvH+7rTXmFzNm0
qHE+aGPfp2CGYS8HhDYwedvXo9qphSCfbU3akm2PSdFfOkoCuIBndJgLkKsqdKXY
St9hN0Z7xo3FayxtTbELftnW1xkvgsJiGcU19b3Uo89V3NS1L+5eJkhI7cx/kQxz
bMlBBKkvpdmRwy2ng42NGipLXEXwTTOqx4ZlhKLjwhGAF1m4R9ThrCrrXrBT01LM
8TwO5HNqUEViQYXMSK95UF2F5ldE5S+kfYUjopdoV1N6/04Uz+X03o9+zMAJq4JH
ZWjcLIYXb1lvmt8KzUD847O69001NtWC0zVgHoQJ3CldguyGX11BYzkRCxRn/u9y
OWEivF8K0hXYw0Dcmnls6/lrDP1xQc7s6BiFg00UTEjGAmlZVR3eSNBxCXXUneyR
nsv1dlvB9+mcVFFpEQI/MFFNdoBBx3KLRIE+pFJIadQ4yaWkUhrOJpTW4Y646QAj
1VMDV5Z6961sMqVkyqNMZqaqnVgrZY0DZWS9PMzYlNnrDSIXu3svu78xq+foBy0+
OuOHjLcIIKxFSRysMZu7
=PMVI
-----END PGP SIGNATURE-----
Merge tag 'rtc-4.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull more RTC updates from Alexandre Belloni:
"A second pull request for v4.6 with a few fixesi before -rc1. The new
features for abx80x actually make the RTC behave correctly.
Drivers:
- abx80x: handle both XT and RC oscillators, XT failure bit and
autocalibration
- m41t80: avoid out of range year values
- rv8803: workaround an i2c HW issue"
* tag 'rtc-4.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
rtc: abx80x: handle the oscillator failure bit
rtc: abx80x: handle autocalibration
rtc: rv8803: workaround i2c HW issue
rtc: mcp795: add devicetree support
rtc: asm9260: remove incorrect __init/__exit annotations
rtc: m41t80: avoid out of range year values
rtc: s3c: Don't print an error on probe deferral
rtc: rv3029: stop mentioning rv3029c2
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW9GjwAAoJEMsfJm/On5mBI8MP/jOIu9g+53OnDp4rjKaurPxZ
uq/cAiSsOb58jjmoAjGnsjn6PHkF9og6gkH9UoKePcr4f7VbTixPu9mYj1RYEqaE
p603ASRyRXEEM0RgdQVrNrQbKnAD8COlh66FmvFYFQ6moqbZW3SAv7KqfUghXtKU
TfPjDVUcxVx6AZ+bsgKi0XeueMgtK7viGxKUEqgayvyQ/JVnHZZKRkJRH5p41QgY
MOt7MG51WDeEYI8jycXiUrLoyh4H2VQmmwJY7tosyBuRqHdM/Jnv1jjBmiskpZoQ
NlQ3/8YyDG4KQxWCjmDHxrIUVR9rVfD+pW4dpOWAnps08byJtdo2tiLWM3x6v0g2
I9HuPqNYf6013a03iCztymKhQZFrP6KZzVzUN7Zq1g6sgM+VCl8VTqB1n2CyI3OC
rL842jP2CZ3D2Z2JVSW1ryiw5Ogk6Qc/H7OWPVxtVESkMOwuYBxc1gWmN9Ps1Rdz
2OjkCEA/e00o4xqzJikLjlLDgyJYNw7lOnPAgKP3jxEgc44t7K381740jNbazPCZ
uTv6DcPBUx66D0Qfpv/AKRlNkFjP3GUVKxb87ENkQaNTBjIuh5x3Qorvd6QoKq8l
iRHjFWr6+juW2Pubm/+yAzj+BFQIsphHeLnYVrusF9fMC+5EwbRmzJxYqxgfEq3h
+W76SO4jrkZmy44iwPC4
=I4zr
-----END PGP SIGNATURE-----
Merge tag 'hwmon-for-linus-v4.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull more hwmon updates from Guenter Roeck:
"Update hwmon mailing list and web page"
* tag 'hwmon-for-linus-v4.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
MAINTAINERS: Update mailing list and web page for hwmon subsystem
Pull block fixes from Jens Axboe:
"Final round of fixes for this merge window - some of this has come up
after the initial pull request, and some of it was put in a post-merge
branch before the merge window.
This contains:
- Fix for a bad check for an error on dma mapping in the mtip32xx
driver, from Alexey Khoroshilov.
- A set of fixes for lightnvm, from Javier, Matias, and Wenwei.
- An NVMe completion record corruption fix from Marta, ensuring that
we read things in the right order.
- Two writeback fixes from Tejun, marked for stable@ as well.
- A blk-mq sw queue iterator fix from Thomas, fixing an oops for
sparse CPU maps. They hit this in the hot plug/unplug rework"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: avoid cqe corruption when update at the same time as read
writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode
writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list()
blk-mq: Use proper cpumask iterator
mtip32xx: fix checks for dma mapping errors
lightnvm: do not load L2P table if not supported
lightnvm: do not reserve lun on l2p loading
nvme: lightnvm: return ppa completion status
lightnvm: add a bitmap of luns
lightnvm: specify target's logical address area
null_blk: add lightnvm null_blk device to the nullb_list