Let's use the more common "unusable".
This patch was originally written and posted by Boris. I am including it
in this patch series.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: WANG Chao <chaowang@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memfd_create() is similar to mmap(MAP_ANON), but returns a file-descriptor
that you can pass to mmap(). It can support sealing and avoids any
connection to user-visible mount-points. Thus, it's not subject to quotas
on mounted file-systems, but can be used like malloc()'ed memory, but with
a file-descriptor to it.
memfd_create() returns the raw shmem file, so calls like ftruncate() can
be used to modify the underlying inode. Also calls like fstat() will
return proper information and mark the file as regular file. If you want
sealing, you can specify MFD_ALLOW_SEALING. Otherwise, sealing is not
supported (like on all other regular files).
Compared to O_TMPFILE, it does not require a tmpfs mount-point and is not
subject to a filesystem size limit. It is still properly accounted to
memcg limits, though, and to the same overcommit or no-overcommit
accounting as all user memory.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If two processes share a common memory region, they usually want some
guarantees to allow safe access. This often includes:
- one side cannot overwrite data while the other reads it
- one side cannot shrink the buffer while the other accesses it
- one side cannot grow the buffer beyond previously set boundaries
If there is a trust-relationship between both parties, there is no need
for policy enforcement. However, if there's no trust relationship (eg.,
for general-purpose IPC) sharing memory-regions is highly fragile and
often not possible without local copies. Look at the following two
use-cases:
1) A graphics client wants to share its rendering-buffer with a
graphics-server. The memory-region is allocated by the client for
read/write access and a second FD is passed to the server. While
scanning out from the memory region, the server has no guarantee that
the client doesn't shrink the buffer at any time, requiring rather
cumbersome SIGBUS handling.
2) A process wants to perform an RPC on another process. To avoid huge
bandwidth consumption, zero-copy is preferred. After a message is
assembled in-memory and a FD is passed to the remote side, both sides
want to be sure that neither modifies this shared copy, anymore. The
source may have put sensible data into the message without a separate
copy and the target may want to parse the message inline, to avoid a
local copy.
While SIGBUS handling, POSIX mandatory locking and MAP_DENYWRITE provide
ways to achieve most of this, the first one is unproportionally ugly to
use in libraries and the latter two are broken/racy or even disabled due
to denial of service attacks.
This patch introduces the concept of SEALING. If you seal a file, a
specific set of operations is blocked on that file forever. Unlike locks,
seals can only be set, never removed. Hence, once you verified a specific
set of seals is set, you're guaranteed that no-one can perform the blocked
operations on this file, anymore.
An initial set of SEALS is introduced by this patch:
- SHRINK: If SEAL_SHRINK is set, the file in question cannot be reduced
in size. This affects ftruncate() and open(O_TRUNC).
- GROW: If SEAL_GROW is set, the file in question cannot be increased
in size. This affects ftruncate(), fallocate() and write().
- WRITE: If SEAL_WRITE is set, no write operations (besides resizing)
are possible. This affects fallocate(PUNCH_HOLE), mmap() and
write().
- SEAL: If SEAL_SEAL is set, no further seals can be added to a file.
This basically prevents the F_ADD_SEAL operation on a file and
can be set to prevent others from adding further seals that you
don't want.
The described use-cases can easily use these seals to provide safe use
without any trust-relationship:
1) The graphics server can verify that a passed file-descriptor has
SEAL_SHRINK set. This allows safe scanout, while the client is
allowed to increase buffer size for window-resizing on-the-fly.
Concurrent writes are explicitly allowed.
2) For general-purpose IPC, both processes can verify that SEAL_SHRINK,
SEAL_GROW and SEAL_WRITE are set. This guarantees that neither
process can modify the data while the other side parses it.
Furthermore, it guarantees that even with writable FDs passed to the
peer, it cannot increase the size to hit memory-limits of the source
process (in case the file-storage is accounted to the source).
The new API is an extension to fcntl(), adding two new commands:
F_GET_SEALS: Return a bitset describing the seals on the file. This
can be called on any FD if the underlying file supports
sealing.
F_ADD_SEALS: Change the seals of a given file. This requires WRITE
access to the file and F_SEAL_SEAL may not already be set.
Furthermore, the underlying file must support sealing and
there may not be any existing shared mapping of that file.
Otherwise, EBADF/EPERM is returned.
The given seals are _added_ to the existing set of seals
on the file. You cannot remove seals again.
The fcntl() handler is currently specific to shmem and disabled on all
files. A file needs to explicitly support sealing for this interface to
work. A separate syscall is added in a follow-up, which creates files that
support sealing. There is no intention to support this on other
file-systems. Semantics are unclear for non-volatile files and we lack any
use-case right now. Therefore, the implementation is specific to shmem.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch (of 6):
The i_mmap_writable field counts existing writable mappings of an
address_space. To allow drivers to prevent new writable mappings, make
this counter signed and prevent new writable mappings if it is negative.
This is modelled after i_writecount and DENYWRITE.
This will be required by the shmem-sealing infrastructure to prevent any
new writable mappings after the WRITE seal has been set. In case there
exists a writable mapping, this operation will fail with EBUSY.
Note that we rely on the fact that iff you already own a writable mapping,
you can increase the counter without using the helpers. This is the same
that we do for i_writecount.
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Ryan Lortie <desrt@desrt.ca>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <zonque@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The core mm code will provide a default gate area based on
FIXADDR_USER_START and FIXADDR_USER_END if
!defined(__HAVE_ARCH_GATE_AREA) && defined(AT_SYSINFO_EHDR).
This default is only useful for ia64. arm64, ppc, s390, sh, tile, 64-bit
UML, and x86_32 have their own code just to disable it. arm, 32-bit UML,
and x86_64 have gate areas, but they have their own implementations.
This gets rid of the default and moves the code into ia64.
This should save some code on architectures without a gate area: it's now
possible to inline the gate_area functions in the default case.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [in principle]
Acked-by: Richard Weinberger <richard@nod.at> [for um]
Acked-by: Will Deacon <will.deacon@arm.com> [for arm64]
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Nathan Lynch <Nathan_Lynch@mentor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rather than have architectures #define ARCH_HAS_SG_CHAIN in an
architecture specific scatterlist.h, make it a proper Kconfig option and
use that instead. At same time, remove the header files are are now
mostly useless and just include asm-generic/scatterlist.h.
[sfr@canb.auug.org.au: powerpc files now need asm/dma.h]
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86]
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [powerpc]
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A small cleanup while changing adjacent code. Extern is not needed for
functions and only one declaration had it so remove it from the odd line.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Jack Miller <millerjo@us.ibm.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is small set of patches our team has had kicking around for a few
versions internally that fixes tasks getting hung on shm_exit when there
are many threads hammering it at once.
Anton wrote a simple test to cause the issue:
http://ozlabs.org/~anton/junkcode/bust_shm_exit.c
Before applying this patchset, this test code will cause either hanging
tracebacks or pthread out of memory errors.
After this patchset, it will still produce output like:
root@somehost:~# ./bust_shm_exit 1024 160
...
INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 116, t=2111 jiffies, g=241, c=240, q=7113)
INFO: Stall ended before state dump start
...
But the task will continue to run along happily, so we consider this an
improvement over hanging, even if it's a bit noisy.
This patch (of 3):
exit_shm obtains the ipc_ns shm rwsem for write and holds it while it
walks every shared memory segment in the namespace. Thus the amount of
work is related to the number of shm segments in the namespace not the
number of segments that might need to be cleaned.
In addition, this occurs after the task has been notified the thread has
exited, so the number of tasks waiting for the ns shm rwsem can grow
without bound until memory is exausted.
Add a list to the task struct of all shmids allocated by this task. Init
the list head in copy_process. Use the ns->rwsem for locking. Add
segments after id is added, remove before removing from id.
On unshare of NEW_IPCNS orphan any ids as if the task had exited, similar
to handling of semaphore undo.
I chose a define for the init sequence since its a simple list init,
otherwise it would require a function call to avoid include loops between
the semaphore code and the task struct. Converting the list_del to
list_del_init for the unshare cases would remove the exit followed by
init, but I left it blow up if not inited.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Jack Miller <millerjo@us.ibm.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now with 64bit bzImage and kexec tools, we support ramdisk that size is
bigger than 2g, as we could put it above 4G.
Found compressed initramfs image could not be decompressed properly. It
turns out that image length is int during decompress detection, and it
will become < 0 when length is more than 2G. Furthermore, during
decompressing len as int is used for inbuf count, that has problem too.
Change len to long, that should be ok as on 32 bit platform long is
32bits.
Tested with following compressed initramfs image as root with kexec.
gzip, bzip2, xz, lzma, lzop, lz4.
run time for populate_rootfs():
size name Nehalem-EX Westmere-EX Ivybridge-EX
9034400256 root_img : 26s 24s 30s
3561095057 root_img.lz4 : 28s 27s 27s
3459554629 root_img.lzo : 29s 29s 28s
3219399480 root_img.gz : 64s 62s 49s
2251594592 root_img.xz : 262s 260s 183s
2226366598 root_img.lzma: 386s 376s 277s
2901482513 root_img.bz2 : 635s 599s
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Rashika Kheria <rashika.kheria@gmail.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: P J P <ppandit@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: "Daniel M. Weeks" <dan@danweeks.net>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This taint flag will be set if the system has ever entered a softlockup
state. Similar to TAINT_WARN it is useful to know whether or not the
system has been in a softlockup state when debugging.
[akpm@linux-foundation.org: apply the taint before calling panic()]
Signed-off-by: Josh Hunt <johunt@akamai.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the final user, and the typedef itself.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add RapidIO DMA interface routines that directly use reference to the mport
device object and/or target device destination ID as parameters.
This allows to perform RapidIO DMA transfer requests by modules that do not
have an access to the RapidIO device list.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Stef van Os <stef.van.os@prodrive-technologies.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's only used in fork.c:mm_init().
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mm initialization on fork/exec is spread all over the place, which makes
the code look inconsistent.
We have mm_init(), which is supposed to init/nullify mm's internals, but
it doesn't init all the fields it should:
- on fork ->mmap,mm_rb,vmacache_seqnum,map_count,mm_cpumask,locked_vm
are zeroed in dup_mmap();
- on fork ->pmd_huge_pte is zeroed in dup_mm(), immediately before
calling mm_init();
- ->cpu_vm_mask_var ptr is initialized by mm_init_cpumask(), which is
called before mm_init() on both fork and exec;
- ->context is initialized by init_new_context(), which is called after
mm_init() on both fork and exec;
Let's consolidate all the initializations in mm_init() to make the code
look cleaner.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
proc_uid_seq_operations, proc_gid_seq_operations and
proc_projid_seq_operations are only called in proc_id_map_open with
seq_open as const struct seq_operations so we can constify the 3
structures and update proc_id_map_open prototype.
text data bss dec hex filename
6817 404 1984 9205 23f5 kernel/user_namespace.o-before
6913 308 1984 9205 23f5 kernel/user_namespace.o-after
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 6b208e3f6e ("mm: memcg: remove unused node/section info from
pc->flags") deleted lookup_cgroup_page() but left a prototype for it.
Kill the vestigial prototype.
Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's not used anywhere today, so let's remove it.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add forward declarations for struct pglist_data, mem_cgroup.
Remove __init, __meminit from function prototypes and inline functions.
Remove redundant inclusion of bit_spinlock.h.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pages are now uncharged at release time, and all sources of batched
uncharges operate on lists of pages. Directly use those lists, and
get rid of the per-task batching state.
This also batches statistics accounting, in addition to the res
counter charges, to reduce IRQ-disabling and re-enabling.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The memcg uncharging code that is involved towards the end of a page's
lifetime - truncation, reclaim, swapout, migration - is impressively
complicated and fragile.
Because anonymous and file pages were always charged before they had their
page->mapping established, uncharges had to happen when the page type
could still be known from the context; as in unmap for anonymous, page
cache removal for file and shmem pages, and swap cache truncation for swap
pages. However, these operations happen well before the page is actually
freed, and so a lot of synchronization is necessary:
- Charging, uncharging, page migration, and charge migration all need
to take a per-page bit spinlock as they could race with uncharging.
- Swap cache truncation happens during both swap-in and swap-out, and
possibly repeatedly before the page is actually freed. This means
that the memcg swapout code is called from many contexts that make
no sense and it has to figure out the direction from page state to
make sure memory and memory+swap are always correctly charged.
- On page migration, the old page might be unmapped but then reused,
so memcg code has to prevent untimely uncharging in that case.
Because this code - which should be a simple charge transfer - is so
special-cased, it is not reusable for replace_page_cache().
But now that charged pages always have a page->mapping, introduce
mem_cgroup_uncharge(), which is called after the final put_page(), when we
know for sure that nobody is looking at the page anymore.
For page migration, introduce mem_cgroup_migrate(), which is called after
the migration is successful and the new page is fully rmapped. Because
the old page is no longer uncharged after migration, prevent double
charges by decoupling the page's memcg association (PCG_USED and
pc->mem_cgroup) from the page holding an actual charge. The new bits
PCG_MEM and PCG_MEMSW represent the respective charges and are transferred
to the new page during migration.
mem_cgroup_migrate() is suitable for replace_page_cache() as well,
which gets rid of mem_cgroup_replace_page_cache(). However, care
needs to be taken because both the source and the target page can
already be charged and on the LRU when fuse is splicing: grab the page
lock on the charge moving side to prevent changing pc->mem_cgroup of a
page under migration. Also, the lruvecs of both pages change as we
uncharge the old and charge the new during migration, and putback may
race with us, so grab the lru lock and isolate the pages iff on LRU to
prevent races and ensure the pages are on the right lruvec afterward.
Swap accounting is massively simplified: because the page is no longer
uncharged as early as swap cache deletion, a new mem_cgroup_swapout() can
transfer the page's memory+swap charge (PCG_MEMSW) to the swap entry
before the final put_page() in page reclaim.
Finally, page_cgroup changes are now protected by whatever protection the
page itself offers: anonymous pages are charged under the page table lock,
whereas page cache insertions, swapin, and migration hold the page lock.
Uncharging happens under full exclusion with no outstanding references.
Charging and uncharging also ensure that the page is off-LRU, which
serializes against charge migration. Remove the very costly page_cgroup
lock and set pc->flags non-atomically.
[mhocko@suse.cz: mem_cgroup_charge_statistics needs preempt_disable]
[vdavydov@parallels.com: fix flags definition]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Tested-by: Jet Chen <jet.chen@intel.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These patches rework memcg charge lifetime to integrate more naturally
with the lifetime of user pages. This drastically simplifies the code and
reduces charging and uncharging overhead. The most expensive part of
charging and uncharging is the page_cgroup bit spinlock, which is removed
entirely after this series.
Here are the top-10 profile entries of a stress test that reads a 128G
sparse file on a freshly booted box, without even a dedicated cgroup (i.e.
executing in the root memcg). Before:
15.36% cat [kernel.kallsyms] [k] copy_user_generic_string
13.31% cat [kernel.kallsyms] [k] memset
11.48% cat [kernel.kallsyms] [k] do_mpage_readpage
4.23% cat [kernel.kallsyms] [k] get_page_from_freelist
2.38% cat [kernel.kallsyms] [k] put_page
2.32% cat [kernel.kallsyms] [k] __mem_cgroup_commit_charge
2.18% kswapd0 [kernel.kallsyms] [k] __mem_cgroup_uncharge_common
1.92% kswapd0 [kernel.kallsyms] [k] shrink_page_list
1.86% cat [kernel.kallsyms] [k] __radix_tree_lookup
1.62% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn
After:
15.67% cat [kernel.kallsyms] [k] copy_user_generic_string
13.48% cat [kernel.kallsyms] [k] memset
11.42% cat [kernel.kallsyms] [k] do_mpage_readpage
3.98% cat [kernel.kallsyms] [k] get_page_from_freelist
2.46% cat [kernel.kallsyms] [k] put_page
2.13% kswapd0 [kernel.kallsyms] [k] shrink_page_list
1.88% cat [kernel.kallsyms] [k] __radix_tree_lookup
1.67% cat [kernel.kallsyms] [k] __pagevec_lru_add_fn
1.39% kswapd0 [kernel.kallsyms] [k] free_pcppages_bulk
1.30% cat [kernel.kallsyms] [k] kfree
As you can see, the memcg footprint has shrunk quite a bit.
text data bss dec hex filename
37970 9892 400 48262 bc86 mm/memcontrol.o.old
35239 9892 400 45531 b1db mm/memcontrol.o
This patch (of 4):
The memcg charge API charges pages before they are rmapped - i.e. have an
actual "type" - and so every callsite needs its own set of charge and
uncharge functions to know what type is being operated on. Worse,
uncharge has to happen from a context that is still type-specific, rather
than at the end of the page's lifetime with exclusive access, and so
requires a lot of synchronization.
Rewrite the charge API to provide a generic set of try_charge(),
commit_charge() and cancel_charge() transaction operations, much like
what's currently done for swap-in:
mem_cgroup_try_charge() attempts to reserve a charge, reclaiming
pages from the memcg if necessary.
mem_cgroup_commit_charge() commits the page to the charge once it
has a valid page->mapping and PageAnon() reliably tells the type.
mem_cgroup_cancel_charge() aborts the transaction.
This reduces the charge API and enables subsequent patches to
drastically simplify uncharging.
As pages need to be committed after rmap is established but before they
are added to the LRU, page_add_new_anon_rmap() must stop doing LRU
additions again. Revive lru_cache_add_active_or_unevictable().
[hughd@google.com: fix shmem_unuse]
[hughd@google.com: Add comments on the private use of -EAGAIN]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A handful of driver-related changes. We've had a bunch of them going in through
other branches as well, so it's only a part of what we really have this release.
Larger pieces are:
* Removal of a now unused PWM driver for atmel
- This includes AVR32 changes that have been appropriately acked.
* Performance counter support for the arm CCN interconnect
* OMAP mailbox driver cleanups and consolidation
* PCI and SATA PHY drivers for SPEAr 13xx platforms
* Redefinition (with backwards compatibility!) of PCI DT bindings for Tegra to
better model regulators/power.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJT5DrJAAoJEIwa5zzehBx3wkkP/iwEfEK5mMon9KEe4DcKTKNq
Z6xyWuMQNHKdfBFpABs6AsHQCKDc7KK6gN3+2zLLHEJ4XGDPZ2g2NaX3oRPJlaay
BDK7rQfIZyi4tmbOnlEv1BDTYgirYBPMwk9RyNo/04Ug3W+Y67aSVo44zkNFBWaJ
GbcX/zYsrsfvdawuQMW6V/A835s3Kq5Zhv1ikPr8gDDjswZRBAT6i7FYpBSHQ8K8
bH6C1891Xit6rxXSLXJyrtM8CAet7PtLTqNr/IKdUaJnGD+fJm5EonxW+g8gvhN8
gOEkm3nM60++kdDlzZCQVNr0m1+ih6NNCr6bDLO6rIRpAJM2O+YrN1rWuZaJOu1A
pIvifk+wWHT+o52pXk8g9fK4n/ZJydK3IBzDePHMrIROOEiW5tLE3WA+u3NSfMfH
WegMt9E2dcB+5gXPeejZ9gFbAHnh2S1oVTZfCYXtuOHrYiEU9U0FA3eRYvJEE2po
k8sdiOn7Vc65O1QZ+xZNbLABpAHaye7X2evOJyhSutzHE/AtUvT4vuCAZ0tggXyD
E1qVKngVW/NvcoFbwYeidq4bOVgiAEn3idZgF5gEq1mq7LzetXUQAcZAOQfLWHLQ
RrXufS7Ez8pSCG74y0AFReVfQH2PgWHPqGUGj99NXgQauexc/vR1Hc5Iqb8liGNJ
n6i8RqvvQ4KYcmHEXDIT
=fsP6
-----END PGP SIGNATURE-----
Merge tag 'drivers-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC driver changes from Olof Johansson:
"A handful of driver-related changes. We've had a bunch of them going
in through other branches as well, so it's only a part of what we
really have this release.
Larger pieces are:
- Removal of a now unused PWM driver for atmel
[ This includes AVR32 changes that have been appropriately acked ]
- Performance counter support for the arm CCN interconnect
- OMAP mailbox driver cleanups and consolidation
- PCI and SATA PHY drivers for SPEAr 13xx platforms
- Redefinition (with backwards compatibility!) of PCI DT bindings for
Tegra to better model regulators/power"
Note: this merge also fixes up the semantic conflict with the new
calling convention for devm_phy_create(), see commit f0ed817638 ("phy:
core: Let node ptr of PHY point to PHY and not of PHY provider") that
came in through Greg's USB tree.
Semantic merge patch by Stephen Rothwell <sfr@canb.auug.org.au> through
the next tree.
* tag 'drivers-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (38 commits)
bus: arm-ccn: Fix error handling at event allocation
mailbox/omap: add a parent structure for every IP instance
mailbox/omap: remove the private mailbox structure
mailbox/omap: consolidate OMAP mailbox driver
mailbox/omap: simplify the fifo assignment by using macros
mailbox/omap: remove omap_mbox_type_t from mailbox ops
mailbox/omap: remove OMAP1 mailbox driver
mailbox/omap: use devm_* interfaces
bus: ARM CCN: add PERF_EVENTS dependency
bus: ARM CCN PMU driver
PCI: spear: Remove spear13xx_pcie_remove()
PCI: spear: Fix Section mismatch compilation warning for probe()
ARM: tegra: Remove legacy PCIe power supply properties
PCI: tegra: Remove deprecated power supply properties
PCI: tegra: Implement accurate power supply scheme
ARM: SPEAr13xx: Update defconfigs
ARM: SPEAr13xx: Add pcie and miphy DT nodes
ARM: SPEAr13xx: Add bindings and dt node for misc block
ARM: SPEAr13xx: Fix static mapping table
phy: Add drivers for PCIe and SATA phy on SPEAr13xx
...
This is the bulk of new SoC enablement and other platform changes for 3.17:
* Samsung S5PV210 has been converted to DT and multiplatform
* Clock drivers and bindings for some of the lower-end i.MX 1/2 platforms
* Kirkwood, one of the popular Marvell platforms, is folded into the
mvebu platform code, removing mach-kirkwood.
* Hwmod data for TI AM43xx and DRA7 platforms.
* More additions of Renesas shmobile platform support
* Removal of plat-samsung contents that can be removed with S5PV210 being
multiplatform/DT-enabled and the other two old platforms being removed.
New platforms (most with only basic support right now):
* Hisilicon X5HD2 settop box chipset is introduced
* Mediatek MT6589 (mobile chipset) is introduced
* Broadcom BCM7xxx settop box chipset is introduced
+ as usual a lot other pieces all over the platform code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJT5Dp+AAoJEIwa5zzehBx3w1sP/0vjT/LQOmC8Lv8RW2Ley2ua
hNu3HcNPnT/N40JEdU9YNv3q0fdxGgcfKj011CNN+49zPSUf1xduk2wfCAk9yV50
8Sbt1PfDGm1YyUugGN420CzI431pPoM1OGXHZHkAmg+2J286RtUi3NckB//QDbCY
QhEjhpYc9SXhAOCGwmB4ab7thOljOFSPzKTLMTu3+PNI5zRPRgkDkt6w9XlsAYmB
nuR271BnzsROkMzAjycwaJ3kdim7wqrMRfk8g96o0jHSF5qf4zsT5uWYYAjTxdUQ
8Ajz6zjeHe4+95TwTDcq+lCX6rDLZgwkvCAc6hFbeg0uR7Dyek0h6XMEYtwdjaiU
KNPwOENrYdENNDAGRpkFp1x4h/rY9Plfru0bBo5o6t7aPBvmNeCDzRtlTtLiUNDV
dG8sfDMtrS/wFHVjylDSQ60Mb+wuW0XneC8D7chY/iRhIllUYi6YXXvt+/tH5C20
oYDOWqqcDFSb0sJhE5pn4KBV82ZaHx9jMBWGLl+erg2sDX/SK8SxOkLqKYZKtKB5
0leOGE3Y+C70xt3G9HftLz2sAvvt+C8UPsApPT+dHNE401TWJOYx6LphPkQKjeeK
P1iwKi+It3l+FaBypgJy/LeMQRy7EyvDBK2I5WoVL/R2qq14EmP1ui3Tthjj0bhq
tBBof6P9c8OnRVj1Lz3R
=5TJ6
-----END PGP SIGNATURE-----
Merge tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC platform changes from Olof Johansson:
"This is the bulk of new SoC enablement and other platform changes for
3.17:
- Samsung S5PV210 has been converted to DT and multiplatform
- Clock drivers and bindings for some of the lower-end i.MX 1/2
platforms
- Kirkwood, one of the popular Marvell platforms, is folded into the
mvebu platform code, removing mach-kirkwood
- Hwmod data for TI AM43xx and DRA7 platforms
- More additions of Renesas shmobile platform support
- Removal of plat-samsung contents that can be removed with S5PV210
being multiplatform/DT-enabled and the other two old platforms
being removed
New platforms (most with only basic support right now):
- Hisilicon X5HD2 settop box chipset is introduced
- Mediatek MT6589 (mobile chipset) is introduced
- Broadcom BCM7xxx settop box chipset is introduced
+ as usual a lot other pieces all over the platform code"
* tag 'soc-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (240 commits)
ARM: hisi: remove smp from machine descriptor
power: reset: move hisilicon reboot code
ARM: dts: Add hix5hd2-dkb dts file.
ARM: debug: Rename Hi3716 to HIX5HD2
ARM: hisi: enable hix5hd2 SoC
ARM: hisi: add ARCH_HISI
MAINTAINERS: add entry for Broadcom ARM STB architecture
ARM: brcmstb: select GISB arbiter and interrupt drivers
ARM: brcmstb: add infrastructure for ARM-based Broadcom STB SoCs
ARM: configs: enable SMP in bcm_defconfig
ARM: add SMP support for Broadcom mobile SoCs
Documentation: arm: misc updates to Marvell EBU SoC status
Documentation: arm: add URLs to public datasheets for the Marvell Armada XP SoC
ARM: mvebu: fix build without platforms selected
ARM: mvebu: add cpuidle support for Armada 38x
ARM: mvebu: add cpuidle support for Armada 370
cpuidle: mvebu: add Armada 38x support
cpuidle: mvebu: add Armada 370 support
cpuidle: mvebu: rename the driver from armada-370-xp to mvebu-v7
ARM: mvebu: export the SCU address
...
This merge window brings a good size of cleanups on various
platforms. Among the bigger ones:
* Removal of Samsung s5pc100 and s5p64xx platforms. Both of these have
lacked active support for quite a while, and after asking around nobody
showed interest in keeping them around. If needed, they could be
resurrected in the future but it's more likely that we would prefer
reintroduction of them as DT and multiplatform-enabled platforms
instead.
* OMAP4 controller code register define diet. They defined a lot of registers
that were never actually used, etc.
* Move of some of the Tegra platform code (PMC, APBIO, fuse, powergate)
to drivers/soc so it can be shared with 64-bit code. This also converts them
over to traditional driver models where possible.
* Removal of legacy gpio-samsung driver, since the last users have been
removed (moved to pinctrl)
Plus a bunch of smaller changes for various platforms that sort of
dissapear in the diffstat for the above. clps711x cleanups, shmobile
header file refactoring/moves for multiplatform friendliness, some misc
cleanups, etc.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJT5DYPAAoJEIwa5zzehBx37egQAIiatNiLLqZnfo3rwGADRz/a
POfPovktj68aPcobyzoyhFtToMqGvi9PpysyFTIQD2HJFG+5BtiIAuqtg0875zDe
EpBWgsfugrm0YktJWAtUerj60oAmNPbKfaEm1cOOWuM2lb2mV+QkRrwSTAgsqkT7
927BzMXKKBRPOVLL0RYhoF8EXa0Eg8kCqAHP8fJrzVYkRp+UrZJDnGiUP1XmWJN+
VXQMu5SEjcPMtqT7+tfX455RfREHJfBcJ1ZN/dPF8HMWDwClQG0lyc6hifh1MxwO
8DjIZNkfZeKqgDqVyC17re7pc7p8md5HL8WXbrKpK0A9vQ5bRexbPHxcwJ1T/C2Y
465H+st5XXbuzV1gbMwjK1/ycsH0tCyffckk8Yl/2e1Fs7GgPNbAELtTdl+5vV1Y
xmDXkyo/9WlRM3LQ23IGKwW7VzN86EfWVuShssfro0fO7xDdb4OOYLdQI+4bCG+h
ytQYun1vU32OEyNik5RVNQuZaMrv2c93a3bID4owwuPHPmYOPVUQaqnRX/0E51eA
aHZYbk2GlUOV3Kq5aSS4iyLg1Yj+I9/NeH9U+A4nc+PQ5FlgGToaVSCuYuw4DqbP
AAG+sqQHbkBMvDPobQz/yd1qZbAb4eLhGy11XK1t5S65rApWI55GwNXnvbyxqt8x
wpmxJTASGxcfuZZgKXm7
=gbcE
-----END PGP SIGNATURE-----
Merge tag 'cleanup-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC cleanups from Olof Johansson:
"This merge window brings a good size of cleanups on various platforms.
Among the bigger ones:
- Removal of Samsung s5pc100 and s5p64xx platforms. Both of these
have lacked active support for quite a while, and after asking
around nobody showed interest in keeping them around. If needed,
they could be resurrected in the future but it's more likely that
we would prefer reintroduction of them as DT and
multiplatform-enabled platforms instead.
- OMAP4 controller code register define diet. They defined a lot of
registers that were never actually used, etc.
- Move of some of the Tegra platform code (PMC, APBIO, fuse,
powergate) to drivers/soc so it can be shared with 64-bit code.
This also converts them over to traditional driver models where
possible.
- Removal of legacy gpio-samsung driver, since the last users have
been removed (moved to pinctrl)
Plus a bunch of smaller changes for various platforms that sort of
dissapear in the diffstat for the above. clps711x cleanups, shmobile
header file refactoring/moves for multiplatform friendliness, some
misc cleanups, etc"
* tag 'cleanup-for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (117 commits)
drivers: CCI: Correct use of ! and &
video: clcd-versatile: Depend on ARM
video: fix up versatile CLCD helper move
MAINTAINERS: Add sdhci-st file to ARCH/STI architecture
ARM: EXYNOS: Fix build breakge with PM_SLEEP=n
MAINTAINERS: Remove Kirkwood
ARM: tegra: Convert PMC to a driver
soc/tegra: fuse: Set up in early initcall
ARM: tegra: Always lock the CPU reset vector
ARM: tegra: Setup CPU hotplug in a pure initcall
soc/tegra: Implement runtime check for Tegra SoCs
soc/tegra: fuse: fix dummy functions
soc/tegra: fuse: move APB DMA into Tegra20 fuse driver
soc/tegra: Add efuse and apbmisc bindings
soc/tegra: Add efuse driver for Tegra
ARM: tegra: move fuse exports to soc/tegra/fuse.h
ARM: tegra: export apb dma readl/writel
ARM: tegra: Use a function to get the chip ID
ARM: tegra: Sort includes alphabetically
ARM: tegra: Move includes to include/soc/tegra
...
The existing vfio_pci_open() fails upon error returned from
vfio_spapr_pci_eeh_open(), which breaks POWER7's P5IOC2 PHB
support which this patch brings back.
The patch fixes the issue by dropping the return value of
vfio_spapr_pci_eeh_open().
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
The VFIO related components could be built as dynamic modules.
Unfortunately, CONFIG_EEH can't be configured to "m". The patch
fixes the build errors when configuring VFIO related components
as dynamic modules as follows:
CC [M] drivers/vfio/vfio_iommu_spapr_tce.o
In file included from drivers/vfio/vfio.c:33:0:
include/linux/vfio.h:101:43: warning: ‘struct pci_dev’ declared \
inside parameter list [enabled by default]
:
WRAP arch/powerpc/boot/zImage.pseries
WRAP arch/powerpc/boot/zImage.maple
WRAP arch/powerpc/boot/zImage.pmac
WRAP arch/powerpc/boot/zImage.epapr
MODPOST 1818 modules
ERROR: ".vfio_spapr_iommu_eeh_ioctl" [drivers/vfio/vfio_iommu_spapr_tce.ko]\
undefined!
ERROR: ".vfio_spapr_pci_eeh_open" [drivers/vfio/pci/vfio-pci.ko] undefined!
ERROR: ".vfio_spapr_pci_eeh_release" [drivers/vfio/pci/vfio-pci.ko] undefined!
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Some semi-mt drivers use the slots in a manual way, but may still
want to call parts of the frame synchronization logic. This patch
makes input_mt_drop_unused callable from those drivers.
Signed-off-by: Henrik Rydberg <rydberg@euromail.se>
Reviewed-by: Benson Leung <bleung@chromium.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
When CONFIG_TRACING is not enabled, there's no reason to save the trace
strings either by the linker or as a static variable that can be
referenced later. Simply pass back the string that is given to
tracepoint_string().
Had to move the define to include/linux/tracepoint.h so that it is still
visible when CONFIG_TRACING is not set.
Link: http://lkml.kernel.org/p/1406318733-26754-2-git-send-email-nicolas.pitre@linaro.org
Suggested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pull DRM updates from Dave Airlie:
"Like all good pull reqs this ends with a revert, so it must mean we
tested it,
[ Ed. That's _one_ way of looking at it ]
This pull is missing nouveau, Ben has been stuck trying to track down
a very longstanding bug that revealed itself due to some other
changes. I've asked him to send you a direct pull request for nouveau
once he cleans things up. I'm away until Monday so don't want to
delay things, you can make a decision on that when he sends it, I have
my phone so I can ack things just not really merge much.
It has one trivial conflict with your tree in armada_drv.c, and also
the pull request contains some component changes that are already in
your tree, the base tree from Russell went via Greg's tree already,
but some stuff still shows up in here that doesn't when I merge my
tree into yours.
Otherwise all pretty standard graphics fare, one new driver and
changes all over the place.
New drivers:
- sti kms driver for STMicroelectronics chipsets stih416 and stih407.
core:
- lots of cleanups to the drm core
- DP MST helper code merged
- universal cursor planes.
- render nodes enabled by default
panel:
- better panel interfaces
- new panel support
- non-continuous cock advertising ability
ttm:
- shrinker fixes
i915:
- hopefully ditched UMS support
- runtime pm fixes
- psr tracking and locking - now enabled by default
- userptr fixes
- backlight brightness fixes
- MST support merged
- runtime PM for dpms
- primary planes locking fixes
- gen8 hw semaphore support
- fbc fixes
- runtime PM on SOix sleep state hw.
- mmio base page flipping
- lots of vlv/chv fixes.
- universal cursor planes
radeon:
- Hawaii fixes
- display scalar support for non-fixed mode displays
- new firmware format support
- dpm on more asics by default
- GPUVM improvements
- uncached and wc GTT buffers
- BOs > visible VRAM
exynos:
- i80 interface support
- module auto-loading
- ipp driver consolidated.
armada:
- irq handling in crtc layer only
- crtc renumbering
- add component support
- DT interaction changes.
tegra:
- load as module fixes
- eDP bpp and sync polarity fixed
- DSI non-continuous clock mode support
- better support for importing buffers from nouveau
msm:
- mdp5/adq8084 v1.3 hw enablement
- devicetree clk changse
- ifc6410 board working
tda998x:
- component support
- DT documentation update
vmwgfx:
- fix compat shader namespace"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (551 commits)
Revert "drm: drop redundant drm_file->is_master"
drm/panel: simple: Use devm_gpiod_get_optional()
drm/dsi: Replace upcasting macro by function
drm/panel: ld9040: Replace upcasting macro by function
drm/exynos: dp: Modify driver to support drm_panel
drm/exynos: Move DP setup into commit()
drm/panel: simple: Add AUO B133HTN01 panel support
drm/panel: simple: Support delays in panel functions
drm/panel: simple: Add proper definition for prepare and unprepare
drm/panel: s6e8aa0: Add proper definition for prepare and unprepare
drm/panel: ld9040: Add proper definition for prepare and unprepare
drm/tegra: Add support for panel prepare and unprepare routines
drm/exynos: dsi: Add support for panel prepare and unprepare routines
drm/exynos: dpi: Add support for panel prepare and unprepare routines
drm/panel: simple: Add dummy prepare and unprepare routines
drm/panel: s6e8aa0: Add dummy prepare and unprepare routines
drm/panel: ld9040: Add dummy prepare and unprepare routines
drm/panel: Provide convenience wrapper for .get_modes()
drm/panel: add .prepare() and .unprepare() functions
drm/panel: simple: Remove simple-panel compatible
...
development cycle:
- Get rid of the .disable() callback from the driver callback
vtable. This callback was abused and counterintuitive since
a pin or group of pins can be said to always be in some
setting, and never really disabled. We now only enable a
certain muxing, and move between some certain muxings, we
never "disable" a mux setting.
- Some janitorial moving the MSM, Samsung and Nomadik and
drivers to their own subdirectories for a clearer view in
the subsystem. This will continue.
- Kill off the use of the return value from gpiochip_remove(),
this will be done in parallel in the GPIO subsystem and
hopefully not trigger too many unchecked return value
warnings before we get rid of this altogether.
- A huge set of changes and improvements to the Allwinner
sunxi drivers especially for their latest A23 and A31 SoCs,
and some ground work for the new sun8i platform family.
- A large set of Rockchip driver improvements adding support
for the RK3288 SoC.
- Advances in migration of older Freescale platforms to pin
control, especially i.MX1.
- Samsung and Exynos improvements.
- Support for the Qualcomm MSM8960 SoC.
- Use the gpiolib irqchip helpers for the ST SPEAr and
Intel Baytrail drivers.
- A bunch of nice janitorial work done with cppcheck.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJT41HwAAoJEEEQszewGV1zVN0QAMJOJcsjYyHG+b/y0upJ5n1c
tyPelyxKrpGUUTvTsO5LiqvoIfa2E/DrwXupRAC4zAXvb+x3TUGkiluK4Yxl5e56
AqjePnSydqqHiRZOK4Q06W7VwGUoxLltDmDPTcra+DAaijIeKUPMQE1MvcPxisMe
IR7PZN58JYCG3ZV5yjwfBBxcRAm8KiiwHvQdBywPIGGvvmpy1X+U96U869nQgUH2
70lpJVPx75bhyAFk99bE9nAnroZeRR7mvijjf26ssyAFNqeJ0K7Xlom+NtpHdiw0
lsDKdBiAWVbZON/7Pc24gpHzhBoIYdA/6LxPA8Xz4QVFRmfxmNkZhuXZnZ7Dbuj2
xv9HtnjExqjZcfeNyUlO0iQDEQIUN/oPkaBS2G8DNZ/bmQqC8EzkIFh6F72KO1s2
7FU214LcuBYuAa3HvNLmgtjSkgou8tTMj58rnZ1XDr2mI9tzlrwI3i6ZrJZWKDur
NIoRAcUZkFiMpXxqLbk4UXzDvuJgrzaFiQ2PkxTXAlC2DjXz+gXPzPIOSD5LTaHE
k3WvZfuGK2iPoKeDHaLx2qEl9PoD5hz1JH3o+bgOKkRZG2gEJWd02JwhuPyRPLfc
TeBgmdYS2t2MBS21VsqoObw8336oCHJu7tDAxTkAalLsZy9MV2WqThhZoYggZ/Rq
yMqCr8vfd2pRVtwYe7Wc
=t1pR
-----END PGP SIGNATURE-----
Merge tag 'pinctrl-v3.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pinctrl updates from Linus Walleij:
"This is the bulk pin control changes for the v3.17 merge development
cycle:
- get rid of the .disable() callback from the driver callback vtable.
This callback was abused and counterintuitive since a pin or group
of pins can be said to always be in some setting, and never really
disabled. We now only enable a certain muxing, and move between
some certain muxings, we never "disable" a mux setting
- some janitorial moving the MSM, Samsung and Nomadik and drivers to
their own subdirectories for a clearer view in the subsystem. This
will continue
- kill off the use of the return value from gpiochip_remove(), this
will be done in parallel in the GPIO subsystem and hopefully not
trigger too many unchecked return value warnings before we get rid
of this altogether
- a huge set of changes and improvements to the Allwinner sunxi
drivers especially for their latest A23 and A31 SoCs, and some
ground work for the new sun8i platform family
- a large set of Rockchip driver improvements adding support for the
RK3288 SoC
- advances in migration of older Freescale platforms to pin control,
especially i.MX1
- Samsung and Exynos improvements
- support for the Qualcomm MSM8960 SoC
- use the gpiolib irqchip helpers for the ST SPEAr and Intel Baytrail
drivers
- a bunch of nice janitorial work done with cppcheck"
* tag 'pinctrl-v3.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (61 commits)
pinctrl: baytrail: Convert to use gpiolib irqchip
pinctrl: sunxi: number gpio ranges starting from 0
pinctrl: sunxi: use gpiolib API to mark a GPIO used as an IRQ
pinctrl: rockchip: add drive-strength control for rk3288
pinctrl: rockchip: add separate type for rk3288
pinctrl: rockchip: set is_generic in pinconf_ops
pinctrl: msm: drop negativity check on unsigned value
pinctrl: remove all usage of gpio_remove ret val in driver/pinctl
pinctrl: qcom: Make muxing of gpio function explicit
pinctrl: nomadik: move all Nomadik drivers to subdir
pinctrl: samsung: Group all drivers in a sub-dir
sh-pfc: sh73a0: Introduce the use of devm_regulator_register
sh-pfc: Add renesas,pfc-r8a7791 to binding documentation
pinctrl: msm: move all qualcomm drivers to subdir
pinctrl: msm: Add msm8960 definitions
pinctrl: samsung: Allow pin value to be initialized using pinfunc
pinctrl: samsung: Allow grouping multiple pinmux/pinconf nodes
pinctrl: exynos: Consolidate irq_chips of GPIO and WKUP EINTs
pinctrl: samsung: Handle GPIO request and free using pinctrl helpers
pinctrl: samsung: Decouple direction setting from pinctrl
...
- Checkpatch fixes throughout the subsystem
- Use Regmap to handle IRQs in max77686, extcon-max77693 and mc13xxx-core
- Use DMA in rtsx_pcr
- Restrict building on unsupported architectures on timberdale, cs5535
- SPI hardening in cros_ec_spi
- More robust error handing in asic3, cros_ec, ab8500-debugfs,
max77686 and pcf50633-core
- Reorder PM runtime and regulator handing during shutdown in arizona
- Enable wakeup in cros_ec_spi
- Unused variable/code clean-up in pm8921-core, cros_ec, htc-i2cpld,
tps65912-spi, wm5110-tables and ab8500-debugfs
- Add regulator handing into suspend() in sec-core
- Remove pointless wrapper functions in extcon-max77693 and i2c-cros-ec-tunnel
- Use cross-architecture friendly data sizes in stmpe-i2c, arizona,
max77686 and tps65910
- Device Tree documentation updates throughout
- Provide power management support in max77686
- Few OF clean-ups in max77686
- Use manged resources in tps6105x
== New drivers/supported devices ==
- Add support for s2mpu02 to sec-core
- Add support for Allwinner A32 to sun6i-prcm
- Add support for Maxim 77802 in max77686
- Add support for DA9063 AD in da9063
- Add new driver for Intel PMICs (generic) and specifically Crystal Cove
== (Re-)moved drivers ==
- Move out keyboard functionality cros_ec ==> input/keyboard/cros_ec_keyb
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJT40p7AAoJEFGvii+H/HdhTo0P/1GuZyvCAJCeqt2oN1gcloIe
Hgf5rEo/PVPh3T9vHA7GCbWhgtdxfJI8FxrQYvU7Dw5cEMlmvl5p/ZHNPIProv97
uI59JO67roLpXZP+aYX8BzXcplYkaR/ah16o/ePtaOCwGrXDz+TtJiHEVVN/8bAG
PWsdcDNBC8byP7BZ/8zFdu6pX4800eRZ0KgeBH+u4k6UDor7M6LkQrxF1hJhU1Bv
z14Q2wKQufhbcyEtQWcYc6M8hignD1Ioyd4I8mnEJs0EUiABfGUEk/K/G4Z5Q7Sv
eRIEPZCd1CEBKD5JQcPXyE1QGdG9GiD15PLmctPA4VY1V+9c5/Hoq0TLoxlAQNWA
gUr7WSqJ+KT2Nch0WVr/MdP8l0jPYfboWbsd/apj4GK0/9quwJNkGUxx0mCdCXyg
9ylitwUrmlrd4CEKjybfEuTQB52Jvcdq24fnNYHHn1TGppZH6w7LVvdwSW7UcjF0
Y48hTImYYnVAlWl5lE5xVQTWD/3hseAcoWTsdSORSWJbkCfAhJUg/Gn5bH/Fkwhs
/aWYPvkF+m47PoudZ9Z8qB5OTO4uz/Q9uEBBf2/k4Yy95vl2IZdy9VqS5tYG67e7
LLdAZvG5hjEwDi3OwcwGSdZ/kRB5Hgq/YvpqjItle86CKj0ECdAqL/PfqLISgJq9
x3zSuWMRLcNoyhc2HnBj
=2cNI
-----END PGP SIGNATURE-----
Merge tag 'mfd-for-linus-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD update from Lee Jones:
"Changes to existing drivers:
- checkpatch fixes throughout the subsystem
- use Regmap to handle IRQs in max77686, extcon-max77693 and
mc13xxx-core
- use DMA in rtsx_pcr
- restrict building on unsupported architectures on timberdale,
cs5535
- SPI hardening in cros_ec_spi
- more robust error handing in asic3, cros_ec, ab8500-debugfs,
max77686 and pcf50633-core
- reorder PM runtime and regulator handing during shutdown in arizona
- enable wakeup in cros_ec_spi
- unused variable/code clean-up in pm8921-core, cros_ec, htc-i2cpld,
tps65912-spi, wm5110-tables and ab8500-debugfs
- add regulator handing into suspend() in sec-core
- remove pointless wrapper functions in extcon-max77693 and
i2c-cros-ec-tunnel
- use cross-architecture friendly data sizes in stmpe-i2c, arizona,
max77686 and tps65910
- devicetree documentation updates throughout
- provide power management support in max77686
- few OF clean-ups in max77686
- use manged resources in tps6105x
New drivers/supported devices:
- add support for s2mpu02 to sec-core
- add support for Allwinner A32 to sun6i-prcm
- add support for Maxim 77802 in max77686
- add support for DA9063 AD in da9063
- new driver for Intel PMICs (generic) and specifically Crystal Cove
(Re-)moved drivers ==
- move out keyboard functionality cros_ec ==> input/keyboard/cros_ec_keyb"
* tag 'mfd-for-linus-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (101 commits)
MAINTAINERS: Update MFD repo location
mfd: omap-usb-host: Fix improper mask use.
mfd: arizona: Only free the CTRLIF_ERR IRQ if we requested it
mfd: arizona: Add missing handling for ISRC3 under/overclocked
mfd: wm5110: Add new interrupt register definitions
mfd: arizona: Rename thermal shutdown interrupt
mfd: wm5110: Add in the output done interrupts
mfd: wm5110: Remove non-existant interrupts
mfd: tps65912-spi: Remove unused variable
mfd: htc-i2cpld: Remove unused code
mfd: da9063: Add support for AD silicon variant
mfd: arizona: Map MICVDD from extcon device to the Arizona core
mfd: arizona: Add MICVDD to mapped regulators for wm8997
mfd: max77686: Ensure device type IDs are architecture agnostic
mfd: max77686: Add Maxim 77802 PMIC support
mfd: tps6105x: Use managed resources when allocating memory
mfd: wm8997-tables: Suppress 'line over 80 chars' warnings
mfd: kempld-core: Correct a variety of checkpatch warnings
mfd: ipaq-micro: Fix coding style errors/warnings reported by checkpatch
mfd: si476x-cmd: Remedy checkpatch style complains
...
There are a few d_obtain_alias callers that are using it to get the
root of a filesystem which may already have an alias somewhere else.
This is not the same as the filehandle-lookup case, and none of them
actually need DCACHE_DISCONNECTED set.
It isn't really a serious problem, but it would really be clearer if we
reserved DCACHE_DISCONNECTED for those cases where it's actually needed.
In the btrfs case this was causing a spurious printk from
nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED
dentry. Josef worked around this by unsetting DCACHE_DISCONNECTED
manually in 3a0dfa6a12 "Btrfs: unset DCACHE_DISCONNECTED when mounting
default subvol", and this replaces that workaround.
Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Rather than playing silly buggers with vfsmount refcounts, just have
acct_on() ask fs/namespace.c for internal clone of file->f_path.mnt
and replace it with said clone. Then attach the pin to original
vfsmount. Voila - the clone will be alive until the file gets closed,
making sure that underlying superblock remains active, etc., and
we can drop the original vfsmount, so that it's not kept busy.
If the file lives until the final mntput of the original vfsmount,
we'll notice that there's an fs_pin (one in bsd_acct_struct that
holds that file) and mnt_pin_kill() will take it out. Since
->kill() is synchronous, we won't proceed past that point until
these files are closed (and private clones of our vfsmount are
gone), so we get the same ordering warranties we used to get.
mnt_pin()/mnt_unpin()/->mnt_pinned is gone now, and good riddance -
it never became usable outside of kernel/acct.c (and racy wrt
umount even there).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
These externs belong in fs/internal.h. Rename (they are not acct-specific
anymore) and move them over there.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a new field to fs_pin - kill(pin). That's what umount and r/o remount
will be calling for all pins attached to vfsmount and superblock resp.
Called after bumping the refcount, so it won't go away under us. Dropping
the refcount is responsibility of the instance. All generic stuff moved to
fs/fs_pin.c; the next step will rip all the knowledge of kernel/acct.c from
fs/super.c and fs/namespace.c. After that - death to mnt_pin(); it was
intended to be usable as generic mechanism for code that wants to attach
objects to vfsmount, so that they would not make the sucker busy and
would get killed on umount. Never got it right; it remained acct.c-specific
all along. Now it's very close to being killable.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Put these suckers on per-vfsmount and per-superblock lists instead.
Note: right now it's still acct_lock for everything, but that's
going to change.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
they had small conflicts (respectively within KVM documentation,
and with 3.16-rc changes). Since they were all within the subsystem,
I took care of them.
Stephen Rothwell reported some snags in PPC builds, but they are all
fixed now; the latest linux-next report was clean.
New features for ARM include:
- KVM VGIC v2 emulation on GICv3 hardware
- Big-Endian support for arm/arm64 (guest and host)
- Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)
And for PPC:
- Book3S: Good number of LE host fixes, enable HV on LE
- Book3S HV: Add in-guest debug support
This release drops support for KVM on the PPC440. As a result, the
PPC merge removes more lines than it adds. :)
I also included an x86 change, since Davidlohr tied it to an independent
bug report and the reporter quickly provided a Tested-by; there was no
reason to wait for -rc2.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJT4iIJAAoJEBvWZb6bTYbyZqoP/3Wxy8NWPFJ8HGt81NHlGnDS
a9UbL7EibcOEG+aaKqmtBglTD5YDiGBDNCxxiSJaDHt+grLN4fsWIliJob1nJFoO
90f89EWN2XjeCrJXA5nUoeg5tpc5OoYKsiP6pTgzIwkP8vvs/H1+zpcTS/UmYsr/
qipVMMsM+zZeHWZcSbqjW88z7YqIn1sr5282wJ85cbyv4KGizb/G4dyPuDqLb6np
hkAD8Ah6VV2suQ2FSy7G2fg20R0vglUi60hkEHLoCBPVqJCl7SmC8MvxNbjBnP8S
J36R0R0u1wHYKzAGooLJGVOZ/o/gSiVqKX+++L2EvJBN+kuA6u/7fxLyBT+LwDAE
IF/Aln5rpg1fe+eywvhz86WljTVEQ8bO1zVsIQUPY+/ZOPedZHMwyvXft8ogbjSp
2m9OJ/3e8Aggh0OeHpCDoeow+QDUXvX0YdCw+2Yh0p+7VMXqkyp0QEiBu38jrusC
rB3VNifJbDSWLKdG9LfCAPHnxZD2XYEwv2WFBo6KQOGMGHfx0GXpCOL/jQihrhA6
HtEG5Bs3lvnHQemdpUZ58xojiABbMaUPdcnPXQQEp23WhZzrfLMLzqVG0VYnhSsC
9pi7MJj8c31rqx5WU2oRM28i/BvNxN0NCtkDpineO5s3f89Ws1xnwxqlm38AKP0J
irJQTYFEqec+GM9JK1rG
=hyQP
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull second round of KVM changes from Paolo Bonzini:
"Here are the PPC and ARM changes for KVM, which I separated because
they had small conflicts (respectively within KVM documentation, and
with 3.16-rc changes). Since they were all within the subsystem, I
took care of them.
Stephen Rothwell reported some snags in PPC builds, but they are all
fixed now; the latest linux-next report was clean.
New features for ARM include:
- KVM VGIC v2 emulation on GICv3 hardware
- Big-Endian support for arm/arm64 (guest and host)
- Debug Architecture support for arm64 (arm32 is on Christoffer's todo list)
And for PPC:
- Book3S: Good number of LE host fixes, enable HV on LE
- Book3S HV: Add in-guest debug support
This release drops support for KVM on the PPC440. As a result, the
PPC merge removes more lines than it adds. :)
I also included an x86 change, since Davidlohr tied it to an
independent bug report and the reporter quickly provided a Tested-by;
there was no reason to wait for -rc2"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (122 commits)
KVM: Move more code under CONFIG_HAVE_KVM_IRQFD
KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use
KVM: nVMX: Fix nested vmexit ack intr before load vmcs01
KVM: PPC: Enable IRQFD support for the XICS interrupt controller
KVM: Give IRQFD its own separate enabling Kconfig option
KVM: Move irq notifier implementation into eventfd.c
KVM: Move all accesses to kvm::irq_routing into irqchip.c
KVM: irqchip: Provide and use accessors for irq routing table
KVM: Don't keep reference to irq routing table in irqfd struct
KVM: PPC: drop duplicate tracepoint
arm64: KVM: fix 64bit CP15 VM access for 32bit guests
KVM: arm64: GICv3: mandate page-aligned GICV region
arm64: KVM: GICv3: move system register access to msr_s/mrs_s
KVM: PPC: PR: Handle FSCR feature deselects
KVM: PPC: HV: Remove generic instruction emulation
KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr
KVM: PPC: Remove DCR handling
KVM: PPC: Expose helper functions for data/inst faults
KVM: PPC: Separate loadstore emulation from priv emulation
KVM: PPC: Handle magic page in kvmppc_ld/st
...
Pull powerpc updates from Ben Herrenschmidt:
"This is the powerpc new goodies for 3.17. The short story:
The biggest bit is Michael removing all of pre-POWER4 processor
support from the 64-bit kernel. POWER3 and rs64. This gets rid of a
ton of old cruft that has been bitrotting in a long while. It was
broken for quite a few versions already and nobody noticed. Nobody
uses those machines anymore. While at it, he cleaned up a bunch of
old dusty cabinets, getting rid of a skeletton or two.
Then, we have some base VFIO support for KVM, which allows assigning
of PCI devices to KVM guests, support for large 64-bit BARs on
"powernv" platforms, support for HMI (Hardware Management Interrupts)
on those same platforms, some sparse-vmemmap improvements (for memory
hotplug),
There is the usual batch of Freescale embedded updates (summary in the
merge commit) and fixes here or there, I think that's it for the
highlights"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (102 commits)
powerpc/eeh: Export eeh_iommu_group_to_pe()
powerpc/eeh: Add missing #ifdef CONFIG_IOMMU_API
powerpc: Reduce scariness of interrupt frames in stack traces
powerpc: start loop at section start of start in vmemmap_populated()
powerpc: implement vmemmap_free()
powerpc: implement vmemmap_remove_mapping() for BOOK3S
powerpc: implement vmemmap_list_free()
powerpc: Fail remap_4k_pfn() if PFN doesn't fit inside PTE
powerpc/book3s: Fix endianess issue for HMI handling on napping cpus.
powerpc/book3s: handle HMIs for cpus in nap mode.
powerpc/powernv: Invoke opal call to handle hmi.
powerpc/book3s: Add basic infrastructure to handle HMI in Linux.
powerpc/iommu: Fix comments with it_page_shift
powerpc/powernv: Handle compound PE in config accessors
powerpc/powernv: Handle compound PE for EEH
powerpc/powernv: Handle compound PE
powerpc/powernv: Split ioda_eeh_get_state()
powerpc/powernv: Allow to freeze PE
powerpc/powernv: Enable M64 aperatus for PHB3
powerpc/eeh: Aux PE data for error log
...
linux/types.h and linux/list.h should be included so the typed used in
the header file are always properly declared.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Reported-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Mostly some cleanup all over the place. Pitch alignment limitations of
the display controller are now honored and job submission is 64-bit
safe.
The SOR output (used for eDP) properly configures sync signal polarities
according to the display mode rather than hard-coding them to some value
and the number of bits per color is now taken from the panel rather than
hard-coded to properly support 24-bit vs. 18-bit panels.
The DSI controller now properly supports non-continuous clock mode.
GEM objects can now have their flags and tiling mode modified via IOCTLs
to allow buffers imported from Nouveau to be properly displayed. Newer
generations of the Tegra display controller can also detile block linear
buffers at scan-out time.
Finally the driver now properly exports MODULE_DEVICE_TABLEs to allow it
to be automatically loaded when built as a module.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJT4kgqAAoJEN0jrNd/PrOhwSUP/ja3Szs8mdKzRtDcmUla7pIU
g6dVTrXibruWQtB4Mf0Zv/7vVHDAuVW3q01lRHamud9g+Ci74YDsd99AJuguBQqy
m0wNm2GFye64pcc/QdpmpH6zUEtt/eULxKYWco3jxGaMtH+v051qIyXE7JhS2KNo
vE0XUV6bQPQWfhBr94IqJL79lO/DaNOKYuGATh3g6o/D/R+EFjNod7vZ7wGjpzoG
LFqCsTfMA80sv/S19GLCwX/m/DbVVjPHPl1AxRNz5KhWXguo+teQ+0hKJ3t635IE
xkW9GJ5Ghuq524sLNY+kEwAiuGVOotFIFqy9lPFNQllp7tyyK0uaWaQHbINYPso9
3QW4qdGfoioYzg6OzpLMQLwiHwl37z9oaDRodb+tZr9DGfrZ2GihTJwZIqLezBhD
KGwDUb6/mexuotn6bg3QavFGgGKxrzb/rmiC/o4wJie6gSw3fJLzhOs1fgcus5p8
NcSRWkl90wbuoVE2h9I3mdHi4A+dAkAR9K6e8UxQCGfJwfisSeYABqh/UvRxNs3u
mp8fIBbEmUKr6Kznaehw6VBy4xd6yp6hpjMJjPClNS4oErXuzhvZdoKjma7Jg6wj
7KFNSs1R1euNRjRX2J7ITPz7sIkjac5Ms2ejeniG5+RpqWp4c54G2MxuhvbFlVEA
GOZYuKjSn6IEOkR5cx/X
=qYgE
-----END PGP SIGNATURE-----
Merge tag 'drm/tegra/for-3.17-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next
drm/tegra: Changes for v3.17-rc1
Mostly some cleanup all over the place. Pitch alignment limitations of
the display controller are now honored and job submission is 64-bit
safe.
The SOR output (used for eDP) properly configures sync signal polarities
according to the display mode rather than hard-coding them to some value
and the number of bits per color is now taken from the panel rather than
hard-coded to properly support 24-bit vs. 18-bit panels.
The DSI controller now properly supports non-continuous clock mode.
GEM objects can now have their flags and tiling mode modified via IOCTLs
to allow buffers imported from Nouveau to be properly displayed. Newer
generations of the Tegra display controller can also detile block linear
buffers at scan-out time.
Finally the driver now properly exports MODULE_DEVICE_TABLEs to allow it
to be automatically loaded when built as a module.
* tag 'drm/tegra/for-3.17-rc1' of git://anongit.freedesktop.org/tegra/linux:
drm/tegra: add MODULE_DEVICE_TABLEs
drm/tegra: dc - Reset controller on driver remove
drm/tegra: Properly align stride for framebuffers
drm/tegra: sor - Configure proper sync polarities
drm/tegra: sor - Use bits-per-color from panel
drm/tegra: Make job submission 64-bit safe
drm/tegra: Allow non-authenticated processes to create buffer objects
drm/tegra: Add SET/GET_FLAGS IOCTLs
drm/tegra: Add SET/GET_TILING IOCTLs
drm/tegra: Implement more tiling modes
drm/tegra: dsi - Handle non-continuous clock flag
drm/tegra: sor - missing unlock on error
Merge incoming from Andrew Morton:
- Various misc things.
- arch/sh updates.
- Part of ocfs2. Review is slow.
- Slab updates.
- Most of -mm.
- printk updates.
- lib/ updates.
- checkpatch updates.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (226 commits)
checkpatch: update $declaration_macros, add uninitialized_var
checkpatch: warn on missing spaces in broken up quoted
checkpatch: fix false positives for --strict "space after cast" test
checkpatch: fix false positive MISSING_BREAK warnings with --file
checkpatch: add test for native c90 types in unusual order
checkpatch: add signed generic types
checkpatch: add short int to c variable types
checkpatch: add for_each tests to indentation and brace tests
checkpatch: fix brace style misuses of else and while
checkpatch: add --fix option for a couple OPEN_BRACE misuses
checkpatch: use the correct indentation for which()
checkpatch: add fix_insert_line and fix_delete_line helpers
checkpatch: add ability to insert and delete lines to patch/file
checkpatch: add an index variable for fixed lines
checkpatch: warn on break after goto or return with same tab indentation
checkpatch: emit a warning on file add/move/delete
checkpatch: add test for commit id formatting style in commit log
checkpatch: emit fewer kmalloc_array/kcalloc conversion warnings
checkpatch: improve "no space after cast" test
checkpatch: allow multiple const * types
...
Pull HID updates from Jiri Kosina:
"Some highlights:
- hid-sony improvements of Sixaxis device support by Antonio Ospite
- hid-hyperv driven devices can now be used as wakeup source, by
Dexuan Cui
- hid-lenovo driver is now more generic and supports more devices, by
Jamie Lentin
- hid-huion now supports wider range of tablets, by Nikolai
Kondrashov
- other various unsorted fixes and device ID additions"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (30 commits)
HID: hyperv: register as a wakeup source
HID: sony: Default initialize all elements of the LED max_brightness array to 1
HID: huion: Fix sparse warnings
HID: usbhid: Use flag HID_DISCONNECTED when a usb device is removed
HID: ignore jabra gn9350e
HID: cp2112: add I2C mode
HID: use multi input quirk for 22b9:2968
HID: rmi: only bind the hid-rmi driver to the mouse interface of composite USB devices
HID: rmi: check that report ids exist in the report_id_hash before accessing their size
HID: lenovo: Add support for Compact (BT|USB) keyboard
HID: lenovo: Don't call function in condition, show error codes
HID: lenovo: Prepare support for adding other devices
HID: lenovo: Rename hid-lenovo-tpkbd to hid-lenovo
HID: huion: Handle tablets with UC-Logic vendor ID
HID: huion: Switch to generating report descriptor
HID: huion: Don't ignore other interfaces
HID: huion: Use "tablet" instead of specific model
HID: add quirk for 0x04d9:0xa096 device
HID: i2c-hid: call the hid driver's suspend and resume callbacks
HID: rmi: change logging level of log messages related to unexpected reports
...
- ACPICA update to upstream version 20140724. That includes
ACPI 5.1 material (support for the _CCA and _DSD predefined names,
changes related to the DMAR and PCCT tables and ARM support among
other things) and cleanups related to using ACPICA's header files.
A major part of it is related to acpidump and the core code used
by that utility. Changes from Bob Moore, David E Box, Lv Zheng,
Sascha Wildner, Tomasz Nowicki, Hanjun Guo.
- Radix trees for memory bitmaps used by the hibernation core from
Joerg Roedel.
- Support for waking up the system from suspend-to-idle (also known
as the "freeze" sleep state) using ACPI-based PCI wakeup signaling
(Rafael J Wysocki).
- Fixes for issues related to ACPI button events (Rafael J Wysocki).
- New device ID for an ACPI-enumerated device included into the
Wildcat Point PCH from Jie Yang.
- ACPI video updates related to backlight handling from Hans de Goede
and Linus Torvalds.
- Preliminary changes needed to support ACPI on ARM from Hanjun Guo
and Graeme Gregory.
- ACPI PNP core cleanups from Arjun Sreedharan and Zhang Rui.
- Cleanups related to ACPI_COMPANION() and ACPI_HANDLE() macros
(Rafael J Wysocki).
- ACPI-based device hotplug cleanups from Wei Yongjun and
Rafael J Wysocki.
- Cleanups and improvements related to system suspend from
Lan Tianyu, Randy Dunlap and Rafael J Wysocki.
- ACPI battery cleanup from Wei Yongjun.
- cpufreq core fixes from Viresh Kumar.
- Elimination of a deadband effect from the cpufreq ondemand
governor and intel_pstate driver cleanups from Stratos Karafotis.
- 350MHz CPU support for the powernow-k6 cpufreq driver from
Mikulas Patocka.
- Fix for the imx6 cpufreq driver from Anson Huang.
- cpuidle core and governor cleanups from Daniel Lezcano,
Sandeep Tripathy and Mohammad Merajul Islam Molla.
- Build fix for the big_little cpuidle driver from Sachin Kamat.
- Configuration fix for the Operation Performance Points (OPP)
framework from Mark Brown.
- APM cleanup from Jean Delvare.
- cpupower utility fixes and cleanups from Peter Senna Tschudin,
Andrey Utkin, Himangi Saraogi, Rickard Strandqvist, Thomas Renninger.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJT4nhtAAoJEILEb/54YlRxtZEP/2rtVQFSFdAW8l0Xm1SeSsl4
EnZpSNT1TFn+NdG23vSIot5Jzdz1/dLfeoJEbXpoVt4DPC9/PK4HPlv5FEDQYfh5
srftvvGcAva969sXzSBRNUeR+M8Yd2RdoYCfmqTEUjzf8GJLL4jC0VAIwMtsQklt
EbiQX8JaHQS7RIql7MDg1N2vaTo+zxkf39Kkcl56usmO/uATP7cAPjFreF/xQ3d8
OyBhz1cOXIhPw7bd9Dv9AgpJzA8WFpktDYEgy2sluBWMv+mLYjdZRCFkfpIRzmea
pt+hJDeAy8ZL6/bjWCzz2x6wG7uJdDLblreI28sgnJx/VHR3Co6u4H1BqUBj18ct
CHV6zQ55WFmx9/uJqBtwFy333HS2ysJziC5ucwmg8QjkvAn4RK8S0qHMfRvSSaHj
F9ejnHGxyrc3zzfsngUf/VXIp67FReaavyKX3LYxjHjMPZDMw2xCtCWEpUs52l2o
fAbkv8YFBbUalIv0RtELH5XnKQ2ggMP8UgvT74KyfXU6LaliH8lEV20FFjMgwrPI
sMr2xk04eS8mNRNAXL8OMMwvh6DY/Qsmb7BVg58RIw6CdHeFJl834yztzcf7+j56
4oUmA16QYBCFA3udGQ3Tb07mi8XTfrMdTOGA0koQG9tjswKXuLUXUk9WAXZe4vml
ItRpZKE86BCs3mLJMYre
=ZODv
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and power management updates from Rafael Wysocki:
"Again, ACPICA leads the pack (47 commits), followed by cpufreq (18
commits) and system suspend/hibernation (9 commits).
From the new code perspective, the ACPICA update brings ACPI 5.1 to
the table, including a new device configuration object called _DSD
(Device Specific Data) that will hopefully help us to operate device
properties like Device Trees do (at least to some extent) and changes
related to supporting ACPI on ARM.
Apart from that we have hibernation changes making it use radix trees
to store memory bitmaps which should speed up some operations carried
out by it quite significantly. We also have some power management
changes related to suspend-to-idle (the "freeze" sleep state) support
and more preliminary changes needed to support ACPI on ARM (outside of
ACPICA).
The rest is fixes and cleanups pretty much everywhere.
Specifics:
- ACPICA update to upstream version 20140724. That includes ACPI 5.1
material (support for the _CCA and _DSD predefined names, changes
related to the DMAR and PCCT tables and ARM support among other
things) and cleanups related to using ACPICA's header files. A
major part of it is related to acpidump and the core code used by
that utility. Changes from Bob Moore, David E Box, Lv Zheng,
Sascha Wildner, Tomasz Nowicki, Hanjun Guo.
- Radix trees for memory bitmaps used by the hibernation core from
Joerg Roedel.
- Support for waking up the system from suspend-to-idle (also known
as the "freeze" sleep state) using ACPI-based PCI wakeup signaling
(Rafael J Wysocki).
- Fixes for issues related to ACPI button events (Rafael J Wysocki).
- New device ID for an ACPI-enumerated device included into the
Wildcat Point PCH from Jie Yang.
- ACPI video updates related to backlight handling from Hans de Goede
and Linus Torvalds.
- Preliminary changes needed to support ACPI on ARM from Hanjun Guo
and Graeme Gregory.
- ACPI PNP core cleanups from Arjun Sreedharan and Zhang Rui.
- Cleanups related to ACPI_COMPANION() and ACPI_HANDLE() macros
(Rafael J Wysocki).
- ACPI-based device hotplug cleanups from Wei Yongjun and Rafael J
Wysocki.
- Cleanups and improvements related to system suspend from Lan
Tianyu, Randy Dunlap and Rafael J Wysocki.
- ACPI battery cleanup from Wei Yongjun.
- cpufreq core fixes from Viresh Kumar.
- Elimination of a deadband effect from the cpufreq ondemand governor
and intel_pstate driver cleanups from Stratos Karafotis.
- 350MHz CPU support for the powernow-k6 cpufreq driver from Mikulas
Patocka.
- Fix for the imx6 cpufreq driver from Anson Huang.
- cpuidle core and governor cleanups from Daniel Lezcano, Sandeep
Tripathy and Mohammad Merajul Islam Molla.
- Build fix for the big_little cpuidle driver from Sachin Kamat.
- Configuration fix for the Operation Performance Points (OPP)
framework from Mark Brown.
- APM cleanup from Jean Delvare.
- cpupower utility fixes and cleanups from Peter Senna Tschudin,
Andrey Utkin, Himangi Saraogi, Rickard Strandqvist, Thomas
Renninger"
* tag 'pm+acpi-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (118 commits)
ACPI / LPSS: add LPSS device for Wildcat Point PCH
ACPI / PNP: Replace faulty is_hex_digit() by isxdigit()
ACPICA: Update version to 20140724.
ACPICA: ACPI 5.1: Update for PCCT table changes.
ACPICA/ARM: ACPI 5.1: Update for GTDT table changes.
ACPICA/ARM: ACPI 5.1: Update for MADT changes.
ACPICA/ARM: ACPI 5.1: Update for FADT changes.
ACPICA: ACPI 5.1: Support for the _CCA predifined name.
ACPICA: ACPI 5.1: New notify value for System Affinity Update.
ACPICA: ACPI 5.1: Support for the _DSD predefined name.
ACPICA: Debug object: Add current value of Timer() to debug line prefix.
ACPICA: acpihelp: Add UUID support, restructure some existing files.
ACPICA: Utilities: Fix local printf issue.
ACPICA: Tables: Update for DMAR table changes.
ACPICA: Remove some extraneous printf arguments.
ACPICA: Update for comments/formatting. No functional changes.
ACPICA: Disassembler: Add support for the ToUUID opererator (macro).
ACPICA: Remove a redundant cast to acpi_size for ACPI_OFFSET() macro.
ACPICA: Work around an ancient GCC bug.
ACPI / processor: Make it possible to get local x2apic id via _MAT
...
This patch set consists of the usual driver updates (ufs, storvsc, pm8001
hpsa). It also has removal of the user space target driver code (everyone is
using LIO now), a partial PCI MSI-X update, more multi-queue updates,
conversion to 64 bit LUNs (so we could theoretically cope with any LUN
returned by a device) and placeholder support for the ZBC device type (Shingle
drives), plus an assortment of minor updates and bug fixes.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJT4mS9AAoJEDeqqVYsXL0Mq34H/2AeXiM8GEVO3PIsBtF3TFZ9
poJvAyb8t//+VwAIVLHU9wrssIrIcyvNQmNHH/InGt5rOaXwGQRsnEc73bBtot4b
aC1t+hAnp2Ddvu6phmyUg7iY2GmQhAoZmeaj7krGIu2XgtLGiPg26eSsgk4Yv/U9
cuULEuOc/UnTj3w5VK8SvpyXMybVF6oQhSrS1slOglfFwPTlTI/NHU9xo7Wc3qHT
VifHXNphIvye5EH8zwtKX5p8qCrFW0pevJwyfPz7Hp2CTA9XYKx3SoeOh+n9F9ez
udBBggg7Vb1tb4mPKUoZ78UrtCVdFSCmesBU/RJe7cIh8daKaO5MVr3WPSx2JhM=
=yGai
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This patch set consists of the usual driver updates (ufs, storvsc,
pm8001 hpsa). It also has removal of the user space target driver
code (everyone is using LIO now), a partial PCI MSI-X update, more
multi-queue updates, conversion to 64 bit LUNs (so we could
theoretically cope with any LUN returned by a device) and placeholder
support for the ZBC device type (Shingle drives), plus an assortment
of minor updates and bug fixes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (143 commits)
scsi: do not issue SCSI RSOC command to Promise Vtrak E610f
vmw_pvscsi: Use pci_enable_msix_exact() instead of pci_enable_msix()
pm8001: Fix invalid return when request_irq() failed
lpfc: Remove superfluous call to pci_disable_msix()
isci: Use pci_enable_msix_exact() instead of pci_enable_msix()
bfa: Use pci_enable_msix_exact() instead of pci_enable_msix()
bfa: Cleanup bfad_setup_intr() function
bfa: Do not call pci_enable_msix() after it failed once
fnic: Use pci_enable_msix_exact() instead of pci_enable_msix()
scsi: use short driver name for per-driver cmd slab caches
scsi_debug: support scsi-mq, queues and locks
Drivers: add blist flags
scsi: ufs: fix endianness sparse warnings
scsi: ufs: make undeclared functions static
bnx2i: Update driver version to 2.7.10.1
pm8001: fix a memory leak in nvmd_resp
pm8001: fix update_flash
pm8001: fix a memory leak in flash_update
pm8001: Cleaning up uninitialized variables
pm8001: Fix to remove null pointer checks that could never happen
...
There've been many updates in ASoC side at this time, especially the
framework enhancement for multiple CODECs on a single DAI and more
componentization works. The only major change in ALSA core is the
addition of timestamp type in sw_params field. This should behave in
backward compatible way. Other than that, there are lots of small
changes and new drivers in wide range, including a large code cut in
HD-audio driver for deprecated static quirks. Some highlights are
below:
ALSA Core:
- Add the new timestamp type field to sw_params to choose
MONOTONIC_RAW type
HD-audio:
- Continued conversion to standard printk macros, generic code
cleanups
- Removal of obsoleted static quirk codes for Conexant and C-Media
codecs
- Fixups for HP Envy TS, Dell XPS 15, HP and Dell mute/mic LED,
Gigabyte BXBT-2807 mobo
- Intel Braswell support
ASoC:
- Support for multiple CODECs attached to a single DAI, enabling
systems with for example multiple DAC/speaker drivers on a single
link, contributed by Benoit Cousson based on work from Misael Lopez
Cruz
- Support for byte controls larger than 256 bytes based on the use of
TLVs contributed by Omair Mohammed Abdullah
- More componentisation work from Lars-Peter Clausen
- The remainder of the conversions of CODEC drivers to params_width()
by Mark Brown
- Drivers for Cirrus Logic CS4265, Freescale i.MX ASRC blocks, Realtek
RT286 and RT5670, Rockchip RK3xxx I2S controllers and Texas
Instruments TAS2552
- Lots of updates and fixes, especially to the DaVinci, Intel,
Freescale, Realtek, and rcar drivers
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJT4fj0AAoJEGwxgFQ9KSmkXQ0QAIiRmVg40aiJoEdOLGgzNZtq
r/nXj69AuB6JSy0hKbFyyijjCcRpyCCGvjDYlogjT75M3c35Npz/m85oZHx2tajD
SB5OA+QxO4EQ3C0GjITIRHJROm4MM8/rnbnNYTsWnEGRkobTFTl0rHbSkA85RGFt
0zZqqs1R0s/nO9PMQ+5PA5x9xVFiZs2COeCK0CFA9s2ACf/hbxJBRIqYpIFWOo78
9L41jBOFuC/hIb4qwjgmsCWbKe1KQysTAf+Wty0CKipJ6VhfCbPn1Qn1zXGeUOxc
mj4eZ6LpJTrVMr/UN02c5vgPOiaBrQ7fWZo3dVHLlIjC6cEI1tUvNYAin7CMEzx8
DUsvo9p30OheA+ijc9wKaYFY6YmmJZRtpnnMd39i0oPG+bhvoV7vjXjJSB1sLJt1
o82xLpVL4Th8H+DMDVwA7UIBvvZGZBusw1qsNGfcOPrmExi4ScGhA0gSOO6W2y1z
VQLRbiXB/HtJGxeqWL6RqJOcLBOlJNmsk4UZMOSCu2OZrWd5I8MuRrNWeHDqhX1H
+VDEJVhFmM21vMpnobzEPxWsMgTVIAVf3Thh+WgaPxL4Krh0vkpZsgZk16VVmy/o
OJJF3n41FND4n9zSjOe4MkuL8UCOUpKCaBdqj9K1s6UKwOEKuDNslyT/zqutRWK5
x1uApU5y+E4iQT/b7cmA
=RL72
-----END PGP SIGNATURE-----
Merge tag 'sound-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"There've been many updates in ASoC side at this time, especially the
framework enhancement for multiple CODECs on a single DAI and more
componentization works.
The only major change in ALSA core is the addition of timestamp type
in sw_params field. This should behave in backward compatible way.
Other than that, there are lots of small changes and new drivers in
wide range, including a large code cut in HD-audio driver for
deprecated static quirks. Some highlights are below:
ALSA Core:
- Add the new timestamp type field to sw_params to choose
MONOTONIC_RAW type
HD-audio:
- Continued conversion to standard printk macros, generic code
cleanups
- Removal of obsoleted static quirk codes for Conexant and C-Media
codecs
- Fixups for HP Envy TS, Dell XPS 15, HP and Dell mute/mic LED,
Gigabyte BXBT-2807 mobo
- Intel Braswell support
ASoC:
- Support for multiple CODECs attached to a single DAI, enabling
systems with for example multiple DAC/speaker drivers on a single
link, contributed by Benoit Cousson based on work from Misael Lopez
Cruz
- Support for byte controls larger than 256 bytes based on the use of
TLVs contributed by Omair Mohammed Abdullah
- More componentisation work from Lars-Peter Clausen
- The remainder of the conversions of CODEC drivers to params_width()
by Mark Brown
- Drivers for Cirrus Logic CS4265, Freescale i.MX ASRC blocks,
Realtek RT286 and RT5670, Rockchip RK3xxx I2S controllers and Texas
Instruments TAS2552
- Lots of updates and fixes, especially to the DaVinci, Intel,
Freescale, Realtek, and rcar drivers"
* tag 'sound-3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (402 commits)
ALSA: usb-audio: Whitespace cleanups for sound/usb/midi.*
ALSA: usb-audio: Respond to suspend and resume callbacks for MIDI input
sound/oss/pss: Remove typedefs pss_mixerdata and pss_confdata
sound/oss/opl3: Remove typedef opl_devinfo
ALSA: fireworks: fix specifiers in format strings for propper output
ASoC: imx-audmux: Use uintptr_t for port numbers
ASoC: davinci: Enable menuconfig entry for McASP
ASoC: fsl_asrc: Don't access members of config before checking it
ASoC: fsl_sarc_dma: Check pair before using it
ASoC: adau1977: Fix truncation warning on 64 bit architectures
ALSA: virtuoso: add Xonar Essence STX II support
ALSA: riptide: fix %d confusingly prefixed with 0x in format strings
ALSA: fireworks: fix %d confusingly prefixed with 0x in format strings
ALSA: hda - add codec ID for Braswell display audio codec
ALSA: hda - add PCI IDs for Intel Braswell
ALSA: usb-audio: Adjust Gamecom 780 volume level
ALSA: usb-audio: improve dmesg source grepability
ASoC: rt5670: Fix duplicate const warnings
ASoC: rt5670: Staticise non-exported symbols
ASoC: Intel: update stream only on stream IPC msgs
...
Apparently, bitmap_andnot is supposed to return whether the new bitmap
is empty. But it didn't take potential garbage bits in the last word
into account.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Apparently, bitmap_and is supposed to return whether the new bitmap is
empty. But it didn't take potential garbage bits in the last word into
account.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is no guarantee that *src does not contain garbage bits outside
the lower nbits, so we need to mask it before the shift-and-assign.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Changing the pos parameter of __reg_op to unsigned allows the compiler
to generate slightly smaller and simpler code. Also update its callers
bitmap_*_region to receive and pass unsigned int. The return types of
bitmap_find_free_region and bitmap_allocate_region are still int to
allow a negative error code to be returned. An int is certainly capable
of representing any realistic return value.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "start" is non-negative.
Also, use the names "start" and "len" for the two parameters for
consistency with bitmap_set.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "start" is non-negative.
Also, use the names "start" and "len" for the two parameters in both
header file and implementation, instead of the previous mix.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
I didn't change the return type, since that might change the semantics
of some expression containing a call to bitmap_weight(). Certainly an
int is capable of holding the result.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This change is only for consistency with the changes to the other
bitmap_* functions; it doesn't change the size of the generated code:
inside BITS_TO_LONGS there is a sizeof(long), which causes bits to be
interpreted as unsigned anyway.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since the extra bits are "don't care", there is no reason to mask the
last word to the used bits when complementing. This shaves off yet a
few bytes.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many functions in lib/bitmap.c start with an expression such as lim =
bits/BITS_PER_LONG. Since bits has type (signed) int, and since gcc
cannot know that it is in fact non-negative, it generates worse code
than it could. These patches, mostly consisting of changing various
parameters to unsigned, gives a slight overall code reduction:
add/remove: 1/1 grow/shrink: 8/16 up/down: 251/-414 (-163)
function old new delta
tick_device_uses_broadcast 335 425 +90
__irq_alloc_descs 498 554 +56
__bitmap_andnot 73 115 +42
__bitmap_and 70 101 +31
bitmap_weight - 11 +11
copy_hugetlb_page_range 752 762 +10
follow_hugetlb_page 846 854 +8
hugetlb_init 1415 1417 +2
hugetlb_nrpages_setup 130 131 +1
hugetlb_add_hstate 377 376 -1
bitmap_allocate_region 82 80 -2
select_task_rq_fair 2202 2191 -11
hweight_long 66 55 -11
__reg_op 230 219 -11
dm_stats_message 2849 2833 -16
bitmap_parselist 92 74 -18
__bitmap_weight 115 97 -18
__bitmap_subset 153 129 -24
__bitmap_full 128 104 -24
__bitmap_empty 120 96 -24
bitmap_set 179 149 -30
bitmap_clear 185 155 -30
__bitmap_equal 136 105 -31
__bitmap_intersects 148 108 -40
__bitmap_complement 109 67 -42
tick_device_setup_broadcast_func.isra 81 - -81
[The increases in __bitmap_and{,not} are due to bug fixes 17/18,18/18.
No idea why bitmap_weight suddenly appears.] While 163 bytes treewide is
insignificant, I believe the bitmap functions are often called with
locks held, so saving even a few cycles might be worth it.
While making these changes, I found a few other things that might be
worth including. 16,17,18 are actual bug fixes. The rest shouldn't
change the behaviour of any of the functions, provided no-one passed
negative nbits values. If something should come up, it should be fairly
bisectable.
A few issues I thought about, but didn't know what to do with:
* Many of the functions misbehave if nbits is compile-time 0; the
out-of-line functions generally handle 0 correctly. bitmap_fill() is
particularly bad, whether the 0 is known at compile time or not. It
would probably be nice to add detection of at least compile-time 0 and
handle that appropriately.
* I didn't change __bitmap_shift_{left,right} to use unsigned because I
want to fully understand why the algorithm works before making that
change. However, AFAICT, they behave correctly for all (positive) shift
amounts. This is not the case for the small_const_nbits versions. If
for example nbits = n = BITS_PER_LONG, the shift operators turn into
no-ops (at least on x86), so one get *dst = *src, whereas one would
expect to get *dst=0. That difference in behaviour is somewhat
annoying.
This patch (of 18):
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's been nearly 3 years now since commit 55036ba76b ("lib: rename
pack_hex_byte() to hex_byte_pack()") so it's time to remove this
deprecated and unused static inline.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a helper function from drivers/ata/libata_core.c, where it is
used to blacklist particular device models. It's being moved to lib/ so
other drivers may use it for the same purpose.
This implementation in non-recursive, so is safe for the kernel stack.
[akpm@linux-foundation.org: fix sparse warning]
Signed-off-by: George Spelvin <linux@horizon.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cleanup unused `if 0'-ed functions, which have been dead since 2006
(commits 87c2ce3b93 ("lib/zlib*: cleanups") by Adrian Bunk and
4f3865fb57 ("zlib_inflate: Upgrade library code to a recent version")
by Richard Purdie):
- zlib_deflateSetDictionary
- zlib_deflateParams
- zlib_deflateCopy
- zlib_inflateSync
- zlib_syncsearch
- zlib_inflateSetDictionary
- zlib_inflatePrime
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The name was modified from hlist_add_after() to hlist_add_behind() when
adjusting the order of arguments to match the one with
klist_add_after(). This is necessary to break old code when it would
use it the wrong way.
Make klist follow this naming scheme for consistency.
Signed-off-by: Ken Helias <kenhelias@firemail.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All other add functions for lists have the new item as first argument
and the position where it is added as second argument. This was changed
for no good reason in this function and makes using it unnecessary
confusing.
The name was changed to hlist_add_behind() to cause unconverted code to
generate a compile error instead of using the wrong parameter order.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Ken Helias <kenhelias@firemail.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [intel driver bits]
Cc: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The argument names for hlist_add_after() are poorly chosen because they
look the same as the ones for hlist_add_before() but have to be used
differently.
hlist_add_after_rcu() has made a better choice.
Signed-off-by: Ken Helias <kenhelias@firemail.de>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit a8fe19ebfb ("kernel/printk: use symbolic defines for console
loglevels") makes consistent use of symbolic values for printk() log
levels.
The naming scheme used is different from the one used for
DEFAULT_MESSAGE_LOGLEVEL though. Change that symbol name to be
MESSAGE_LOGLEVEL_DEFAULT for consistency. And because the value of that
symbol comes from a similarly-named config option, rename
CONFIG_DEFAULT_MESSAGE_LOGLEVEL as well.
Signed-off-by: Alex Elder <elder@linaro.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jan Kara <jack@suse.cz>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The DEFINE_SIMPLE_ATTRIBUTE macro should not end in a ; Fix the one use
in the kernel tree that did not have a semicolon.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Luca Tettamanti <kronos.it@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add zpool api.
zpool provides an interface for memory storage, typically of compressed
memory. Users can select what backend to use; currently the only
implementations are zbud, a low density implementation with up to two
compressed pages per storage page, and zsmalloc, a higher density
implementation with multiple compressed pages per storage page.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Tested-by: Seth Jennings <sjennings@variantweb.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change the type of the zbud_alloc() size param from unsigned int to
size_t.
Technically, this should not make any difference, as the zbud
implementation already restricts the size to well within either type's
limits; but as zsmalloc (and kmalloc) use size_t, and zpool will use
size_t, this brings the size parameter type in line with zsmalloc/zpool.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Acked-by: Seth Jennings <sjennings@variantweb.net>
Tested-by: Seth Jennings <sjennings@variantweb.net>
Cc: Weijie Yang <weijie.yang@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When kernel device drivers or subsystems want to bind their lifespan to
t= he lifespan of the mm_struct, they usually use one of the following
methods:
1. Manually calling a function in the interested kernel module. The
funct= ion call needs to be placed in mmput. This method was rejected
by several ker= nel maintainers.
2. Registering to the mmu notifier release mechanism.
The problem with the latter approach is that the mmu_notifier_release
cal= lback is called from__mmu_notifier_release (called from exit_mmap).
That functi= on iterates over the list of mmu notifiers and don't expect
the release call= back function to remove itself from the list.
Therefore, the callback function= in the kernel module can't release the
mmu_notifier_object, which is actuall= y the kernel module's object
itself. As a result, the destruction of the kernel module's object must
to be done in a delayed fashion.
This patch adds support for this delayed callback, by adding a new
mmu_notifier_call_srcu function that receives a function ptr and calls
th= at function with call_srcu. In that function, the kernel module
releases its object. To use mmu_notifier_call_srcu, the calling module
needs to call b= efore that a new function called
mmu_notifier_unregister_no_release that as its= name implies,
unregisters a notifier without calling its notifier release call= back.
This patch also adds a function that will call barrier_srcu so those
kern= el modules can sync with mmu_notifier.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__kmap_atomic_idx is per_cpu variable. Each CPU can use KM_TYPE_NR
entries from FIXMAP i.e. from 0 to KM_TYPE_NR - 1. Allowing
__kmap_atomic_idx to over- shoot to KM_TYPE_NR can mess up with next
CPU's 0th entry which is a bug. Hence BUG_ON if __kmap_atomic_idx >=
KM_TYPE_NR.
Fix the off-by-on in this test.
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
try_set_zonelist_oom() and clear_zonelist_oom() are not named properly
to imply that they require locking semantics to avoid out_of_memory()
being reordered.
zone_scan_lock is required for both functions to ensure that there is
proper locking synchronization.
Rename try_set_zonelist_oom() to oom_zonelist_trylock() and rename
clear_zonelist_oom() to oom_zonelist_unlock() to imply there is proper
locking semantics.
At the same time, convert oom_zonelist_trylock() to return bool instead
of int since only success and failure are tested.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With memoryless node support being worked on, it's possible that for
optimizations that a node may not have a non-NULL zonelist. When
CONFIG_NUMA is enabled and node 0 is memoryless, this means the zonelist
for first_online_node may become NULL.
The oom killer requires a zonelist that includes all memory zones for
the sysrq trigger and pagefault out of memory handler.
Ensure that a non-NULL zonelist is always passed to the oom killer.
[akpm@linux-foundation.org: fix non-numa build]
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This series of patches fixes a problem when adding memory in bad manner.
For example: for a x86_64 machine booted with "mem=400M" and with 2GiB
memory installed, following commands cause problem:
# echo 0x40000000 > /sys/devices/system/memory/probe
[ 28.613895] init_memory_mapping: [mem 0x40000000-0x47ffffff]
# echo 0x48000000 > /sys/devices/system/memory/probe
[ 28.693675] init_memory_mapping: [mem 0x48000000-0x4fffffff]
# echo online_movable > /sys/devices/system/memory/memory9/state
# echo 0x50000000 > /sys/devices/system/memory/probe
[ 29.084090] init_memory_mapping: [mem 0x50000000-0x57ffffff]
# echo 0x58000000 > /sys/devices/system/memory/probe
[ 29.151880] init_memory_mapping: [mem 0x58000000-0x5fffffff]
# echo online_movable > /sys/devices/system/memory/memory11/state
# echo online> /sys/devices/system/memory/memory8/state
# echo online> /sys/devices/system/memory/memory10/state
# echo offline> /sys/devices/system/memory/memory9/state
[ 30.558819] Offlined Pages 32768
# free
total used free shared buffers cached
Mem: 780588 18014398509432020 830552 0 0 51180
-/+ buffers/cache: 18014398509380840 881732
Swap: 0 0 0
This is because the above commands probe higher memory after online a
section with online_movable, which causes ZONE_HIGHMEM (or ZONE_NORMAL
for systems without ZONE_HIGHMEM) overlaps ZONE_MOVABLE.
After the second online_movable, the problem can be observed from
zoneinfo:
# cat /proc/zoneinfo
...
Node 0, zone Movable
pages free 65491
min 250
low 312
high 375
scanned 0
spanned 18446744073709518848
present 65536
managed 65536
...
This series of patches solve the problem by checking ZONE_MOVABLE when
choosing zone for new memory. If new memory is inside or higher than
ZONE_MOVABLE, makes it go there instead.
After applying this series of patches, following are free and zoneinfo
result (after offlining memory9):
bash-4.2# free
total used free shared buffers cached
Mem: 780956 80112 700844 0 0 51180
-/+ buffers/cache: 28932 752024
Swap: 0 0 0
bash-4.2# cat /proc/zoneinfo
Node 0, zone DMA
pages free 3389
min 14
low 17
high 21
scanned 0
spanned 4095
present 3998
managed 3977
nr_free_pages 3389
...
start_pfn: 1
inactive_ratio: 1
Node 0, zone DMA32
pages free 73724
min 341
low 426
high 511
scanned 0
spanned 98304
present 98304
managed 92958
nr_free_pages 73724
...
start_pfn: 4096
inactive_ratio: 1
Node 0, zone Normal
pages free 32630
min 120
low 150
high 180
scanned 0
spanned 32768
present 32768
managed 32768
nr_free_pages 32630
...
start_pfn: 262144
inactive_ratio: 1
Node 0, zone Movable
pages free 65476
min 241
low 301
high 361
scanned 0
spanned 98304
present 65536
managed 65536
nr_free_pages 65476
...
start_pfn: 294912
inactive_ratio: 1
This patch (of 7):
Introduce zone_for_memory() in arch independent code for
arch_add_memory() use.
Many arch_add_memory() function simply selects ZONE_HIGHMEM or
ZONE_NORMAL and add new memory into it. However, with the existance of
ZONE_MOVABLE, the selection method should be carefully considered: if
new, higher memory is added after ZONE_MOVABLE is setup, the default
zone and ZONE_MOVABLE may overlap each other.
should_add_memory_movable() checks the status of ZONE_MOVABLE. If it
has already contain memory, compare the address of new memory and
movable memory. If new memory is higher than movable, it should be
added into ZONE_MOVABLE instead of default zone.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Mel Gorman" <mgorman@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a comment describing the circumstances in which
__lock_page_or_retry() will or will not release the mmap_sem when
returning 0.
Add comments to lock_page_or_retry()'s callers (filemap_fault(),
do_swap_page()) noting the impact on VM_FAULT_RETRY returns.
Add comments on up the call tree, particularly replacing the false "We
return with mmap_sem still held" comments.
Signed-off-by: Paul Cassella <cassella@cray.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The fair zone allocation policy round-robins allocations between zones
within a node to avoid age inversion problems during reclaim. If the
first allocation fails, the batch counts are reset and a second attempt
made before entering the slow path.
One assumption made with this scheme is that batches expire at roughly
the same time and the resets each time are justified. This assumption
does not hold when zones reach their low watermark as the batches will
be consumed at uneven rates. Allocation failure due to watermark
depletion result in additional zonelist scans for the reset and another
watermark check before hitting the slowpath.
On UMA, the benefit is negligible -- around 0.25%. On 4-socket NUMA
machine it's variable due to the variability of measuring overhead with
the vmstat changes. The system CPU overhead comparison looks like
3.16.0-rc3 3.16.0-rc3 3.16.0-rc3
vanilla vmstat-v5 lowercost-v5
User 746.94 774.56 802.00
System 65336.22 32847.27 40852.33
Elapsed 27553.52 27415.04 27368.46
However it is worth noting that the overall benchmark still completed
faster and intuitively it makes sense to take as few passes as possible
through the zonelists.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zone->pages_scanned is a write-intensive cache line during page reclaim
and it's also updated during page free. Move the counter into vmstat to
take advantage of the per-cpu updates and do not update it in the free
paths unless necessary.
On a small UMA machine running tiobench the difference is marginal. On
a 4-node machine the overhead is more noticable. Note that automatic
NUMA balancing was disabled for this test as otherwise the system CPU
overhead is unpredictable.
3.16.0-rc3 3.16.0-rc3 3.16.0-rc3
vanillarearrange-v5 vmstat-v5
User 746.94 759.78 774.56
System 65336.22 58350.98 32847.27
Elapsed 27553.52 27282.02 27415.04
Note that the overhead reduction will vary depending on where exactly
pages are allocated and freed.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The arrangement of struct zone has changed over time and now it has
reached the point where there is some inappropriate sharing going on.
On x86-64 for example
o The zone->node field is shared with the zone lock and zone->node is
accessed frequently from the page allocator due to the fair zone
allocation policy.
o span_seqlock is almost never used by shares a line with free_area
o Some zone statistics share a cache line with the LRU lock so
reclaim-intensive and allocator-intensive workloads can bounce the cache
line on a stat update
This patch rearranges struct zone to put read-only and read-mostly
fields together and then splits the page allocator intensive fields, the
zone statistics and the page reclaim intensive fields into their own
cache lines. Note that the type of lowmem_reserve changes due to the
watermark calculations being signed and avoiding a signed/unsigned
conversion there.
On the test configuration I used the overall size of struct zone shrunk
by one cache line. On smaller machines, this is not likely to be
noticable. However, on a 4-node NUMA machine running tiobench the
system CPU overhead is reduced by this patch.
3.16.0-rc3 3.16.0-rc3
vanillarearrange-v5r9
User 746.94 759.78
System 65336.22 58350.98
Elapsed 27553.52 27282.02
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently map_vm_area() takes (struct page *** pages) as third argument,
and after mapping, it moves (*pages) to point to (*pages +
nr_mappped_pages).
It looks like this kind of increment is useless to its caller these
days. The callers don't care about the increments and actually they're
trying to avoid this by passing another copy to map_vm_area().
The caller can always guarantee all the pages can be mapped into vm_area
as specified in first argument and the caller only cares about whether
map_vm_area() fails or not.
This patch cleans up the pointer movement in map_vm_area() and updates
its callers accordingly.
Signed-off-by: WANG Chao <chaowang@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 71e3aac072 ("thp: transparent hugepage core") adds
copy_pte_range prototype to huge_mm.h. I'm not sure why (or if) this
function have been used outside of memory.c, but it currently isn't.
This patch makes copy_pte_range() static again.
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
They are unnecessary: "zero" can be used in place of "hugetlb_zero" and
passing extra2 == NULL is equivalent to infinity.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Do we really need an exported alias for __SetPageReferenced()? Its
callers better know what they're doing, in which case the page would not
be already marked referenced. Kill init_page_accessed(), just
__SetPageReferenced() inline.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Prabhakar Lad <prabhakar.csengg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It was missing...
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In original code, zone_movable_is_highmem() assumes ZONE_MOVABLE not
highmem if CONFIG_HAVE_MEMBLOCK_NODE_MAP is not set. In online_pages,
it extracts pages from the previous zone before ZONE_MOVABLE. Which is
logically inconsistent:
If HAVE_MEMBLOCK_NODE_MAP is turned off but HIGHMEM is on,
zone_movable_is_highmem() makes movable zone not highmem, but
online_pages() extracts pages from ZONE_HIGHMEM.
This inconsistency doesn't cause real problem currently, because all
architectures support online_pages also have HAVE_MEMBLOCK_NODE_MAP.
However, fixing it makes code clear, and also helps futher coding.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Zhang Zhen <zhangzhen@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- PAGEFLAG_FALSE only defines TEST, make it define SET and CLEAR as
well, analogous to PAGEFLAG.
- Define TESTSETFLAG_FALSE, analogous to TESTSETFLAG.
- Define TESTSCFLAG_FALSE, analogous to TESTSCFLAG
- Make PG_mlocked accessors the same on both MMU and !MMU setups
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conventionally, we put output param to the end of param list and put the
'base' ahead of 'size', but cma_declare_contiguous() doesn't look like
that, so change it.
Additionally, move down cma_areas reference code to the position where
it is really needed.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Gleb Natapov <gleb@kernel.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, there are two users on CMA functionality, one is the DMA
subsystem and the other is the KVM on powerpc. They have their own code
to manage CMA reserved area even if they looks really similar. From my
guess, it is caused by some needs on bitmap management. KVM side wants
to maintain bitmap not for 1 page, but for more size. Eventually it use
bitmap where one bit represents 64 pages.
When I implement CMA related patches, I should change those two places
to apply my change and it seem to be painful to me. I want to change
this situation and reduce future code management overhead through this
patch.
This change could also help developer who want to use CMA in their new
feature development, since they can use CMA easily without copying &
pasting this reserved area management code.
In previous patches, we have prepared some features to generalize CMA
reserved area management and now it's time to do it. This patch moves
core functions to mm/cma.c and change DMA APIs to use these functions.
There is no functional change in DMA APIs.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Alexander Graf <agraf@suse.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Gleb Natapov <gleb@kernel.org>
Acked-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In store_mem_state(), we have:
...
334 else if (!strncmp(buf, "offline", min_t(int, count, 7)))
335 online_type = -1;
...
355 case -1:
356 ret = device_offline(&mem->dev);
357 break;
...
Here, "offline" is hard coded as -1.
This patch does the following renaming:
ONLINE_KEEP -> MMOP_ONLINE_KEEP
ONLINE_KERNEL -> MMOP_ONLINE_KERNEL
ONLINE_MOVABLE -> MMOP_ONLINE_MOVABLE
and introduces MMOP_OFFLINE = -1 to avoid hard coding.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Hu Tao <hutao@cn.fujitsu.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
memblock_set_bottom_up() is only called by __init
cmdline_parse_movable_node() and __init numa_init().
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
alloc_pages_exact_nid() is only called by __meminit alloc_page_cgroup()
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 8581679424 ("fanotify: Fix use after free for permission
events") introduced a double free issue for permission events which are
pending in group's notification queue while group is being destroyed.
These events are freed from fanotify_handle_event() but they are not
removed from groups notification queue and thus they get freed again
from fsnotify_flush_notify().
Fix the problem by removing permission events from notification queue
before freeing them if we skip processing access response. Also expand
comments in fanotify_release() to explain group shutdown in detail.
Fixes: 8581679424
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Douglas Leeder <douglas.leeder@sophos.com>
Tested-by: Douglas Leeder <douglas.leeder@sophos.com>
Reported-by: Heinrich Schuchard <xypron.glpk@gmx.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename fsnotify_add_notify_event() to fsnotify_add_event() since the
"notify" part is duplicit. Rename fsnotify_remove_notify_event() and
fsnotify_peek_notify_event() to fsnotify_remove_first_event() and
fsnotify_peek_first_event() respectively since "notify" part is duplicit
and they really look at the first event in the queue.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>