NODE_NOT_IN_PAGE_FLAGS is defined in mm.h when the node information is not
stored in the page flags bitmap.
Unfortunately, there's a typo in one of the checks for it. This patch
fixes it (s/NODE_NOT_IN_PAGEFLAGS/NODE_NOT_IN_PAGE_FLAGS/). Since this
has been around for ages, I doubt it's been causing any serious problems.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The kernel already exposes the user desired thresholds in /proc/sys/vm
with dirty_background_ratio and background_ratio. But the kernel may
alter the number requested without giving the user any indication that is
the case.
Knowing the actual ratios the kernel is honoring can help app developers
understand how their buffered IO will be sent to the disk.
$ grep threshold /proc/vmstat
nr_dirty_threshold 409111
nr_dirty_background_threshold 818223
Signed-off-by: Michael Rubin <mrubin@google.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For NUMA node systems it is important to have visibility in memory
characteristics. Two of the /proc/vmstat values "nr_written" and
"nr_dirtied" are added here.
# cat /sys/devices/system/node/node20/vmstat
nr_written 0
nr_dirtied 0
Signed-off-by: Michael Rubin <mrubin@google.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To help developers and applications gain visibility into writeback
behaviour adding two entries to vm_stat_items and /proc/vmstat. This will
allow us to track the "written" and "dirtied" counts.
# grep nr_dirtied /proc/vmstat
nr_dirtied 3747
# grep nr_written /proc/vmstat
nr_written 3618
Signed-off-by: Michael Rubin <mrubin@google.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To help developers and applications gain visibility into writeback
behaviour this patch adds two counters to /proc/vmstat.
# grep nr_dirtied /proc/vmstat
nr_dirtied 3747
# grep nr_written /proc/vmstat
nr_written 3618
These entries allow user apps to understand writeback behaviour over time
and learn how it is impacting their performance. Currently there is no
way to inspect dirty and writeback speed over time. It's not possible for
nr_dirty/nr_writeback.
These entries are necessary to give visibility into writeback behaviour.
We have /proc/diskstats which lets us understand the io in the block
layer. We have blktrace for more in depth understanding. We have
e2fsprogs and debugsfs to give insight into the file systems behaviour,
but we don't offer our users the ability understand what writeback is
doing. There is no way to know how active it is over the whole system, if
it's falling behind or to quantify it's efforts. With these values
exported users can easily see how much data applications are sending
through writeback and also at what rates writeback is processing this
data. Comparing the rates of change between the two allow developers to
see when writeback is not able to keep up with incoming traffic and the
rate of dirty memory being sent to the IO back end. This allows folks to
understand their io workloads and track kernel issues. Non kernel
engineers at Google often use these counters to solve puzzling performance
problems.
Patch #4 adds a pernode vmstat file with nr_dirtied and nr_written
Patch #5 add writeback thresholds to /proc/vmstat
Currently these values are in debugfs. But they should be promoted to
/proc since they are useful for developers who are writing databases
and file servers and are not debugging the kernel.
The output is as below:
# grep threshold /proc/vmstat
nr_pages_dirty_threshold 409111
nr_pages_dirty_background_threshold 818223
This patch:
This allows code outside of the mm core to safely manipulate page
writeback state and not worry about the other accounting. Not using these
routines means that some code will lose track of the accounting and we get
bugs.
Modify nilfs2 to use interface.
Signed-off-by: Michael Rubin <mrubin@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jiro SEKIBA <jir@unicus.jp>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Function check_range may return ERR_PTR(...). Check for it.
Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ying Han reported that backing aging of anon pages in no swap system
causes unnecessary TLB flush.
When I sent a patch(69c8548175), I wanted this patch but Rik pointed out
and allowed aging of anon pages to give a chance to promote from inactive
to active LRU.
It has a two problem.
1) non-swap system
Never make sense to age anon pages.
2) swap configured but still doesn't swapon
It doesn't make sense to age anon pages until swap-on time. But it's
arguable. If we have aged anon pages by swapon, VM have moved anon pages
from active to inactive. And in the time swapon by admin, the VM can't
reclaim hot pages so we can protect hot pages swapout.
But let's think about it. When does swap-on happen? It depends on admin.
we can't expect it. Nonetheless, we have done aging of anon pages to
protect hot pages swapout. It means we lost run time overhead when below
high watermark but gain hot page swap-[in/out] overhead when VM decide
swapout. Is it true? Let's think more detail. We don't promote anon
pages in case of non-swap system. So even though VM does aging of anon
pages, the pages would be in inactive LRU for a long time. It means many
of pages in there would mark access bit again. So access bit hot/code
separation would be pointless.
This patch prevents unnecessary anon pages demotion in not-yet-swapon and
non-configured swap system. Even, in non-configuared swap system
inactive_anon_is_low can be compiled out.
It could make side effect that hot anon pages could swap out when admin
does swap on. But I think sooner or later it would be steady state. So
it's not a big problem.
We could lose someting but gain more thing(TLB flush and unnecessary
function call to demote anon pages).
Signed-off-by: Ying Han <yinghan@google.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now, sysfs interface of memory hotplug shows whether the section is
removable or not. But it checks only migrateype of pages and doesn't
check details of cluster of pages.
Next, memory hotplug's set_migratetype_isolate() has the same kind of
check, too.
This patch adds the function __count_unmovable_pages() and makes above 2
checks to use the same logic. Then, is_removable and hotremove code uses
the same logic. No changes in the hotremove logic itself.
TODO: need to find a way to check RECLAMABLE. But, considering bit,
calling shrink_slab() against a range before starting memory hotremove
sounds better. If so, this patch's logic doesn't need to be changed.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reported-by: Michal Hocko <mhocko@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Even if notifier cannot find any pages, it doesn't mean no pages are
available...And, if there are no notifiers registered, this condition will
be always true and memory hotplug will show -EBUSY.
This is a bug but not critical.
In most case, a pageblock which will be offlined is MIGRATE_MOVABLE This
"notifier" is called only when the pageblock is _not_ MIGRATE_MOVABLE.
But if not MIGRATE_MOVABLE, it's common case that memory hotplug will
fail. So, no one notice this bug.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Presently update_nr_listpages() doesn't have a role. That's because lists
passed is always empty just after calling migrate_pages. The
migrate_pages cleans up page list which have failed to migrate before
returning by aaa994b3.
[PATCH] page migration: handle freeing of pages in migrate_pages()
Do not leave pages on the lists passed to migrate_pages(). Seems that we will
not need any postprocessing of pages. This will simplify the handling of
pages by the callers of migrate_pages().
At that time, we thought we don't need any postprocessing of pages. But
the situation is changed. The compaction need to know the number of
failed to migrate for COMPACTPAGEFAILED stat
This patch makes new rule for caller of migrate_pages to call
putback_lru_pages. So caller need to clean up the lists so it has a
chance to postprocess the pages. [suggested by Christoph Lameter]
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Reviewed-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Non-NUMA systems do never create these files anyway, since they are only
created by driver subsystem when NUMA is configured.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This removes more dead code that was somehow missed by commit 0d99519efe
(writeback: remove unused nonblocking and congestion checks). There are
no behavior change except for the removal of two entries from one of the
ext4 tracing interface.
The nonblocking checks in ->writepages are no longer used because the
flusher now prefer to block on get_request_wait() than to skip inodes on
IO congestion. The latter will lead to more seeky IO.
The nonblocking checks in ->writepage are no longer used because it's
redundant with the WB_SYNC_NONE check.
We no long set ->nonblocking in VM page out and page migration, because
a) it's effectively redundant with WB_SYNC_NONE in current code
b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
that would skip some dirty inodes on congestion and page out others, which
is unfair in terms of LRU age.
Inspired by Christoph Hellwig. Thanks!
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: Sage Weil <sage@newdream.net>
Cc: Steve French <sfrench@samba.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The locking order in oom_adjust_write() and oom_score_adj_write() for
task->alloc_lock and task->sighand->siglock is reversed, and lockdep
notices that irqs could encounter an ABBA scenario.
This fixes the locking order so that we always take task_lock(task) prior
to lock_task_sighand(task).
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's better to use proper error handling in oom_adjust_write() and
oom_score_adj_write() instead of duplicating the locking order on various
exit paths.
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's necessary to kill all threads that share an oom killed task's mm if
the goal is to lead to future memory freeing.
This patch reintroduces the code removed in 8c5cd6f3 (oom: oom_kill
doesn't kill vfork parent (or child)) since it is obsoleted.
It's now guaranteed that any task passed to oom_kill_task() does not share
an mm with any thread that is unkillable. Thus, we're safe to issue a
SIGKILL to any thread sharing the same mm.
This is especially necessary to solve an mm->mmap_sem livelock issue
whereas an oom killed thread must acquire the lock in the exit path while
another thread is holding it in the page allocator while trying to
allocate memory itself (and will preempt the oom killer since a task was
already killed). Since tasks with pending fatal signals are now granted
access to memory reserves, the thread holding the lock may quickly
allocate and release the lock so that the oom killed task may exit.
This mainly is for threads that are cloned with CLONE_VM but not
CLONE_THREAD, so they are in a different thread group. Non-NPTL threads
exist in the wild and this change is necessary to prevent the livelock in
such cases. We care more about preventing the livelock than incurring the
additional tasklist in the oom killer when a task has been killed.
Systems that are sufficiently large to not want the tasklist scan in the
oom killer in the first place already have the option of enabling
/proc/sys/vm/oom_kill_allocating_task, which was designed specifically for
that purpose.
This code had existed in the oom killer for over eight years dating back
to the 2.4 kernel.
[akpm@linux-foundation.org: add nice comment]
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The oom killer's goal is to kill a memory-hogging task so that it may
exit, free its memory, and allow the current context to allocate the
memory that triggered it in the first place. Thus, killing a task is
pointless if other threads sharing its mm cannot be killed because of its
/proc/pid/oom_adj or /proc/pid/oom_score_adj value.
This patch checks whether any other thread sharing p->mm has an
oom_score_adj of OOM_SCORE_ADJ_MIN. If so, the thread cannot be killed
and oom_badness(p) returns 0, meaning it's unkillable.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ying Han <yinghan@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's pointless to kill a task if another thread sharing its mm cannot be
killed to allow future memory freeing. A subsequent patch will prevent
kills in such cases, but first it's necessary to have a way to flag a task
that shares memory with an OOM_DISABLE task that doesn't incur an
additional tasklist scan, which would make select_bad_process() an O(n^2)
function.
This patch adds an atomic counter to struct mm_struct that follows how
many threads attached to it have an oom_score_adj of OOM_SCORE_ADJ_MIN.
They cannot be killed by the kernel, so their memory cannot be freed in
oom conditions.
This only requires task_lock() on the task that we're operating on, it
does not require mm->mmap_sem since task_lock() pins the mm and the
operation is atomic.
[rientjes@google.com: changelog and sys_unshare() code]
[rientjes@google.com: protect oom_disable_count with task_lock in fork]
[rientjes@google.com: use old_mm for oom_disable_count in exec]
Signed-off-by: Ying Han <yinghan@google.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We use vmcore in our production kernel for a long time, it is pretty
stable now. So I don't think we need to mark it as experimental any more.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit df9ee292 ("Fix IRQ flag handling naming") changed the IRQ flag
handling naming scheme and broke UML:
In file included from arch/um/include/asm/fixmap.h:5,
from arch/um/include/shared/um_uaccess.h:10,
from arch/um/include/asm/uaccess.h:41,
from arch/um/include/asm/thread_info.h:13,
from include/linux/thread_info.h:56,
from include/linux/preempt.h:9,
from include/linux/spinlock.h:50,
from include/linux/seqlock.h:29,
from include/linux/time.h:8,
from include/linux/stat.h:60,
from include/linux/module.h:10,
from init/main.c:13:
arch/um/include/asm/system.h:11:1: warning: "local_save_flags" redefined
This patch brings the new scheme to UML and makes it work again.
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This helper is wrong: it coerces signed values into unsigned ones, so code
such as
if (kfifo_alloc(...) < 0) {
error
}
will fail to detect the error.
So let's disable __kfifo_must_check_helper() for 2.6.36.
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
365b1818 ("add f_flags to struct statfs(64)") resized f_spare within
struct statfs which caused a UML crash. There is no need to copy f_spare.
Signed-off-by: Richard Weinberger <richard@nod.at>
Reported-by: Toralf Förster <toralf.foerster@gmx.de>
Tested-by: Toralf Förster <toralf.foerster@gmx.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a bug fix. Some SPI connected devices using 16/24 bit accesses,
previously failed, now work.
This typo slipped in after testing, during some restructuring.
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Chris Verges <chrisv@cyberswitching.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The linker script cleanup that I did in commit 5d150a97f9 ("um: Clean up
linker script using standard macros.") (2.6.32) accidentally introduced an
ALIGN(PAGE_SIZE) when converting to use INIT_TEXT_SECTION; Richard
Weinberger reported that this causes the kernel to segfault with
CONFIG_STATIC_LINK=y.
I'm not certain why this extra alignment is a problem, but it seems likely
it is because previously
__init_begin = _stext = _text = _sinittext
and with the extra ALIGN(PAGE_SIZE), _sinittext becomes different from the
rest. So there is likely a bug here where something is assuming that
_sinittext is the same as one of those other symbols. But reverting the
accidental change fixes the regression, so it seems worth committing that
now.
Signed-off-by: Tim Abbott <tabbott@ksplice.com>
Reported-by: Richard Weinberger <richard@nod.at>
Cc: Jeff Dike <jdike@addtoit.com>
Tested by: Antoine Martin <antoine@nagafix.co.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Under some workloads, some channel messages have been observed being
delayed on the sending side past the point where the receiving side has
been able to tear down its partition structures.
This condition is already detected in xpc_handle_activate_IRQ_uv(), but
that information is not given to xpc_handle_activate_mq_msg_uv(). As a
result, xpc_handle_activate_mq_msg_uv() assumes the structures still exist
and references them, causing a NULL-pointer deref.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes a issue which was introduced by fe2cc53e ("uml: track and make
up lost ticks").
timeval_to_ns() returns long long and not int. Due to that UML's timer
did not work properlt and caused timer freezes.
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a bug in commit 6dda9d55 ("page allocator: reduce fragmentation
in buddy allocator by adding buddies that are merging to the tail of the
free lists") that means a buddy at order MAX_ORDER is checked for merging.
A page of this order never exists so at times, an effectively random
piece of memory is being checked.
Alan Curry has reported that this is causing memory corruption in
userspace data on a PPC32 platform (http://lkml.org/lkml/2010/10/9/32).
It is not clear why this is happening. It could be a cache coherency
problem where pages mapped in both user and kernel space are getting
different cache lines due to the bad read from kernel space
(http://lkml.org/lkml/2010/10/13/179). It could also be that there are
some special registers being io-remapped at the end of the memmap array
and that a read has special meaning on them. Compiler bugs have been
ruled out because the assembly before and after the patch looks relatively
harmless.
This patch fixes the problem by ensuring we are not reading a possibly
invalid location of memory. It's not clear why the read causes corruption
but one way or the other it is a buggy read.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Corrado Zoccolo <czoccolo@gmail.com>
Reported-by: Alan Curry <pacman@kosh.dhis.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This comment landed in the wrong place.
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
scan_lru_pages returns pfn. So, it's type should be "unsigned long"
not "int".
Note: I guess this has been work until now because memory hotplug tester's
machine has not very big memory....
physical address < 32bit << PAGE_SHIFT.
Reported-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.infradead.org/battery-2.6:
power_supply: Makefile cleanup
bq27x00_battery: Add missing kfree(di->bus) in bq27x00_battery_remove()
power_supply: Introduce maximum current property
power_supply: Add types for USB chargers
ds2782_battery: Fix units
power_supply: Add driver for TWL4030/TPS65950 BCI charger
bq20z75: Add support for more power supply properties
wm831x_power: Add missing kfree(wm831x_power) in wm831x_power_remove()
jz4740-battery: Add missing kfree(jz_battery) in jz_battery_remove()
ds2760_battery: Add missing kfree(di) in ds2760_battery_remove()
olpc_battery: Fix endian neutral breakage for s16 values
ds2760_battery: Fix W1 and W1_SLAVE_DS2760 dependency
pcf50633-charger: Add missing sysfs_remove_group()
power_supply: Add driver for TI BQ20Z75 gas gauge IC
wm831x_power: Remove duplicate chg mask
omap: rx51: Add support for USB chargers
power_supply: Add isp1704 charger detection driver
* 'linux_next' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/i7core: (34 commits)
i7core_edac: return -ENODEV when devices were already probed
i7core_edac: properly terminate pci_dev_table
i7core_edac: Avoid PCI refcount to reach zero on successive load/reload
i7core_edac: Fix refcount error at PCI devices
i7core_edac: it is safe to i7core_unregister_mci() when mci=NULL
i7core_edac: Fix an oops at i7core probe
i7core_edac: Remove unused member channels in i7core_pvt
i7core_edac: Remove unused arg csrow from get_dimm_config
i7core_edac: Reduce args of i7core_register_mci
i7core_edac: Introduce i7core_unregister_mci
i7core_edac: Use saved pointers
i7core_edac: Check probe counter in i7core_remove
i7core_edac: Call pci_dev_put() when alloc_i7core_dev() failed
i7core_edac: Fix error path of i7core_register_mci
i7core_edac: Fix order of lines in i7core_register_mci
i7core_edac: Always do get/put for all devices
i7core_edac: Introduce i7core_pci_ctl_create/release
i7core_edac: Introduce free_i7core_dev
i7core_edac: Introduce alloc_i7core_dev
i7core_edac: Reduce args of i7core_get_onedevice
...
* 'for_linus' of git://github.com/at91linux/linux-2.6-at91:
AT91: rtc: enable built-in RTC in Kconfig for at91sam9g45 family
at91/atmel-mci: inclusion of sd/mmc driver in at91sam9g45 chip and board
AT91: pm: make sure that r0 is 0 when dealing with cache operations
AT91: pm: use plain cpu_do_idle() for "wait for interrupt"
AT91: reset: extend alternate reset procedure to several chips
AT91: reset routine cleanup, remove not needed icache flush
AT91: trivial: align comment of at91sam9g20_reset with one more tab
AT91: Fix AT91SAM9G20 reset as per the errata in the data sheet
AT91: add board support for Pcontrol_G20
* 'for-linus' of git://gitorious.org/linux-omap-dss2/linux:
OMAP: DSS2: don't power off a panel twice
OMAP: DSS2: OMAPFB: Allow usage of def_vrfb only for omap2,3
OMAP: DSS2: OMAPFB: make VRFB depends on OMAP2,3
OMAP: DSS2: OMAPFB: Allow FB_OMAP2 to build without VRFB
arm/omap: simplify conditional
OMAP: DSS2: DSI: Remove extra iounmap in error path
OMAP: DSS2: Use dss_features framework on DSS2 code
OMAP: DSS2: Introduce dss_features files
video/omap: remove mux.h include
ARM: omap/fb: move get_fbmem_region() to .init.text
ARM: omap/fb: move omapfb_reserve_sram to .init.text
ARM: omap/fb: move omap_init_fb to .init.text
OMAP: DSS2: OMAPFB: swap front and back porches for both hsync and vsync
OMAP: DSS2: make filter coefficient tables human readable
OMAP: DSS2: Add SPI dependency to Kconfig of ACX565AKM panel
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
svcrpc: assume svc_delete_xprt() called only once
svcrpc: never clear XPT_BUSY on dead xprt
nfsd4: fix connection allocation in sequence()
nfsd4: only require krb5 principal for NFSv4.0 callbacks
nfsd4: move minorversion to client
nfsd4: delay session removal till free_client
nfsd4: separate callback change and callback probe
nfsd4: callback program number is per-session
nfsd4: track backchannel connections
nfsd4: confirm only on succesful create_session
nfsd4: make backchannel sequence number per-session
nfsd4: use client pointer to backchannel session
nfsd4: move callback setup into session init code
nfsd4: don't cache seq_misordered replies
SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
SUNRPC: Use conventional switch statement when reclassifying sockets
sunrpc/xprtrdma: clean up workqueue usage
sunrpc: Turn list_for_each-s into the ..._entry-s
...
Fix up trivial conflicts (two different deprecation notices added in
separate branches) in Documentation/feature-removal-schedule.txt
* 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
net/sunrpc: Use static const char arrays
nfs4: fix channel attribute sanity-checks
NFSv4.1: Use more sensible names for 'initialize_mountpoint'
NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure
NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure
NFS: client needs to maintain list of inodes with active layouts
NFS: create and destroy inode's layout cache
NFSv4.1: pnfs: filelayout: introduce minimal file layout driver
NFSv4.1: pnfs: full mount/umount infrastructure
NFS: set layout driver
NFS: ask for layouttypes during v4 fsinfo call
NFS: change stateid to be a union
NFSv4.1: pnfsd, pnfs: protocol level pnfs constants
SUNRPC: define xdr_decode_opaque_fixed
NFSD: remove duplicate NFS4_STATEID_SIZE
Enable built-in RTC IP in Kconfig and modify comments and help messages.
RTT as RTC is still available but should not be selected in common case.
Reported-by: Yegor Yefremov <yegor_sub1@visionsystems.de>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
This adds the support of atmel-mci sd/mmc driver in at91sam9g45 devices and
board files. This also configures the DMA controller slave interface for
at_hdmac dmaengine driver.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
When using CP15 cache operations (c7), we make sure that Rd (r0)
is actually 0 as ARM 926 TRM is saying.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
For power management at91_pm_enter() routine, use the cpu_do_idle() for a
rock solid "wait for interrupt" implementation.
For AT91SAM9 ARM 926 based chips, we can exceed the cache line length as
we can access RAM even while in self-refresh mode.
We keep plain access to CP15 for at91rm9200 as this feature is not
available: instructions have to be in a single cache line.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Several at91sam9 chips need the alternate reset procedure to be sure to halt
SDRAM smoothly before resetting the chip.
This is an extension of previous patch "Fix AT91SAM9G20 reset" to all chips
affected.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Generalize assembler reset routine to allow use on several at91sam9 chips.
This patch replace double definitions of SDRAM controller registers and RSTC
registers with use of classical header files.
For this rework, we remove the not needed icache flush as it is already
done in the calling function: arm_machine_restart().
Rename at91sam9g20_reset.S to generalize to several chips.
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
If the SDRAM is not cleanly shutdown before reset it can be left driving
the bus, which then stops the bootloader booting from NAND.
Signed-off-by: Peter Horton <phorton@bitbox.co.uk>
[nicolas.ferre@atmel.com: change file header line order]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Board is a carrier board for Stamp9G20, with additional peripherals
for a building automation system
Signed-off-by: Peter Gsellmann <pgsellmann@portner-elektronik.at>
[nicolas.ferre@atmel.com: remove machine_desc.io_pg_offst and .phys_io]
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>