Stashing version 4 under version 3 and removing version 4, because both
version changes were within single patchset.
Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/20120822083540.GB1003@krava.brq.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
. Fix include order for bison/flex-generated C files, from Ben Hutchings
. Build fixes and documentation corrections from David Ahern
. Group parsing support, from Jiri Olsa
. UI/gtk refactorings and improvements from Namhyung Kim
. NULL deref fix for perf script, from Namhyung Kim
. Assorted cleanups from Robert Richter
. Let O= makes handle relative paths, from Steven Rostedt
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJQMkGhAAoJENZQFvNTUqpAqjsQAJE5iD1LFogC8o/WjvRHz0TY
Y0x+sR/XfW61KYpeq5g+UaKuFU3P44ijCoyks3y5sza97DkYgUwMpEHlLXFSM8Pp
sNOapqY57s24nq3MLrhH1V9w+cSE+m2u/Gi5fGLCQekio9gkOBwYxNGk7vpKri/n
LBRsMozBu/mZjMy20uWOb7Uk8xsAToh+TFaAtjyQ9Snn9nNJj49NUAp37uN888H/
ducMLq32HN5v/6Zd3q6IWdDWgZsHLkIa3R5FIs/GNe3Dih07gtYLmDol4ktPbTFm
yoaWpP5wbtu/62EZlJwE393vMuoeqN/96394ZZQGFafhHVxN4+rcBhXbejBs0T2b
wk/0CzntW8bbUAI/cl3SB9aui//FWOxcjG9aDQ7PsmHzPw1Q4VD0F9Mcod4p+dRX
PsA9q/tST1eAiwzWYthDtj81U7iChINcXKhoZn2xn6+0+aMH+6FFNBmCH8MR5aCU
BvrXhTJjvau/Ym/sILl4Tf4wfssTq49yMsn/YKCwLJ0hg0XlTObWfQRy2MOayXH9
NJvUE+9GSXoTEKhmr1AfTYEG9vObaXZyFwAI74xvPPwUYojCb4ZjEKmG0egW+VGk
IJKFCaJZwwVsGau4aIbFAMP12/L8Qs/Ox91ddCJ0j5TIlSGMaqW5lbV1N1crzlTT
a0GsN49NvhbFttBXrcNX
=0a2X
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
* Fix include order for bison/flex-generated C files, from Ben Hutchings
* Build fixes and documentation corrections from David Ahern
* Group parsing support, from Jiri Olsa
* UI/gtk refactorings and improvements from Namhyung Kim
* NULL deref fix for perf script, from Namhyung Kim
* Assorted cleanups from Robert Richter
* Let O= makes handle relative paths, from Steven Rostedt
* perf script python fixes, from Feng Tang.
* Improve 'perf lock' error message when the needed tracepoints
are not present, from David Ahern.
* Initial bash completion support, from Frederic Weisbecker
* Allow building without libelf, from Namhyung Kim.
* Support DWARF CFI based unwind to have callchains when %bp
based unwinding is not possible, from Jiri Olsa.
* Symbol resolution fixes, while fixing support PPC64 files with an .opt ELF
section was the end goal, several fixes for code that handles all
architectures and cleanups are included, from Cody Schafer.
* Add a description for the JIT interface, from Andi Kleen.
* Assorted fixes for Documentation and build in 32 bit, from Robert Richter
* Add support for non-tracepoint events in perf script python, from Feng Tang
* Cache the libtraceevent event_format associated to each evsel early, so that we
avoid relookups, i.e. calling pevent_find_event repeatedly when processing
tracepoint events.
[ This is to reduce the surface contact with libtraceevents and make clear what
is that the perf tools needs from that lib: so far parsing the common and per
event fields. ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull ftrace updates from Steve Rostedt:
" This patch series extends ftrace function tracing utility to be
more dynamic for its users. It allows for data passing to the callback
functions, as well as reading regs as if a breakpoint were to trigger
at function entry.
The main goal of this patch series was to allow kprobes to use ftrace
as an optimized probe point when a probe is placed on an ftrace nop.
With lots of help from Masami Hiramatsu, and going through lots of
iterations, we finally came up with a good solution. "
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Introducing following bits to the the perf_event_attr struct:
- exclude_callchain_kernel to filter out kernel callchain
from the sample dump
- exclude_callchain_user to filter out user callchain
from the sample dump
We need to be able to disable standard user callchain dump when we use
the dwarf cfi callchain mode, because frame pointer based user
callchains are useless in this mode.
Implementing also exclude_callchain_kernel to have complete set of
options.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
[ Added kernel callchains filtering ]
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-7-git-send-email-jolsa@redhat.com
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introducing PERF_SAMPLE_STACK_USER sample type bit to trigger the dump
of the user level stack on sample. The size of the dump is specified by
sample_stack_user value.
Being able to dump parts of the user stack, starting from the stack
pointer, will be useful to make a post mortem dwarf CFI based stack
unwinding.
Added HAVE_PERF_USER_STACK_DUMP config option to determine if the
architecture provides user stack dump on perf event samples. This needs
access to the user stack pointer which is not unified across
architectures. Enabling this for x86 architecture.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-6-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introducing perf_output_skip function to be able to skip data within the
perf ring buffer.
When writing data into perf ring buffer we first reserve needed place in
ring buffer and then copy the actual data.
There's a possibility we won't be able to fill all the reserved size
with data, so we need a way to skip the remaining bytes.
This is going to be useful when storing the user stack dump, where we
might end up with less data than we originally requested.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-5-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding a generic way to use __output_copy function with specific copy
function via DEFINE_PERF_OUTPUT_COPY macro.
Using this to add new __output_copy_user function, that provides output
copy from user pointers. For x86 the copy_from_user_nmi function is used
and __copy_from_user_inatomic for the rest of the architectures.
This new function will be used in user stack dump on sample, coming in
next patches.
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-4-git-send-email-jolsa@redhat.com
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introducing PERF_SAMPLE_REGS_USER sample type bit to trigger the dump of
user level registers on sample. Registers we want to dump are specified
by sample_regs_user bitmask.
Only user level registers are dumped at the moment. Meaning the register
values of the user space context as it was before the user entered the
kernel for whatever reason (syscall, irq, exception, or a PMI happening
in userspace).
The layout of the sample_regs_user bitmap is described in
asm/perf_regs.h for archs that support register dump.
This is going to be useful to bring Dwarf CFI based stack unwinding on
top of samples.
Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com>
[ Dump registers ABI specification. ]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Suggested-by: Stephane Eranian <eranian@google.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This brings a new API to help the selective dump of registers on event
sampling, and its implementation for x86 arch.
Added HAVE_PERF_REGS config option to determine if the architecture
provides perf registers ABI.
The information about desired registers will be passed in u64 mask.
It's up to the architecture to map the registers into the mask bits.
For the x86 arch implementation, both 32 and 64 bit registers bits are
defined within single enum to ensure 64 bit system can provide register
dump for compat task if needed in the future.
Original-patch-by: Frederic Weisbecker <fweisbec@gmail.com>
[ Added missing linux/errno.h include ]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Arun Sharma <asharma@fb.com>
Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Ulrich Drepper <drepper@gmail.com>
Link: http://lkml.kernel.org/r/1344345647-11536-2-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull VFS fixes and cleanups from Al Viro.
Most of this is the series to remove sync_supers() and the
->write_supers VFS callback from Artem Bityutskiy. One commit to do the
actual removal work, a whole series of commits to fix up stale comments
etc all over the tree.
There's also a regression fix for an incorrect use of mnt_drop_write()
in do_dentry_open().
* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
missed mnt_drop_write() in do_dentry_open()
UBIFS: nuke pdflush from comments
gfs2: nuke pdflush from comments
drbd: nuke pdflush from comments
nilfs2: nuke write_super from comments
hfs: nuke write_super from comments
vfs: nuke pdflush from comments
jbd/jbd2: nuke write_super from comments
btrfs: nuke pdflush from comments
btrfs: nuke write_super from comments
ext4: nuke pdflush from comments
ext4: nuke write_super from comments
ext3: nuke write_super from comments
Documentation: fix the VM knobs descritpion WRT pdflush
Documentation: get rid of write_super
vfs: kill write_super and sync_supers
The pdflush thread is long gone, so this patch removes references to pdflush
from vfs comments.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Finally we can kill the 'sync_supers' kernel thread along with the
'->write_super()' superblock operation because all the users are gone.
Now every file-system is supposed to self-manage own superblock and
its dirty state.
The nice thing about killing this thread is that it improves power management.
Indeed, 'sync_supers' is a source of monotonic system wake-ups - it woke up
every 5 seconds no matter what - even if there were no dirty superblocks and
even if there were no file-systems using this service (e.g., btrfs and
journalled ext4 do not need it). So it was wasting power most of the time. And
because the thread was in the core of the kernel, all systems had to have it.
So I am quite happy to make it go away.
Interestingly, this thread is a left-over from the pdflush kernel thread which
was a self-forking kernel thread responsible for all the write-back in old
Linux kernels. It was turned into per-block device BDI threads, and
'sync_supers' was a left-over. Thus, R.I.P, pdflush as well.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull ACPI and power management fixes from Len Brown:
"A 3.3 sleep regression fixed, numa bugfix, plus some minor cleanups"
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
ACPI processor: Fix tick_broadcast_mask online/offline regression
ACPI: Only count valid srat memory structures
ACPI: Untangle a return statement for better readability
ACPI / PCI: Do not try to acquire _OSC control if that is hopeless
ACPI: delete _GTS/_BFS support
ACPI/x86: revert 'x86, acpi: Call acpi_enter_sleep_state via an asmlinkage C function from assembler'
ACPI: replace strlen("string") with sizeof("string") -1
ACPI / PM: Fix build warning in sleep.c for CONFIG_ACPI_SLEEP unset
Pull timer fixes from Ingo Molnar:
"One regression fix, and a couple of cleanups that clean up the code
flow in areas that had high-profile bugs recently."
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
time: Remove all direct references to timekeeper
time: Clean up offs_real/wall_to_mono and offs_boot/total_sleep_time updates
time: Clean up stray newlines
time/jiffies: Rename ACTHZ to SHIFTED_HZ
time/jiffies: Allow CLOCK_TICK_RATE to be undefined
time: Fix casting issue in tk_set_xtime and tk_xtime_add
Pull scheduler fixes from Ingo Molnar:
"Fixes and two late cleanups"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/cleanups: Add load balance cpumask pointer to 'struct lb_env'
sched: Fix comment about PREEMPT_ACTIVE bit location
sched: Fix minor code style issues
sched: Use task_rq_unlock() in __sched_setscheduler()
sched/numa: Add SD_PERFER_SIBLING to CPU domain
Pull perf fixes from Ingo Molnar:
"Fix merge window fallout and fix sleep profiling (this was always
broken, so it's not a fix for the merge window - we can skip this one
from the head of the tree)."
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/trace: Add ability to set a target task for events
perf/x86: Fix USER/KERNEL tagging of samples properly
perf/x86/intel/uncore: Make UNCORE_PMU_HRTIMER_INTERVAL 64-bit
Pull irq fix from Ingo Molnar.
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Allow irq chips to mark themself oneshot safe
usb-dbgp - increase the controller wait time to come out of halt.
kdb - Remove unused KDB_FLAG_ONLY_DO_DUMP code and cpu in more prompt
debug core - pass NMI type on archs that provide NMI types
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQG8wyAAoJEIciOldedpOjN2oP/ipaQSLnvoKUhutFl/qL2239
mMsxh9ga9rfKuCujpSkHZUwjo3VX7put7cnhVwETd4y2gN2YWPYg4OIKt+Y0AhNe
4NzHwB+lm6iGE33Q1x4uEHBH5aWLzWcOM/9n4avwY2DjtDfpecki5ChP/CVHK8qU
VVF2PfY8nbxcEonCbP1b/0KaD3xrPqwgZ70HdFi5eUuXiBajAyp9c9zqVUWJ6j+H
r+2PVkzn9NxRCkyq3tzK5gYk5SzoJPClkpB67CWugG35MiFLkz2csJNztFxtaInZ
t8HLkMTVLdCgLZqnw/ZEVWfqQA+q6N5NS6zs9j0siLg5HbEb+UtCebPwpChdQrmh
Sol+0vmT9Hi4Jm6onhDnQYchaDI7gMhynUC9sWAPhtSHS9e7D9c5IBLHQd3YbOHK
c8ELzxduszw8+jaiDJStkWM+tbQzJXD9bT1KpLVJd8t9BKAmuBX7ETSD+eKtjynJ
SgywSfVOdxXzEMvRWqeK3qgkLJYCWCfFsc+75hzJl18dRoM3NDyuOxKyOLWF9tFV
QUjaCvndIFz7CgM7FTToJZbACqxFHRGh4UUXHiXPUBPXvE5Zt34n4VsxAXYpJc95
1by/TBcYKWPd3hsOJOh1qMeKXD6TEwN/eEmsjBfnHTp8EBcgRtvue8ZHPMcfYnn/
Iauy66b/nHJLRsi1gWrn
=LAWq
-----END PGP SIGNATURE-----
Merge tag 'for_linux-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb
Pull KGDB/KDB/usb-dbgp fixes and cleanups from Jason Wessel:
"There are no new features, those will be delayed to the 3.7 window.
There are only fixes/cleanup against the usual kernel churn and we are
removing more lines than we add:
- usb-dbgp - increase the controller wait time to come out of halt.
- kdb - Remove unused KDB_FLAG_ONLY_DO_DUMP code and cpu in more prompt
- debug core - pass NMI type on archs that provide NMI types"
* tag 'for_linux-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
USB: echi-dbgp: increase the controller wait time to come out of halt.
kernel/debug: Make use of KGDB_REASON_NMI
kdb: Remove cpu from the more prompt
kdb: Remove unused KDB_FLAG_ONLY_DO_DUMP
Otherwise you could run into:
WARN_ON in numa_register_memblks(), because node_possible_map is zero
References: https://bugzilla.novell.com/show_bug.cgi?id=757888
On this machine (ProLiant ML570 G3) the SRAT table contains:
- No processor affinities
- One memory affinity structure (which is set disabled)
CC: Per Jessen <per@opensuse.org>
CC: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
Pull OLPC platform updates from Andres Salomon:
"These move the OLPC Embedded Controller driver out of
arch/x86/platform and into drivers/platform/olpc.
OLPC machines are now ARM-based (which means lots of x86 and ARM
changes), but are typically pretty self-contained.. so it makes more
sense to go through a separate OLPC tree after getting the appropriate
review/ACKs."
* 'for-linus-3.6' of git://dev.laptop.org/users/dilinger/linux-olpc:
x86: OLPC: move s/r-related EC cmds to EC driver
Platform: OLPC: move global variables into priv struct
Platform: OLPC: move debugfs support from x86 EC driver
x86: OLPC: switch over to using new EC driver on x86
Platform: OLPC: add a suspended flag to the EC driver
Platform: OLPC: turn EC driver into a platform_driver
Platform: OLPC: allow EC cmd to be overridden, and create a workqueue to call it
drivers: OLPC: update various drivers to include olpc-ec.h
Platform: OLPC: add a stub to drivers/platform/ for the OLPC EC driver
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iEYEABECAAYFAlAaIN8ACgkQGkmNcg7/o7gpMgCgtmCJeUrSCc/eVKGy5ik1BPnC
3DAAoLdqCJflwx4mrN48kkXQj9SA4Xko
=Y6yl
-----END PGP SIGNATURE-----
Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh
Pull SuperH fixes from Paul Mundt.
* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh: (24 commits)
sh: explicitly include sh_dma.h in setup-sh7722.c
sh: ecovec: care CN5 VBUS if USB host mode
sh: sh7724: fixup renesas_usbhs clock settings
sh: intc: initial irqdomain support.
sh: pfc: Fix up init ordering mess.
serial: sh-sci: fix compilation breakage, when DMA is enabled
dmaengine: shdma: restore partial transfer calculation
sh: modify the sh_dmae_slave_config for RSPI in setup-sh7757
sh: Fix up recursive fault in oops with unset TTB.
sh: pfc: Build fix for pinctrl_remove_gpio_range() changes.
sh: select the fixed regulator driver on several boards
sh: ecovec: switch MMC power control to regulators
sh: add fixed voltage regulators to se7724
sh: add fixed voltage regulators to sdk7786
sh: add fixed voltage regulators to rsk
sh: add fixed voltage regulators to migor
sh: add fixed voltage regulators to kfr2r09
sh: add fixed voltage regulators to ap325rxa
sh: add fixed voltage regulators to sh7757lcr
sh: add fixed voltage regulators to sh2007
...
Pull ARM DMA engine updates from Russell King:
"This looks scary at first glance, but what it is is:
- a rework of the sa11x0 DMA engine driver merged during the previous
cycle, to extract a common set of helper functions for DMA engine
implementations.
- conversion of amba-pl08x.c to use these helper functions.
- addition of OMAP DMA engine driver (using these helper functions),
and conversion of some of the OMAP DMA users to use DMA engine.
Nothing in the helper functions is ARM specific, so I hope that other
implementations can consolidate some of their code by making use of
these helpers.
This has been sitting in linux-next most of the merge cycle, and has
been tested by several OMAP folk. I've tested it on sa11x0 platforms,
and given it my best shot on my broken platforms which have the
amba-pl08x controller.
The last point is the addition to feature-removal-schedule.txt, which
will have a merge conflict. Between myself and TI, we're planning to
remove the old TI DMA implementation next year."
Fix up trivial add/add conflicts in Documentation/feature-removal-schedule.txt
and drivers/dma/{Kconfig,Makefile}
* 'dmaengine' of git://git.linaro.org/people/rmk/linux-arm: (53 commits)
ARM: 7481/1: OMAP2+: omap2plus_defconfig: enable OMAP DMA engine
ARM: 7464/1: mmc: omap_hsmmc: ensure probe returns error if DMA channel request fails
Add feature removal of old OMAP private DMA implementation
mtd: omap2: remove private DMA API implementation
mtd: omap2: add DMA engine support
spi: omap2-mcspi: remove private DMA API implementation
spi: omap2-mcspi: add DMA engine support
ARM: omap: remove mmc platform data dma_mask and initialization
mmc: omap: remove private DMA API implementation
mmc: omap: add DMA engine support
mmc: omap_hsmmc: remove private DMA API implementation
mmc: omap_hsmmc: add DMA engine support
dmaengine: omap: add support for cyclic DMA
dmaengine: omap: add support for setting fi
dmaengine: omap: add support for returning residue in tx_state method
dmaengine: add OMAP DMA engine driver
dmaengine: sa11x0-dma: add cyclic DMA support
dmaengine: sa11x0-dma: fix DMA residue support
dmaengine: PL08x: ensure all descriptors are freed when channel is released
dmaengine: PL08x: get rid of write only pool_ctr and free_txd locking
...
Pull second vfs pile from Al Viro:
"The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
deadlock reproduced by xfstests 068), symlink and hardlink restriction
patches, plus assorted cleanups and fixes.
Note that another fsfreeze deadlock (emergency thaw one) is *not*
dealt with - the series by Fernando conflicts a lot with Jan's, breaks
userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
for massive vfsmount leak; this is going to be handled next cycle.
There probably will be another pull request, but that stuff won't be
in it."
Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
delousing target_core_file a bit
Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
fs: Remove old freezing mechanism
ext2: Implement freezing
btrfs: Convert to new freezing mechanism
nilfs2: Convert to new freezing mechanism
ntfs: Convert to new freezing mechanism
fuse: Convert to new freezing mechanism
gfs2: Convert to new freezing mechanism
ocfs2: Convert to new freezing mechanism
xfs: Convert to new freezing code
ext4: Convert to new freezing mechanism
fs: Protect write paths by sb_start_write - sb_end_write
fs: Skip atime update on frozen filesystem
fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
fs: Improve filesystem freezing handling
switch the protection of percpu_counter list to spinlock
nfsd: Push mnt_want_write() outside of i_mutex
btrfs: Push mnt_want_write() outside of i_mutex
fat: Push mnt_want_write() outside of i_mutex
...
Pull block driver changes from Jens Axboe:
- Making the plugging support for drivers a bit more sane from Neil.
This supersedes the plugging change from Shaohua as well.
- The usual round of drbd updates.
- Using a tail add instead of a head add in the request completion for
ndb, making us find the most completed request more quickly.
- A few floppy changes, getting rid of a duplicated flag and also
running the floppy init async (since it takes forever in boot terms)
from Andi.
* 'for-3.6/drivers' of git://git.kernel.dk/linux-block:
floppy: remove duplicated flag FD_RAW_NEED_DISK
blk: pass from_schedule to non-request unplug functions.
block: stack unplug
blk: centralize non-request unplug handling.
md: remove plug_cnt feature of plugging.
block/nbd: micro-optimization in nbd request completion
drbd: announce FLUSH/FUA capability to upper layers
drbd: fix max_bio_size to be unsigned
drbd: flush drbd work queue before invalidate/invalidate remote
drbd: fix potential access after free
drbd: call local-io-error handler early
drbd: do not reset rs_pending_cnt too early
drbd: reset congestion information before reporting it in /proc/drbd
drbd: report congestion if we are waiting for some userland callback
drbd: differentiate between normal and forced detach
drbd: cleanup, remove two unused global flags
floppy: Run floppy initialization asynchronous
Pull core block IO bits from Jens Axboe:
"The most complicated part if this is the request allocation rework by
Tejun, which has been queued up for a long time and has been in
for-next ditto as well.
There are a few commits from yesterday and today, mostly trivial and
obvious fixes. So I'm pretty confident that it is sound. It's also
smaller than usual."
* 'for-3.6/core' of git://git.kernel.dk/linux-block:
block: remove dead func declaration
block: add partition resize function to blkpg ioctl
block: uninitialized ioc->nr_tasks triggers WARN_ON
block: do not artificially constrain max_sectors for stacking drivers
blkcg: implement per-blkg request allocation
block: prepare for multiple request_lists
block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv
blkcg: inline bio_blkcg() and friends
block: allocate io_context upfront
block: refactor get_request[_wait]()
block: drop custom queue draining used by scsi_transport_{iscsi|fc}
mempool: add @gfp_mask to mempool_create_node()
blkcg: make root blkcg allocation use %GFP_KERNEL
blkcg: __blkg_lookup_create() doesn't need radix preload
In commit 3b6e2723f3 ("locks: prevent side-effects of
locks_release_private before file_lock is initialized") we removed the
last user of lm_release_private without removing the field itself.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that
allows altering the size of an existing partition, even if it is currently
in use.
This patch converts hd_struct->nr_sects into sequence counter because
One might extend a partition while IO is happening to it and update of
nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This
can lead to issues like reading inconsistent size of a partition. Sequence
counter have been used so that readers don't have to take bdev mutex lock
as we call sector_in_part() very frequently.
Now all the access to hd_struct->nr_sects should happen using sequence
counter read/update helper functions part_nr_sects_read/part_nr_sects_write.
There is one exception though, set_capacity()/get_capacity(). I think
theoritically race should exist there too but this patch does not
modify set_capacity()/get_capacity() due to sheer number of call sites
and I am afraid that change might break something. I have left that as a
TODO item. We can handle it later if need be. This patch does not introduce
any new races as such w.r.t set_capacity()/get_capacity().
v2: Add CONFIG_LBDAF test to UP preempt case as suggested by Phillip.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Phillip Susi <psusi@ubuntu.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The recent shdma driver split has mistakenly removed support for partial
DMA transfer size calculation on forced termination. This patch restores
it.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Vinod Koul <vinod.koul@linux.intel.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Round of refactoring and enhancements to irq_domain infrastructure. This
series starts the process of simplifying irqdomain. The ultimate goal is
to merge LEGACY, LINEAR and TREE mappings into a single system, but had
to back off from that after some last minute bugs. Instead it mainly
reorganizes the code and ensures that the reverse map gets populated
when the irq is mapped instead of the first time it is looked up.
Merging of the irq_domain types is deferred to v3.7
In other news, this series adds helpers for creating static mappings on
a linear or tree mapping.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQGKKkAAoJEEFnBt12D9kB59gQAJnTjrihej1tr0OEkffIthGK
RyVI/DMo0jMgLs4K/rIo3Y+PdTSsNYd8x4R7ln8O7rNRQn8W6jE6NQgoMh51EvNc
FAltmTsBldq6hUNuz2FEnbmojBP4QklTzL8bAiXtX5EufWQsgMsP4guOuHXLCjEV
CkWYVk/slXEWJ8yYJc6GKVRvL+CNeiXVCTcOsYA0CI3ofN7O0rd+YAL314CRllIc
e5uARbWM+s9FJ/eXwCZP4+3jCmdI/CHJb284WldMc/mBD8Rbiqpb4kH6AZI+TH2O
CyiNEPWs6FG5eJPTID7HrOarXGzwYq/pvv8iG7Mh8NiKSae1C1HdkHelCjbLQ+pU
POya0fWF1Gvzlmw0gHik86dqaKjwb29btjj7SFg8KnQExWn2ifhsY70mM9wCTo3s
cwcQlssDIsARE83nttTFCoV/iAWh9AvTxafrXu/+9OKTjpsYlC8kgzdVjq5aAxON
JaAUK1OduTWRsd1TabKlh6naRXr9nRcLKikwKri2oYVKkj97wahBuib4ffzAcNqz
VklRBxTH6M+dz/t5NpcVyLXJpqzTN++QNdTAmeQG6LOnHJL4tpFTsx5sMa7ghmzX
LNpmp/AkVfP0MT7Drf0FUUx6iFA7sjANYzcepUVDrPGKHx0E3LyqbG5JKcC5LgM6
+UIoKAktF3vY7pdZJL9z
=ZUF/
-----END PGP SIGNATURE-----
Merge tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6
Pull irqdomain changes from Grant Likely:
"Round of refactoring and enhancements to irq_domain infrastructure.
This series starts the process of simplifying irqdomain. The ultimate
goal is to merge LEGACY, LINEAR and TREE mappings into a single
system, but had to back off from that after some last minute bugs.
Instead it mainly reorganizes the code and ensures that the reverse
map gets populated when the irq is mapped instead of the first time it
is looked up.
Merging of the irq_domain types is deferred to v3.7
In other news, this series adds helpers for creating static mappings
on a linear or tree mapping."
* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6:
irqdomain: Improve diagnostics when a domain mapping fails
irqdomain: eliminate slow-path revmap lookups
irqdomain: Fix irq_create_direct_mapping() to test irq_domain type.
irqdomain: Eliminate dedicated radix lookup functions
irqdomain: Support for static IRQ mapping and association.
irqdomain: Always update revmap when setting up a virq
irqdomain: Split disassociating code into separate function
irq_domain: correct a minor wrong comment for linear revmap
irq_domain: Standardise legacy/linear domain selection
irqdomain: Make ops->map hook optional
irqdomain: Remove unnecessary test for IRQ_DOMAIN_MAP_LEGACY
irqdomain: Simple NUMA awareness.
devicetree: add helper inline for retrieving a node's full name
The 1.75-based OLPC EC driver already does this; let's do it for all EC
drivers. This gives us nice suspend/resume hooks, amongst other things.
We want to run the EC's suspend hooks later than other drivers (which may
be setting wakeup masks or be running EC commands). We also want to run
the EC's resume hooks earlier than other drivers (which may want to run EC
commands).
Signed-off-by: Andres Salomon <dilinger@queued.net>
Acked-by: Paul Fox <pgf@laptop.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
This provides a new API allows different OLPC architectures to override the
EC driver. x86 and ARM OLPC machines use completely different EC backends.
The olpc_ec_cmd is synchronous, and waits for the workqueue to send the
command to the EC. Multiple callers can run olpc_ec_cmd() at once, and
they will by serialized and sleep while only one executes on the EC at a time.
We don't provide an unregister function, as that doesn't make sense within
the context of OLPC machines - there's only ever 1 EC, it's critical to
functionality, and it certainly not hotpluggable.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Acked-by: Paul Fox <pgf@laptop.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
The OLPC EC driver has outgrown arch/x86/platform/. It's time to both
share common code amongst different architectures, as well as move it out
of arch/x86/. The XO-1.75 is ARM-based, and the EC driver shares a lot of
code with the x86 code.
Signed-off-by: Andres Salomon <dilinger@queued.net>
Acked-by: Paul Fox <pgf@laptop.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Merge Andrew's second set of patches:
- MM
- a few random fixes
- a couple of RTC leftovers
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
rtc/rtc-88pm80x: remove unneed devm_kfree
rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
tmpfs: distribute interleave better across nodes
mm: remove redundant initialization
mm: warn if pg_data_t isn't initialized with zero
mips: zero out pg_data_t when it's allocated
memcg: gix memory accounting scalability in shrink_page_list
mm/sparse: remove index_init_lock
mm/sparse: more checks on mem_section number
mm/sparse: optimize sparse_index_alloc
memcg: add mem_cgroup_from_css() helper
memcg: further prevent OOM with too many dirty pages
memcg: prevent OOM with too many dirty pages
mm: mmu_notifier: fix freed page still mapped in secondary MMU
mm: memcg: only check anon swapin page charges for swap cache
mm: memcg: only check swap cache pages for repeated charging
mm: memcg: split swapin charge function into private and public part
mm: memcg: remove needless !mm fixup to init_mm when charging
mm: memcg: remove unneeded shmem charge type
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQF+koAAoJECObm247sIsiNjQP/Rra8MHYEtUYlIDlx/MU5aPV
dVOQ1rbHqLH6vnqBESuzwWYd1Eg9pI0yjMoQKuOrxg0Ri/cAhf3WE8xTzb/HTZ6h
npBvVmYIrqAFvE8Tqd0L9XZXNES5dy47F7a3Sp7vXAflQTQ7NAgG3idhzcmom5Zp
6Z7ETrtcaLnu0wr67eb96ICpq9QgPd9rWy5Fn9RAo7NzOb3SJp9CXT02vShytQLk
4sCcez4DTXf3IQjvjgVwV1OfvmcvG1lhI8R1Wvj1TqfqJVC0ZmKQYpmR9rlnPkLB
SB51h0v6a64YaKBpD7R7wuKUGWn2CEg4zAPuFxeACZ99pyx+eulPqZOTX2mAl+gQ
aW5Ec7Au76zvqnKgPVt2nzFMCXYZpHlWRQHqQNon5xXVdqJRS4cJAev7o9udyD6q
SQZIKBoZrwQtA3YbukXd91JKnxliH23MxCaL2u5C5ix6BSKsRs7djHboe210GAcQ
bEjQnhHujETJcFzm1O9HbypJhfg3Klw5o/SKYi3pylURDRMmKVhR3vP9cxdloxAC
QGh3klwQncKN9RxSACsfcNAh0V7RqkufcLXPpEcCwAuyUIrYniwWxiroco/bYE24
DK8g6bi+cY9At2iMALTd2J7RzZcrertQVmRtmXaOrLwSZvWrdizc6qNBpwTIozIT
+V/e1WcSau1C4zU4T0Z0
=fx5A
-----END PGP SIGNATURE-----
Merge tag 'vfio-for-v3.6' of git://github.com/awilliam/linux-vfio
Pull VFIO core from Alex Williamson:
"This series includes the VFIO userspace driver interface for the 3.6
kernel merge window. This driver is intended to provide a secure
interface for device access using IOMMU protection for applications
like assignment of physical devices to virtual machines.
Qemu will be the first user of this interface, enabling assignment of
PCI devices to Qemu guests. This interface is intended to eventually
replace the x86-specific assignment mechanism currently available in
KVM.
This interface has the advantage of being more secure, by working with
IOMMU groups to ensure device isolation and providing it's own
filtered resource access mechanism, and also more flexible, in not
being x86 or KVM specific (extensions to enable POWER are already
working).
This driver is originally the work of Tom Lyon, but has since been
handed over to me and gone through a complete overhaul thanks to the
input from David Gibson, Ben Herrenschmidt, Chris Wright, Joerg
Roedel, and others. This driver has been available in linux-next for
the last month."
Paul Mackerras says:
"I would be glad to see it go in since we want to use it with KVM on
PowerPC. If possible we'd like the PowerPC bits for it to go in as
well."
* tag 'vfio-for-v3.6' of git://github.com/awilliam/linux-vfio:
vfio: Add PCI device driver
vfio: Type1 IOMMU implementation
vfio: Add documentation
vfio: VFIO core
from interrupts for /dev/random and /dev/urandom. The goal is to
addresses weaknesses discussed in the paper "Mining your Ps and Qs:
Detection of Widespread Weak Keys in Network Devices", by Nadia
Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman, which will
be published in the Proceedings of the 21st Usenix Security Symposium,
August 2012. (See https://factorable.net for more information and an
extended version of the paper.)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJQF/0DAAoJENNvdpvBGATwIowQAOep9QKtLrBvb2lwIRVmeiy8
lRf7V/tYZnz4FePbR0W92JQfKYkCV8yyOO0bmeRzWL3v4m+lRwDTSyA1DDyQMoH+
LOMzvDKSLJMSXTXdSOIr1WYACphViCR/9CrbMBCKSkYfZLJ1MdaEDxT3rcpTGD0T
6iknUweiSkHHhkerU5yQL7FKzD5kYUe0hsF47w7QVlHRHJsW2fsZqkFoh+RpnhNw
03u+djxNGBo9qV81vZ9D1b0vA9uRlEjoWOOEG2XE4M2iq6TUySueA72dQnCwunfi
3kG/u1Swv2dgq6aRrP3H7zdwhYSourGxziu3jNhEKwKEohrxYY7xjNX3RVeTqP67
AzlKsOTWpRLIDrzjSLlb8VxRQiZewu8Unex3e1G+eo20sbcIObHGrxNp7K00zZvd
QZiMHhOwItwFTe4lBO+XbqH2JKbL9/uJmwh5EipMpQTraKO9E6N3CJiUHjzBLo2K
iGDZxRMKf4gVJRwDxbbP6D70JPVu8ZJ09XVIpsXQ3Z1xNqaMF0QdCmP3ty56q1o0
NvkSXxPKrijZs8Sk0rVDqnJ3ll8PuDnXMv5eDtL42VT818I5WxESn9djjwEanGv0
TYxbFub/NRxmPEE5B2Js5FBpqsLf5f282OSMeS/5WLBbnHJR1OoPoAhGVpHvxntC
bi5FC1OolqhvzVIdsqgt
=u7KM
-----END PGP SIGNATURE-----
Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull random subsystem patches from Ted Ts'o:
"This patch series contains a major revamp of how we collect entropy
from interrupts for /dev/random and /dev/urandom.
The goal is to addresses weaknesses discussed in the paper "Mining
your Ps and Qs: Detection of Widespread Weak Keys in Network Devices",
by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman,
which will be published in the Proceedings of the 21st Usenix Security
Symposium, August 2012. (See https://factorable.net for more
information and an extended version of the paper.)"
Fix up trivial conflicts due to nearby changes in
drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c}
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (33 commits)
random: mix in architectural randomness in extract_buf()
dmi: Feed DMI table to /dev/random driver
random: Add comment to random_initialize()
random: final removal of IRQF_SAMPLE_RANDOM
um: remove IRQF_SAMPLE_RANDOM which is now a no-op
sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op
[ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op
board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op
isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op
pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out
uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op
drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op
xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op
n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op
pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op
i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op
input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op
mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op
...
Pull second set of media updates from Mauro Carvalho Chehab:
- radio API: add support to work with radio frequency bands
- new AM/FM radio drivers: radio-shark, radio-shark2
- new Remote Controller USB driver: iguanair
- conversion of several drivers to the v4l2 core control framework
- new board additions at existing drivers
- the remaining (and vast majority of the patches) are due to
drivers/DocBook fixes/cleanups.
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (154 commits)
[media] radio-tea5777: use library for 64bits div
[media] tlg2300: Declare MODULE_FIRMWARE usage
[media] lgs8gxx: Declare MODULE_FIRMWARE usage
[media] xc5000: Add MODULE_FIRMWARE statements
[media] s2255drv: Add MODULE_FIRMWARE statement
[media] dib8000: move dereference after check for NULL
[media] Documentation: Update cardlists
[media] bttv: add support for Aposonic W-DVR
[media] cx25821: Remove bad strcpy to read-only char*
[media] pms.c: remove duplicated include
[media] smiapp-core.c: remove duplicated include
[media] via-camera: pass correct format settings to sensor
[media] rtl2832.c: minor cleanup
[media] Add support for the IguanaWorks USB IR Transceiver
[media] Minor cleanups for MCE USB
[media] drivers/media/dvb/siano/smscoreapi.c: use list_for_each_entry
[media] Use a named union in struct v4l2_ioctl_info
[media] mceusb: Add Twisted Melon USB IDs
[media] staging/media/solo6x10: use module_pci_driver macro
[media] staging/media/dt3155v4l: use module_pci_driver macro
...
Conflicts:
Documentation/feature-removal-schedule.txt
Features include:
- Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
separate modules.
- Fix Oopses in the NFSv4 idmapper
- Fix a deadlock whereby rpciod tries to allocate a new socket and
ends up recursing into the NFS code due to memory reclaim.
- Increase the number of permitted callback connections.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQGF+vAAoJEGcL54qWCgDy6ncP/jKVl/Yjz6i1b/8QtC0EmYFb
XNwbjZicNupVM98XPLm+sfdWxvoGiHSMbwG8t/hx5Z6CteM18adeGNO1KqP9xQyg
scPvFmqj5VJNHsHztcHIFHFYdpsuJqRKcW3TA2l8AkNOm6uBcnXArRosYN6LrXik
hI/jWcyv8wZuooSQrLf463JK37t/s6LuUQo6jKP4582sbANHvPdLkHFCYg3yt1Sk
2IIo2CLLs2cJUswGJnsHT76Jxfvh21NdGtvydjVjpr4H0/LY5GykBc23AAL9TWj6
KlegDO902hLzEE93xazF59hQZrPuwBi3quEMJ3X0mjdNQWb4G96s/t2hmwpTjWyc
mjpRsuaGlCXDxcrI5dnU52hqKa0Ju1ipbLpghxSVfilRy998xvGp5Yc6ADU+pYnt
uP09lamCyjFEa+gDSqGt7lv4dFmdPJjWZdkXLPv0ah5sXVMYAT8NRLUfiE1+hLLu
zKaM84PHSEvykFeJ2WvjDTgF3L55y3x6L3LN4UkKmfO5nqQf7csPcquU1NfHh25S
jjKGq2SKGoYG1l6FwK3hemup5cnFUEX78E2QeLrmaHg92fJDGrz587i6MPc5jrSe
sOSX/CpuCJd4VhodIo8T00me0GZSHEU7RP2KEc518ZvrPmz13avXeLeG9nYhjuxM
cV9Ex8zh2Y4aiUXCy3uA
=KSix
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull second wave of NFS client updates from Trond Myklebust:
- Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
separate modules.
- Fix Oopses in the NFSv4 idmapper
- Fix a deadlock whereby rpciod tries to allocate a new socket and ends
up recursing into the NFS code due to memory reclaim.
- Increase the number of permitted callback connections.
* tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: explicitly reject LOCK_MAND flock() requests
nfs: increase number of permitted callback connections.
SUNRPC: return negative value in case rpcbind client creation error
NFS: Convert v4 into a module
NFS: Convert v3 into a module
NFS: Convert v2 into a module
NFS: Keep module parameters in the generic NFS client
NFS: Split out remaining NFS v4 inode functions
NFS: Pass super operations and xattr handlers in the nfs_subversion
NFS: Only initialize the ACL client in the v3 case
NFS: Create a try_mount rpc op
NFS: Remove the NFS v4 xdev mount function
NFS: Add version registering framework
NFS: Fix a number of bugs in the idmapper
nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
sunrpc: clarify comments on rpc_make_runnable
pnfsblock: bail out partial page IO
If a process creates a large hugetlbfs mapping that is eligible for page
table sharing and forks heavily with children some of whom fault and
others which destroy the mapping then it is possible for page tables to
get corrupted. Some teardowns of the mapping encounter a "bad pmd" and
output a message to the kernel log. The final teardown will trigger a
BUG_ON in mm/filemap.c.
This was reproduced in 3.4 but is known to have existed for a long time
and goes back at least as far as 2.6.37. It was probably was introduced
in 2.6.20 by [39dde65c: shared page table for hugetlb page]. The messages
look like this;
[ ..........] Lots of bad pmd messages followed by this
[ 127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7).
[ 127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7).
[ 127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7).
[ 127.186778] ------------[ cut here ]------------
[ 127.186781] kernel BUG at mm/filemap.c:134!
[ 127.186782] invalid opcode: 0000 [#1] SMP
[ 127.186783] CPU 7
[ 127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod
[ 127.186801]
[ 127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR
[ 127.186804] RIP: 0010:[<ffffffff810ed6ce>] [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[ 127.186809] RSP: 0000:ffff8804144b5c08 EFLAGS: 00010002
[ 127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0
[ 127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00
[ 127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003
[ 127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8
[ 127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8
[ 127.186815] FS: 00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000
[ 127.186816] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0
[ 127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0)
[ 127.186821] Stack:
[ 127.186822] ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b
[ 127.186824] ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98
[ 127.186825] ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000
[ 127.186827] Call Trace:
[ 127.186829] [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80
[ 127.186832] [<ffffffff811bc925>] truncate_hugepages+0x115/0x220
[ 127.186834] [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30
[ 127.186837] [<ffffffff811655c7>] evict+0xa7/0x1b0
[ 127.186839] [<ffffffff811657a3>] iput_final+0xd3/0x1f0
[ 127.186840] [<ffffffff811658f9>] iput+0x39/0x50
[ 127.186842] [<ffffffff81162708>] d_kill+0xf8/0x130
[ 127.186843] [<ffffffff81162812>] dput+0xd2/0x1a0
[ 127.186845] [<ffffffff8114e2d0>] __fput+0x170/0x230
[ 127.186848] [<ffffffff81236e0e>] ? rb_erase+0xce/0x150
[ 127.186849] [<ffffffff8114e3ad>] fput+0x1d/0x30
[ 127.186851] [<ffffffff81117db7>] remove_vma+0x37/0x80
[ 127.186853] [<ffffffff81119182>] do_munmap+0x2d2/0x360
[ 127.186855] [<ffffffff811cc639>] sys_shmdt+0xc9/0x170
[ 127.186857] [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b
[ 127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0
[ 127.186868] RIP [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160
[ 127.186870] RSP <ffff8804144b5c08>
[ 127.186871] ---[ end trace 7cbac5d1db69f426 ]---
The bug is a race and not always easy to reproduce. To reproduce it I was
doing the following on a single socket I7-based machine with 16G of RAM.
$ hugeadm --pool-pages-max DEFAULT:13G
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax
$ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall
$ for i in `seq 1 9000`; do ./hugetlbfs-test; done
On my particular machine, it usually triggers within 10 minutes but
enabling debug options can change the timing such that it never hits.
Once the bug is triggered, the machine is in trouble and needs to be
rebooted. The machine will respond but processes accessing proc like "ps
aux" will hang due to the BUG_ON. shutdown will also hang and needs a
hard reset or a sysrq-b.
The basic problem is a race between page table sharing and teardown. For
the most part page table sharing depends on i_mmap_mutex. In some cases,
it is also taking the mm->page_table_lock for the PTE updates but with
shared page tables, it is the i_mmap_mutex that is more important.
Unfortunately it appears to be also insufficient. Consider the following
situation
Process A Process B
--------- ---------
hugetlb_fault shmdt
LockWrite(mmap_sem)
do_munmap
unmap_region
unmap_vmas
unmap_single_vma
unmap_hugepage_range
Lock(i_mmap_mutex)
Lock(mm->page_table_lock)
huge_pmd_unshare/unmap tables <--- (1)
Unlock(mm->page_table_lock)
Unlock(i_mmap_mutex)
huge_pte_alloc ...
Lock(i_mmap_mutex) ...
vma_prio_walk, find svma, spte ...
Lock(mm->page_table_lock) ...
share spte ...
Unlock(mm->page_table_lock) ...
Unlock(i_mmap_mutex) ...
hugetlb_no_page <--- (2)
free_pgtables
unlink_file_vma
hugetlb_free_pgd_range
remove_vma_list
In this scenario, it is possible for Process A to share page tables with
Process B that is trying to tear them down. The i_mmap_mutex on its own
does not prevent Process A walking Process B's page tables. At (1) above,
the page tables are not shared yet so it unmaps the PMDs. Process A sets
up page table sharing and at (2) faults a new entry. Process B then trips
up on it in free_pgtables.
This patch fixes the problem by adding a new function
__unmap_hugepage_range_final that is only called when the VMA is about to
be destroyed. This function clears VM_MAYSHARE during
unmap_hugepage_range() under the i_mmap_mutex. This makes the VMA
ineligible for sharing and avoids the race. Superficially this looks like
it would then be vunerable to truncate and madvise issues but hugetlbfs
has its own truncate handlers so does not use unmap_mapping_range() and
does not support madvise(DONTNEED).
This should be treated as a -stable candidate if it is merged.
Test program is as follows. The test case was mostly written by Michal
Hocko with a few minor changes to reproduce this bug.
==== CUT HERE ====
static size_t huge_page_size = (2UL << 20);
static size_t nr_huge_page_A = 512;
static size_t nr_huge_page_B = 5632;
unsigned int get_random(unsigned int max)
{
struct timeval tv;
gettimeofday(&tv, NULL);
srandom(tv.tv_usec);
return random() % max;
}
static void play(void *addr, size_t size)
{
unsigned char *start = addr,
*end = start + size,
*a;
start += get_random(size/2);
/* we could itterate on huge pages but let's give it more time. */
for (a = start; a < end; a += 4096)
*a = 0;
}
int main(int argc, char **argv)
{
key_t key = IPC_PRIVATE;
size_t sizeA = nr_huge_page_A * huge_page_size;
size_t sizeB = nr_huge_page_B * huge_page_size;
int shmidA, shmidB;
void *addrA = NULL, *addrB = NULL;
int nr_children = 300, n = 0;
if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
perror("shmget:");
return 1;
}
if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) {
perror("shmat");
return 1;
}
if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) {
perror("shmget:");
return 1;
}
if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) {
perror("shmat");
return 1;
}
fork_child:
switch(fork()) {
case 0:
switch (n%3) {
case 0:
play(addrA, sizeA);
break;
case 1:
play(addrB, sizeB);
break;
case 2:
break;
}
break;
case -1:
perror("fork:");
break;
default:
if (++n < nr_children)
goto fork_child;
play(addrA, sizeA);
break;
}
shmdt(addrA);
shmdt(addrB);
do {
wait(NULL);
} while (--n > 0);
shmctl(shmidA, IPC_RMID, NULL);
shmctl(shmidB, IPC_RMID, NULL);
return 0;
}
[akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build]
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pg_data_t is zeroed before reaching free_area_init_core(), so remove the
now unnecessary initializations.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Compaction (and page migration in general) can currently be hindered
through pages being owned by memory cgroups that are at their limits and
unreclaimable.
The reason is that the replacement page is being charged against the limit
while the page being replaced is also still charged. But this seems
unnecessary, given that only one of the two pages will still be in use
after migration finishes.
This patch changes the memcg migration sequence so that the replacement
page is not charged. Whatever page is still in use after successful or
failed migration gets to keep the charge of the page that was going to be
replaced.
The replacement page will still show up temporarily in the rss/cache
statistics, this can be fixed in a later patch as it's less urgent.
Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Wanpeng Li <liwp.linux@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This
will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
->connect() method.
PF_MEMALLOC should allow the allocation of struct socket and related
objects and the early (re)setting of SOCK_MEMALLOC should allow us to
receive the packets required for the TCP connection buildup.
[jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
[dfeng@redhat.com: Fix handling of multiple swap files]
[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The patch "mm: add support for a filesystem to activate swap files and use
direct_IO for writing swap pages" added support for using direct_IO to
write swap pages but it is insufficient for highmem pages.
To support highmem pages, this patch kmaps() the page before calling the
direct_IO() handler. As direct_IO deals with virtual addresses an
additional helper is necessary for get_kernel_pages() to lookup the struct
page for a kmap virtual address.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The version of swap_activate introduced is sufficient for swap-over-NFS
but would not provide enough information to implement a generic handler.
This patch shuffles things slightly to ensure the same information is
available for aops->swap_activate() as is available to the core.
No functionality change.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently swapfiles are managed entirely by the core VM by using ->bmap to
allocate space and write to the blocks directly. This effectively ensures
that the underlying blocks are allocated and avoids the need for the swap
subsystem to locate what physical blocks store offsets within a file.
If the swap subsystem is to use the filesystem information to locate the
blocks, it is critical that information such as block groups, block
bitmaps and the block descriptor table that map the swap file were
resident in memory. This patch adds address_space_operations that the VM
can call when activating or deactivating swap backed by a file.
int swap_activate(struct file *);
int swap_deactivate(struct file *);
The ->swap_activate() method is used to communicate to the file that the
VM relies on it, and the address_space should take adequate measures such
as reserving space in the underlying device, reserving memory for mempools
and pinning information such as the block descriptor table in memory. The
->swap_deactivate() method is called on sys_swapoff() if ->swap_activate()
returned success.
After a successful swapfile ->swap_activate, the swapfile is marked
SWP_FILE and swapper_space.a_ops will proxy to
sis->swap_file->f_mappings->a_ops using ->direct_io to write swapcache
pages and ->readpage to read.
It is perfectly possible that direct_IO be used to read the swap pages but
it is an unnecessary complication. Similarly, it is possible that
->writepage be used instead of direct_io to write the pages but filesystem
developers have stated that calling writepage from the VM is undesirable
for a variety of reasons and using direct_IO opens up the possibility of
writing back batches of swap pages in the future.
[a.p.zijlstra@chello.nl: Original patch]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds two new APIs get_kernel_pages() and get_kernel_page() that
may be used to pin a vector of kernel addresses for IO. The initial user
is expected to be NFS for allowing pages to be written to swap using
aops->direct_IO(). Strictly speaking, swap-over-NFS only needs to pin one
page for IO but it makes sense to express the API in terms of a vector and
add a helper for pinning single pages.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Cc: Mark Salter <msalter@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In order to teach filesystems to handle swap cache pages, three new page
functions are introduced:
pgoff_t page_file_index(struct page *);
loff_t page_file_offset(struct page *);
struct address_space *page_file_mapping(struct page *);
page_file_index() - gives the offset of this page in the file in
PAGE_CACHE_SIZE blocks. Like page->index is for mapped pages, this
function also gives the correct index for PG_swapcache pages.
page_file_offset() - uses page_file_index(), so that it will give the
expected result, even for PG_swapcache pages.
page_file_mapping() - gives the mapping backing the actual page; that is
for swap cache pages it will give swap_file->f_mapping.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Neil Brown <neilb@suse.de>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Under significant pressure when writing back to network-backed storage,
direct reclaimers may get throttled. This is expected to be a short-lived
event and the processes get woken up again but processes do get stalled.
This patch counts how many times such stalling occurs. It's up to the
administrator whether to reduce these stalls by increasing
min_free_kbytes.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If swap is backed by network storage such as NBD, there is a risk that a
large number of reclaimers can hang the system by consuming all
PF_MEMALLOC reserves. To avoid these hangs, the administrator must tune
min_free_kbytes in advance which is a bit fragile.
This patch throttles direct reclaimers if half the PF_MEMALLOC reserves
are in use. If the system is routinely getting throttled the system
administrator can increase min_free_kbytes so degradation is smoother but
the system will keep running.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: David Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The skb->pfmemalloc flag gets set to true iff during the slab allocation
of data in __alloc_skb that the the PFMEMALLOC reserves were used. If
page splitting is used, it is possible that pages will be allocated from
the PFMEMALLOC reserve without propagating this information to the skb.
This patch propagates page->pfmemalloc from pages allocated for fragments
to the skb.
It works by reintroducing and expanding the skb_alloc_page() API to take
an skb. If the page was allocated from pfmemalloc reserves, it is
automatically copied. If the driver allocates the page before the skb, it
should call skb_propagate_pfmemalloc() after the skb is allocated to
ensure the flag is copied properly.
Failure to do so is not critical. The resulting driver may perform slower
if it is used for swap-over-NBD or swap-over-NFS but it should not result
in failure.
[davem@davemloft.net: API rename and consistency]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>