An rmdir pushes css's ref count to zero. However, if the associated
directory is open at the time, the dentry ref count is non-zero. If
the fd for this directory is then passed into perf_event_open, it
does a css_get(). This bounces the ref count back up from zero. This
is a problem by itself. But what makes it turn into a crash is the
fact that we end up doing an extra dput, since we perform a dput
when css_put sees the ref count go down to zero.
css_tryget() does not fall into that trap. So, we use that instead.
Reproduction test-case for the bug:
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <linux/unistd.h>
#include <linux/perf_event.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
#define PERF_FLAG_PID_CGROUP (1U << 2)
int perf_event_open(struct perf_event_attr *hw_event_uptr,
pid_t pid, int cpu, int group_fd, unsigned long flags) {
return syscall(__NR_perf_event_open,hw_event_uptr, pid, cpu,
group_fd, flags);
}
/*
* Directly poke at the perf_event bug, since it's proving hard to repro
* depending on where in the kernel tree. what moved?
*/
int main(int argc, char **argv)
{
int fd;
struct perf_event_attr attr;
memset(&attr, 0, sizeof(attr));
attr.exclude_kernel = 1;
attr.size = sizeof(attr);
mkdir("/dev/cgroup/perf_event/blah", 0777);
fd = open("/dev/cgroup/perf_event/blah", O_RDONLY);
perror("open");
rmdir("/dev/cgroup/perf_event/blah");
sleep(2);
perf_event_open(&attr, fd, 0, -1, PERF_FLAG_PID_CGROUP);
perror("perf_event_open");
close(fd);
return 0;
}
Signed-off-by: Salman Qazi <sqazi@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/20120614223108.1025.2503.stgit@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The RDPMC index calculation is wrong for AMD family 15h
(X86_FEATURE_ PERFCTR_CORE set). This leads to a #GP when
accessing the counter:
Pid: 2237, comm: syslog-ng Not tainted 3.5.0-rc1-perf-x86_64-standard-g130ff90 #135 AMD Pike/Pike
RIP: 0010:[<ffffffff8100dc33>] [<ffffffff8100dc33>] x86_perf_event_update+0x27/0x66
While the msr address offset is (index << 1) we must use index to
select the correct rdpmc.
Signed-off-by: Robert Richter <robert.richter@amd.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Vince Weaver <vweaver1@eecs.utk.edu>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
1. __copy_insn() needs "loff_t offset", not "unsigned long",
to read the file.
2. use pgoff_t for "idx" and remove the unnecessary typecast.
3. fix the typo, "&=" is not what we want
4. can't resist, rename off1 to off.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120615154359.GA9625@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
loff_t looks confusing when it is used for the virtual address.
Change map_info and install_breakpoint/remove_breakpoint paths
to use "unsigned long".
The patch doesn't change vma_address(), it can't return "long"
because it is used to verify the mapping. But probably this
needs some cleanups too.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120615154355.GA9622@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
uprobe->pending_list is only used to create the temporary list,
it has no meaning after we drop uprobes_mmap_hash(inode).
No need to initialize this node or remove it from tmp_list, and
we can use list_for_each_entry().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120615154353.GA9614@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
write_opcode() ensures that UPROBE_SWBP_INSN doesn't cross the
page boundary. This looks a bit confusing, the check does not
depend on vaddr and it is enough to do it only once right after
install_breakpoint()->arch_uprobe_analyze_insn().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120615154350.GA9611@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
write_opcode() is called by register_for_each_vma() and
uprobe_mmap() paths. In both cases the caller has already
verified this vaddr under mmap_sem, no need to re-check.
Note also that this check is wrong anyway, we should not
truncate loff_t returned by vma_address() if we do not trust
this mapping.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120615154347.GA9604@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
copy_insn() returns -ENOMEM if the first __copy_insn() fails,
it should return the correct error code.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120615154344.GA9601@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
1. copy_insn() doesn't need "addr", it can use uprobe->offset.
Remove this argument.
2. Change copy_insn/__copy_insn to accept "struct file*" instead
of vma.
copy_insn() is called only once and mm/vma/vaddr are random, it
shouldn't depend on them.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120615154342.GA9598@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Because the mind is treacherous and makes us forget we need to
write stuff down.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20120615154339.GA9591@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
build_map_info() doesn't allocate the memory under i_mmap_mutex
to avoid the deadlock with page reclaim. But it can try
GFP_NOWAIT first, it should work in the likely case and thus we
almost never need the pre-alloc-and-retry path.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Link: http://lkml.kernel.org/r/20120615154336.GA9588@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently register_for_each_vma() is O(n ** 2) + O(n ** 3),
every time find_next_vma_info() "restarts" the
vma_prio_tree_foreach() loop and each iteration rechecks the
whole try_list. This also means that try_list can grow
"indefinitely" if register/unregister races with munmap/mmap
activity even if the number of mapping is bounded at any time.
With this patch register_for_each_vma() builds the list of
mm/vaddr structures only once and does install_breakpoint() for
each entry.
We do not care about the new mappings which can be created after
build_map_info() drops mapping->i_mmap_mutex, uprobe_mmap()
should do its work.
Note that we do not allocate map_info under i_mmap_mutex, this
can deadlock with page reclaim (but see the next patch). So we
use 2 lists, "curr" which we are going to return, and "prev"
which holds the already allocated memory. The main loop deques
the entry from "prev" (initially it is empty), and if "prev"
becomes empty again it counts the number of entries we need to
pre-allocate outside of i_mmap_mutex.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Link: http://lkml.kernel.org/r/20120615154333.GA9581@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
install_breakpoint() returns -EEXIST if is_swbp_insn(orig_insn)
== T, the caller treats this code as success.
This is doubly wrong. The successful return should set
UPROBE_COPY_INSN, but the real problem is that it shouldn't
succeed. If the probed insn is int3 the application should get
SIGTRAP, this won't happen with uprobe.
Probably we can fix this, we can add the UPROBE_SHARED_BP flag
and teach handle_swbp/set_orig_insn to handle this case
correctly. But this needs some complications and we have other
insns which can't be probed, lets make a simple fix for now.
I think this needs a cleanup. UPROBE_COPY_INSN should die,
copy_insn() should be called by alloc_uprobe().
arch_uprobe_analyze_insn() depends on ->mm (ia32_compat) but it
is called only once.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120615154331.GA9578@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
write_opcode() gets old_page via get_user_pages() and then calls
__replace_page() which assumes that this old_page is still
mapped after pte_offset_map_lock().
This is not true if this old_page was already try_to_unmap()'ed,
and in this case everything __replace_page() does with old_page
is wrong. Just for example, put_page() is not balanced.
I think it is possible to teach __replace_page() to handle this
unlikely case correctly, but this patch simply changes it to use
page_check_address() and return -EAGAIN if it fails. The caller
should notice this error code and retry.
Note: write_opcode() asks for the cleanups, I'll try to do this
in a separate patch.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120615154328.GA9571@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
__copy_insn() blindly calls read_mapping_page(), this will crash
the kernel if ->readpage == NULL, add the necessary check. For
example, hugetlbfs_aops->readpage is NULL. Perhaps we should
change read_mapping_page() instead.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120615154325.GA9568@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
__replace_page() obviously can't work with the hugetlbfs
mappings, uprobe_register() will likely crash the kernel. Change
valid_vma() to check VM_HUGETLB as well.
As for PageTransHuge() no need to worry, vma->vm_file != NULL.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anton Arapov <anton@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120615154322.GA9561@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
. Set name of tracepoints when reading the perf.data headers, so that
we don't end up using the local ones, from /sys.
. Fix default output file for perf stat, from Stephane Eranian.
. Fix endian handling of features bitmask in perf.data header, from David Ahern
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJP2K//AAoJENZQFvNTUqpAMDMP/3/SbjZSr4YQZx6CNXBjak3D
DiJc7R0h6VjxuStQknn9x+pL1Mwd0Avy6c1BUx4DSJntdTvjrFCFxo8ryc837Ry0
pJ+tWYGHu3RhGUAMYL1kgG/5lL907Zv1jrSMkuY9bMTTwao7wacuH79JYb8OPbIp
Fcth00OU5O1az++U36rTKGzIP5/vaJPyGL8l3VblzMQNpADWYQ5KWSa0hfwwbd/r
w1oARIh9WBmKEMO6xVgAWzgyfcsjJMFoIhk77A+YNHwrcb/3XQRxeIIy9I8yg9s5
1cwb3KlfdLi+V1j2x0O2Elnta0KsVeCdurVrdp8Kc4b3yeDeCZnQ7j5N9B2afuE3
eIbWVMsIqfA5uGanSzPEz36qgb+JcPQmlEqrIwPIiK7QZXDSSTMgthq8RHiMv+g1
sbRgxpRSo9HjRGz65LL0C4umkbRXyLB30RSW2yvqyzQMxVkhlieDQxGet2ICkcus
tdVHhfGSk7aGVAKwMX1FnRsAgzPrVwUOvj1WoUo5MQpE4gPC/nvx3/LnSF86lPop
hF+WqPUybCn/j3+rR/SfuQEC5YygDwLK4DB1MfdzWVfAXBunrVSsZTIEGnR9Eqrt
8wCFRAftks/xmwVHeXE8XjQcrF8xAe4nOrJUDcC42uIIBa7JwkKUS2TRUrU8Oyau
i3+2THoljqfnWPEP5zY7
=nEew
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf fixes from Arnaldo Carvalho de Melo:
* Set name of tracepoints when reading the perf.data headers, so that
we don't end up using the local ones, from /sys.
* Fix default output file for perf stat, from Stephane Eranian.
* Fix endian handling of features bitmask in perf.data header, from David Ahern.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
All trace events including ftrace internel events (like trace_printk
and function tracing), register functions that describe how to print
their output. The events may be recorded as soon as the ring buffer
is allocated, but they are just raw binary in the buffer. The mapping
of event ids to how to print them are held within a structure that
is registered on system boot.
If a crash happens in boot up before these functions are registered
then their output (via ftrace_dump_on_oops) will be useless:
Dumping ftrace buffer:
---------------------------------
<...>-1 0.... 319705us : Unknown type 6
---------------------------------
This can be quite frustrating for a kernel developer trying to see
what is going wrong.
There's no reason to register them so late in the boot up process.
They can be registered by early_initcall().
Reported-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
TRACE_EVENT_FL_ENABLED_BIT,
TRACE_EVENT_FL_FILTERED_BIT,
TRACE_EVENT_FL_RECORDED_CMD_BIT,
Have comments about what they are, but:
TRACE_EVENT_FL_CAP_ANY_BIT,
TRACE_EVENT_FL_NO_SET_FILTER_BIT,
TRACE_EVENT_FL_IGNORE_ENABLE_BIT,
do not, making them second class citizens. To prevent another
class warfare, these bits have protested for their right to be
commented. And By Golly! I'll give them what they want!
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
register_ftrace_function() checks ftrace_disabled and calls
__register_ftrace_function which does it again.
Drop the first check and add the unlikely hint to the second one. Also,
drop the label as John correctly notices.
No functional change.
Link: http://lkml.kernel.org/r/20120329171140.GE6409@aftab
Cc: Borislav Petkov <bp@amd64.org>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
A bunch of bugzillas have complained how noisy the nmi_watchdog
is during boot-up especially with its expected failure cases
(like virt and bios resource contention).
This is my attempt to quiet them down and keep it less confusing
for the end user. What I did is print the message for cpu0 and
save it for future comparisons. If future cpus have an
identical message as cpu0, then don't print the redundant info.
However, if a future cpu has a different message, happily print
that loudly.
Before the change, you would see something like:
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz stepping 0a
Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver.
... version: 2
... bit width: 40
... generic registers: 2
... value mask: 000000ffffffffff
... max period: 000000007fffffff
... fixed-purpose events: 3
... event mask: 0000000700000003
NMI watchdog enabled, takes one hw-pmu counter.
Booting Node 0, Processors #1
NMI watchdog enabled, takes one hw-pmu counter.
#2
NMI watchdog enabled, takes one hw-pmu counter.
#3 Ok.
NMI watchdog enabled, takes one hw-pmu counter.
Brought up 4 CPUs
Total of 4 processors activated (22607.24 BogoMIPS).
After the change, it is simplified to:
..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
CPU0: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz stepping 0a
Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver.
... version: 2
... bit width: 40
... generic registers: 2
... value mask: 000000ffffffffff
... max period: 000000007fffffff
... fixed-purpose events: 3
... event mask: 0000000700000003
NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
Booting Node 0, Processors #1#2#3 Ok.
Brought up 4 CPUs
V2: little changes based on Joe Perches' feedback
V3: printk cleanup based on Ingo's feedback; checkpatch fix
V4: keep printk as one long line
V5: Ingo fix ups
Reported-and-tested-by: Nathan Zimmer <nzimmer@sgi.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: nzimmer@sgi.com
Cc: joe@perches.com
Link: http://lkml.kernel.org/r/1339594548-17227-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
I noticed that the LBR fixups were not working anymore
on programs where they used to. I tracked this down to
a recent change to copy_from_user_nmi():
db0dc75d64 ("perf/x86: Check user address explicitly in copy_from_user_nmi()")
This commit added a call to __range_not_ok() to the
copy_from_user_nmi() routine. The problem is that the logic
of the test must be reversed. __range_not_ok() returns 0 if the
range is VALID. We want to return early from copy_from_user_nmi()
if the range is NOT valid.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Arun Sharma <asharma@fb.com>
Link: http://lkml.kernel.org/r/20120611134426.GA7542@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We need to use the per event info snapshoted at record time to
synthesize the events name, so do it just after reading the perf.data
headers, when we already processed the /sys events data, otherwise we'll
end up using the local /sys that only by sheer luck will have the same
tracepoint ID -> real event association.
Example:
# uname -a
Linux felicio.ghostprotocols.net 3.4.0-rc5+ #1 SMP Sat May 19 15:27:11 BRT 2012 x86_64 x86_64 x86_64 GNU/Linux
# perf record -e sched:sched_switch usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.015 MB perf.data (~648 samples) ]
# cat /t/events/sched/sched_switch/id
279
# perf evlist -v
sched:sched_switch: sample_freq=1, type: 2, config: 279, size: 80, sample_type: 1159, read_format: 7, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
#
So on the above machine the sched:sched_switch has tracepoint id 279, but on
the machine were we'll analyse it it has a different id:
$ cat /t/events/sched/sched_switch/id
56
$ perf evlist -i /tmp/perf.data
kmem:mm_balancedirty_writeout
$ cat /t/events/kmem/mm_balancedirty_writeout/id
279
With this fix:
$ perf evlist -i /tmp/perf.data
sched:sched_switch
Reported-by: Dmitry Antipov <dmitry.antipov@linaro.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-auwks8fpuhmrdpiefs55o5oz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The following commit:
commit 56f3bae706
Author: Jim Cromie <jim.cromie@gmail.com>
Date: Wed Sep 7 17:14:00 2011 -0600
perf stat: Add --log-fd <N> option to redirect stderr elsewhere
introduced a bug in the way perf stat outputs the results by default,
i.e., without the --log-fd or --output option. It would default to
writing to file descriptor 0, i.e., stdin. Writing to stdin is allowed
and is equivalent to writing to stdout. However, there is a major
difference for any script that was already capturing the output of perf
stat via redirection:
perf stat >/tmp/log .... or perf stat 2>/tmp/log ....
They would not capture anything anymore. They would have to do:
perf stat 0>/tmp/log ...
This breaks compatibility with existing scripts and does not look very
natural.
This patch fixes the problem by looking at output_fd only when it was
modified by user (> 0). It also checks that the value if positive.
Passing --log-fd 0 is ignored.
I would also argue that defaulting to stderr for the results is not the
right thing to do, though this patch does not address this specific
issue.
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jim Cromie <jim.cromie@gmail.com>
Link: http://lkml.kernel.org/r/20120515111111.GA9870@quad
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Based on Jiri's latest attempt:
https://lkml.org/lkml/2012/5/16/61
Basically, adds_features should be byte swapped assuming unsigned
longs are either 8-bytes (u64) or 4-bytes (u32).
Fixes 32-bit ppc dumping 64-bit x86 feature data:
========
captured on: Sun May 20 19:23:23 2012
hostname : nxos-vdc-dev3
os release : 3.4.0-rc7+
perf version : 3.4.rc4.137.g978da3
arch : x86_64
nrcpus online : 16
nrcpus avail : 16
cpudesc : Intel(R) Xeon(R) CPU E5540 @ 2.53GHz
cpuid : GenuineIntel,6,26,5
total memory : 24680324 kB
...
Verified 64-bit x86 can still dump feature data for 32-bit ppc.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Jiri Olsa <jolsa@redhat.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/4FBBB539.5010805@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If the privileges given to root threads (3% of allowable memory) or a
negative value of /proc/pid/oom_score_adj happen to exceed the amount of
rss of a thread, its badness score overflows as a result of commit
a7f638f999 ("mm, oom: normalize oom scores to oom_score_adj scale only
for userspace").
Fix this by making the type signed and return 1, meaning the thread is
still eligible for kill, if the value is negative.
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix lots of new kernel-doc warnings in kernel/sched/fair.c:
Warning(kernel/sched/fair.c:3625): No description found for parameter 'env'
Warning(kernel/sched/fair.c:3625): Excess function parameter 'sd' description in 'update_sg_lb_stats'
Warning(kernel/sched/fair.c:3735): No description found for parameter 'env'
Warning(kernel/sched/fair.c:3735): Excess function parameter 'sd' description in 'update_sd_pick_busiest'
Warning(kernel/sched/fair.c:3735): Excess function parameter 'this_cpu' description in 'update_sd_pick_busiest'
.. more warnings
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 9e612a008f.
It incorrectly finds VGA connectors where none are attached, apparently
not noticing that nothing replied to the EDID queries, and happily using
the default EDID modes that have nothing to do with actual hardware.
That in turn then causes X to fall down to the lowest common
denominator, which is usually the default 1024x768 mode that is in the
default EDID and pretty much anything supports).
I'd suggest that if not relying on the HDP pin, the code should at least
check whether it gets valid EDID data back, rather than just assume
there's something on the VGA connector.
Cc: Dave Airlie <airlied@linux.ie>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This update contains two bug fixes, both destined for the stable tree.
Perhaps the most important is one which fixes ext4 when used with file
systems originally formatted for use with ext3, but then later
converted to take advantage of ext4.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABCAAGBQJP0jyAAAoJENNvdpvBGATwGSIP/R0+4j/SqSncjLIEx6EvrCeY
eyR9JYb0F44P1bKlZ6WErrupysmhPqHubnbxMg9bVMTxefN/m6tEyG+EG4MwMxFP
mRtc5EVT0i/9k7U6s4Ey0kSnBnZqx6IBdFUR5lHrvAf7Ud920eY12Cx4oscI7Fj6
E81lWASOkDTjzsJpzFIqcW7b257Ne1oxPAu1p290W5riSkDzle7qqT5xT+lBAbr0
YM8ZFhznUdyv2aeuchf/dfys5YdVgSporplIjvP69bRyNr4x5B6QOUGxuX3RNhV3
Hqf83GW8HycyAEdl4AtakNtxiMPhrIy0S/RamWqBTE2+nws3Wnkd1wN4c5HjZZad
w5u2E/uSxedZfgGDZZriRwHCgR+2I2CEtMy45gL96o+2Cel6unN+vjVPTVbMpMUy
3PsPoPjPVmNPLyn4vIfHlEQ5ZLhBzVpXFEsca9s+6JeHQ4dKP77fa8iq5AjzCjzA
FYvmf1fNbDW1da81fRUOSnwuaC0JuUvOu5eQLLQ3jzz5gj+DvZvQqvleMHTXkQ3X
hLpSXNTtXUBYeXIw5aJzh5VPJqqGzPkpJhAEbMRgCobRA1HtuAEJzZf4J7+jiWjV
IrQpw0HIeZiK1oZW8iC8YYbIz8AakU+MnIVhvcPc3rIr5eeEtWPM/9ETx/mrO9s2
/y0bwu84eQLIctFjzR2s
=9oyU
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bug fixes from Theodore Ts'o:
"This update contains two bug fixes, both destined for the stable tree.
Perhaps the most important is one which fixes ext4 when used with file
systems originally formatted for use with ext3, but then later
converted to take advantage of ext4."
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: don't set i_flags in EXT4_IOC_SETFLAGS
ext4: fix the free blocks calculation for ext3 file systems w/ uninit_bg
Pull powerpc fixes from Paul Mackerras:
"Two small fixes for powerpc:
- a fix for a regression since 3.2 that causes 4-second (or longer)
pauses
- a fix for a potential oops when loading kernel modules on 32-bit
embedded systems."
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
powerpc: Fix kernel panic during kernel module load
powerpc/time: Sanity check of decrementer expiration is necessary
broken by the 3.5-rc1 UBI/UBIFS changes when we removed the debugging
Kconfig switches.
Also, correct locking in 'ubi_wl_flush()' - it was extended to support
flushing a specific LEB in 3.5-rc1, and the locking was sub-optimal.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJP0jAHAAoJECmIfjd9wqK0qN8P/1JRuky3xsX8o5S5tqVLKkzO
4naM8bH1AuUemg8f+K+2IUyHp1Mp0lsBj7zA46+3xWXvRcZBJaWgqZFkLOuWSZn0
ChRGKKhdwRkXokq7JHgUJCrdTLyXsBV+uuBbJ+oTPQa2+7f1mxgZuLc+MJzpoKAs
sNdE3A2pH3hhXDJKuT6hrM2wqD/2rW5bGurW4sYBxVUJEZ793HeXu30lTffZuWHo
ClzDtC0iWIS5NmdN0sbQL6Os3p+uiR5mjJek0eABTHAmgXvfBUWpf9ewTXYS2ulK
zY8brtIy2N8qmXFa8ZyguZKpCdulDqRLPz/2bmX5rOJSsyogoNxHI+UlZFCfq0u/
txA+Wm4dQZtvSM3Stws6gxuIDCX83wOgHq/gUitMsVkxo+McwevjXpZb4Ha80OVl
Iw7ERGrbxZn+B+lTh+9jgD6kstqpefO7fBP5gkLRvHToxDKY+6sdH6b3UuE6pDJI
SA2D1uDyk33A/dyvYZThcvsq7KMdF0mhfLuxkScSfP17hc0hnh/FSzkj3fHrADfy
oA4Oo/jKFbOFnmByNHjzvOaqeYHVsKwzq+feDcZ7sYA5GNrVMVViLZQljQBdJWo5
E9howusrb81AGvjCIFobntBLE8DZF5OGfgt/n+60+jo3/qern/4jeuWfGL/Q9Wcw
x27oVmOBLjuHuG3hXlGt
=X7QS
-----END PGP SIGNATURE-----
Merge tag 'upstream-3.5-rc2' of git://git.infradead.org/linux-ubifs
Pull UBI/UBIFS fixes from Artem Bityutskiy:
"Fix UBI and UBIFS - they refuse to work without debugfs. This was
broken by the 3.5-rc1 UBI/UBIFS changes when we removed the debugging
Kconfig switches.
Also, correct locking in 'ubi_wl_flush()' - it was extended to support
flushing a specific LEB in 3.5-rc1, and the locking was sub-optimal."
* tag 'upstream-3.5-rc2' of git://git.infradead.org/linux-ubifs:
UBI: correct ubi_wl_flush locking
UBIFS: fix debugfs-less systems support
UBI: fix debugfs-less systems support
This reverts commit 7732a557b1 (and commit
3f50fff4da, which was a follow-up
cleanup).
We're chasing an elusive bug that Dave Jones can apparently reproduce
using his system call fuzzer tool, and that looks like some kind of
locking ordering problem on the directory i_mutex chain. Our i_mutex
locking is rather complex, and depends on the topological ordering of
the directories, which is why we have been very wary of splicing
directory entries around.
Of course, we really don't want to ever see aliased unconnected
directories anyway, so none of this should ever happen, but this revert
aims to basically get us back to a known older state.
Bruce points to some of the previous discussion at
http://marc.info/?i=<20110310105821.GE22723@ZenIV.linux.org.uk>
and in particular a long post from Neil:
http://marc.info/?i=<20110311150749.2fa2be66@notabene.brown>
It should be noted that it's possible that Dave's problems come from
other changes altohgether, including possibly just the fact that Dave
constantly is teachning his fuzzer new tricks. So what appears to be a
new bug could in fact be an old one that just gets newly triggered, but
reverting these patches as "still under heavy discussion" is the right
thing regardless.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 fixes from Ingo Molnar.
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/nmi: Fix section mismatch warnings on 32-bit
x86/uv: Fix UV2 BAU legacy mode
x86/mm: Only add extra pages count for the first memory range during pre-allocation early page table space
x86, efi stub: Add .reloc section back into image
x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs
x86/reboot: Fix a warning message triggered by stop_other_cpus()
x86/intel/moorestown: Change intel_scu_devices_create() to __devinit
x86/numa: Set numa_nodes_parsed at acpi_numa_memory_affinity_init()
x86/gart: Fix kmemleak warning
x86: mce: Add the dropped timer interval init back
x86/mce: Fix the MCE poll timer logic
Pull perf fixes from Ingo Molnar:
"A bit larger than what I'd wish for - half of it is due to hw driver
updates to Intel Ivy-Bridge which info got recently released,
cycles:pp should work there now too, amongst other things. (but we
are generally making exceptions for hardware enablement of this type.)
There are also callchain fixes in it - responding to mostly
theoretical (but valid) concerns. The tooling side sports perf.data
endianness/portability fixes which did not make it for the merge
window - and various other fixes as well."
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (26 commits)
perf/x86: Check user address explicitly in copy_from_user_nmi()
perf/x86: Check if user fp is valid
perf: Limit callchains to 127
perf/x86: Allow multiple stacks
perf/x86: Update SNB PEBS constraints
perf/x86: Enable/Add IvyBridge hardware support
perf/x86: Implement cycles:p for SNB/IVB
perf/x86: Fix Intel shared extra MSR allocation
x86/decoder: Fix bsr/bsf/jmpe decoding with operand-size prefix
perf: Remove duplicate invocation on perf_event_for_each
perf uprobes: Remove unnecessary check before strlist__delete
perf symbols: Check for valid dso before creating map
perf evsel: Fix 32 bit values endianity swap for sample_id_all header
perf session: Handle endianity swap on sample_id_all header data
perf symbols: Handle different endians properly during symbol load
perf evlist: Pass third argument to ioctl explicitly
perf tools: Update ioctl documentation for PERF_IOC_FLAG_GROUP
perf tools: Make --version show kernel version instead of pull req tag
perf tools: Check if callchain is corrupted
perf callchain: Make callchain cursors TLS
...
Pull drm intel and exynos fixes from Dave Airlie:
"A bunch of fixes for Intel and exynos, nothing too major, a new intel
PCI ID, and a fix for CRT detection."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: pch_irq_handler -> {ibx, cpt}_irq_handler
char/agp: add another Ironlake host bridge
drm/i915: fix up ivb plane 3 pageflips
drm/exynos: fixed blending for hdmi graphic layer
drm/exynos: Remove dummy encoder get_crtc operation implementation
drm/exynos: Keep a reference to frame buffer GEM objects
drm/exynos: Don't cast GEM object to Exynos GEM object when not needed
drm/exynos: DRIVER_BUS_PLATFORM is not a driver feature
drm/exynos: fixed size type.
drm/exynos: Use DRM_FORMAT_{NV12, YUV420} instead of DRM_FORMAT_{NV12M, YUV420M}
drm/i915: hold forcewake around ring hw init
drm/i915: Mark the ringbuffers as being in the GTT domain
drm/i915/crt: Do not rely upon the HPD presence pin
drm/i915: Reset last_retired_head when resetting ring
Pull leap second timer fix from Thomas Gleixner.
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecond
It was reported that compiling for 32-bit caused a bunch of
section mismatch warnings:
VDSOSYM arch/x86/vdso/vdso32-syms.lds
LD arch/x86/vdso/built-in.o
LD arch/x86/built-in.o
WARNING: arch/x86/built-in.o(.data+0x5af0): Section mismatch in
reference from the variable test_nmi_ipi_callback_na.10451 to
the function .init.text:test_nmi_ipi_callback() [...]
WARNING: arch/x86/built-in.o(.data+0x5b04): Section mismatch in
reference from the variable nmi_unk_cb_na.10399 to the function
.init.text:nmi_unk_cb() The variable nmi_unk_cb_na.10399
references the function __init nmi_unk_cb() [...]
Both of these are attributed to the internal representation of
the nmiaction struct created during register_nmi_handler. The
reason for this is that those structs are not defined in the
init section whereas the rest of the code in nmi_selftest.c is.
To resolve this, I created a new #define,
register_nmi_handler_initonly, that tags the struct as
__initdata to resolve the mismatch. This #define should only be
used in rare situations where the register/unregister is called
during init of the kernel.
Big thanks to Jan Beulich for decoding this for me as I didn't
have a clue what was going on.
Reported-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl>
Tested-by: Witold Baryluk <baryluk@smp.if.uj.edu.pl>
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1338991542-23000-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This fixes a problem which can causes kernel oopses while loading
a kernel module.
According to the PowerPC EABI specification, GPR r11 is assigned
the dedicated function to point to the previous stack frame.
In the powerpc-specific kernel module loader, do_plt_call()
(in arch/powerpc/kernel/module_32.c), GPR r11 is also used
to generate trampoline code.
This combination crashes the kernel, in the case where the compiler
chooses to use a helper function for saving GPRs on entry, and the
module loader has placed the .init.text section far away from the
.text section, meaning that it has to generate a trampoline for
functions in the .init.text section to call the GPR save helper.
Because the trampoline trashes r11, references to the stack frame
using r11 can cause an oops.
The fix just uses GPR r12 instead of GPR r11 for generating the
trampoline code. According to the statements from Freescale, this is
safe from an EABI perspective.
I've tested the fix for kernel 2.6.33 on MPC8541.
Cc: stable@vger.kernel.org
Signed-off-by: Steffen Rumler <steffen.rumler.ext@nsn.com>
[paulus@samba.org: reworded the description]
Signed-off-by: Paul Mackerras <paulus@samba.org>
The SGI Altix UV2 BAU (Broadcast Assist Unit) as used for
tlb-shootdown (selective broadcast mode) always uses UV2
broadcast descriptor format. There is no need to clear the
'legacy' (UV1) mode, because the hardware always uses UV2 mode
for selective broadcast.
But the BIOS uses general broadcast and legacy mode, and the
hardware pays attention to the legacy mode bit for general
broadcast. So the kernel must not clear that mode bit.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/E1SccoO-0002Lh-Cb@eag09.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Robin found this regression:
| I just tried to boot an 8TB system. It fails very early in boot with:
| Kernel panic - not syncing: Cannot find space for the kernel page tables
git bisect commit 722bc6b167.
A git revert of that commit does boot past that point on the 8TB
configuration.
That commit will add up extra pages for all memory range even
above 4g.
Try to limit that extra page count adding to first entry only.
Bisected-by: Robin Holt <holt@sgi.com>
Tested-by: Robin Holt <holt@sgi.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/CAE9FiQUj3wyzQxtq9yzBNc9u220p8JZ1FYHG7t%3DMOzJ%3D9BZMYA@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
* 'exynos-drm-fixes' of git://git.infradead.org/users/kmpark/linux-samsung:
drm/exynos: fixed blending for hdmi graphic layer
drm/exynos: Remove dummy encoder get_crtc operation implementation
drm/exynos: Keep a reference to frame buffer GEM objects
drm/exynos: Don't cast GEM object to Exynos GEM object when not needed
drm/exynos: DRIVER_BUS_PLATFORM is not a driver feature
drm/exynos: fixed size type.
drm/exynos: Use DRM_FORMAT_{NV12, YUV420} instead of DRM_FORMAT_{NV12M, YUV420M}