utask->xol_vaddr is either zero or valid, remove the bogus
IS_ERR_VALUE() check in xol_free_insn_slot().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
handle_swbp() does get_utask() before can_skip_sstep() for no reason,
we do not need ->utask if can_skip_sstep() succeeds.
Move get_utask() to pre_ssout() who actually starts to use it. Move
the initialization of utask->active_uprobe/state as well. This way
the whole initialization is consolidated in pre_ssout().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
pre_ssout() should do xol_free_insn_slot() if arch_uprobe_pre_xol()
fails, otherwise nobody will free the allocated slot.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
pre_ssout()->xol_get_insn_slot() path is confusing and buggy. This patch
cleanups the code, the next one fixes the bug.
Change xol_get_insn_slot() to only allocate the slot and do nothing more,
move the initialization of utask->xol_vaddr/vaddr into pre_ssout().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Rename add_utask() into get_utask() and change it to allocate on
demand to simplify the caller. Like get_xol_area() it will have
more users.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Currently only xol_get_insn_slot() does get_xol_area() + xol_alloc_area(),
but this will have more users and we do not want to copy-and-paste this
code. This patch simply moves xol_alloc_area() into get_xol_area() to
simplify the current and future code.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Move alloc_page() from xol_add_vma() to xol_alloc_area() to cleanup
the code. This separates the memory allocations and consolidates the
-EALREADY cleanups and the error handling.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Change handle_swbp() to set regs->ip = bp_vaddr in advance, this is
what consumer->handler() needs but uprobe_get_swbp_addr() is not
exported.
This also simplifies the code and makes it more consistent across
the supported architectures. handle_swbp() becomes the only caller
of uprobe_get_swbp_addr().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Currrently the are 2 problems with pre-filtering:
1. It is not possible to add/remove a task (mm) after uprobe_register()
2. A forked child inherits all breakpoints and uprobe_consumer can not
control this.
This patch does the first step to improve the filtering. handler_chain()
removes the breakpoints installed by this uprobe from current->mm if all
handlers return UPROBE_HANDLER_REMOVE.
Note that handler_chain() relies on ->register_rwsem to avoid the race
with uprobe_register/unregister which can add/del a consumer, or even
remove and then insert the new uprobe at the same address.
Perhaps we will add uprobe_apply_mm(uprobe, mm, is_register) and teach
copy_mm() to do filter(UPROBE_FILTER_FORK), but I think this change makes
sense anyway.
Note: instead of checking the retcode from uc->handler, we could add
uc->filter(UPROBE_FILTER_BPHIT). But I think this is not optimal to
call 2 hooks in a row. This buys nothing, and if handler/filter do
something nontrivial they will probably do the same work twice.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Finally add uprobe_consumer->filter() and change consumer_filter()
to actually call this method.
Note that ->filter() accepts mm_struct, not task_struct. Because:
1. We do not have for_each_mm_user(mm, task).
2. Even if we implement for_each_mm_user(), ->filter() can
use it itself.
3. It is not clear who will actually need this interface to
do the "nontrivial" filtering.
Another argument is "enum uprobe_filter_ctx", consumer->filter() can
use it to figure out why/where it was called. For example, perhaps
we can add UPROBE_FILTER_PRE_REGISTER used by build_map_info() to
quickly "nack" the unwanted mm's. In this case consumer should know
that it is called under ->i_mmap_mutex.
See the previous discussion at http://marc.info/?t=135214229700002
Perhaps we should pass more arguments, vma/vaddr?
Note: this patch obviously can't help to filter out the child created
by fork(), this will be addressed later.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
filter_chain() was added into install_breakpoint/remove_breakpoint to
simplify the initial changes but this is sub-optimal.
This patch shifts the callsite to the callers, register_for_each_vma()
and uprobe_mmap(). This way:
- It will be easier to add the new arguments. This is the main reason,
we can do more optimizations later.
- register_for_each_vma(is_register => true) can be optimized, we only
need to consult the new consumer. The previous consumers were already
asked when they called uprobe_register().
This patch also moves the MMF_HAS_UPROBES check from remove_breakpoint(),
this allows to avoid the potentionally costly filter_chain(). Note that
register_for_each_vma(is_register => false) doesn't really need to take
->consumer_rwsem, but I don't think it makes sense to optimize this and
introduce filter_chain_lockless().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
uprobe_register() and uprobe_unregister() are the only users of
mutex_lock(uprobes_hash(inode)), and the only reason why we can't
simply remove it is that we need to ensure that delete_uprobe() is
not possible after alloc_uprobe() and before consumer_add().
IOW, we need to ensure that when we take uprobe->register_rwsem
this uprobe is still valid and we didn't race with _unregister()
which called delete_uprobe() in between.
With this patch uprobe_register() simply checks uprobe_is_active()
and retries if it hits this very unlikely race. uprobes_mutex[] is
no longer needed and can be removed.
There is another reason for this change, prepare_uprobe() should be
folded into alloc_uprobe() and we do not want to hold the extra locks
around read_mapping_page/etc.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
The lifetime of uprobe->rb_node and uprobe->inode is not refcounted,
delete_uprobe() is called when we detect that uprobe has no consumers,
and it would be deadly wrong to do this twice.
Change delete_uprobe() to WARN() if it was already called. We use
RB_CLEAR_NODE() to mark uprobe "inactive", then RB_EMPTY_NODE() can
be used to detect this case.
RB_EMPTY_NODE() is not used directly, we add the trivial helper for
the next change.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
uprobe_events counts the number of uprobes in uprobes_tree but
it is used as a boolean. We can use RB_EMPTY_ROOT() instead.
Probably no_uprobe_events() added by this patch can have more
callers, say, mmf_recalc_uprobes().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Anton Arapov <anton@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Now that ->register_rwsem is safe under ->mmap_sem we can kill
->copy_mutex and abuse down_write(&uprobe->consumer_rwsem).
This makes prepare_uprobe() even more ugly, but we should kill
it anyway.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Simply remove UPROBE_RUN_HANDLER and the corresponding code.
It can only help if uprobe has a single consumer, and in fact
it is no longer needed after handler_chain() was changed to use
->register_rwsem, we simply can not race with uprobe_register().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Now that it safe to use ->consumer_rwsem under ->mmap_sem we can
almost finish the implementation of filter_chain(). It still lacks
the actual uc->filter(...) call but othewrwise it is ready, just
it pretends that ->filter() always returns true.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Introduce uprobe->register_rwsem. It is taken for writing around
__uprobe_register/unregister.
Change handler_chain() to use this sem rather than consumer_rwsem.
The main reason for this change is that we have the nasty problem
with mmap_sem/consumer_rwsem dependency. filter_chain() needs to
protect uprobe->consumers like handler_chain(), but they can not
use the same lock. filter_chain() can be called under ->mmap_sem
(currently this is always true), but we want to allow ->handler()
to play with the probed task's memory, and this needs ->mmap_sem.
Alternatively we could use srcu, but synchronize_srcu() is very
slow and ->register_rwsem allows us to do more. In particular, we
can teach handler_chain() to do remove_breakpoint() if this bp is
"nacked" by all consumers, we know that we can't race with the
new consumer which does uprobe_register().
See also the next patches. uprobes_mutex[] is almost ready to die.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To support the filtering uprobe_register() should do
register_for_each_vma(true) every time the new consumer comes,
we need to install the previously nacked breakpoints.
Note:
- uprobes_mutex[] should die, what it actually protects is
alloc_uprobe().
- UPROBE_RUN_HANDLER should die too, obviously it can't work
unless uprobe has a single consumer. The consumer should
serialize with _register/_unregister itself. Or this flag
should live in uprobe_consumer->state.
- Perhaps we can do some optimizations later. For example, if
filter_chain() never returns false uprobe can record this
fact and avoid the unnecessary register_for_each_vma().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
uprobe_unregister() removes the breakpoints only if the last consumer
goes away. To support the filtering it should do this every time, we
want to remove the breakpoints which nobody else want to keep.
Note: given that filter_chain() is not actually implemented, this patch
itself doesn't change the behaviour yet, register_for_each_vma(false)
is a heavy "nop" unless there are no more consumers.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Add the new helper filter_chain(). Currently it is only placeholder,
the comment explains what is should do. We will change it later to
consult every consumer to decide whether we need to install the swbp.
Until then it works as if any consumer returns true, this matches the
current behavior.
Change install_breakpoint() to call filter_chain() instead of checking
uprobe->consumers != NULL. We obviously need this, and this equally
closes the race with _unregister().
Change remove_breakpoint() to call this helper too. Currently this is
pointless because remove_breakpoint() is only called when the last
consumer goes away, but we will change this.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
uprobe_consumer->filter() is pointless in its current form, kill it.
We will add it back, but with the different signature/semantics. Perhaps
we will even re-introduce the callsite in handler_chain(), but not to
just skip uc->handler().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
register/unregister verifies that inode/uc != NULL. For what?
This really looks like "hide the potential problem", the caller
should pass the valid data.
register() also checks uc->next == NULL, probably to prevent the
double-register but the caller can do other stupid/wrong things.
If we do this check, then we should document that uc->next should
be cleared before register() and add BUG_ON().
Also add the small comment about the i_size_read() check.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cosmetic. __set_bit(UPROBE_SKIP_SSTEP) is the part of initialization,
it is not clear why it is set in insert_uprobe().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
On early boot up, when the ftrace ring buffer is initialized, the
static variable current_trace is initialized to &nop_trace.
Before this initialization, current_trace is NULL and will never
become NULL again. It is always reassigned to a ftrace tracer.
Several places check if current_trace is NULL before it uses
it, and this check is frivolous, because at the point in time
when the checks are made the only way current_trace could be
NULL is if ftrace failed its allocations at boot up, and the
paths to these locations would probably not be possible.
By initializing current_trace to &nop_trace where it is declared,
current_trace will never be NULL, and we can remove all these
checks of current_trace being NULL which never needed to be
checked in the first place.
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Ftrace has a snapshot feature available from kernel space and
latency tracers (e.g. irqsoff) are using it. This patch enables
user applictions to take a snapshot via debugfs.
Add "snapshot" debugfs file in "tracing" directory.
snapshot:
This is used to take a snapshot and to read the output of the
snapshot.
# echo 1 > snapshot
This will allocate the spare buffer for snapshot (if it is
not allocated), and take a snapshot.
# cat snapshot
This will show contents of the snapshot.
# echo 0 > snapshot
This will free the snapshot if it is allocated.
Any other positive values will clear the snapshot contents if
the snapshot is allocated, or return EINVAL if it is not allocated.
Link: http://lkml.kernel.org/r/20121226025300.3252.86850.stgit@liselsia
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
[
Fixed irqsoff selftest and also a conflict with a change
that fixes the update_max_tr.
]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently the trace buffer read functions use a static variable
"old_tracer" for detecting if the current tracer changes. This
was suitable for a single trace file ("trace"), but to add a
snapshot feature that will use the same function for its file,
a check against a static variable is not sufficient.
To use the output functions for two different files, instead of
storing the current tracer in a static variable, as the trace
iterator descriptor contains a pointer to the original current
tracer's name, that pointer can now be used to check if the
current tracer has changed between different reads of the trace
file.
Link: http://lkml.kernel.org/r/20121226025252.3252.9276.stgit@liselsia
Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
For systems with an unstable sched_clock, all cpu_clock() does is enable/
disable local irq during the call to sched_clock_cpu(). And for stable
systems they are same.
trace_clock_global() already disables interrupts, so it can call
sched_clock_cpu() directly.
Link: http://lkml.kernel.org/r/1356576585-28782-2-git-send-email-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add a stat about the number of events read from the ring buffer:
# cat /debug/tracing/per_cpu/cpu0/stats
entries: 39869
overrun: 870512
commit overrun: 0
bytes: 1449912
oldest event ts: 6561.368690
now ts: 6565.246426
dropped events: 0
read events: 112 <-- Added
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
While debugging the virtual cputime with the function graph tracer
with a max_depth of 1 (most common use of the max_depth so far),
I found that I was missing kernel execution because of a race condition.
The code for the return side of the function has a slight race:
ftrace_pop_return_trace(&trace, &ret, frame_pointer);
trace.rettime = trace_clock_local();
ftrace_graph_return(&trace);
barrier();
current->curr_ret_stack--;
The ftrace_pop_return_trace() initializes the trace structure for
the callback. The ftrace_graph_return() uses the trace structure
for its own use as that structure is on the stack and is local
to this function. Then the curr_ret_stack is decremented which
is what the trace.depth is set to.
If an interrupt comes in after the ftrace_graph_return() but
before the curr_ret_stack, then the called function will get
a depth of 2. If max_depth is set to 1 this function will be
ignored.
The problem is that the trace has already been called, and the
timestamp for that trace will not reflect the time the function
was about to re-enter userspace. Calls to the interrupt will not
be traced because the max_depth has prevented this.
To solve this issue, the ftrace_graph_return() can safely be
moved after the current->curr_ret_stack has been updated.
This way the timestamp for the return callback will reflect
the actual time.
If an interrupt comes in after the curr_ret_stack update and
ftrace_graph_return(), it will be traced. It may look a little
confusing to see it within the other function, but at least
it will not be lost.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
__this_cpu_inc_return() or __this_cpu_dec generates a single instruction,
which is faster than __get_cpu_var operation.
Link: http://lkml.kernel.org/r/50A9C1BD.1060308@gmail.com
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
We checked for uprobe==NULL earlier, no need to redo that.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1356030701-16284-22-git-send-email-sasha.levin@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The last remaining user was oprofile and its use has been
removed a while ago in commit bc078e4eab
("oprofile: convert oprofile from timer_hook to hrtimer").
There doesn't seem to be any upstream user of this hook
for about two years now. And I'm not even aware of any out of
tree user.
Let's remove it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Alessio Igor Bogani <abogani@kernel.org>
Cc: Avi Kivity <avi@redhat.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Gilad Ben Yossef <gilad@benyossef.com>
Cc: Hakan Akkan <hakanakkan@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1356191991-2251-1-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Dan's smatch found a compare bug with the result of the
trace_test_and_set_recursion() and comparing to less than
zero. If the function fails, it returns -1, but was saved in
an unsigned int, which will never be less than zero and will
ignore the result of the test if a recursion did happen.
Luckily this is the last of the recursion tests, as the
infrastructure of ftrace would catch recursions before it
got here, except for some few exceptions.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Pull tracing updates from Steve Rostedt.
This commit:
tracing: Remove the extra 4 bytes of padding in events
changes the ABI. All involved parties seem to agree that it's safe to
do now, but the devil is in the details ...
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ring_buffer.c use to require declarations from trace.h, but
these have moved to the generic header files. There's nothing
in trace.h that ring_buffer.c requires.
There's some headers that trace.h included that ring_buffer.c
needs, but it's best that it includes them directly, and not
include trace.h.
Also, some things may use ring_buffer.c without having tracing
configured. This removes the dependency that may come in the
future.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Using context bit recursion checking, we can help increase the
performance of the ring buffer.
Before this patch:
# echo function > /debug/tracing/current_tracer
# for i in `seq 10`; do ./hackbench 50; done
Time: 10.285
Time: 10.407
Time: 10.243
Time: 10.372
Time: 10.380
Time: 10.198
Time: 10.272
Time: 10.354
Time: 10.248
Time: 10.253
(average: 10.3012)
Now we have:
# echo function > /debug/tracing/current_tracer
# for i in `seq 10`; do ./hackbench 50; done
Time: 9.712
Time: 9.824
Time: 9.861
Time: 9.827
Time: 9.962
Time: 9.905
Time: 9.886
Time: 10.088
Time: 9.861
Time: 9.834
(average: 9.876)
a 4% savings!
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The function tracer had two different versions of function tracing.
The disabling of irqs version and the preempt disable version.
As function tracing in very intrusive and can cause nasty recursion
issues, it has its own recursion protection. But the old method to
do this was a flat layer. If it detected that a recursion was happening
then it would just return without recording.
This made the preempt version (much faster than the irq disabling one)
not very useful, because if an interrupt were to occur after the
recursion flag was set, the interrupt would not be traced at all,
because every function that was traced would think it recursed on
itself (due to the context it preempted setting the recursive flag).
Now that we have a recursion flag for every context level, we
no longer need to worry about that. We can disable preemption,
set the current context recursion check bit, and go on. If an
interrupt were to come along, it would check its own context bit
and happily continue to trace.
As the preempt version is faster than the irq disable version,
there's no more reason to keep the preempt version around.
And the irq disable version still had an issue with missing
out on tracing NMI code.
Remove the irq disable function tracer version and have the
preempt disable version be the default (and only version).
Before this patch we had from running:
# echo function > /debug/tracing/current_tracer
# for i in `seq 10`; do ./hackbench 50; done
Time: 12.028
Time: 11.945
Time: 11.925
Time: 11.964
Time: 12.002
Time: 11.910
Time: 11.944
Time: 11.929
Time: 11.941
Time: 11.924
(average: 11.9512)
Now we have:
# echo function > /debug/tracing/current_tracer
# for i in `seq 10`; do ./hackbench 50; done
Time: 10.285
Time: 10.407
Time: 10.243
Time: 10.372
Time: 10.380
Time: 10.198
Time: 10.272
Time: 10.354
Time: 10.248
Time: 10.253
(average: 10.3012)
a 13.8% savings!
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
When function tracing occurs, the following steps are made:
If arch does not support a ftrace feature:
call internal function (uses INTERNAL bits) which calls...
If callback is registered to the "global" list, the list
function is called and recursion checks the GLOBAL bits.
then this function calls...
The function callback, which can use the FTRACE bits to
check for recursion.
Now if the arch does not suppport a feature, and it calls
the global list function which calls the ftrace callback
all three of these steps will do a recursion protection.
There's no reason to do one if the previous caller already
did. The recursion that we are protecting against will
go through the same steps again.
To prevent the multiple recursion checks, if a recursion
bit is set that is higher than the MAX bit of the current
check, then we know that the check was made by the previous
caller, and we can skip the current check.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Currently for recursion checking in the function tracer, ftrace
tests a task_struct bit to determine if the function tracer had
recursed or not. If it has, then it will will return without going
further.
But this leads to races. If an interrupt came in after the bit
was set, the functions being traced would see that bit set and
think that the function tracer recursed on itself, and would return.
Instead add a bit for each context (normal, softirq, irq and nmi).
A check of which context the task is in is made before testing the
associated bit. Now if an interrupt preempts the function tracer
after the previous context has been set, the interrupt functions
can still be traced.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There is lots of places that perform:
op = rcu_dereference_raw(ftrace_control_list);
while (op != &ftrace_list_end) {
Add a helper macro to do this, and also optimize for a single
entity. That is, gcc will optimize a loop for either no iterations
or more than one iteration. But usually only a single callback
is registered to the function tracer, thus the optimized case
should be a single pass. to do this we now do:
op = rcu_dereference_raw(list);
do {
[...]
} while (likely(op = rcu_dereference_raw((op)->next)) &&
unlikely((op) != &ftrace_list_end));
An op is always registered (ftrace_list_end when no callbacks is
registered), thus when a single callback is registered, the link
list looks like:
top => callback => ftrace_list_end => NULL.
The likely(op = op->next) still must be performed due to the race
of removing the callback, where the first op assignment could
equal ftrace_list_end. In that case, the op->next would be NULL.
But this is unlikely (only happens in a race condition when
removing the callback).
But it is very likely that the next op would be ftrace_list_end,
unless more than one callback has been registered. This tells
gcc what the most common case is and makes the fast path with
the least amount of branches.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The function tracing recursion self test should not crash
the machine if the resursion test fails. If it detects that
the function tracing is recursing when it should not be, then
bail, don't go into an infinite recursive loop.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If one of the function tracers set by the global ops is not recursion
safe, it can still be called directly without the added recursion
supplied by the ftrace infrastructure.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The test that checks function recursion does things differently
if the arch does not support all ftrace features. But that really
doesn't make a difference with how the test runs, and either way
the count variable should be 2 at the end.
Currently the test wrongly fails for archs that don't support all
the ftrace features.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There's a race condition between the setting of a new tracer and
the update of the max trace buffers (the swap). When a new tracer
is added, it sets current_trace to nop_trace before disabling
the old tracer. At this moment, if the old tracer uses update_max_tr(),
the update may trigger the warning against !current_trace->use_max-tr,
as nop_trace doesn't have that set.
As update_max_tr() requires that interrupts be disabled, we can
add a check to see if current_trace == nop_trace and bail if it
does. Then when disabling the current_trace, set it to nop_trace
and run synchronize_sched(). This will make sure all calls to
update_max_tr() have completed (it was called with interrupts disabled).
As a clean up, this commit also removes shrinking and recreating
the max_tr buffer if the old and new tracers both have use_max_tr set.
The old way use to always shrink the buffer, and then expand it
for the next tracer. This is a waste of time.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>