Instrument the scheduler activity (sched_switch, migration, wakeups,
wait for a task, signal delivery) and process/thread
creation/destruction (fork, exit, kthread stop). Actually, kthread
creation is not instrumented in this patch because it is architecture
dependent. It allows to connect tracers such as ftrace which detects
scheduling latencies, good/bad scheduler decisions. Tools like LTTng can
export this scheduler information along with instrumentation of the rest
of the kernel activity to perform post-mortem analysis on the scheduler
activity.
About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added. See the "Tracepoints" patch header for
performance result detail.
Changelog :
- Change instrumentation location and parameter to match ftrace
instrumentation, previously done with kernel markers.
[ mingo@elte.hu: conflict resolutions ]
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.
Taken from the documentation patch :
"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section). When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."
Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".
We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.
Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
tracepoint table. This will make sure not type mismatch happens due to
connexion of a probe with the wrong type to a tracepoint declared with
the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.
Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.
Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).
About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.
Quoting Hideo Aoki about Markers :
I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.
While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.
I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.
The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8
Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.
Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.
* 2 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------
* 4 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------
* 8 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Merges oprofile, timers/hpet, x86/traps, x86/time, and x86/core misc items.
* 'x86-core-v4-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (132 commits)
x86: change early_ioremap to use slots instead of nesting
x86: adjust dependencies for CONFIG_X86_CMOV
dumpstack: x86: various small unification steps, fix
x86: remove additional_cpus
x86: remove additional_cpus configurability
x86: improve UP kernel when CPU-hotplug and SMP is enabled
dumpstack: x86: various small unification steps
dumpstack: i386: make kstack= an early boot-param and add oops=panic
dumpstack: x86: use log_lvl and unify trace formatting
dumptrace: x86: consistently include loglevel, print stack switch
dumpstack: x86: add "end" parameter to valid_stack_ptr and print_context_stack
dumpstack: x86: make printk_address equal
dumpstack: x86: move die_nmi to dumpstack_32.c
traps: x86: finalize unification of traps.c
traps: x86: make traps_32.c and traps_64.c equal
traps: x86: various noop-changes preparing for unification of traps_xx.c
traps: x86_64: use task_pid_nr(tsk) instead of tsk->pid in do_general_protection
traps: i386: expand clear_mem_error and remove from mach_traps.h
traps: x86_64: make io_check_error equal to the one on i386
traps: i386: use preempt_conditional_sti/cli in do_int3
...
Original idea for this from a patch by Rodolfo Giometti which merges various
bits of PPS support
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Drop the kernel lock further and also correct cases where we set rc to an
error code, and then return 0
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
get_current_tty now does internal locking and returns a referenced object,
thus our use of tty_mutex here can go away.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drivers/char/tty_io.c:1413:17: warning: symbol 'buf' shadows an earlier one
drivers/char/tty_io.c:1379:20: originally declared here
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The multiple drivers share the minor space occupied by a particular major
number, the actual index within the device name's space is indicated by
the tty_driver->name_base + uart_port->line
Another usable formula is (uart_driver->minor - MINOR_BASE) + port->line
Use those to print the device names properly in such situations in
serial_core.c and 8250.c
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Closes bug #11408 by checking the card index range for command 0
Fixes the ioctl to return ENOTTY which is correct for unknown ioctls
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Many tty drivers contain 'can't happen' checks against NULL pointers passed
in by the tty layer. These have never been possible to occur. Even more
importantly if they ever do occur we want to know as it would be a serious
bug.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen's fixes reminded me that gigaset is still rather broken so fix it up
a bit
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Today's linux-next build (x86_64 allmodconfig) failed like this:
/drivers/char/tty_ioctl.c: In function 'change_termios':
drivers/isdn/capi/capi.c🔢 error: implicit declaration of function 'n_tty_ioctl'
drivers/isdn/gigaset/ser-gigaset.c: In function 'gigaset_tty_ioctl':
drivers/isdn/gigaset/ser-gigaset.c:648: error: implicit declaration of function 'n_tty_ioctl'
Introduced by commit 686b5e4aea05a80e370dc931b7f4a8d03c80da54
("tty-move-canon-specials"). I added the following patch (which may not
be correct).
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the set up on ldisc change into the ldisc
Move the INQ/OUTQ cases into the driver not in shared ioctl code where it
gives bogus answers for other ldisc values
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Copy the simplification from the pty unix98 special case to the generic one.
This allows us to kill off driver->termios_locked entirely which is nice. We
have to whack bits of the cris driver as it meddles in places it shouldn't
providing its own arrays that were never used anyway.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need both termios and termios_locked so allocate them as one
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The updating and moving around of the pty code added a bug where both the
helper and caller free the main tty struct (the pty driver must free the
o_tty pair itself however).
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We've done the heavy lifting now its time to mop up a bit
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When creating a new pty, save the pty's inode in the tty->driver_data.
Use this inode in pty_kill() to identify the devpts instance. Since
we now have the inode for the pty, we can skip get_node() lookup and
remove the unused get_node().
TODO:
- check if the mutex_lock is needed in pty_kill().
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
devpts_pty_new() is called when setting up a new pty and would not
will not have an existing dentry or inode for the pty. So don't bother
looking for an existing dentry - just create a new one.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As pointed out by H. Peter Anvin, since the inode for the pty is known,
we don't need to look it up.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pass-in 'inode' or 'tty' parameter to devpts interfaces. With multiple
devpts instances, these parameters will be used in subsequent patches
to identify the instance of devpts mounted. The parameters also help
simplify devpts implementation.
Changelog[v3]:
- minor changes due to merge with ttydev updates
- rename parameters to emphasize they are ptmx or pts inodes
- pass-in tty_struct * to devpts_pty_kill() (this will help
cleanup the get_node() call in a subsequent patch)
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move tty_driver_lookup_tty() and tty_reopen() from tty_init_dev()
into tty_open() (one of the two callers of tty_init_dev()). These
calls are not really required in ptmx_open(), the other caller,
since ptmx_open() would be setting up a new tty.
Changelog[v2]:
- remove the lookup and reopen calls from ptmx_open
- merge with recent changes to ttydev tree
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The majority of the remaining init_dev code is pty special cases. We
refactor this code into the driver->install method.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Original suggestion and proposal from Sukadev Bhattiprolu.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have the lookup operation abstracted which is nice for pty cleanup but
we really want to abstract the add/remove entries as well so that we can
pull the pty code out of the tty core and create a clear defined interface
for the tty driver table.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix up the naming, style and extract some bits of code into the driver
specific code
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Move the 'find-tty' and 'fast-track-open' parts of init_dev() to
separate functions.
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Carry on pushing code out of tty_io when it belongs to other drivers. I'm
not 100% happy with some of this and it will be worth revisiting some of the
exports later when the restructuring work is done.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Right now there are various drivers that try to use tty->count to know when
they get the final close. Aristeau Rozanski showed while debugging the vt
sysfs race that this isn't entirely safe.
Instead of driver side tricks to work around this introduce a shutdown which
is called when the tty is being destructed. This also means that the shutdown
method is tied into the refcounting.
Use this to rework the console close/sysfs logic.
Remove lots of special case code from the tty core code. The pty code can now
have a shutdown() method that replaces the special case hackery in the tree
free up paths.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For hysterical raisins the vt layer drops and retakes locks in the write
method. This is a left over from the days when user/kernel data was passed
directly to the tty not pre-buffered.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The open path for ptmx slaves is via the ptmx device. Opening them any
other way is not allowed. Vegard Nossum found that previously this was not
the case and mknod foo c 128 42; cat foo would produce nasty diagnostics
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Various people outside the tty layer still stick their noses in behind the
scenes. We need to make sure they also obey the locking and referencing rules.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently it is sometimes locked by the tty mutex and sometimes by the
sighand lock. The latter is in fact correct and now we can hand back referenced
objects we can fix this up without problems around sleeping functions.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We now have the infrastructure to sort this out but rather than teaching
the syscall tty lock rules we move the hard work into a tty helper
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use tty_port_init and krefs in the stallion drivers to protect us from devices
going away underneath us. As with the other drives some rearranging is done to
pass the tty structure down properly on the user side.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rather than blindly keep taking krefs we reorder the code in a few places
to pass the tty down to the right place (which is important as from the user
side it is not the case that tty == port->tty in all situations). For the irq
and related paths use the krefs to stop the tty being freed under us.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use kref in the USB serial drivers so that we don't free tty structures
from under the URB receive handlers as has historically been the case if
you were unlucky. This also gives us a framework for general tty drivers to
use tty_port objects and refcount.
Contains two err->dev_err changes merged together to fix clashes in the
-next tree.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is pure tty code so put it in the tty layer where it can be with the
locking relevant material it uses
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We now return a kref covered tty reference. That ensures the tty structure
doesn't go away when you have a return from get_current_tty. This is not
enough to protect you from most of the resources being freed behind your
back - yet.
[Updated to include fixes for SELinux problems found by Andrew Morton and
an s390 leak found while debugging the former]
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We always use the real tty one for stuff so the pty one should not be
compared. As we propagate window changes to both it doesn't currently
matter but will when we tidy up the pty termios logic a bit more
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This moves us towards sanity and should mean our termios locking is now
complete and comprehensive.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need a way to describe the various additional modes and flow control
features that random weird hardware shows up and software such as wine
wants to emulate as Windows supports them.
TCGETX/TCSETX and the termiox ioctl are a SYS5 extension that we might as
well adopt. This patches adds the structures and the basic ioctl interfaces
when the TCGETX etc defines are added for an architecture. Drivers wishing
to use this stuff need to add new methods.
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>