With the planned proper hierarchy support, a bio will climb up the
tree before actually being dispatched. This makes sure bio is also
subjected to parent's throttling limits, if any.
It might happen that parent is idle and when bio is transferred to
parent, a new slice starts fresh. But that is incorrect as parents
wait time should have started when bio was queued in child group and
causes IOs to be throttled more than configured as they climb the
hierarchy.
Given the fact that we have not written hierarchical algorithm in a
way where child's and parents time slices are synchronized, we
transfer the child's start time to parent if parent was idling. If
parent was busy doing dispatch of other bios all this while, this is
not an issue.
Child's slice start time is passed to parent. Parent looks at its
last expired slice start time. If child's start time is after parents
old start time, that means parent had been idle and after parent
went idle, child had an IO queued. So use child's start time as
parent start time.
If parent's start time is after child's start time, that means,
when IO got queued in child group, parent was not idle. But later
it dispatched some IO, its slice got trimmed and then it went idle.
After a while child's request got shifted in parent group. In this
case use parent's old start time as new start time as that's the
duration of slice we did not use.
This logic is far from perfect as if there are multiple childs
then first child transferring the bio decides the start time while
a bio might have queued up even earlier in other child, which is
yet to be transferred up to parent. In that case we will lose
time and bandwidth in parent. This patch is just an approximation
to make situation somewhat better.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
With flat hierarchy, there's only single level of dispatching
happening and fairness beyond that point is the responsibility of the
rest of the block layer and driver, which usually works out okay;
however, with the planned hierarchy support,
service_queue->bio_lists[] can be filled up by bios from a single
source. While the limits would still be honored, it'd be very easy to
starve IOs from siblings or children.
To avoid such starvation, this patch implements throtl_qnode and
converts service_queue->bio_lists[] to lists of per-source qnodes
which in turn contains the bio's. For example, when a bio is
dispatched from a child group, the bio doesn't get queued on
->bio_lists[] directly but it first gets queued on the group's qnode
which in turn gets queued on service_queue->queued[]. When
dispatching for the upper level, the ->queued[] list is consumed in
round-robing order so that the dispatch windows is consumed fairly by
all IO sources.
There are two ways a bio can come to a throtl_grp - directly queued to
the group or dispatched from a child. For the former
throtl_grp->qnode_on_self[rw] is used. For the latter, the child's
->qnode_on_parent[rw].
Note that this means that the child which is contributing a bio to its
parent should stay pinned until all its bios are dispatched to its
grand-parent. This patch moves blkg refcnting from bio add/remove
spots to qnode activation/deactivation so that the blkg containing an
active qnode is always pinned. As child pins the parent, this is
sufficient for keeping the relevant sub-tree pinned while bios are in
flight.
The starvation issue was spotted by Vivek Goyal.
v2: The original patch used the same throtl_grp->qnode_on_self/parent
for reads and writes causing RWs to be queued incorrectly if there
already are outstanding IOs in the other direction. They should
be throtl_grp->qnode_on_self/parent[2] so that READs and WRITEs
can use different qnodes. Spotted by Vivek Goyal.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
throtl_pending_timer_fn() currently assumes that the parent_sq is the
top level one and the bio's dispatched are ready to be issued;
however, this assumption will be wrong with proper hierarchy support.
This patch makes the following changes to make
throtl_pending_timer_fn() ready for hiearchy.
* If the parent_sq isn't the top-level one, update the parent
throtl_grp's dispatch time and schedule the next dispatch as
necessary. If the parent's dispatch time is now, repeat the
function for the parent throtl_grp.
* If the parent_sq is the top-level one, kick issue work_item as
before.
* The debug message printed by throtl_log() now prints out the
service_queue's nr_queued[] instead of the total nr_queued as the
latter becomes uninteresting and misleading with hierarchical
dispatch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
tg_dispatch_one_bio() currently assumes that the parent_sq is the top
level one and the bio being dispatched is ready to be issued; however,
this assumption will be wrong with proper hierarchy support. This
patch makes the following changes to make tg_dispatch_on_bio() ready
for hiearchy.
* throtl_data->nr_queued[] is incremented in blk_throtl_bio() instead
of throtl_add_bio_tg() so that throtl_add_bio_tg() can be used to
transfer a bio from a child tg to its parent.
* tg_dispatch_one_bio() is updated to distinguish whether its parent
is another throtl_grp or the throtl_data. If former, the bio is
transferred to the parent throtl_grp using throtl_add_bio_tg(). If
latter, the bio is ready to be issued and put on the top-level
service_queue's bio_lists[] and throtl_data->nr_queued is
decremented.
As all throtl_grps currently have the top level service_queue as their
->parent_sq, this patch in itself doesn't make any behavior
difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Currently, blk_throtl_bio() issues the passed in bio directly if it's
within limits of its associated tg (throtl_grp). This behavior
becomes incorrect with hierarchy support as the bio should be
accounted to and throttled by the ancestor throtl_grps too.
This patch makes the direct issue path of blk_throtl_bio() to loop
until it reaches the top-level service_queue or gets throttled. If
the former, the bio can be issued directly; otherwise, it gets queued
at the first layer it was above limits.
As tg->parent_sq is always the top-level service queue currently, this
patch in itself doesn't make any behavior differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
The current blk_throtl_drain() assumes that all active throtl_grps are
queued on throtl_data->service_queue, which won't be true once
hierarchy support is implemented.
This patch makes blk_throtl_drain() perform post-order walk of the
blkg hierarchy draining each associated throtl_grp, which guarantees
that all bios will eventually be pushed to the top-level service_queue
in throtl_data.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Currently, blk_throtl_dispatch_work_fn() is responsible for both
dispatching bio's from throtl_grp's according to their limits and then
issuing the dispatched bios.
This patch moves the dispatch part to throtl_pending_timer_fn() so
that the work item is kicked iff there are bio's to issue. This is to
avoid work item execution at each step when hierarchy support is
enabled. bio's will be dispatched towards the top-level service_queue
from the timers at each layer and the work item will only be used to
issue the bio's which reached the top-level service_queue.
While fetching bio's to issue from bio_lists[],
blk_throtl_dispatch_work_fn() fetches all READs before WRITEs. While
the original code also dispatched READs first, if multiple throtl_grps
are dispatched on the same run, WRITEs from throtl_grp which is
dispatched first would precede READs from throtl_grps which are
dispatched later. While this is a behavior change, given that the
previous code already prioritized READs and block layer generally
prioritizes and segregates READs from WRITEs, this isn't likely to
make any noticeable differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
throtl_select_dispatch() only dispatches throtl_quantum bios on each
invocation. blk_throtl_dispatch_work_fn() in turn depends on
throtl_schedule_next_dispatch() scheduling the next dispatch window
immediately so that undue delays aren't incurred. This effectively
chains multiple dispatch work item executions back-to-back when there
are more than throtl_quantum bios to dispatch on a given tick.
There is no reason to finish the current work item just to repeat it
immediately. This patch makes throtl_schedule_next_dispatch() return
%false without doing anything if the current dispatch window is still
open and updates blk_throtl_dispatch_work_fn() repeat dispatching
after cpu_relax() on %false return.
This change will help implementing hierarchy support as dispatching
will be done from pending_timer and immediate reschedule of timer
function isn't supported and doesn't make much sense.
While this patch changes how dispatch behaves when there are more than
throtl_quantum bios to dispatch on a single tick, the behavior change
is immaterial.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Currently, throtl_data->dispatch_work is a delayed_work item which
handles both delayed dispatch and issuing bios. The two tasks will be
separated to support proper hierarchy. To prepare for that, this
patch separates out the timer into throtl_service_queue->pending_timer
from throtl_data->dispatch_work and make the latter a work_struct.
* As the timer is now per-service_queue, it's initialized and
del_sync'd as its corresponding service_queue is created and
destroyed. The timer, when triggered, simply schedules
throtl_data->dispathc_work for execution.
* throtl_schedule_delayed_work() is renamed to
throtl_schedule_pending_timer() and takes @sq and @expires now.
* Simiarly, throtl_schedule_next_dispatch() now takes @sq, which
should be the parent_sq of the service_queue which just got a new
bio or updated. As the parent_sq is always the top-level
service_queue now, this doesn't change anything at this point.
This patch doesn't introduce any behavior differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
With proper hierarchy support, a bio can be dispatched multiple times
until it reaches the top-level service_queue and we don't want to
update dispatch stats at each step. They are local stats and will be
kept local. If recursive stats are necessary, they should be
implemented separately and definitely not by updating counters
recursively on each dispatch.
This patch moves REQ_THROTTLED setting to throtl_charge_bio() and gate
stats update with it so that dispatch stats are updated only on the
first time the bio is charged to a throtl_grp, which will always be
the throtl_grp the bio was originally queued to.
This means that REQ_THROTTLED would be set even for bios which don't
get throttled. As we don't want bios to leave blk-throtl with the
flag set, move REQ_THROTLLED clearing to the end of blk_throtl_bio()
and clear if the bio is being issued directly.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Now that both throtl_data and throtl_grp embed throtl_service_queue,
we can unify throtl_log() and throtl_log_tg().
* sq_to_tg() is added. This returns the throtl_grp a service_queue is
embedded in. If the service_queue is the top-level one embedded in
throtl_data, NULL is returned.
* sq_to_td() is added. A service_queue is always associated with a
throtl_data. This function finds the associated td and returns it.
* throtl_log() is updated to take throtl_service_queue instead of
throtl_data. If the service_queue is one embedded in throtl_grp, it
prints the same header as throtl_log_tg() did. If it's one embedded
in throtl_data, it behaves the same as before. This renders
throtl_log_tg() unnecessary. Removed.
This change is necessary for hierarchy support as we're gonna be using
the same code paths to dispatch bios to intermediate service_queues
embedded in throtl_grps and the top-level service_queue embedded in
throtl_data.
This patch doesn't make any behavior changes.
v2: throtl_log() didn't print a space after blkg path. Updated so
that it prints a space after throtl_grp path. Spotted by Vivek.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
To prepare for hierarchy support, this patch adds
throtl_service_queue->service_sq which points to the arent
service_queue. Currently, for all service_queues embedded in
throtl_grps, it points to throtl_data->service_queue. As
throtl_data->service_queue doesn't have a parent its parent_sq is set
to NULL.
There are a number of functions which take both throtl_grp *tg and
throtl_service_queue *parent_sq. With this patch, the parent
service_queue can be determined from @tg and the @parent_sq arguments
are removed.
This patch doesn't make any behavior differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
When blk_throtl_bio() wants to queue a bio to a tg (throtl_grp), it
avoids invoking tg_update_disptime() and
throtl_schedule_next_dispatch() if the tg already has bios queued in
that direction. As a new bio is appeneded after the existing ones, it
can't change the tg's next dispatch time or the parent's dispatch
schedule.
This optimization is currently open coded in blk_throtl_bio().
Whether the target biolist was occupied was recorded in a local
variable and later used to skip disptime update. This patch moves
generalizes it so that throtl_add_bio_tg() sets a new flag
THROTL_TG_WAS_EMPTY if the biolist was empty before the new bio was
added. tg_update_disptime() clears the flag automatically.
blk_throtl_bio() is updated to simply test the flag before updating
disptime.
This patch doesn't make any functional differences now but will enable
using the same optimization for recursive dispatch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
throtl_service_queues will eventually form a tree which is anchored at
throtl_data->service_queue and queue bios will climb the tree to the
top service_queue to be executed.
This patch makes the dispatch paths in blk_throtl_dispatch_work_fn()
and blk_throtl_drain() to dispatch bios to
throtl_data->service_queue.bio_lists[] instead of the on-stack
bio_lists. This will keep the final dispatch to the top level
service_queue share the same mechanism as dispatches through the rest
of the hierarchy.
As bio's should be issued in a sleepable context,
blk_throtl_dispatch_work_fn() transfers all dispatched bio's from the
service_queue bio_lists[] into an onstack one before dropping
queue_lock and issuing the bio's.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
throtl_service_queues will eventually form a tree which is anchored at
throtl_data->service_queue and queue bios will climb the tree to the
top service_queue to be executed.
This patch moves bio_lists[] and nr_queued[] from throtl_grp to its
service_queue to prepare for that. As currently only the
throtl_data->service_queue is in use, this patch just ends up moving
throtl_grp->bio_lists[] and ->nr_queued[] to
throtl_grp->service_queue.bio_lists[] and ->nr_queued[] without making
any functional differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Currently, there's single service_queue per queue -
throtl_data->service_queue. All active throtl_grp's are queued on the
queue and dispatched according to their limits. To support hierarchy,
this will be expanded such that active throtl_grp's form a tree
anchored at throtl_data->service_queue and chained through each
intermediate throtl_grp's service_queue.
This patch adds throtl_grp->service_queue to prepare for hierarchy
support. The initialization function - throtl_service_queue_init() -
is added and replaces the macro initializer. The newly added
tg->service_queue isn't used yet. Following patches will do.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
throtl_service_queue will be the building block of hierarchy support
and will form a tree. This patch updates its usages as arguments to
reduce confusion.
* When a service queue is used as the parent role - the host of the
rbtree - use @parent_sq instead of @sq.
* For functions taking both @tg and @parent_sq, reorder them so that
the order is (@tg, @parent_sq) not the other way around. This makes
the code follow the usual convention of specifying the primary
target of the operation as the first argument.
This patch doesn't make any functional differences.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
throtl_service_queue will be used as the basic block to implement
hierarchy support. Pass around throtl_service_queue *sq instead of
throtl_data *td in the following functions which will be used across
multiple levels of hierarchy.
* [__]throtl_enqueue/dequeue_tg()
* throtl_add_bio_tg()
* tg_update_disptime()
* throtl_select_dispatch()
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Add throtl_grp->td so that the td (throtl_data) a given tg
(throtl_grp) belongs to can be determined, and remove @td argument
from functions which take both @td and @tg as the former now can be
determined from the latter.
This generally simplifies the code and removes a number of cases where
@td is passed as an argument without being actually used. This will
also help hierarchy support implementation.
While at it, in multi-line conditions, move the logical operators
leading broken lines to the end of the previous line.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
blk-throttle is still using function-defining macros to define flag
handling functions, which went out style at least a decade ago.
Just define the flag as bitmask and use direct bit operations.
This patch doesn't make any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
throtl_rb_root will be expanded to cover more roles for hierarchy
support. Rename it to throtl_service_queue and make its fields more
descriptive.
* rb -> pending_tree
* left -> first_pending
* count -> nr_pending
* min_disptime -> first_pending_disptime
This patch is purely cosmetic.
Signed-off-by: Tejun Heo <tj@kernel.org
Acked-by: Vivek Goyal <vgoyal@redhat.com>
throtl_nr_queued() is used in several places to avoid performing
certain operations when the throtl_data is empty. This usually is
useless as those paths usually aren't traveled if there's no bio
queued.
* throtl_schedule_delayed_work() skips scheduling dispatch work item
if @td doesn't have any bios queued; however, the only case it can
be called when @td is empty is from tg_set_conf() which isn't
something we should be optimizing for.
* throtl_schedule_next_dispatch() takes a quick exit if @td is empty;
however, right after that it triggers BUG if the service tree is
empty. The two conditions are equivalent and it can just test
@st->count for the quick exit.
* blk_throtl_dispatch_work_fn() skips dispatch if @td is empty. This
work function isn't usually invoked when @td is empty. The only
possibility is from tg_set_conf() and when it happens the normal
dispatching path can handle empty @td fine. No need to add special
skip path.
This patch removes the above three unnecessary optimizations, which
leave throtl_log() call in blk_throtl_dispatch_work_fn() the only user
of throtl_nr_queued(). Remove throtl_nr_queued() and open code it in
throtl_log(). I don't think we need td->nr_queued[] at all. Maybe we
can remove it later.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Move throtl_schedule_delayed_work() above its first user so that the
forward declaration can be removed.
This patch is pure relocaiton.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
blk-throttle is about to go through major restructuring to support
hierarchy. Do cosmetic updates in preparation.
* s/throtl_data->throtl_work/throtl_data->dispatch_work/
* s/blk_throtl_work()/blk_throtl_dispatch_work_fn()/
* Collapse throtl_dispatch() into blk_throtl_dispatch_work_fn()
This patch is purely cosmetic.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
When bps or iops configuration changes, blk-throttle records the new
configuration and sets a flag indicating that the config has changed.
The flag is checked in the bio dispatch path and applied. This
deferred config application was necessary due to limitations in blkcg
framework, which haven't existed for quite a while now.
This patch removes the deferred config application mechanism and
applies new configurations directly from tg_set_conf(), which is
simpler.
v2: Dropped unnecessary throtl_schedule_delayed_work() call from
tg_set_conf() as suggested by Vivek Goyal.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
throtl_select_dispatch() calls throtl_enqueue_tg() right after
tg_update_disptime(), which always calls the function anyway. The
call is, while harmless, unnecessary. Remove it.
This patch doesn't introduce any behavior difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Currently, when the last reference of a blkcg_gq is put, all then
release operations sans the actual freeing happen directly in
blkg_put(). As blkg_put() may be called under queue_lock, all
pd_exit_fn()s may be too. This makes it impossible for pd_exit_fn()s
to use del_timer_sync() on timers which grab the queue_lock which is
an irq-safe lock due to the deadlock possibility described in the
comment on top of del_timer_sync().
This can be easily avoided by perfoming the release operations in the
RCU callback instead of directly from blkg_put(). This patch moves
the blkcg_gq release operations to the RCU callback.
As this leaves __blkg_release() with only call_rcu() invocation,
blkg_rcu_free() is renamed to __blkg_release_rcu(), exported and
call_rcu() invocation is now done directly from blkg_put() instead of
going through __blkg_release() which is removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Currently, when creating a new blkcg_gq, each policy's pd_init_fn() is
invoked in blkg_alloc() before the parent is linked. This makes it
difficult for policies to perform initializations which are dependent
on the parent.
This patch moves pd_init_fn() invocations to blkg_create() after the
parent blkg is linked where the new blkg is fully initialized. As
this means that blkg_free() can't assume that pd's are initialized,
pd_exit_fn() invocations are moved to __blkg_release(). This
guarantees that pd_exit_fn() is also invoked with fully initialized
blkgs with valid parent pointers.
This will help implementing hierarchy support in blk-throttle.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
blk-throttle hierarchy support will make use of it. Move
blkg_for_each_descendant_pre() from block/blk-cgroup.c to
block/blk-cgroup.h.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
In blkg_create(), after lookup of parent fails, the control jumps to
error path with the error code encoded into @blkg. The error path
doesn't use @blkg for the return value. It returns ERR_PTR(ret).
Make lookup fail path set @ret instead of @blkg.
Note that the parent lookup is guaranteed to succeed at that point and
the condition check is purely for sanity and triggers WARN when fails.
As such, I don't think it's necessary to mark it for -stable.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
kprobes up to par with the latest changes to ftrace (multi buffering
and the new function probes).
He also discovered and fixed some bugs in doing so. When pulling in his
patches, I also found a few minor bugs as well and fixed them.
This also includes a compile fix for some archs that select the ring buffer
but not tracing.
I based this off of the last patch you took from me that fixed the merge
conflict error, as that was the commit that had all the changes I needed
for this set of changes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJRjYnJAAoJEOdOSU1xswtMg9EH/iFs438FgrNMk2ZdQftmqcqA
cqcactHo1mmoHjAoLZT/oDBjEThhVUuqzMXrFRutSYcTh4PsQEC3arX0mpsC+T12
UEEV/tZS3TXH+GXEyrOit/O3kzntQcDHwJDV4+0n80IrJmw4IDZbnV3R8DWjS6wp
so+dq0A1pwehcG/upgpw1oTKsGv1G/p6vyf968B6W44icHEClLiph4JE2kzE6D3r
fzSpOLaQoBEvwIRf6xRKxi240VqIItXwfG7pwNpPpSC37gRLzm74zGr+Sj93/k1y
pARbZ/5XO7/pcVYQYupErRAoV5in+QMZ67k5G1vQIvyOS9r039catbQf/7PkzcI=
=EZCE
-----END PGP SIGNATURE-----
Merge tag 'trace-fixes-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing/kprobes update from Steven Rostedt:
"The majority of these changes are from Masami Hiramatsu bringing
kprobes up to par with the latest changes to ftrace (multi buffering
and the new function probes).
He also discovered and fixed some bugs in doing so. When pulling in
his patches, I also found a few minor bugs as well and fixed them.
This also includes a compile fix for some archs that select the ring
buffer but not tracing.
I based this off of the last patch you took from me that fixed the
merge conflict error, as that was the commit that had all the changes
I needed for this set of changes."
* tag 'trace-fixes-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/kprobes: Support soft-mode disabling
tracing/kprobes: Support ftrace_event_file base multibuffer
tracing/kprobes: Pass trace_probe directly from dispatcher
tracing/kprobes: Increment probe hit-count even if it is used by perf
tracing/kprobes: Use bool for retprobe checker
ftrace: Fix function probe when more than one probe is added
ftrace: Fix the output of enabled_functions debug file
ftrace: Fix locking in register_ftrace_function_probe()
tracing: Add helper function trace_create_new_event() to remove duplicate code
tracing: Modify soft-mode only if there's no other referrer
tracing: Indicate enabled soft-mode in enable file
tracing/kprobes: Fix to increment return event probe hit-count
ftrace: Cleanup regex_lock and ftrace_lock around hash updating
ftrace, kprobes: Fix a deadlock on ftrace_regex_lock
ftrace: Have ftrace_regex_write() return either read or error
tracing: Return error if register_ftrace_function_probe() fails for event_enable_func()
tracing: Don't succeed if event_enable_func did not register anything
ring-buffer: Select IRQ_WORK
- More fixes in the vCPU PVHVM hotplug path.
- Add more documentation.
- Fix various ARM related issues in the Xen generic drivers.
- Updates in the xen-pciback driver per Bjorn's updates.
- Mask the x2APIC feature for PV guests.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJRjPL5AAoJEFjIrFwIi8fJdlIIANXawH+B+aFbqsFSKOOh76XN
smgICU78SVzKpW9WAPYK7YFqSdNN4AleGC2Mn2lSkiaqgciRyDb9Yt+OSMMts2Xn
ZVbFkGhEKR+DtZfTKo9YgsGatul/McTiVEkuuli+aN5dql3WXDLAaA+/b9bO3ohh
TCWtWNuSCGmlfDoJET2je+J6CgKvCErH3fvzKNxgYxytcGhAvxoVK/lC4d3pnq/m
wQUAIcF8XYENqC2m1WDR0OGveAB0Me0j9g+UkQS+TzqA8GPmxC4aptjkroFYhOz6
8nZp+LanimmTI6olVioWEXkCr5+dxb058jQKwncQfonFpl58RS0qUrz5zoe3etU=
=h9SX
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
- More fixes in the vCPU PVHVM hotplug path.
- Add more documentation.
- Fix various ARM related issues in the Xen generic drivers.
- Updates in the xen-pciback driver per Bjorn's updates.
- Mask the x2APIC feature for PV guests.
* tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pci: Used cached MSI-X capability offset
xen/pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
xen: clear IRQ_NOAUTOEN and IRQ_NOREQUEST
xen: mask x2APIC feature in PV
xen: SWIOTLB is only used on x86
xen/spinlock: Fix check from greater than to be also be greater or equal to.
xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info
xen/vcpu: Document the xen_vcpu_info and xen_vcpu
xen/vcpu/pvhvm: Fix vcpu hotplugging hanging.
This is the final round of SCSI patches for the merge window. It consists
mostly of driver updates (bnx2fc, ibmfc, fnic, lpfc, be2iscsi, pm80xx, qla4x
and ipr). There's also the power management updates that complete the patches
in Jens' tree, an iscsi refcounting problem fix from the last pull, some dif
handling in scsi_debug fixes, a few nice code cleanups and an error handling
busy bug fix.
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQEcBAABAgAGBQJRjQsqAAoJEDeqqVYsXL0MayMH/iDncN0swy/Y2Pfh31YHRcvG
pPTsx/WgQogg5WDeG1XRPmgDJBmJKeSO7OcIZ/AWz+8BJkZbDLfdSp9ugg8ZrrsN
yAa3lZoOfxhf43AEaUpgORHGj39O+QfoLq3mdo44A57ro30YotIaMvCiqv+baPSL
Z8K7O/TmmHmOeir3Gl05qrO6PKvQkNnHYZoF1lBiHypUJk51cuTmHrfY6mb2ktXc
7aElSXSbpUcqQQ8QHnXO/3r6VPkYjhznvo8q8rcLvqQqRnTdLF5Su8vDuyMbbJcU
svu30wRaBik3XvIfppGhukrYeU5ooj13zZAUY9NIf2vc2wjpMUlRNNZ0uSsHzOw=
=6cOg
-----END PGP SIGNATURE-----
Merge tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull second SCSI update from James "Jaj B" Bottomley:
"This is the final round of SCSI patches for the merge window. It
consists mostly of driver updates (bnx2fc, ibmfc, fnic, lpfc,
be2iscsi, pm80xx, qla4x and ipr).
There's also the power management updates that complete the patches in
Jens' tree, an iscsi refcounting problem fix from the last pull, some
dif handling in scsi_debug fixes, a few nice code cleanups and an
error handling busy bug fix."
* tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (92 commits)
[SCSI] qla2xxx: Update firmware link in Kconfig file.
[SCSI] iscsi class, qla4xxx: fix sess/conn refcounting when find fns are used
[SCSI] sas: unify the pointlessly separated enums sas_dev_type and sas_device_type
[SCSI] pm80xx: thermal, sas controller config and error handling update
[SCSI] pm80xx: NCQ error handling changes
[SCSI] pm80xx: WWN Modification for PM8081/88/89 controllers
[SCSI] pm80xx: Changed module name and debug messages update
[SCSI] pm80xx: Firmware flash memory free fix, with addition of new memory region for it
[SCSI] pm80xx: SPC new firmware changes for device id 0x8081 alone
[SCSI] pm80xx: Added SPCv/ve specific hardware functionalities and relevant changes in common files
[SCSI] pm80xx: MSI-X implementation for using 64 interrupts
[SCSI] pm80xx: Updated common functions common for SPC and SPCv/ve
[SCSI] pm80xx: Multiple inbound/outbound queue configuration
[SCSI] pm80xx: Added SPCv/ve specific ids, variables and modify for SPC
[SCSI] lpfc: fix up Kconfig dependencies
[SCSI] Handle MLQUEUE busy response in scsi_send_eh_cmnd
[SCSI] sd: change to auto suspend mode
[SCSI] sd: use REQ_PM in sd's runtime suspend operation
[SCSI] qla4xxx: Fix iocb_cnt calculation in qla4xxx_send_mbox_iocb()
[SCSI] ufs: Correct the expected data transfersize
...
Pull idle update from Len Brown:
"Add support for new Haswell-ULT CPU idle power states"
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
intel_idle: initial C8, C9, C10 support
tools/power turbostat: display C8, C9, C10 residency
Pull audit changes from Eric Paris:
"Al used to send pull requests every couple of years but he told me to
just start pushing them to you directly.
Our touching outside of core audit code is pretty straight forward. A
couple of interface changes which hit net/. A simple argument bug
calling audit functions in namei.c and the removal of some assembly
branch prediction code on ppc"
* git://git.infradead.org/users/eparis/audit: (31 commits)
audit: fix message spacing printing auid
Revert "audit: move kaudit thread start from auditd registration to kaudit init"
audit: vfs: fix audit_inode call in O_CREAT case of do_last
audit: Make testing for a valid loginuid explicit.
audit: fix event coverage of AUDIT_ANOM_LINK
audit: use spin_lock in audit_receive_msg to process tty logging
audit: do not needlessly take a lock in tty_audit_exit
audit: do not needlessly take a spinlock in copy_signal
audit: add an option to control logging of passwords with pam_tty_audit
audit: use spin_lock_irqsave/restore in audit tty code
helper for some session id stuff
audit: use a consistent audit helper to log lsm information
audit: push loginuid and sessionid processing down
audit: stop pushing loginid, uid, sessionid as arguments
audit: remove the old depricated kernel interface
audit: make validity checking generic
audit: allow checking the type of audit message in the user filter
audit: fix build break when AUDIT_DEBUG == 2
audit: remove duplicate export of audit_enabled
Audit: do not print error when LSMs disabled
...
Pull nfsd fixes from Bruce Fields:
"Small fixes for two bugs and two warnings"
* 'for-3.10' of git://linux-nfs.org/~bfields/linux:
nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error
SUNRPC: fix decoding of optional gss-proxy xdr fields
SUNRPC: Refactor gssx_dec_option_array() to kill uninitialized warning
nfsd4: don't allow owner override on 4.1 CLAIM_FH opens
Pull x86 platform drivers from Matthew Garrett:
"Small set of updates, mainly trivial bugfixes and some small updates
to deal with newer hardware.
There's also a new driver that allows qemu guests to notify the
hypervisor that they've just paniced, which seems useful."
* 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86:
Add support for fan button on Ideapad Z580
pvpanic: pvpanic device driver
asus-nb-wmi: set wapf=4 for ASUSTeK COMPUTER INC. X75A
drivers: platform: x86: Use PTR_RET function
sony-laptop: SVS151290S kbd backlight and gfx switch support
hp-wmi: add more definitions for new event_id's
dell-laptop: Fix krealloc() misuse in parse_da_table()
hp_accel: Ignore the error from lis3lv02d_poweron() at resume
dell: add new dell WMI format for the AIO machines
Pull stray syscall bits from Al Viro:
"Several syscall-related commits that were missing from the original"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE
unicore32: just use mmap_pgoff()...
unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)
available by moving to the ablkcipher crypto API. The improvement is more
apparent on faster storage devices. There's no noticeable change when hardware
crypto is not available.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCgAGBQJRjJFjAAoJENaSAD2qAscKuRgQAJkefyLPNBb4plXr3R+zJ1aM
jkcqkUJWamCxlmqHg/n9LmlxpSZiiWSWJM8iiq8zQhPE0PVXVOVhqvzogT1xwv75
xfT9xTuRi1v7UFaSGHtj3WoO2nscQ0pjZW/0CHgd/PZjz9y0iJ/l6ueoWCOz5L2i
3oJvx/W407qM+MSogWKx79i1B2jILdBdQH/7PZ+UJS3jWEo3rMWBbfCwbYhd4pUG
oVc+qFglNs+3HLdHVUmHPCerCL9qYAJIJmDrvupSOQ6DwdFaV8IysTgSEdFtLcfC
8Z6DUOPzXnvA/+y+NCCCUxg1CrkgYkNrefLKAq18atFu63zIZIHZyJBTJ5Q6vXVF
o1H8UcIOg/liGa6lXGf5b4ENNKvB0qMYQgiSrL0/FVVima4zGqUWkQLno4kQl1zx
FHB5imQ7F/EMcow/nTN3YQYC3N/iYIFAIRxf35SiGGsNhO2sEyIqYlRSyq2MNrRl
pLWNbhnRuhBUqcbqZDxq1oZ7624Ui4jnHHx7rl6Y3gfm8Xa1ZmQeY6rOadSZaRd7
+ZqFZi1jiHz1c0tVUO/3DsqABIhbr9Ee03vracN8bVTGV0ZmO0qneugFB9esoeDf
UnrU0Im8ilHu3OHAyc+UkRZuzThL9bLZEivwICQ+JDUJ1zsLYSQZEaGIt8hA0iDy
8Bu3gtfX2WxD4ak/AkFv
=3EhP
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-3.10-rc1-ablkcipher' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs update from Tyler Hicks:
"Improve performance when AES-NI (and most likely other crypto
accelerators) is available by moving to the ablkcipher crypto API.
The improvement is more apparent on faster storage devices.
There's no noticeable change when hardware crypto is not available"
* tag 'ecryptfs-3.10-rc1-ablkcipher' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Use the ablkcipher crypto API
- Two krealloc() abuse fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iEYEABECAAYFAlGLydsACgkQdwG7hYl686O3ZwCfYIJUdE/VADo53j3SwDbCEa15
pMgAnA13t6dKgC74Xqk5dd65KKaw64WB
=nbL9
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20130509' of git://git.infradead.org/~dwmw2/random-2.6
Pull misc fixes from David Woodhouse:
"This is some miscellaneous cleanups that don't really belong anywhere
else (or were ignored), that have been sitting in linux-next for some
time. Two of them are fixes resulting from my audit of krealloc()
usage that don't seem to have elicited any response when I posted
them, and the other three are patches from Artem removing dead code."
* tag 'for-linus-20130509' of git://git.infradead.org/~dwmw2/random-2.6:
pcmcia: remove RPX board stuff
m68k: remove rpxlite stuff
pcmcia: remove Motorola MBX860 support
params: Fix potential memory leak in add_sysfs_param()
dell-laptop: Fix krealloc() misuse in parse_da_table()
Pull kvm fixes from Gleb Natapov:
"Most of the fixes are in the emulator since now we emulate more than
we did before for correctness sake we see more bugs there, but there
is also an OOPS fixed and corruption of xcr0 register."
* tag 'kvm-3.10-2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: emulator: emulate SALC
KVM: emulator: emulate XLAT
KVM: emulator: emulate AAM
KVM: VMX: fix halt emulation while emulating invalid guest sate
KVM: Fix kvm_irqfd_init initialization
KVM: x86: fix maintenance of guest/host xcr0 state
provisioning target to be extended easily; allow WRITE SAME on
multipath devices; an assortment of little fixes and clean-ups.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRjPh5AAoJEK2W1qbAHj1nznsP/idIXjHebZbyp0DHLIHcOyR9
R7wj641TCE72qtIvlxPkpqrFdPQF5alilerS3OpWfrOR5clswQhI8aZOfdXRtvCe
3QQ2yLpi1aZxm/TjJXo4vlfm+JJQZVsytG+5ou5cNPVt0/rAAhaMS3n2OvSNBiMO
krtksMZk8oD0LV33M6FLnieV5gZIsEpy1VrPu8eLfTT7cSE28k7h0wsN8MzRlRKq
uKThwPkA332N9adPZoxr3diRL7USSn6VYo9ygheJfAj5o6UGDxwZZu2BnO0vsaEj
6q9yt+W6wFa96knBI7pVb19OEtNYvF7fjsYuAu0DRvEZ8cPeU40Mi/fNnN8TAg+Q
tFZkZ+rlvQCBylIsGEhLZkOjTf9cl2H9I3BfVLQ2givwANLvUjunGtcxPENbsKSq
kKBwlXyDs2bgUIk1ltTTyenGZxVA7ADOPNbvYMOKWzeh8gsKgQN+34ggzhHYgNQr
jkljEt3ToPDQ+rqWmz5+NGNTzH6I0FuK/rF9C9TAaFnBgCAuXvfNEUYBR0VnXgp+
LxijQ+q1CFDNC7pnrlVrKz2UAlpn2dDVJsREMUXZQQr4r06x3TgmMaLcGpUgNK0Q
iZ1vHPkHMmFTeCyJYJpe62wKRDQotevhjcgyQAEJBiPFOyVcfAn/WqPRz6U/IP00
A1QR9eBvCWKrcZnYOet9
=YXMj
-----END PGP SIGNATURE-----
Merge tag 'dm-3.10-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
Pull device-mapper updates from Alasdair Kergon:
"Allow devices that hold metadata for the device-mapper thin
provisioning target to be extended easily; allow WRITE SAME on
multipath devices; an assortment of little fixes and clean-ups."
* tag 'dm-3.10-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (21 commits)
dm cache: set config value
dm cache: move config fns
dm thin: generate event when metadata threshold passed
dm persistent metadata: add space map threshold callback
dm persistent data: add threshold callback to space map
dm thin: detect metadata device resizing
dm persistent data: support space map resizing
dm thin: open dev read only when possible
dm thin: refactor data dev resize
dm cache: replace memcpy with struct assignment
dm cache: fix typos in comments
dm cache policy: fix description of lookup fn
dm: document iterate_devices
dm persistent data: fix error message typos
dm cache: tune migration throttling
dm mpath: enable WRITE SAME support
dm table: fix write same support
dm bufio: avoid a possible __vmalloc deadlock
dm snapshot: fix error return code in snapshot_ctr
dm cache: fix error return code in cache_create
...
Pull HID fixes from Jiri Kosina:
- fix usage of sleeping lock in atomic context from Jiri Kosina
- build fix for hid-steelseries under certain .config setups by Simon Wood
- simple mismerge fix from Fernando Luis Vázquez Cao
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: debug: fix RCU preemption issue
HID: hid-steelseries fix led class build issue
HID: reintroduce fix-up for certain Sony RF receivers
This contains small fixes since the previous pull request:
- A few regression fixes and small updates of HD-audio
- Yet another fix for Haswell HDMI audio
- A copule of trivial fixes in ASoC McASP, DPAM and WM8994
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJRjOOfAAoJEGwxgFQ9KSmkzE8P/iX7Tz8F7DCTJnbE6W617fwI
z9weFqy7D/f6pTlMZrfzVCFJQBcACYLxN5OxfPDNu4zMao1Cki6ngococ6QBRMl/
bSu02pM3N2EGQQU4emQYfgR6+ZelUlVDS441jmIz6JOQYQql+eZZnO1XxWb0fHQC
MtHcxWLMhuXIcgSDeYeg+wQZjM/XxeN/AYA8Lnn8EEwoNV6vrZw4slOm8eC9qQnb
uqLjrivhcJpARetl/n5aPdIbtplkUVUAeyZnK6O4NHsN7AqBQ2RXSpPTTj4DV+fN
pN0Ah39eDNcF/zM0JqcDheSXP7MkB7s7kRcZOEmPwNSgCXfhjdwPDd4Si2y5tTbI
NMIZUawEdx47NkZDmyGRHyOQLixkMC/+qPQcD7cAof5WJAygpBAyU9WlOEVJ9MOZ
ytA0S+RWW05+jh5tiYHI+pjVl1TcN/ltgMsyBu+3owI4jQQs9LyIYR+IM4QkhpfE
gNDeDV6Do0xL0LSnPfYwgxV+H0oSWrRrUOlgEEeuyXBLcnIqJfUl+Y4n7afwO+xz
04izx1SUdp4dQ9Fo2/jInVn/EhQwwpw361yNUHtozFCh4ETNatBGiXIvWyLSW4FS
j9d+aNIZR4CtZ28+wymFSw6yiqLkJArNoUNgcJcKATgeqg6CUa+ZdYi6gkirp+gk
sU+bSxrxp4dv5hOj+rIF
=DMsb
-----END PGP SIGNATURE-----
Merge tag 'sound-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This contains small fixes since the previous pull request:
- A few regression fixes and small updates of HD-audio
- Yet another fix for Haswell HDMI audio
- A copule of trivial fixes in ASoC McASP, DPAM and WM8994"
* tag 'sound-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
Revert "ALSA: hda - Don't set up active streams twice"
ALSA: Add comment for control TLV API
ALSA: hda - Apply pin-enablement workaround to all Haswell HDMI codecs
ALSA: HDA: Fix Oops caused by dereference NULL pointer
ALSA: mips/sgio2audio: Remove redundant platform_set_drvdata()
ALSA: mips/hal2: Remove redundant platform_set_drvdata()
ALSA: hda - Fix 3.9 regression of EAPD init on Conexant codecs
sound: Fix make allmodconfig on MIPS
ALSA: hda - Fix system panic when DMA > 40 bits for Nvidia audio controllers
ALSA: atmel: Remove redundant platform_set_drvdata()
ASoC: McASP: Fix receive clock polarity in DAIFMT_NB_NF mode.
ASoC: wm8994: missing break in wm8994_aif3_hw_params()
ASoC: McASP: Add pins output direction for rx clocks when configured in CBS_CFS format
ASoC: dapm: use clk_prepare_enable and clk_disable_unprepare
Pull MIPS updates from Ralf Baechle:
- More work on DT support for various platforms
- Various fixes that were to late to make it straight into 3.9
- Improved platform support, in particular the Netlogic XLR and
BCM63xx, and the SEAD3 and Malta eval boards.
- Support for several Ralink SOC families.
- Complete support for the microMIPS ASE which basically reencodes the
existing MIPS32/MIPS64 ISA to use non-constant size instructions.
- Some fallout from LTO work which remove old cruft and will generally
make the MIPS kernel easier to maintain and resistant to compiler
optimization, even in absence of LTO.
- KVM support. While MIPS has announced hardware virtualization
extensions this KVM extension uses trap and emulate mode for
virtualization of MIPS32. More KVM work to add support for VZ
hardware virtualizaiton extensions and MIPS64 will probably already
be merged for 3.11.
Most of this has been sitting in -next for a long time. All defconfigs
have been build or run time tested except three for which fixes are being
sent by other maintainers.
Semantic conflict with kvm updates done as per Ralf
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (118 commits)
MIPS: Add new GIC clockevent driver.
MIPS: Formatting clean-ups for clocksources.
MIPS: Refactor GIC clocksource code.
MIPS: Move 'gic_frequency' to common location.
MIPS: Move 'gic_present' to common location.
MIPS: MIPS16e: Add unaligned access support.
MIPS: MIPS16e: Support handling of delay slots.
MIPS: MIPS16e: Add instruction formats.
MIPS: microMIPS: Optimise 'strnlen' core library function.
MIPS: microMIPS: Optimise 'strlen' core library function.
MIPS: microMIPS: Optimise 'strncpy' core library function.
MIPS: microMIPS: Optimise 'memset' core library function.
MIPS: microMIPS: Add configuration option for microMIPS kernel.
MIPS: microMIPS: Disable LL/SC and fix linker bug.
MIPS: microMIPS: Add vdso support.
MIPS: microMIPS: Add unaligned access support.
MIPS: microMIPS: Support handling of delay slots.
MIPS: microMIPS: Add support for exception handling.
MIPS: microMIPS: Floating point support.
MIPS: microMIPS: Fix macro naming in micro-assembler.
...