Impact: refactor code, cleanup
When we enable swiotlb for platforms that support HIGHMEM, we
can no longer store the virtual address of the original dma
buffer, because that buffer might not have a permament mapping.
Change the swiotlb code to instead store the physical address of
the original buffer.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: extend functions with a (yet unused) parameter, update callsites
Some architectures need it - in preparation for highmem swiotlb.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Selecting CRYPTO_CRC32C is not enough as CRYPTO which CRYPTO_CRC32C
depends on may be disabled. This patch adds the select on CRYPTO.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch swaps the role of libcrc32c and crc32c. Previously
the implementation was in libcrc32c and crc32c was a wrapper.
Now the code is in crc32c and libcrc32c just calls the crypto
layer.
The reason for the change is to tap into the algorithm selection
capability of the crypto API so that optimised implementations
such as the one utilising Intel's CRC32C instruction can be
used where available.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Impact: New kerneldoc comments
Additional documentation added to all the alloc_cpumask and free_cpumask
functions.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor additions)
Impact: New API
This will be needed in x86 code to allocate the domain and old_domain
cpumasks on the same node as where the containing irq_cfg struct is
allocated.
(Also fixes double-dump_stack on rare CONFIG_DEBUG_PER_CPU_MAPS case)
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (re-impl alloc_cpumask_var)
This patch fixes a long-standing performance bug in classic RCU that
results in massive internal-to-RCU lock contention on systems with
more than a few hundred CPUs. Although this patch creates a separate
flavor of RCU for ease of review and patch maintenance, it is intended
to replace classic RCU.
This patch still handles stress better than does mainline, so I am still
calling it ready for inclusion. This patch is against the -tip tree.
Nevertheless, experience on an actual 1000+ CPU machine would still be
most welcome.
Most of the changes noted below were found while creating an rcutiny
(which should permit ejecting the current rcuclassic) and while doing
detailed line-by-line documentation.
Updates from v9 (http://lkml.org/lkml/2008/12/2/334):
o Fixes from remainder of line-by-line code walkthrough,
including comment spelling, initialization, undesirable
narrowing due to type conversion, removing redundant memory
barriers, removing redundant local-variable initialization,
and removing redundant local variables.
I do not believe that any of these fixes address the CPU-hotplug
issues that Andi Kleen was seeing, but please do give it a whirl
in case the machine is smarter than I am.
A writeup from the walkthrough may be found at the following
URL, in case you are suffering from terminal insomnia or
masochism:
http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf
o Made rcutree tracing use seq_file, as suggested some time
ago by Lai Jiangshan.
o Added a .csv variant of the rcudata debugfs trace file, to allow
people having thousands of CPUs to drop the data into
a spreadsheet. Tested with oocalc and gnumeric. Updated
documentation to suit.
Updates from v8 (http://lkml.org/lkml/2008/11/15/139):
o Fix a theoretical race between grace-period initialization and
force_quiescent_state() that could occur if more than three
jiffies were required to carry out the grace-period
initialization. Which it might, if you had enough CPUs.
o Apply Ingo's printk-standardization patch.
o Substitute local variables for repeated accesses to global
variables.
o Fix comment misspellings and redundant (but harmless) increments
of ->n_rcu_pending (this latter after having explicitly added it).
o Apply checkpatch fixes.
Updates from v7 (http://lkml.org/lkml/2008/10/10/291):
o Fixed a number of problems noted by Gautham Shenoy, including
the cpu-stall-detection bug that he was having difficulty
convincing me was real. ;-)
o Changed cpu-stall detection to wait for ten seconds rather than
three in order to reduce false positive, as suggested by Ingo
Molnar.
o Produced a design document (http://lwn.net/Articles/305782/).
The act of writing this document uncovered a number of both
theoretical and "here and now" bugs as noted below.
o Fix dynticks_nesting accounting confusion, simplify WARN_ON()
condition, fix kerneldoc comments, and add memory barriers
in dynticks interface functions.
o Add more data to tracing.
o Remove unused "rcu_barrier" field from rcu_data structure.
o Count calls to rcu_pending() from scheduling-clock interrupt
to use as a surrogate timebase should jiffies stop counting.
o Fix a theoretical race between force_quiescent_state() and
grace-period initialization. Yes, initialization does have to
go on for some jiffies for this race to occur, but given enough
CPUs...
Updates from v6 (http://lkml.org/lkml/2008/9/23/448):
o Fix a number of checkpatch.pl complaints.
o Apply review comments from Ingo Molnar and Lai Jiangshan
on the stall-detection code.
o Fix several bugs in !CONFIG_SMP builds.
o Fix a misspelled config-parameter name so that RCU now announces
at boot time if stall detection is configured.
o Run tests on numerous combinations of configurations parameters,
which after the fixes above, now build and run correctly.
Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):
o Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
changeset some time ago, and finally got around to retesting
this option).
o Fix some tracing bugs in rcupreempt that caused incorrect
totals to be printed.
o I now test with a more brutal random-selection online/offline
script (attached). Probably more brutal than it needs to be
on the people reading it as well, but so it goes.
o A number of optimizations and usability improvements:
o Make rcu_pending() ignore the grace-period timeout when
there is no grace period in progress.
o Make force_quiescent_state() avoid going for a global
lock in the case where there is no grace period in
progress.
o Rearrange struct fields to improve struct layout.
o Make call_rcu() initiate a grace period if RCU was
idle, rather than waiting for the next scheduling
clock interrupt.
o Invoke rcu_irq_enter() and rcu_irq_exit() only when
idle, as suggested by Andi Kleen. I still don't
completely trust this change, and might back it out.
o Make CONFIG_RCU_TRACE be the single config variable
manipulated for all forms of RCU, instead of the prior
confusion.
o Document tracing files and formats for both rcupreempt
and rcutree.
Updates from v4 for those missing v5 given its bad subject line:
o Separated dynticks interface so that NMIs and irqs call separate
functions, greatly simplifying it. In particular, this code
no longer requires a proof of correctness. ;-)
o Separated dynticks state out into its own per-CPU structure,
avoiding the duplicated accounting.
o The case where a dynticks-idle CPU runs an irq handler that
invokes call_rcu() is now correctly handled, forcing that CPU
out of dynticks-idle mode.
o Review comments have been applied (thank you all!!!).
For but one example, fixed the dynticks-ordering issue that
Manfred pointed out, saving me much debugging. ;-)
o Adjusted rcuclassic and rcupreempt to handle dynticks changes.
Attached is an updated patch to Classic RCU that applies a hierarchy,
greatly reducing the contention on the top-level lock for large machines.
This passes 10-hour concurrent rcutorture and online-offline testing on
128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
bugs in presence of dynticks (exciting working on a system where
"sleep 1" hangs until interrupted...), which were fixed in the
2.6.27 kernel. It is getting more reliable than mainline by some
measures, so the next version will be against -tip for inclusion.
See also Manfred Spraul's recent patches (or his earlier work from
2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
We will converge onto a common patch in the fullness of time, but are
currently exploring different regions of the design space. That said,
I have already gratefully stolen quite a few of Manfred's ideas.
This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
of the RCU hierarchy. Defaults to 32 on 32-bit machines and 64 on
64-bit machines. If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
there is no hierarchy. By default, the RCU initialization code will
adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
this balancing, allowing the hierarchy to be exactly aligned to the
underlying hardware. Up to two levels of hierarchy are permitted
(in addition to the root node), allowing up to 16,384 CPUs on 32-bit
systems and up to 262,144 CPUs on 64-bit systems. I just know that I
am going to regret saying this, but this seems more than sufficient
for the foreseeable future. (Some architectures might wish to set
CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
If this becomes a real problem, additional levels can be added, but I
doubt that it will make a significant difference on real hardware.)
In the common case, a given CPU will manipulate its private rcu_data
structure and the rcu_node structure that it shares with its immediate
neighbors. This can reduce both lock and memory contention by multiple
orders of magnitude, which should eliminate the need for the strange
manipulations that are reported to be required when running Linux on
very large systems.
Some shortcomings:
o More bugs will probably surface as a result of an ongoing
line-by-line code inspection.
Patches will be provided as required.
o There are probably hangs, rcutorture failures, &c. Seems
quite stable on a 128-CPU machine, but that is kind of small
compared to 4096 CPUs. However, seems to do better than
mainline.
Patches will be provided as required.
o The memory footprint of this version is several KB larger
than rcuclassic.
A separate UP-only rcutiny patch will be provided, which will
reduce the memory footprint significantly, even compared
to the old rcuclassic. One such patch passes light testing,
and has a memory footprint smaller even than rcuclassic.
Initial reaction from various embedded guys was "it is not
worth it", so am putting it aside.
Credits:
o Manfred Spraul for ideas, review comments, and bugs spotted,
as well as some good friendly competition. ;-)
o Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
for reviews and comments.
o Thomas Gleixner for much-needed help with some timer issues
(see patches below).
o Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
alive despite my heavy abuse^Wtesting.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Both messages are missing the newline and thus dmesg output gets
scrambled.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The 'ret' variable is assigned, but not used in the return statement. Fix this.
Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net>
Acked-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Impact: clean up swiotlb printks
Remove duplicated swiotlb info printing, and make it more detailed.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: prepare the swiotlb code for HighMem struct pages
This requires us to treat DMA regions in terms of page+offset rather
than virtual addressing since a HighMem page may not have a mapping.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: generalize the sw-IOTLB range checks
Some architectures require special rules to determine whether a range
needs mapping or not. This adds a weak function for architectures to
override.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: generalize phys<->bus<->phys conversions in the swiotlb code
Architectures may need to override these conversions. Implement a
__weak hook point containing the default implementation.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: generalize swiotlb allocation code
Architectures may need to allocate memory specially for use with
the swiotlb. Create the weak function swiotlb_alloc_boot() and
swiotlb_alloc() defaulting to the current behaviour.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: reduce bug table size
This allows reducing the bug table size by half. Perhaps there are
other 64-bit architectures that could also make use of this.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Add config option to enable code in cpumask.h
Currently it can be set if DEBUG_PER_CPU_MAPS, or set specifically by
an arch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The last patch to lib/idr.c caused a bug if idr_get_new_above() was
called on an empty idr.
Usually, nodes stay on the same layer. New layers are added to the top
of the tree.
The exception is idr_get_new_above() on an empty tree: In this case, the
new root node is first added on layer 0, then moved upwards. p->layer
was not updated.
As usual: You shall never rely on the source code comments, they will
only mislead you.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert
commit e8ced39d5e
Author: Mingming Cao <cmm@us.ibm.com>
Date: Fri Jul 11 19:27:31 2008 -0400
percpu_counter: new function percpu_counter_sum_and_set
As described in
revert "percpu counter: clean up percpu_counter_sum_and_set()"
the new percpu_counter_sum_and_set() is racy against updates to the
cpu-local accumulators on other CPUs. Revert that change.
This means that ext4 will be slow again. But correct.
Reported-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: <stable@kernel.org> [2.6.27.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert
commit 1f7c14c62c
Author: Mingming Cao <cmm@us.ibm.com>
Date: Thu Oct 9 12:50:59 2008 -0400
percpu counter: clean up percpu_counter_sum_and_set()
Before this patch we had the following:
percpu_counter_sum(): return the percpu_counter's value
percpu_counter_sum_and_set(): return the percpu_counter's value, copying
that value into the central value and zeroing the per-cpu counters before
returning.
After this patch, percpu_counter_sum_and_set() has gone, and
percpu_counter_sum() gets the old percpu_counter_sum_and_set()
functionality.
Problem is, as Eric points out, the old percpu_counter_sum_and_set()
functionality was racy and wrong. It zeroes out counters on "other" cpus,
without holding any locks which will prevent races agaist updates from
those other CPUS.
This patch reverts 1f7c14c62c. This means
that percpu_counter_sum_and_set() still has the race, but
percpu_counter_sum() does not.
Note that this is not a simple revert - ext4 has since started using
percpu_counter_sum() for its dirty_blocks counter as well.
Note that this revert patch changes percpu_counter_sum() semantics.
Before the patch, a call to percpu_counter_sum() will bring the counter's
central counter mostly up-to-date, so a following percpu_counter_read()
will return a close value.
After this patch, a call to percpu_counter_sum() will leave the counter's
central accumulator unaltered, so a subsequent call to
percpu_counter_read() can now return a significantly inaccurate result.
If there is any code in the tree which was introduced after
e8ced39d5e was merged, and which depends
upon the new percpu_counter_sum() semantics, that code will break.
Reported-by: Eric Dumazet <dada1@cosmosbay.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We should first delete the counter from percpu_counters list
before freeing memory, or a percpu_counter_hotcpu_callback()
could dereference a NULL pointer.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
fs/nfsd/nfs4recover.c
Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().
Signed-off-by: James Morris <jmorris@namei.org>
2nd part of the fixes needed for
http://bugzilla.kernel.org/show_bug.cgi?id=11796.
When the idr tree is either grown or shrunk, then the update to the number
of layers and the top pointer were not atomic. This race caused crashes.
The attached patch fixes that by replicating the layers counter in each
layer, thus idr_find doesn't need idp->layers anymore.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Clement Calmels <cboulte@gmail.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: add .config driven boot parameter default value
Right now debugobjects can only be activated if the debug_objects
boot parameter is passed in via the boot command line.
Make this more convenient (and randomizable) by also providing
a .config method. Enable it by default. (DEBUG_OBJECTS itself
is default-off)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add %pm to omit the colons when printing a mac address.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kunmap() takes as argument the struct page that orginally got kmap()'d,
however the sg_miter_stop() function passed it the kernel virtual address
instead, resulting in weird stuff.
Somehow I ended up fixing this bug by accident while looking for a bug in
the same area.
Reported-by: kerneloops.org
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: <stable@kernel.org> [2.6.27.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Impact: fix DMA buffer allocation coherency bug in certain configs
This patch fixes swiotlb to use dev->coherent_dma_mask in
swiotlb_alloc_coherent().
coherent_dma_mask is a subset of dma_mask (equal to it most of
the time), enumerating the address range that a given device
is able to DMA to/from in a cache-coherent way.
But currently, swiotlb uses dev->dma_mask in alloc_coherent()
implicitly via address_needs_mapping(), but alloc_coherent is really
supposed to use coherent_dma_mask.
This bug could break drivers that uses smaller coherent_dma_mask than
dma_mask (though the current code works for the majority that use the
same mask for coherent_dma_mask and dma_mask).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c
Fixed conflicts above by using the non 'tsk' versions.
Signed-off-by: James Morris <jmorris@namei.org>
Inaugurate copy-on-write credentials management. This uses RCU to manage the
credentials pointer in the task_struct with respect to accesses by other tasks.
A process may only modify its own credentials, and so does not need locking to
access or modify its own credentials.
A mutex (cred_replace_mutex) is added to the task_struct to control the effect
of PTRACE_ATTACHED on credential calculations, particularly with respect to
execve().
With this patch, the contents of an active credentials struct may not be
changed directly; rather a new set of credentials must be prepared, modified
and committed using something like the following sequence of events:
struct cred *new = prepare_creds();
int ret = blah(new);
if (ret < 0) {
abort_creds(new);
return ret;
}
return commit_creds(new);
There are some exceptions to this rule: the keyrings pointed to by the active
credentials may be instantiated - keyrings violate the COW rule as managing
COW keyrings is tricky, given that it is possible for a task to directly alter
the keys in a keyring in use by another task.
To help enforce this, various pointers to sets of credentials, such as those in
the task_struct, are declared const. The purpose of this is compile-time
discouragement of altering credentials through those pointers. Once a set of
credentials has been made public through one of these pointers, it may not be
modified, except under special circumstances:
(1) Its reference count may incremented and decremented.
(2) The keyrings to which it points may be modified, but not replaced.
The only safe way to modify anything else is to create a replacement and commit
using the functions described in Documentation/credentials.txt (which will be
added by a later patch).
This patch and the preceding patches have been tested with the LTP SELinux
testsuite.
This patch makes several logical sets of alteration:
(1) execve().
This now prepares and commits credentials in various places in the
security code rather than altering the current creds directly.
(2) Temporary credential overrides.
do_coredump() and sys_faccessat() now prepare their own credentials and
temporarily override the ones currently on the acting thread, whilst
preventing interference from other threads by holding cred_replace_mutex
on the thread being dumped.
This will be replaced in a future patch by something that hands down the
credentials directly to the functions being called, rather than altering
the task's objective credentials.
(3) LSM interface.
A number of functions have been changed, added or removed:
(*) security_capset_check(), ->capset_check()
(*) security_capset_set(), ->capset_set()
Removed in favour of security_capset().
(*) security_capset(), ->capset()
New. This is passed a pointer to the new creds, a pointer to the old
creds and the proposed capability sets. It should fill in the new
creds or return an error. All pointers, barring the pointer to the
new creds, are now const.
(*) security_bprm_apply_creds(), ->bprm_apply_creds()
Changed; now returns a value, which will cause the process to be
killed if it's an error.
(*) security_task_alloc(), ->task_alloc_security()
Removed in favour of security_prepare_creds().
(*) security_cred_free(), ->cred_free()
New. Free security data attached to cred->security.
(*) security_prepare_creds(), ->cred_prepare()
New. Duplicate any security data attached to cred->security.
(*) security_commit_creds(), ->cred_commit()
New. Apply any security effects for the upcoming installation of new
security by commit_creds().
(*) security_task_post_setuid(), ->task_post_setuid()
Removed in favour of security_task_fix_setuid().
(*) security_task_fix_setuid(), ->task_fix_setuid()
Fix up the proposed new credentials for setuid(). This is used by
cap_set_fix_setuid() to implicitly adjust capabilities in line with
setuid() changes. Changes are made to the new credentials, rather
than the task itself as in security_task_post_setuid().
(*) security_task_reparent_to_init(), ->task_reparent_to_init()
Removed. Instead the task being reparented to init is referred
directly to init's credentials.
NOTE! This results in the loss of some state: SELinux's osid no
longer records the sid of the thread that forked it.
(*) security_key_alloc(), ->key_alloc()
(*) security_key_permission(), ->key_permission()
Changed. These now take cred pointers rather than task pointers to
refer to the security context.
(4) sys_capset().
This has been simplified and uses less locking. The LSM functions it
calls have been merged.
(5) reparent_to_kthreadd().
This gives the current thread the same credentials as init by simply using
commit_thread() to point that way.
(6) __sigqueue_alloc() and switch_uid()
__sigqueue_alloc() can't stop the target task from changing its creds
beneath it, so this function gets a reference to the currently applicable
user_struct which it then passes into the sigqueue struct it returns if
successful.
switch_uid() is now called from commit_creds(), and possibly should be
folded into that. commit_creds() should take care of protecting
__sigqueue_alloc().
(7) [sg]et[ug]id() and co and [sg]et_current_groups.
The set functions now all use prepare_creds(), commit_creds() and
abort_creds() to build and check a new set of credentials before applying
it.
security_task_set[ug]id() is called inside the prepared section. This
guarantees that nothing else will affect the creds until we've finished.
The calling of set_dumpable() has been moved into commit_creds().
Much of the functionality of set_user() has been moved into
commit_creds().
The get functions all simply access the data directly.
(8) security_task_prctl() and cap_task_prctl().
security_task_prctl() has been modified to return -ENOSYS if it doesn't
want to handle a function, or otherwise return the return value directly
rather than through an argument.
Additionally, cap_task_prctl() now prepares a new set of credentials, even
if it doesn't end up using it.
(9) Keyrings.
A number of changes have been made to the keyrings code:
(a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have
all been dropped and built in to the credentials functions directly.
They may want separating out again later.
(b) key_alloc() and search_process_keyrings() now take a cred pointer
rather than a task pointer to specify the security context.
(c) copy_creds() gives a new thread within the same thread group a new
thread keyring if its parent had one, otherwise it discards the thread
keyring.
(d) The authorisation key now points directly to the credentials to extend
the search into rather pointing to the task that carries them.
(e) Installing thread, process or session keyrings causes a new set of
credentials to be created, even though it's not strictly necessary for
process or session keyrings (they're shared).
(10) Usermode helper.
The usermode helper code now carries a cred struct pointer in its
subprocess_info struct instead of a new session keyring pointer. This set
of credentials is derived from init_cred and installed on the new process
after it has been cloned.
call_usermodehelper_setup() allocates the new credentials and
call_usermodehelper_freeinfo() discards them if they haven't been used. A
special cred function (prepare_usermodeinfo_creds()) is provided
specifically for call_usermodehelper_setup() to call.
call_usermodehelper_setkeys() adjusts the credentials to sport the
supplied keyring as the new session keyring.
(11) SELinux.
SELinux has a number of changes, in addition to those to support the LSM
interface changes mentioned above:
(a) selinux_setprocattr() no longer does its check for whether the
current ptracer can access processes with the new SID inside the lock
that covers getting the ptracer's SID. Whilst this lock ensures that
the check is done with the ptracer pinned, the result is only valid
until the lock is released, so there's no point doing it inside the
lock.
(12) is_single_threaded().
This function has been extracted from selinux_setprocattr() and put into
a file of its own in the lib/ directory as join_session_keyring() now
wants to use it too.
The code in SELinux just checked to see whether a task shared mm_structs
with other tasks (CLONE_VM), but that isn't good enough. We really want
to know if they're part of the same thread group (CLONE_THREAD).
(13) nfsd.
The NFS server daemon now has to use the COW credentials to set the
credentials it is going to use. It really needs to pass the credentials
down to the functions it calls, but it can't do that until other patches
in this series have been applied.
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
Rename is_single_threaded() to is_wq_single_threaded() so that a new
is_single_threaded() can be created that refers to tasks rather than
waitqueues.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <jmorris@namei.org>
Impact: cleanup
Clean up based on feedback from Andrew Morton and others:
- change to inline functions instead of macros
- add __init to bootmem method
- add a missing debug check
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: introduce new APIs
We want to deprecate cpumasks on the stack, as we are headed for
gynormous numbers of CPUs. Eventually, we want to head towards an
undefined 'struct cpumask' so they can never be declared on stack.
1) New cpumask functions which take pointers instead of copies.
(cpus_* -> cpumask_*)
2) Several new helpers to reduce requirements for temporary cpumasks
(cpumask_first_and, cpumask_next_and, cpumask_any_and)
3) Helpers for declaring cpumasks on or offstack for large NR_CPUS
(cpumask_var_t, alloc_cpumask_var and free_cpumask_var)
4) 'struct cpumask' for explicitness and to mark new-style code.
5) Make iterator functions stop at nr_cpu_ids (a runtime constant),
not NR_CPUS for time efficiency and for smaller dynamic allocations
in future.
6) cpumask_copy() so we can allocate less than a full cpumask eventually
(for alloc_cpumask_var), and so we can eliminate the 'struct cpumask'
definition eventually.
7) work_on_cpu() helper for doing task on a CPU, rather than saving old
cpumask for current thread and manipulating it.
8) smp_call_function_many() which is smp_call_function_mask() except
taking a cpumask pointer.
Note that this patch simply introduces the new functions and leaves
the obsolescent ones in place. This is to simplify the transition
patches.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
put_dec_trunc prints the digits in reverse order and is reversed
inside number(). Continue using put_dec_trunc, but reverse each quad
in ip4_addr_string.
[Noticed by Julius Volz]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In testing 2.6.28-rc1, I found that passing 'dynamic_printk' on the command
line didn't activate the debug code. The problem is that dynamic_printk_setup()
(which activates the debugging) is being called before dynamic_printk_init() is
called (which initializes infrastructure). Fix this by setting setting the
state to 'DYNAMIC_ENABLED_ALL' in dynamic_printk_setup(), which will also
cause all subsequent modules to have debugging automatically started, which is
probably the behavior we want.
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
For use in printing IPv4, or IPv6 addresses in the usual way:
%i4 and %I4 are currently equivalent and print the address in
dot-separated decimal x.x.x.x
%I6 prints 16-bit network order hex with colon separators:
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx
%i6 omits the colons.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Takes a pointer to a IPv6 address and formats it in the usual
colon-separated hex format:
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx
Each 16 bit word is printed in network-endian byteorder.
%#p6 is also supported and will omit the colons.
%p6 is a replacement for NIP6_FMT and NIP6()
%#p6 is a replacement for NIP6_SEQFMT and NIP6()
Note that NIP6() took a struct in6_addr whereas this takes a pointer
to a struct in6_addr.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
ftrace: fix current_tracer error return
tracing: fix a build error on alpha
ftrace: use a real variable for ftrace_nop in x86
tracing/ftrace: make boot tracer select the sched_switch tracer
tracepoint: check if the probe has been registered
asm-generic: define DIE_OOPS in asm-generic
trace: fix printk warning for u64
ftrace: warning in kernel/trace/ftrace.c
ftrace: fix build failure
ftrace, powerpc, sparc64, x86: remove notrace from arch ftrace file
ftrace: remove ftrace hash
ftrace: remove mcount set
ftrace: remove daemon
ftrace: disable dynamic ftrace for all archs that use daemon
ftrace: add ftrace warn on to disable ftrace
ftrace: only have ftrace_kill atomic
ftrace: use probe_kernel
ftrace: comment arch ftrace code
ftrace: return error on failed modified text.
ftrace: dynamic ftrace process only text section
...
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
lockdep: fix irqs on/off ip tracing
lockdep: minor fix for debug_show_all_locks()
x86: restore the old swiotlb alloc_coherent behavior
x86: use GFP_DMA for 24bit coherent_dma_mask
swiotlb: remove panic for alloc_coherent failure
xen: compilation fix of drivers/xen/events.c on IA64
xen: portability clean up and some minor clean up for xencomm.c
xen: don't reload cr3 on suspend
kernel/resource: fix reserve_region_with_split() section mismatch
printk: remove unused code from kernel/printk.c
Add format specifiers for printing out six colon-separated bytes:
MAC addresses (%pM):
xx:xx:xx:xx:xx:xx
%#pM is also supported and omits the colon separators.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (21 commits)
[SCSI] sd: fix computation of the full size of the device
[SCSI] lib: string_get_size(): don't hang on zero; no decimals on exact
[SCSI] sun3x_esp: Convert && to ||
[SCSI] sd: remove command-size switching code
[SCSI] 3w-9xxx: remove unnecessary local_irq_save/restore for scsi sg copy API
[SCSI] 3w-xxxx: remove unnecessary local_irq_save/restore for scsi sg copy API
[SCSI] fix netlink kernel-doc
[SCSI] sd: Fix handling of NO_SENSE check condition
[SCSI] export busy state via q->lld_busy_fn()
[SCSI] refactor sdev/starget/shost busy checking
[SCSI] mptfusion: Increase scsi-timeouts, similariy to the LSI 4.x driver.
[SCSI] aic7xxx: Take the LED out of diagnostic mode on PM resume
[SCSI] aic79xx: user visible misuse wrong SI units (not disk size!)
[SCSI] ipr: use memory_read_from_buffer()
[SCSI] aic79xx: fix shadowed variables
[SCSI] aic79xx: fix shadowed variables, add statics
[SCSI] aic7xxx: update *_shipped files
[SCSI] aic7xxx: update .reg files
[SCSI] aic7xxx: introduce "dont_generate_debug_code" keyword in aicasm parser
[SCSI] scsi_dh: Initialize path state to be passive when path is not owned
...
swiotlb_alloc_coherent calls panic() when allocated swiotlb pages is
not fit for a device's dma mask. However, alloc_coherent failure is
not a disaster at all. AFAIK, none of other x86 and IA64 IOMMU
implementations don't crash in case of alloc_coherent failure.
There are some drivers that don't check alloc_coherent failure but not
many (about ten and I've already started to fix some of
them). alloc_coherent returns NULL in case of failure so it's likely
that these guilty drivers crash immediately. So swiotlb doesn't need
to call panic() just for them.
Reported-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We would hang forever when passing a zero to string_get_size().
Furthermore, string_get_size() would produce decimals on a value small
enough to be exact. Finally, a few formatting issues are inconsistent
with standard SI style guidelines.
- If the value is less than the divisor, skip the entire rounding
step. This prints out all small values including zero as integers,
without decimals.
- Add a space between the value and the symbol for the unit,
consistent with standard SI practice.
- Lower case k in kB since we are talking about powers of 10.
- Finally, change "int" to "unsigned int" in one place to shut up a
gcc warning when compiling the code out-of-kernel for testing.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
* 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/dvrabel/uwb: (47 commits)
uwb: wrong sizeof argument in mac address compare
uwb: don't use printk_ratelimit() so often
uwb: use kcalloc where appropriate
uwb: use time_after() when purging stale beacons
uwb: add credits for the original developers of the UWB/WUSB/WLP subsystems
uwb: add entries in the MAINTAINERS file
uwb: depend on EXPERIMENTAL
wusb: wusb-cbaf (CBA driver) sysfs ABI simplification
uwb: document UWB and WUSB sysfs files
uwb: add symlinks in sysfs between radio controllers and PALs
uwb: dont tranmit identification IEs
uwb: i1480/GUWA100U: fix firmware download issues
uwb: i1480: remove MAC/PHY information checking function
uwb: add Intel i1480 HWA to the UWB RC quirk table
uwb: disable command/event filtering for D-Link DUB-1210
uwb: initialize the debug sub-system
uwb: Fix handling IEs with empty IE data in uwb_est_get_size()
wusb: fix bmRequestType for Abort RPipe request
wusb: fix error path for wusb_set_dev_addr()
wusb: add HWA host controller driver
...
Due to confusion between the ftrace infrastructure and the gcc profiling
tracer "ftrace", this patch renames the config options from FTRACE to
FUNCTION_TRACER. The other two names that are offspring from FTRACE
DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same.
This patch was generated mostly by script, and partially by hand.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add a %pR option to the kernel vsnprintf that prints the range of
addresses inside a struct resource passed by pointer.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bitmap_scnprintf_len() is not used now, so we remove it.
Otherwise we have to maintain it and make its return
value always equal to bitmap_scnprintf()'s return value.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Paul Menage <menage@google.com>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CONFIG_DEBUG_BLOCK_EXT_DEVT can break booting even on some modern
distros. Add BIG FAT WARNING to keep people at a safe distance.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Open-code them rather than using defining macros. The function bodies are now
next to their kerneldoc comments as a bonus.
Add casts to the signed cases as they call into the unsigned versions.
Avoids the sparse warnings:
lib/vsprintf.c:249:1: warning: incorrect type in argument 3 (different signedness)
lib/vsprintf.c:249:1: expected unsigned long *res
lib/vsprintf.c:249:1: got long *res
lib/vsprintf.c:249:1: warning: incorrect type in argument 3 (different signedness)
lib/vsprintf.c:249:1: expected unsigned long *res
lib/vsprintf.c:249:1: got long *res
lib/vsprintf.c:251:1: warning: incorrect type in argument 3 (different signedness)
lib/vsprintf.c:251:1: expected unsigned long long *res
lib/vsprintf.c:251:1: got long long *res
lib/vsprintf.c:251:1: warning: incorrect type in argument 3 (different signedness)
lib/vsprintf.c:251:1: expected unsigned long long *res
lib/vsprintf.c:251:1: got long long *res
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove extra lines before the EXPORT_SYMBOL()s
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The default base is 10 unless there is a leading zero, in which
case the base will be guessed as 8.
The base will only be guesed as 16 when the string starts with '0x'
the third character is a valid hex digit.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (46 commits)
UIO: Fix mapping of logical and virtual memory
UIO: add automata sercos3 pci card support
UIO: Change driver name of uio_pdrv
UIO: Add alignment warnings for uio-mem
Driver core: add bus_sort_breadthfirst() function
NET: convert the phy_device file to use bus_find_device_by_name
kobject: Cleanup kobject_rename and !CONFIG_SYSFS
kobject: Fix kobject_rename and !CONFIG_SYSFS
sysfs: Make dir and name args to sysfs_notify() const
platform: add new device registration helper
sysfs: use ilookup5() instead of ilookup5_nowait()
PNP: create device attributes via default device attributes
Driver core: make bus_find_device_by_name() more robust
usb: turn dev_warn+WARN_ON combos into dev_WARN
debug: use dev_WARN() rather than WARN_ON() in device_pm_add()
debug: Introduce a dev_WARN() function
sysfs: fix deadlock
device model: Do a quickcheck for driver binding before doing an expensive check
Driver core: Fix cleanup in device_create_vargs().
Driver core: Clarify device cleanup.
...
This patch introduces the generic iommu_num_pages function. It can be used by
a given memory area.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Dave Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add documentation in kerneldoc for new printk format extensions
This patch documents the new %pS/%pF options in printk in kernel doc.
Hope I didn't miss any other extension.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Using "def_bool n" is pointless, simply using bool here appears more
appropriate.
Further, retaining such options that don't have a prompt and aren't
selected by anything seems also at least questionable.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It finally dawned on me what the clean fix to sysfs_rename_dir
calling kobject_set_name is. Move the work into kobject_rename
where it belongs. The callers serialize us anyway so this is
safe.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
When looking at kobject_rename I found two bugs with
that exist when sysfs support is disabled in the kernel.
kobject_rename does not change the name on the kobject when
sysfs support is not compiled in.
kobject_rename without locking attempts to check the
validity of a rename operation, which the kobject layer
simply does not have the infrastructure to do.
This patch documents the previously unstated requirement of
kobject_rename that is the responsibility of the caller to
provide mutual exclusion and to be certain that the new_name
for the kobject is valid.
This patch modifies sysfs_rename_dir in !CONFIG_SYSFS case
to call kobject_set_name to actually change the kobject_name.
This patch removes the bogus and misleading check in kobject_rename
that attempts to see if a rename is valid. The check is bogus
because we do not have the proper locking. The check is misleading
because it looks like we can and do perform checking at the kobject
level that we don't.
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Base infrastructure to enable per-module debug messages.
I've introduced CONFIG_DYNAMIC_PRINTK_DEBUG, which when enabled centralizes
control of debugging statements on a per-module basis in one /proc file,
currently, <debugfs>/dynamic_printk/modules. When, CONFIG_DYNAMIC_PRINTK_DEBUG,
is not set, debugging statements can still be enabled as before, often by
defining 'DEBUG' for the proper compilation unit. Thus, this patch set has no
affect when CONFIG_DYNAMIC_PRINTK_DEBUG is not set.
The infrastructure currently ties into all pr_debug() and dev_dbg() calls. That
is, if CONFIG_DYNAMIC_PRINTK_DEBUG is set, all pr_debug() and dev_dbg() calls
can be dynamically enabled/disabled on a per-module basis.
Future plans include extending this functionality to subsystems, that define
their own debug levels and flags.
Usage:
Dynamic debugging is controlled by the debugfs file,
<debugfs>/dynamic_printk/modules. This file contains a list of the modules that
can be enabled. The format of the file is as follows:
<module_name> <enabled=0/1>
.
.
.
<module_name> : Name of the module in which the debug call resides
<enabled=0/1> : whether the messages are enabled or not
For example:
snd_hda_intel enabled=0
fixup enabled=1
driver enabled=0
Enable a module:
$echo "set enabled=1 <module_name>" > dynamic_printk/modules
Disable a module:
$echo "set enabled=0 <module_name>" > dynamic_printk/modules
Enable all modules:
$echo "set enabled=1 all" > dynamic_printk/modules
Disable all modules:
$echo "set enabled=0 all" > dynamic_printk/modules
Finally, passing "dynamic_printk" at the command line enables
debugging for all modules. This mode can be turned off via the above
disable command.
[gkh: minor cleanups and tweaks to make the build work quietly]
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.
This was posted for review some time ago and I believe its been in -mm
since then.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (43 commits)
ext4: Rename ext4dev to ext4
ext4: Avoid double dirtying of super block in ext4_put_super()
Update ext4 MAINTAINERS file
Hook ext4 to the vfs fiemap interface.
generic block based fiemap implementation
ocfs2: fiemap support
vfs: vfs-level fiemap interface
ext4: fix xattr deadlock
jbd2: Fix buffer head leak when writing the commit block
ext4: Add debugging markers that can be used by systemtap
jbd2: abort instead of waiting for nonexistent transaction
ext4: fix initialization of UNINIT bitmap blocks
ext4: Remove old legacy block allocator
ext4: Use readahead when reading an inode from the inode table
ext4: Improve the documentation for ext4's /proc tunables
ext4: Combine proc file handling into a single set of functions
ext4: move /proc setup and teardown out of mballoc.c
ext4: Don't use 'struct dentry' for internal lookups
ext4/jbd2: Avoid WARN() messages when failing to write to the superblock
ext4: use percpu data structures for lg_prealloc_list
...
* 'x86-v28-for-linus-phase3-B' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (74 commits)
AMD IOMMU: use iommu_device_max_index, fix
AMD IOMMU: use iommu_device_max_index
x86: add PCI IDs for AMD Barcelona PCI devices
x86/iommu: use __GFP_ZERO instead of memset for GART
x86/iommu: convert GART need_flush to bool
x86/iommu: make GART driver checkpatch clean
x86 gart: remove unnecessary initialization
x86: restore old GART alloc_coherent behavior
revert "x86: make GART to respect device's dma_mask about virtual mappings"
x86: export pci-nommu's alloc_coherent
iommu: remove fullflush and nofullflush in IOMMU generic option
x86: remove set_bit_string()
iommu: export iommu_area_reserve helper function
AMD IOMMU: use coherent_dma_mask in alloc_coherent
add AMD IOMMU tree to MAINTAINERS file
AMD IOMMU: use cmd_buf_size when freeing the command buffer
AMD IOMMU: calculate IVHD size with a function
AMD IOMMU: remove unnecessary cast to u64 in the init code
AMD IOMMU: free domain bitmap with its allocation order
AMD IOMMU: simplify dma_mask_to_pages
...
* 'rcu-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits)
rcu: RCU-based detection of stalled CPUs for Classic RCU, fix
rcu: RCU-based detection of stalled CPUs for Classic RCU
rcu: add rcu_read_lock_sched() / rcu_read_unlock_sched()
rcu: fix sparse shadowed variable warning
doc/RCU: fix pseudocode in rcuref.txt
rcuclassic: fix compiler warning
rcu: use irq-safe locks
rcuclassic: fix compilation NG
rcu: fix locking cleanup fallout
rcu: remove redundant ACCESS_ONCE definition from rcupreempt.c
rcu: fix classic RCU locking cleanup lockdep problem
rcu: trace fix possible mem-leak
rcu: just rename call_rcu_bh instead of making it a macro
rcu: remove list_for_each_rcu()
rcu: fixes to include/linux/rcupreempt.h
rcu: classic RCU locking and memory-barrier cleanups
rcu: prevent console flood when one CPU sees another AWOL via RCU
rcu, debug: detect stalled grace periods, cleanups
rcu, debug: detect stalled grace periods
rcu classic: new algorithm for callbacks-processing(v2)
...
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (37 commits)
[SCSI] zfcp: fix double dbf id usage
[SCSI] zfcp: wait on SCSI work to be finished before proceeding with init dev
[SCSI] zfcp: fix erp list usage without using locks
[SCSI] zfcp: prevent fc_remote_port_delete calls for unregistered rport
[SCSI] zfcp: fix deadlock caused by shared work queue tasks
[SCSI] zfcp: put threshold data in hba trace
[SCSI] zfcp: Simplify zfcp data structures
[SCSI] zfcp: Simplify get_adapter_by_busid
[SCSI] zfcp: remove all typedefs and replace them with standards
[SCSI] zfcp: attach and release SAN nameserver port on demand
[SCSI] zfcp: remove unused references, declarations and flags
[SCSI] zfcp: Update message with input from review
[SCSI] zfcp: add queue_full sysfs attribute
[SCSI] scsi_dh: suppress comparison warning
[SCSI] scsi_dh: add Dell product information into rdac device handler
[SCSI] qla2xxx: remove the unused SCSI_QLOGIC_FC_FIRMWARE option
[SCSI] qla2xxx: fix printk format warnings
[SCSI] qla2xxx: Update version number to 8.02.01-k8.
[SCSI] qla2xxx: Ignore payload reserved-bits during RSCN processing.
[SCSI] qla2xxx: Additional residual-count corrections during UNDERRUN handling.
...
Only works for the generic request timer handling. Allows one to
sporadically ignore request completions, thus exercising the timeout
handling.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
DEBUG_BLOCK_EXT_DEVT shuffles SCSI and IDE device numbers and root
device number set using rdev become meaningless. Root devices should
be explicitly specified using textual names. Warn about it if root
can't be found and DEBUG_BLOCK_EXT_DEVT is enabled. Also, add warning
to the help text.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It's a debug option that you would explicitly enable to test this
feature, we should default it to 'n' to prevent accidental surprises
for now.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Extended devt introduces non-contiguos device numbers. This patch
implements a debug option which forces most devt allocations to be
from the extended area and spreads them out. This is enabled by
default if DEBUG_KERNEL is set and achieves...
1. Detects code paths in kernel or userland which expect predetermined
consecutive device numbers.
2. When something goes wrong, avoid corruption as adding to the minor
of earlier partition won't lead to the wrong but valid device.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
A klist entry is kept on the list till all its current iterations are
finished; however, a new iteration after deletion also iterates over
deleted entries as long as their reference count stays above zero.
This causes problems for cases where there are users which iterate
over the list while synchronized against list manipulations and
natuarally expect already deleted entries to not show up during
iteration.
This patch implements dead flag which gets set on deletion so that
iteration can skip already deleted entries. The dead flag piggy backs
on the lowest bit of knode->n_klist and only visible to klist
implementation proper.
While at it, drop klist_iter->i_head as it's redundant and doesn't
offer anything in semantics or performance wise as klist_iter->i_klist
is dereferenced on every iteration anyway.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This patch adds the ability to print sizes in either units of 10^3 (SI)
or 2^10 (Binary) units. It rounds up to three significant figures and
can be used for either memory or storage capacities.
Oh, and I'm fully aware that 64 bits is only 16EiB ... the Zetta and
Yotta units are added for future proofing against the day we have 128
bit computers ...
[fujita.tomonori@lab.ntt.co.jp: fix missed unsigned long long cast]
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch adds stalled-CPU detection to Classic RCU. This capability
is enabled by a new config variable CONFIG_RCU_CPU_STALL_DETECTOR, which
defaults disabled.
This is a debugging feature to detect infinite loops in kernel code, not
something that non-kernel-hackers would be expected to care about.
This feature can detect looping CPUs in !PREEMPT builds and looping CPUs
with preemption disabled in PREEMPT builds. This is essentially a port of
this functionality from the treercu patch, replacing the stall debug patch
that is already in tip/core/rcu (commit 67182ae1c4).
The changes from the patch in tip/core/rcu include making the config
variable name match that in treercu, changing from seconds to jiffies to
avoid spurious warnings, and printing a boot message when this feature
is enabled.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86 has set_bit_string() that does the exact same thing that
set_bit_area() in lib/iommu-helper.c does.
This patch exports set_bit_area() in lib/iommu-helper.c as
iommu_area_reserve(), converts GART, Calgary, and AMD IOMMU to use it.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
swiotlb can use dma_get_mask() instead of the homegrown function.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
bitmap_copy_le() copies a bitmap, putting the bits into little-endian
order (i.e., each unsigned long word in the bitmap is put into
little-endian order).
The UWB stack used bitmaps to manage Medium Access Slot availability,
and these bitmaps need to be written to the hardware in LE order.
Signed-off-by: David Vrabel <david.vrabel@csr.com>
The callers of sg_copy_buffer must disable interrupts before calling
it (since it uses kmap_atomic). Some callers use it on
interrupt-disabled code but some need to take the trouble to disable
interrupts just for this. No wonder they forget about it and we hit a
bug like:
http://bugzilla.kernel.org/show_bug.cgi?id=11529
James said that it might be better to disable interrupts inside the
function rather than risk the callers getting it wrong.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
- unbreak ia64 (and powerpc) where function pointers dont
point at code but at data (reported by Tony Luck)
[ mingo@elte.hu: various cleanups ]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
during some development we suspected a case where we left something
in a notifier chain that was from a module that was unloaded already...
and that sort of thing is rather hard to track down.
This patch adds a very simple sanity check (which isn't all that
expensive) to make sure the notifier we're about to call is
actually from either the kernel itself of from a still-loaded
module, avoiding a hard-to-chase-down crash.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
It was introduced by "vsprintf: add support for '%pS' and '%pF' pointer
formats" in commit 0fe1ef24f7. However,
the current way its coded doesn't work on parisc64. For two reasons: 1)
parisc isn't in the #ifdef and 2) parisc has a different format for
function descriptors
Make dereference_function_descriptor() more accommodating by allowing
architecture overrides. I put the three overrides (for parisc64, ppc64
and ia64) in arch/kernel/module.c because that's where the kernel
internal linker which knows how to deal with function descriptors sits.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Tony Luck <tony.luck@intel.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds is_swiotlb_buffer() helper function to see whether a buffer
belongs to the swiotlb buffer or not.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We don't need any check in swiotlb_unmap_single here. unmap_single is
appropriate.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We always need swiotlb memory here so address_needs_mapping and
swiotlb_force testings are irrelevant. map_single should be used here
instead of swiotlb_map_single.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The callers are supposed to set up the gfp flags appropriately.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This bug is causing random crashes
(http://bugzilla.kernel.org/show_bug.cgi?id=11414).
-fno-omit-frame-pointer is only needed on powerpc when -pg is also
supplied, and there is a gcc bug that causes incorrect code generation
on 32-bit powerpc when -fno-omit-frame-pointer is used---it uses stack
locations below the stack pointer, which is not allowed by the ABI
because those locations can and sometimes do get corrupted by an
interrupt.
This ensures that CONFIG_FRAME_POINTER is only selected by ftrace.
When CONFIG_FTRACE is enabled we also pass -mno-sched-epilog to work
around the gcc codegen bug.
Patch based on work by:
Andreas Schwab <schwab@suse.de>
Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Daniel J. Blueman reported:
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.27-rc4-224c #1
> -------------------------------------------------------
> hald/4680 is trying to acquire lock:
> (&n->list_lock){++..}, at: [<ffffffff802bfa26>] add_partial+0x26/0x80
>
> but task is already holding lock:
> (&obj_hash[i].lock){++..}, at: [<ffffffff8041cfdc>]
> debug_object_free+0x5c/0x120
We fix it by moving the actual freeing to outside the lock (the lock
now only protects the list).
The pool lock is also promoted to irq-safe (suggested by Dan). It's
necessary because free_pool is now called outside the irq disabled
region. So we need to protect against an interrupt handler which calls
debug_object_init().
[tglx@linutronix.de: added hlist_move_list helper to avoid looping
through the list twice]
Reported-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
A recent patch from Kay Sievers <kay.sievers@vrfy.org>
replaced the first occurrence of '/' with '!' as needed for block devices.
Now do some cheap defensive coding and replace all of them to avoid future
issues in this area.
Signed-off-by: Ingo Oeser <ioe-lkml@rameria.de>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
percpu_counter_sum_and_set() and percpu_counter_sum() is the same except
the former updates the global counter after accounting. Since we are
taking the fbc->lock to calculate the precise value of the counter in
percpu_counter_sum() anyway, it should simply set fbc->count too, as the
percpu_counter_sum_and_set() does.
This patch merges these two interfaces into one.
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
I noticed that sysctl_check.o was the largest object file in
a allnoconfig build in kernel/*.
36243 0 0 36243 8d93 kernel/sysctl_check.o
This is because it was default y and && EMBEDDED. But I don't
really see a need for a non kernel developer to have their
sysctls checked all the time.
So move the Kconfig into the kernel debugging section and
also drop the default y and the EMBEDDED check.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The idea of the implementation of this fix is from Michael Ellerman.
This function has two loops, but they each interpret the memory_limit
value differently. The first loop interprets it as a "size limit"
whereas the second loop interprets it as an "address limit".
Before the second loop runs, reset memory_limit to lmb_end_of_DRAM()
so that it all works out.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Currently source files in the Documentation/ sub-dir can easily bit-rot
since they are not generally buildable, either because they are hidden in
text files or because there are no Makefile rules for them. This needs to
be fixed so that the source files remain usable and good examples of code
instead of bad examples.
Add the ability to build source files that are in the Documentation/ dir.
Add to Kconfig as "BUILD_DOCSRC" config symbol.
Use "CONFIG_BUILD_DOCSRC=1 make ..." to build objects from the
Documentation/ sources. Or enable BUILD_DOCSRC in the *config system.
However, this symbol depends on HEADERS_CHECK since the header files need
to be installed (for userspace builds).
Built (using cross-tools) for x86-64, i386, alpha, ia64, sparc32,
sparc64, powerpc, sh, m68k, & mips.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Reviewed-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Short enough reads from /proc/irq/*/smp_affinity return -EINVAL for no
good reason.
This became noticed with NR_CPUS=4096 patches, when length of printed
representation of cpumask becase 1152, but cat(1) continued to read with
1024-byte chunks. bitmap_scnprintf() in good faith fills buffer, returns
1023, check returns -EINVAL.
Fix it by switching to seq_file, so handler will just fill buffer and
doesn't care about offsets, length, filling EOF and all this crap.
For that add seq_bitmap(), and wrappers around it -- seq_cpumask() and
seq_nodemask().
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Paul Jackson <pj@sgi.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix wrong conversion function used by strict_strtou*
Signed-off-by: Yi Yang <yi.y.yang@intel.com>
Reported-by: Swen Schillig <swen@vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
this is a diagnostic patch for Classic RCU.
The approach is to record a timestamp at the beginning
of the grace period (in rcu_start_batch()), then have
rcu_check_callbacks() complain if:
1. it is running on a CPU that has holding up grace periods for
a long time (say one second). This will identify the culprit
assuming that the culprit has not disabled hardware irqs,
instruction execution, or some such.
2. it is running on a CPU that is not holding up grace periods,
but grace periods have been held up for an even longer time
(say two seconds).
It is enabled via the default-off CONFIG_DEBUG_RCU_STALL kernel parameter.
Rather than exponential backoff, it backs off to once per 30 seconds.
My feeling upon thinking on it was that if you have stalled RCU grace
periods for that long, a few extra printk() messages are probably the
least of your worries...
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: David Witbrodt <dawitbro@sbcglobal.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
kgdb: fix gdb serial thread queries
kgdb: fix kgdb_validate_break_address to perform a mem write
kgdb: remove the requirement for CONFIG_FRAME_POINTER
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits)
tcp: MD5: Fix IPv6 signatures
skbuff: add missing kernel-doc for do_not_encrypt
net/ipv4/route.c: fix build error
tcp: MD5: Fix MD5 signatures on certain ACK packets
ipv6: Fix ip6_xmit to send fragments if ipfragok is true
ipvs: Move userspace definitions to include/linux/ip_vs.h
netdev: Fix lockdep warnings in multiqueue configurations.
netfilter: xt_hashlimit: fix race between htable_destroy and htable_gc
netfilter: ipt_recent: fix race between recent_mt_destroy and proc manipulations
netfilter: nf_conntrack_tcp: decrease timeouts while data in unacknowledged
irda: replace __FUNCTION__ with __func__
nsc-ircc: default to dongle type 9 on IBM hardware
bluetooth: add quirks for a few hci_usb devices
hysdn: remove the packed attribute from PofTimStamp_tag
isdn: use the common ascii hex helpers
tg3: adapt tg3 to use reworked PCI PM code
atm: fix direct casts of pointers to u32 in the InterPhase driver
atm: fix const assignment/discard warnings in the ATM networking driver
net: use the common ascii hex helpers
random32: seeding improvement
...
There is no technical reason that the kgdb core requires frame
pointers. It is up to the end user of KGDB to decide if they need
them or not.
[ anemo@mba.ocn.ne.jp: removed frame pointers on mips ]
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Otherwise lock debugging messages on runqueue locks can deadlock the
system due to the wakeups performed by printk().
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The rationale is:
* use u32 consistently
* no need to do LCG on values from (better) get_random_bytes
* use more data from get_random_bytes for secondary seeding
* don't reduce state space on srandom32()
* enforce state variable initialization restrictions
Note: the second paper has a version of random32() with even longer period
and a version of random64() if needed.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This IOMMU helper function doesn't work for some architectures:
http://marc.info/?l=linux-kernel&m=121699304403202&w=2
It also breaks POWER and SPARC builds:
http://marc.info/?l=linux-kernel&m=121730388001890&w=2
Currently, only x86 IOMMUs use this so let's move it to x86 for
now.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (21 commits)
x86/PCI: use dev_printk when possible
PCI: add D3 power state avoidance quirk
PCI: fix bogus "'device' may be used uninitialized" warning in pci_slot
PCI: add an option to allow ASPM enabled forcibly
PCI: disable ASPM on pre-1.1 PCIe devices
PCI: disable ASPM per ACPI FADT setting
PCI MSI: Don't disable MSIs if the mask bit isn't supported
PCI: handle 64-bit resources better on 32-bit machines
PCI: rewrite PCI BAR reading code
PCI: document pci_target_state
PCI hotplug: fix typo in pcie hotplug output
x86 gart: replace to_pages macro with iommu_num_pages
x86, AMD IOMMU: replace to_pages macro with iommu_num_pages
iommu: add iommu_num_pages helper function
dma-coherent: add documentation to new interfaces
Cris: convert to using generic dma-coherent mem allocator
Sh: use generic per-device coherent dma allocator
ARM: support generic per-device coherent dma mem
Generic dma-coherent: fix DMA_MEMORY_EXCLUSIVE
x86: use generic per-device dma coherent allocator
...
memparse()'s first argument can be const, so it should be.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This implements a platform-independent version of show_mem().
Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Bryan Wu <cooloney@kernel.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Cc: Mikael Starvik <starvik@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds the new function task_current_syscall() on machines where the
asm/syscall.h interface is supported (CONFIG_HAVE_ARCH_TRACEHOOK). It's
exported for modules to use in the future. This function safely samples
the state of a blocked thread to collect what system call it is blocked
in, and the six system call argument registers.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes
part of the warning section for better reporting/collection. In addition, one
of the if() clauses collapes into the WARN() entirely now.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.
Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c
This is flag day, yes.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce gang_lookup_slot() and gang_lookup_slot_tag() functions, which
are used by lockless pagecache.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:
This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread). So I
CC'ed this to KVM camp. Comments are appreciated.
A pointer to dma_mapping_ops to struct dev_archdata is added. If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
NULL, the system-wide dma_ops pointer is used as before.
If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging). It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.
The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations. So x86 can't have dma_mapping_ops per
device. Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.
The first patch adds the device argument to dma_mapping_error. The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.
This patch:
dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations. So we can't have dma_mapping_ops per device.
Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
argument.
[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Replace previous instances of the cpumask_of_cpu_ptr* macros
with a the new (lvalue capable) generic cpumask_of_cpu().
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Calculating the number of pages from given address and length numbers is a task
required in multiple IOMMU implementations. So implement this as a generic
function into the IOMMU helper code.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Cc: iommu@lists.linux-foundation.org
Cc: bhavna.sarathy@amd.com
Cc: robert.richter@amd.com
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Introduce the free_layer() routine: it is the one that actually frees memory
after a grace period has elapsed.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make idr_find rcu-safe: it can now be called inside an rcu_read critical
section.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Do some code factorization in the return code analysis.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a trivial patch that renames:
. alloc_layer to get_from_free_list since it idr_pre_get that actually
allocates memory.
. free_layer to move_to_free_list since memory is not actually freed there.
This makes things more clear for the next patches.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Reviewed-by: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Cc: Pierre Peiffer <peifferp@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All ratelimit user use same jiffies and burst params, so some messages
(callbacks) will be lost.
For example:
a call printk_ratelimit(5 * HZ, 1)
b call printk_ratelimit(5 * HZ, 1) before the 5*HZ timeout of a, then b will
will be supressed.
- rewrite __ratelimit, and use a ratelimit_state as parameter. Thanks for
hints from andrew.
- Add WARN_ON_RATELIMIT, update rcupreempt.h
- remove __printk_ratelimit
- use __ratelimit in net_ratelimit
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Dave Young <hidave.darkstar@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arjan noted that the list_head debugging is BUG'ing when it detects
corruption. By causing the box to panic immediately, we're possibly
losing some bug reports. Changing this to a WARN() should mean we at the
least start seeing reports collected at kerneloops.org
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that WARN() exists, we can fold some of the printk's into it.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Inflate requires some dynamic memory allocation very early in the boot
process and this is provided with a set of four functions:
malloc/free/gzip_mark/gzip_release.
The old inflate code used a mark/release strategy rather than implement
free. This new version instead keeps a count on the number of outstanding
allocations and when it hits zero, it resets the malloc arena.
This allows removing all the mark and release implementations and unifying
all the malloc/free implementations.
The architecture-dependent code must define two addresses:
- free_mem_ptr, the address of the beginning of the area in which
allocations should be made
- free_mem_end_ptr, the address of the end of the area in which
allocations should be made. If set to 0, then no check is made on
the number of allocations, it just grows as much as needed
The architecture-dependent code can also provide an arch_decomp_wdog()
function call. This function will be called several times during the
decompression process, and allow to notify the watchdog that the system is
still running. If an architecture provides such a call, then it must
define ARCH_HAS_DECOMP_WDOG so that the generic inflate code calls
arch_decomp_wdog().
Work initially done by Matt Mackall, updated to a recent version of the
kernel and improved by me.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Mikael Starvik <mikael.starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the conditional surrounding the definition of list_add() from list.h
since, if you define CONFIG_DEBUG_LIST, the definition you will subsequently
pick up from lib/list_debug.c will be absolutely identical, at which point you
can remove that redundant definition from list_debug.c as well.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Extend memparse() to allow the caller to use a NULL second parameter, which
would represent no interest in returning the address of the end of the parsed
string.
In numerous cases, callers invoke memparse() to parse a possibly-suffixed
string (such as "64K" or "2G" or whatever) and define a character pointer to
accept the end pointer being returned by memparse() even though they have no
interest in it and promptly throw it away.
This (backward-compatible) enhancement allows callers to use NULL in the cases
where they just don't care about getting back that end pointer.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This updates <linux/bcd.h> to define the key routines as constant
functions, which the macros will then call. Newer code can now call
bcd2bin() instead of SCREAMING BCD2BIN() TO THE FOUR WINDS.
This lets each driver shrink their codespace by using N function calls to
a single (global) copy of those routines, instead of N inlined copies of
these functions per driver.
These routines aren't used in speed-critical code. Almost all callers are
in the RTC framework. Typical per-driver savings is near 300 bytes.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Adrian Bunk <bunk@kernel.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
lib/debugobjects.c has a function to test if an object is on the stack.
The block layer and ide needs it (they need to avoid DMA from/to stack
buffers). This patch moves the function to include/linux/sched.h so that
everyone can use it.
lib/debugobjects.c uses current->stack but this patch uses a
task_stack_page() accessor, which is a preferable way to access the stack.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Boot initialisation is very complex, with significant numbers of
architecture-specific routines, hooks and code ordering. While significant
amounts of the initialisation is architecture-independent, it trusts the data
received from the architecture layer. This is a mistake, and has resulted in
a number of difficult-to-diagnose bugs.
This patchset adds some validation and tracing to memory initialisation. It
also introduces a few basic defensive measures. The validation code can be
explicitly disabled for embedded systems.
This patch:
Add additional debugging and verification code for memory initialisation.
Once enabled, the verification checks are always run and when required
additional debugging information may be outputted via a mminit_loglevel=
command-line parameter.
The verification code is placed in a new file mm/mm_init.c. Ideally other mm
initialisation code will be moved here over time.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'cpus4096-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
NR_CPUS: Replace NR_CPUS in speedstep-centrino.c
cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUP
NR_CPUS: Replace NR_CPUS in cpufreq userspace routines
NR_CPUS: Replace per_cpu(..., smp_processor_id()) with __get_cpu_var
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genapic_flat_64.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/genx2apic_uv_x.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/proc.c
NR_CPUS: Replace NR_CPUS in arch/x86/kernel/cpu/mcheck/mce_64.c
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c, fix
cpumask: Use optimized CPUMASK_ALLOC macros in the centrino_target
cpumask: Provide a generic set of CPUMASK_ALLOC macros
cpumask: Optimize cpumask_of_cpu in lib/smp_processor_id.c
cpumask: Optimize cpumask_of_cpu in kernel/time/tick-common.c
cpumask: Optimize cpumask_of_cpu in drivers/misc/sgi-xp/xpc_main.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/ldt.c
cpumask: Optimize cpumask_of_cpu in arch/x86/kernel/io_apic_64.c
cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptr
Revert "cpumask: introduce new APIs"
cpumask: make for_each_cpu_mask a bit smaller
net: Pass reference to cpumask variable in net/sunrpc/svc.c
...
Fix up trivial conflicts in drivers/cpufreq/cpufreq.c manually
* 'core/softlockup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
softlockup: fix invalid proc_handler for softlockup_panic
softlockup: fix watchdog task wakeup frequency
softlockup: fix watchdog task wakeup frequency
softlockup: show irqtrace
softlockup: print a module list on being stuck
softlockup: fix NMI hangs due to lock race - 2.6.26-rc regression
softlockup: fix false positives on nohz if CPU is 100% idle for more than 60 seconds
softlockup: fix softlockup_thresh fix
softlockup: fix softlockup_thresh unaligned access and disable detection at runtime
softlockup: allow panic on lockup
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
sdhci: highmem capable PIO routines
sg: reimplement sg mapping iterator
mmc_test: print message when attaching to card
mmc: Remove Russell as primecell mci maintainer
mmc_block: bounce buffer highmem support
sdhci: fix bad warning from commit c8b3e02
sdhci: add warnings for bad buffers in ADMA path
mmc_test: test oversized sg lists
mmc_test: highmem tests
s3cmci: ensure host stopped on machine shutdown
au1xmmc: suspend/resume implementation
s3cmci: fixes for section mismatch warnings
pxamci: trivial fix of DMA alignment register bit clearing
Remove HAVE_ARCH_KGDB_SHADOW_INFO because it does not
exist anywhere in the kernel mainline sources
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
This is alternative implementation of sg content iterator introduced
by commit 83e7d317... from Pierre Ossman in next-20080716. As there's
already an sg iterator which iterates over sg entries themselves, name
this sg_mapping_iterator.
Slightly edited description from the original implementation follows.
Iteration over a sg list is not that trivial when you take into
account that memory pages might have to be mapped before being used.
Unfortunately, that means that some parts of the kernel restrict
themselves to directly accesible memory just to not have to deal with
the mess.
This patch adds a simple iterator system that allows any code to
easily traverse an sg list and not have to deal with all the details.
The user can decide to consume part of the iteration. Also, iteration
can be stopped and resumed later if releasing the kmap between
iteration steps is necessary. These features are useful to implement
piecemeal sg copying for interrupt drive PIO for example.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
remove CONFIG_KMOD from core kernel code
remove CONFIG_KMOD from lib
remove CONFIG_KMOD from sparc64
rework try_then_request_module to do less in non-modular kernels
remove mention of CONFIG_KMOD from documentation
make CONFIG_KMOD invisible
modules: Take a shortcut for checking if an address is in a module
module: turn longs into ints for module sizes
Shrink struct module: CONFIG_UNUSED_SYMBOLS ifdefs
module: reorder struct module to save space on 64 bit builds
module: generic each_symbol iterator function
module: don't use stop_machine for waiting rmmod
textsearch algorithms can be loaded, make the code depend
on CONFIG_MODULES instead of CONFIG_KMOD.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
kobject_uevent_env() drops the return value of call_usermodehelper().
It will make upper caller, such as dm_send_uevents(), to lose error
information.
BTW, Previously kobject_uevent_env() transmitted return of
call_usermodehelper() to callers, but
commit 5f123fbd80
"[PATCH] merge kobject_uevent and kobject_hotplug" removed it.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Cc: Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Some (block) devices have a '/' in the name, and need special
handling. Let's have that rule to the core, so we can remove it
from the block class.
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Optimize various places where a pointer to the cpumask_of_cpu value
will result in reducing stack pressure.
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ftrace: do not trace library functions
ftrace: do not trace scheduler functions
ftrace: fix lockup with MAXSMP
ftrace: fix merge buglet
make function tracing more robust: do not trace library functions.
We've already got a sizable list of exceptions:
ifdef CONFIG_FTRACE
# Do not profile string.o, since it may be used in early boot or vdso
CFLAGS_REMOVE_string.o = -pg
# Also do not profile any debug utilities
CFLAGS_REMOVE_spinlock_debug.o = -pg
CFLAGS_REMOVE_list_debug.o = -pg
CFLAGS_REMOVE_debugobjects.o = -pg
CFLAGS_REMOVE_find_next_bit.o = -pg
CFLAGS_REMOVE_cpumask.o = -pg
CFLAGS_REMOVE_bitmap.o = -pg
endif
... and the pattern has been that random library functionality showed
up in ftrace's critical path (outside of its recursion check), causing
hard to debug lockups.
So be a bit defensive about it and exclude all lib/*.o functions by
default. It's not that they are overly interesting for tracing purposes
anyway. Specific ones can still be traced, in an opt-in manner.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
MAXSMP brings in lots of use of various bitops in smp_processor_id()
and friends - causing ftrace to lock up during bootup:
calling anon_inode_init+0x0/0x130
initcall anon_inode_init+0x0/0x130 returned 0 after 0 msecs
calling acpi_event_init+0x0/0x57
[ hard hang ]
So exclude the bitops facilities from tracing.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (72 commits)
Revert "x86/PCI: ACPI based PCI gap calculation"
PCI: remove unnecessary volatile in PCIe hotplug struct controller
x86/PCI: ACPI based PCI gap calculation
PCI: include linux/pm_wakeup.h for device_set_wakeup_capable
PCI PM: Fix pci_prepare_to_sleep
x86/PCI: Fix PCI config space for domains > 0
Fix acpi_pm_device_sleep_wake() by providing a stub for CONFIG_PM_SLEEP=n
PCI: Simplify PCI device PM code
PCI PM: Introduce pci_prepare_to_sleep and pci_back_from_sleep
PCI ACPI: Rework PCI handling of wake-up
ACPI: Introduce new device wakeup flag 'prepared'
ACPI: Introduce acpi_device_sleep_wake function
PCI: rework pci_set_power_state function to call platform first
PCI: Introduce platform_pci_power_manageable function
ACPI: Introduce acpi_bus_power_manageable function
PCI: make pci_name use dev_name
PCI: handle pci_name() being const
PCI: add stub for pci_set_consistent_dma_mask()
PCI: remove unused arch pcibios_update_resource() functions
PCI: fix pci_setup_device()'s sprinting into a const buffer
...
Fixed up conflicts in various files (arch/x86/kernel/setup_64.c,
arch/x86/pci/irq.c, arch/x86/pci/pci.h, drivers/acpi/sleep/main.c,
drivers/pci/pci.c, drivers/pci/pci.h, include/acpi/acpi_bus.h) from x86
and ACPI updates manually.
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (102 commits)
[SCSI] scsi_dh: fix kconfig related build errors
[SCSI] sym53c8xx: Fix bogus sym_que_entry re-implementation of container_of
[SCSI] scsi_cmnd.h: remove double inclusion of linux/blkdev.h
[SCSI] make struct scsi_{host,target}_type static
[SCSI] fix locking in host use of blk_plug_device()
[SCSI] zfcp: Cleanup external header file
[SCSI] zfcp: Cleanup code in zfcp_erp.c
[SCSI] zfcp: zfcp_fsf cleanup.
[SCSI] zfcp: consolidate sysfs things into one file.
[SCSI] zfcp: Cleanup of code in zfcp_aux.c
[SCSI] zfcp: Cleanup of code in zfcp_scsi.c
[SCSI] zfcp: Move status accessors from zfcp to SCSI include file.
[SCSI] zfcp: Small QDIO cleanups
[SCSI] zfcp: Adapter reopen for large number of unsolicited status
[SCSI] zfcp: Fix error checking for ELS ADISC requests
[SCSI] zfcp: wait until adapter is finished with ERP during auto-port
[SCSI] ibmvfc: IBM Power Virtual Fibre Channel Adapter Client Driver
[SCSI] sg: Add target reset support
[SCSI] lib: Add support for the T10 (SCSI) Data Integrity Field CRC
[SCSI] sd: Move scsi_disk() accessor function to sd.h
...
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (61 commits)
ext4: Documention update for new ordered mode and delayed allocation
ext4: do not set extents feature from the kernel
ext4: Don't allow nonextenst mount option for large filesystem
ext4: Enable delalloc by default.
ext4: delayed allocation i_blocks fix for stat
ext4: fix delalloc i_disksize early update issue
ext4: Handle page without buffers in ext4_*_writepage()
ext4: Add ordered mode support for delalloc
ext4: Invert lock ordering of page_lock and transaction start in delalloc
mm: Add range_cont mode for writeback
ext4: delayed allocation ENOSPC handling
percpu_counter: new function percpu_counter_sum_and_set
ext4: Add delayed allocation support in data=writeback mode
vfs: add hooks for ext4's delayed allocation support
jbd2: Remove data=ordered mode support using jbd buffer heads
ext4: Use new framework for data=ordered mode in JBD2
jbd2: Implement data=ordered mode handling via inodes
vfs: export filemap_fdatawrite_range()
ext4: Fix lock inversion in ext4_ext_truncate()
ext4: Invert the locking order of page_lock and transaction start
...
The SCSI Block Protocol uses this 16-bit CRC to verify the integrity
of each data sector. crc_t10dif() is used by sd_dif.c when performing
I/O to or from disks formatted with protection information.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Delayed allocation need to check free blocks at every write time.
percpu_counter_read_positive() is not quit accurate. delayed
allocation need a more accurate accounting, but using
percpu_counter_sum_positive() is frequently is quite expensive.
This patch added a new function to update center counter when sum
per-cpu counter, to increase the accurate rate for next
percpu_counter_read() and require less calling expensive
percpu_counter_sum().
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
For fsm text search, handle case insensitive parameter as -EINVAL.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for case insensitive search to Knuth-Morris-Pratt algorithm.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for case insensitive search to Boyer-Moore algorithm.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function textsearch_prepare has a new flag to support case
insensitive searching.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
They print out a pointer in symbolic format, if possible (ie using
symbolic KALLSYMS information). The '%pS' format is for regular direct
pointers (which can point to data or code and that you find on the stack
during backtraces etc), while '%pF' is for C function pointer types.
On most architectures, the two mean exactly the same thing, but some
architectures use an indirect pointer for C function pointers, where the
function pointer points to a function descriptor (which in turn contains
the actual pointer to the code). The '%pF' code automatically does the
appropriate function descriptor dereference on such architectures.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This expands the kernel '%p' handling with an arbitrary alphanumberic
specifier extension string immediately following the '%p'. Right now
it's just being ignored, but the next commit will start adding some
specific pointer type extensions.
NOTE! The reason the extension is appended to the '%p' is to allow
minimal gcc type checking: gcc will still see the '%p' and will check
that the argument passed in is indeed a pointer, and yet will not
complain about the extended information that gcc doesn't understand
about (on the other hand, it also won't actually check that the pointer
type and the extension are compatible).
Alphanumeric characters were chosen because there is no sane existing
use for a string format with a hex pointer representation immediately
followed by alphanumerics (which is what such a format string would have
traditionally resulted in).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The actual code is the same, just split out into a helper function.
This makes it easier to read, and allows for simple future extension
of %p handling.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The actual code is the same, just split out into a helper function.
This makes it easier to read, and allows for future sharing of the
string code.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 95b570c9ce ("Taint kernel after
WARN_ON(condition)") introduced a TAINT_WARN that was implemented for
all architectures using the generic warn_on_slowpath(), which excluded
any architecture that set HAVE_ARCH_WARN_ON.
As all of the architectures that implement their own WARN_ON() all go
through the report_bug() path (specifically handling BUG_TRAP_TYPE_WARN),
taint the kernel there as well for consistency.
Tested on avr32 and sh. Also relevant for s390, parisc, and powerpc.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove all clameter@sgi.com addresses from the kernel tree since they will
become invalid on June 27th. Change my maintainer email address for the
slab allocators to cl@linux-foundation.org (which will be the new email
address for the future).
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (55 commits)
net: fib_rules: fix error code for unsupported families
netdevice: Fix wrong string handle in kernel command line parsing
net: Tyop of sk_filter() comment
netlink: Unneeded local variable
net-sched: fix filter destruction in atm/hfsc qdisc destruction
net-sched: change tcf_destroy_chain() to clear start of filter list
ipv4: fix sysctl documentation of time related values
mac80211: don't accept WEP keys other than WEP40 and WEP104
hostap: fix sparse warnings
hostap: don't report useless WDS frames by default
textsearch: fix Boyer-Moore text search bug
netfilter: nf_conntrack_tcp: fixing to check the lower bound of valid ACK
ipv6 route: Convert rt6_device_match() to use RT6_LOOKUP_F_xxx flags.
netlabel: Fix a problem when dumping the default IPv6 static labels
net/inet_lro: remove setting skb->ip_summed when not LRO-able
inet fragments: fix race between inet_frag_find and inet_frag_secret_rebuild
CONNECTOR: add a proc entry to list connectors
netlink: Fix some doc comments in net/netlink/attr.c
tcp: /proc/net/tcp rto,ato values not scaled properly (v2)
include/linux/netdevice.h: don't export MAX_HEADER to userspace
...
The current logic has a bug which cannot find matching pattern, if the
pattern is matched from the first character of target string.
for example:
pattern=abc, string=abcdefg
pattern=a, string=abcdefg
Searching algorithm should return 0 for those things.
Signed-off-by: Joonwoo Park <joonwpark81@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds saved stack-traces to the backtrace suite of self-tests.
Note that we don't depend on or unconditionally enable CONFIG_STACKTRACE
because not all architectures may have it (and we still want to enable the
other tests for those architectures).
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add some (configurable) expensive sanity checking to catch wrong address
translations on x86.
- create linux/mmdebug.h file to be able include this file in
asm headers to not get unsolvable loops in header files
- __phys_addr on x86_32 became a function in ioremap.c since
PAGE_OFFSET, is_vmalloc_addr and VMALLOC_* non-constasts are undefined
if declared in page_32.h
- add __phys_addr_const for initializing doublefault_tss.__cr3
Tested on 386, 386pae, x86_64 and x86_64 numa=fake=2.
Contains Andi's enable numa virtual address debug patch.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch re-institutes the ability to build rcutorture directly into
the Linux kernel. The reason that this capability was removed was that
this could result in your kernel being pretty much useless, as rcutorture
would be running starting from early boot. This problem has been avoided
by (1) making rcutorture run only three seconds of every six by default,
(2) adding a CONFIG_RCU_TORTURE_TEST_RUNNABLE that permits rcutorture
to be quiesced at boot time, and (3) adding a sysctl in /proc named
/proc/sys/kernel/rcutorture_runnable that permits rcutorture to be
quiesced and unquiesced when built into the kernel.
Please note that this /proc file is -not- available when rcutorture
is built as a module. Please also note that to get the earlier
take-no-prisoners behavior, you must use the boot command line to set
rcutorture's "stutter" parameter to zero.
The rcutorture quiescing mechanism is currently quite crude: loops
in each rcutorture process that poll a global variable once per tick.
Suggestions for improvement are welcome. The default action will
be to reduce the polling rate to a few times per second.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Suggested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Daniel J Blueman reported:
| =======================================================
| [ INFO: possible circular locking dependency detected ]
| 2.6.26-rc5-201c #1
| -------------------------------------------------------
| nscd/3669 is trying to acquire lock:
| (&n->list_lock){.+..}, at: [<ffffffff802bab03>] deactivate_slab+0x173/0x1e0
|
| but task is already holding lock:
| (&obj_hash[i].lock){++..}, at: [<ffffffff803fa56f>]
| __debug_object_init+0x2f/0x350
|
| which lock already depends on the new lock.
There are two locks involved here; the first is a SLUB-local lock, and
the second is a debugobjects-local lock. They are basically taken in two
different orders:
1. SLUB { debugobjects { ... } }
2. debugobjects { SLUB { ... } }
This patch changes pattern #2 by trying to fill the memory pool (e.g.
the call into SLUB/kmalloc()) outside the debugobjects lock, so now the
two patterns look like this:
1. SLUB { debugobjects { ... } }
2. SLUB { } debugobjects { ... }
[ daniel.blueman@gmail.com: pool_lock needs to be taken irq safe in fill_pool ]
Reported-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This reverts commit 9aaffc898f.
That commit was a very bad idea. RCU_TORTURE found many boot timing
bugs and other sorts of bugs in the past, so excluding it from
boot images is very silly.
The option already depends on DEBUG_KERNEL and is disabled by default.
Even when it runs, the test threads are reniced. If it annoys people
we could add a runtime sysctl.
We shrink a radix tree when its root node has only one child, in the left
most slot. The child becomes the new root node. To perform this
operation in a manner compatible with concurrent lockless lookups, we
atomically switch the root pointer from the parent to its child.
However a concurrent lockless lookup may now have loaded a pointer to the
parent (and is presently deciding what to do next). For this reason, we
also have to keep the parent node in a valid state after shrinking the
tree, until the next RCU grace period -- otherwise this lookup with the
parent pointer may not do the right thing. Notably, we need to keep the
child in the left most slot there in case that is requested by the lookup.
This is all pretty standard RCU stuff. It is worth repeating because in
my eagerness to obey the radix tree node constructor scheme, I had broken
it by zeroing the radix tree node before the grace period.
What could happen is that a lookup can load the parent pointer, then
decide it wants to follow the left most child slot, only to find the slot
contained NULL due to the concurrent shrinker having zeroed the parent
node before waiting for a grace period. The lookup would return a false
negative as a result.
Fix it by doing that clearing in the RCU callback. I would normally want
to rip out the constructor entirely, but radix tree nodes are one of those
places where they make sense (only few cachelines will be touched soon
after allocation).
This was never actually found in any lockless pagecache testing or by the
test harness, but by seeing the odd problem with my scalable vmap rewrite.
I have not tickled the test harness into reproducing it yet, but I'll
keep working at it.
Fortunately, it is not a problem anywhere lockless pagecache is used in
mainline kernels (pagecache probe is not a guarantee, and brd does not
have concurrent lookups and deletes).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
iter_div_u64_rem is used in the x86-64 vdso, which cannot call other
kernel code. For this case, provide the always_inlined version,
__iter_div_u64_rem.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
We have a few instances of the open-coded iterative div/mod loop, used
when we don't expcet the dividend to be much bigger than the divisor.
Unfortunately modern gcc's have the tendency to strength "reduce" this
into a full mod operation, which isn't necessarily any faster, and
even if it were, doesn't exist if gcc implements it in libgcc.
The workaround is to put a dummy asm statement in the loop to prevent
gcc from performing the transformation.
This patch creates a single implementation of this loop, and uses it
to replace the open-coded versions I know about.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>
Cc: Christian Kujau <lists@nerdbynature.de>
Cc: Robert Hancock <hancockr@shaw.ca>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Detect all physical PCI slots as described by ACPI, and create entries in
/sys/bus/pci/slots/.
Not all physical slots are hotpluggable, and the acpiphp module does not
detect them. Now we know the physical PCI geography of our system, without
caring about hotplug.
[kaneshige.kenji@jp.fujitsu.com: export-kobject_rename-for-pci_hotplug_core]
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Acked-by: Greg KH <greg@kroah.com>
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix build with CONFIG_DMI=n]
Signed-off-by: Alex Chiang <achiang@hp.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Cc: Len Brown <lenb@kernel.org>
Acked-by: Len Brown <len.brown@intel.com>
Acked-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Bluetooth will be able to use this.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
allow users to configure the softlockup detector to generate a panic
instead of a warning message.
high-availability systems might opt for this strict method (combined
with panic_timeout= boot option/sysctl), instead of generating
softlockup warnings ad infinitum.
also, automated tests work better if the system reboots reliably (into
a safe kernel) in case of a lockup.
The full spectrum of configurability is supported: boot option, sysctl
option and Kconfig option.
it's default-disabled.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch removes the Makefile turd and uses the nice CFLAGS_REMOVE macro
in the lib directory.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The debug functions in spin_lock debugging pollute the output of the
function tracer. This patch adds the debug files in the lib director
to those that should not be compiled with mcount tracing.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Most archs define the string and memory compare functions in assembly.
Some do not. But these functions may be used in some archs at early
boot up.
Since most archs define this code in assembly and they are not usually
traced, there's no need to trace them when they are not defined in
assembly.
This patch removes the -pg from the CFLAGS for lib/string.o.
This prevents the string functions use in either vdso or early bootup
from crashing the system.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The debug smp_processor_id caused a recursive fault in debugging
the irqsoff tracer. The tracer used a smp_processor_id in the
ftrace callback, and this function called preempt_disable which
also is traced. This caused a recursive fault (stack overload).
Since using smp_processor_id without debugging on does not cause
faults with the tracer (even when the tracer is wrong), the
debug version should not cause a system reboot.
This changes the debug_smp_processor_id to use the notrace versions
of preempt_disable and enable.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
If CONFIG_FTRACE is selected and /proc/sys/kernel/ftrace_enabled is
set to a non-zero value the ftrace routine will be called everytime
we enter a kernel function that is not marked with the "notrace"
attribute.
The ftrace routine will then call a registered function if a function
happens to be registered.
[ This code has been highly hacked by Steven Rostedt and Ingo Molnar,
so don't blame Arnaldo for all of this ;-) ]
Update:
It is now possible to register more than one ftrace function.
If only one ftrace function is registered, that will be the
function that ftrace calls directly. If more than one function
is registered, then ftrace will call a function that will loop
through the functions to call.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Mark with "notrace" functions in core code that should not be
traced. The "notrace" attribute will prevent gcc from adding
a call to ftrace on the annotated funtions.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Increase performance for systems with large count NR_CPUS by limiting
the range of the cpumask operators that loop over the bits in a cpumask_t
variable. This removes a large amount of wasted cpu cycles.
* Add performance variants of the cpumask operators:
int cpus_weight_nr(mask) Same using nr_cpu_ids instead of NR_CPUS
int first_cpu_nr(mask) Number lowest set bit, or nr_cpu_ids
int next_cpu_nr(cpu, mask) Next cpu past 'cpu', or nr_cpu_ids
for_each_cpu_mask_nr(cpu, mask) for-loop cpu over mask using nr_cpu_ids
* Modify following to use performance variants:
#define num_online_cpus() cpus_weight_nr(cpu_online_map)
#define num_possible_cpus() cpus_weight_nr(cpu_possible_map)
#define num_present_cpus() cpus_weight_nr(cpu_present_map)
#define for_each_possible_cpu(cpu) for_each_cpu_mask_nr((cpu), ...)
#define for_each_online_cpu(cpu) for_each_cpu_mask_nr((cpu), ...)
#define for_each_present_cpu(cpu) for_each_cpu_mask_nr((cpu), ...)
* Comment added to include/linux/cpumask.h:
Note: The alternate operations with the suffix "_nr" are used
to limit the range of the loop to nr_cpu_ids instead of
NR_CPUS when NR_CPUS > 64 for performance reasons.
If NR_CPUS is <= 64 then most assembler bitmask
operators execute faster with a constant range, so
the operator will continue to use NR_CPUS.
Another consideration is that nr_cpu_ids is initialized
to NR_CPUS and isn't lowered until the possible cpus are
discovered (including any disabled cpus). So early uses
will span the entire range of NR_CPUS.
(The net effect is that for systems with 64 or less CPU's there are no
functional changes.)
For inclusion into sched-devel/latest tree.
Based on:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
+ sched-devel/latest .../mingo/linux-2.6-sched-devel.git
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Move rcu-protected lists from list.h into a new header file rculist.h.
This is done because list are a very used primitive structure all over the
kernel and it's currently impossible to include other header files in this
list.h without creating some circular dependencies.
For example, list.h implements rcu-protected list and uses rcu_dereference()
without including rcupdate.h. It actually compiles because users of
rcu_dereference() are macros. Others RCU functions could be used too but
aren't probably because of this.
Therefore this patch creates rculist.h which includes rcupdates without to
many changes/troubles.
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
lib/lmb.c: In function 'lmb_dump_all':
lib/lmb.c:51: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'u64'
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
* 'for-linus' of ssh://master.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix error path during early mount
9p: make cryptic unknown error from server less scary
9p: fix flags length in net
9p: Correct fidpool creation failure in p9_client_create
9p: use struct mutex instead of struct semaphore
9p: propagate parse_option changes to client and transports
fs/9p/v9fs.c (v9fs_parse_options): Handle kstrdup and match_strdup failure.
9p: Documentation updates
add match_strlcpy() us it to make v9fs make uname and remotename parsing more robust
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Use a TS_RESTORE_SIGMASK
lmb: Make lmb debugging more useful.
lmb: Fix inconsistent alignment of size argument.
sparc: Fix mremap address range validation.
Add a common hex array in hexdump.c so everyone can use it.
Add a common hi/lo helper to avoid the shifting masking that is
done to get the upper and lower nibbles of a byte value.
Pull the pack_hex_byte helper from kgdb as it is opencoded many
places in the tree that will be consolidated.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
match_strcpy() is a somewhat creepy function: the caller needs to make sure
that the destination buffer is big enough, and when he screws up or
forgets, match_strcpy() happily overruns the buffer.
There's exactly one customer: v9fs_parse_options(). I believe it currently
can't overflow its buffer, but that's not exactly obvious.
The source string is a substing of the mount options. The kernel silently
truncates those to PAGE_SIZE bytes, including the terminating zero. See
compat_sys_mount() and do_mount().
The destination buffer is obtained from __getname(), which allocates from
name_cachep, which is initialized by vfs_caches_init() for size PATH_MAX.
We're safe as long as PATH_MAX <= PAGE_SIZE. PATH_MAX is 4096. As far as
I know, the smallest PAGE_SIZE is also 4096.
Here's a patch that makes the code a bit more obviously correct. It
doesn't depend on PATH_MAX <= PAGE_SIZE.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Cc: Latchesar Ionkov <lucho@ionkov.net>
Cc: Jim Meyering <meyering@redhat.com>
Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
They aren't used. They were briefly used as part of some other patches to
provide an alternative format for displaying some /proc and /sys cpumasks.
They probably should have been removed when those other patches were dropped,
in favor of a different solution.
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: "Mike Travis" <travis@sgi.com>
Cc: "Bert Wesarg" <bert.wesarg@googlemail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Having to muck with the build and set DEBUG just to
get lmb_dump_all() to print things isn't very useful.
So use pr_info() and use an early boot param
"lmb=debug" so we can simply ask users to reboot
with this option when we need some debugging from
them.
Signed-off-by: David S. Miller <davem@davemloft.net>
When allocating, if we will align up the size when making
the reservation, we should also align the size for the
check that the space is actually available.
The simplest thing is to just aling the size up from
the beginning, then we can use plain 'size' throughout.
Signed-off-by: David S. Miller <davem@davemloft.net>
The generic semaphore rewrite had a huge performance regression on AIM7
(and potentially other BKL-heavy benchmarks) because the generic
semaphores had been rewritten to be simple to understand and fair. The
latter, in particular, turns a semaphore-based BKL implementation into a
mess of scheduling.
The attempt to fix the performance regression failed miserably (see the
previous commit 00b41ec261 'Revert
"semaphore: fix"'), and so for now the simple and sane approach is to
instead just go back to the old spinlock-based BKL implementation that
never had any issues like this.
This patch also has the advantage of being reported to fix the
regression completely according to Yanmin Zhang, unlike the semaphore
hack which still left a couple percentage point regression.
As a spinlock, the BKL obviously has the potential to be a latency
issue, but it's not really any different from any other spinlock in that
respect. We do want to get rid of the BKL asap, but that has been the
plan for several years.
These days, the biggest users are in the tty layer (open/release in
particular) and Alan holds out some hope:
"tty release is probably a few months away from getting cured - I'm
afraid it will almost certainly be the very last user of the BKL in
tty to get fixed as it depends on everything else being sanely locked."
so while we're not there yet, we do have a plan of action.
Tested-by: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Alexander Viro <viro@ftp.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We provide an ioremap_flags, so this provides a corresponding
devm_ioremap_prot. The slight name difference is at Ben
Herrenschmidt's request as he plans on changing ioremap_flags to
ioremap_prot in the future.
Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Tejun Heo <htejun@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The return inside the loop makes us free only a single layer.
Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Jim Houston <jim.houston@comcast.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a new sysfs_streq() string comparison function, which ignores
the trailing newlines found in sysfs inputs. By example:
sysfs_streq("a", "b") ==> false
sysfs_streq("a", "a") ==> true
sysfs_streq("a", "a\n") ==> true
sysfs_streq("a\n", "a") ==> true
This is intended to simplify parsing of sysfs inputs, letting them
avoid the need to manually strip off newlines from inputs.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide. Move its definition
to math64.h as currently no architecture overrides the generic implementation.
They can still override it of course, but the duplicated declarations are
avoided.
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current do_div doesn't explicitly say that it's unsigned and the signed
counterpart is missing, which is e.g. needed when dealing with time values.
This introduces 64bit signed/unsigned divide functions which also attempts to
cleanup the somewhat awkward calling API, which often requires the use of
temporary variables for the dividend. To avoid the need for temporary
variables everywhere for the remainder, each divide variant also provides a
version which doesn't return the remainder.
Each architecture can now provide optimized versions of these function,
otherwise generic fallback implementations will be used.
As an example I provided an alternative for the current x86 divide, which
avoids the asm casts and using an union allows gcc to generate better code.
It also avoids the upper divde in a few more cases, where the result is known
(i.e. upper quotient is zero).
Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use a resource_size_t instead of unsigned long since some arch's are
capable of having ioremap deal with addresses greater than the size of a
unsigned long.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add klist_add_after() and klist_add_before() which puts a new node
after and before an existing node, respectively. This is useful for
callers which need to keep klist ordered. Note that synchronizing
between simultaneous additions for ordering is the caller's
responsibility.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
__FUNCTION__ is gcc specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We can see an ever repeating problem pattern with objects of any kind in the
kernel:
1) freeing of active objects
2) reinitialization of active objects
Both problems can be hard to debug because the crash happens at a point where
we have no chance to decode the root cause anymore. One problem spot are
kernel timers, where the detection of the problem often happens in interrupt
context and usually causes the machine to panic.
While working on a timer related bug report I had to hack specialized code
into the timer subsystem to get a reasonable hint for the root cause. This
debug hack was fine for temporary use, but far from a mergeable solution due
to the intrusiveness into the timer code.
The code further lacked the ability to detect and report the root cause
instantly and keep the system operational.
Keeping the system operational is important to get hold of the debug
information without special debugging aids like serial consoles and special
knowledge of the bug reporter.
The problems described above are not restricted to timers, but timers tend to
expose it usually in a full system crash. Other objects are less explosive,
but the symptoms caused by such mistakes can be even harder to debug.
Instead of creating specialized debugging code for the timer subsystem a
generic infrastructure is created which allows developers to verify their code
and provides an easy to enable debug facility for users in case of trouble.
The debugobjects core code keeps track of operations on static and dynamic
objects by inserting them into a hashed list and sanity checking them on
object operations and provides additional checks whenever kernel memory is
freed.
The tracked object operations are:
- initializing an object
- adding an object to a subsystem list
- deleting an object from a subsystem list
Each operation is sanity checked before the operation is executed and the
subsystem specific code can provide a fixup function which allows to prevent
the damage of the operation. When the sanity check triggers a warning message
and a stack trace is printed.
The list of operations can be extended if the need arises. For now it's
limited to the requirements of the first user (timers).
The core code enqueues the objects into hash buckets. The hash index is
generated from the address of the object to simplify the lookup for the check
on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a
global lock.
The debug code can be compiled in without being active. The runtime overhead
is minimal and could be optimized by asm alternatives. A kernel command line
option enables the debugging code.
Thanks to Ingo Molnar for review, suggestions and cleanup patches.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add "max_ratio" to /sys/class/bdi. This indicates the maximum percentage of
the global dirty threshold allocated to this bdi.
[mszeredi@suse.cz]
- fix parsing in max_ratio_store().
- export bdi_set_max_ratio() to modules
- limit bdi_dirty with bdi->max_ratio
- document new sysfs attribute
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info object.
This allows us to see and set the various BDI specific variables.
In particular this properly exposes the read-ahead window for all relevant
users and /sys/block/<block>/queue/read_ahead_kb should be deprecated.
With patient help from Kay Sievers and Greg KH
[mszeredi@suse.cz]
- split off NFS and FUSE changes into separate patches
- document new sysfs attributes under Documentation/ABI
- do bdi_class_init as a core_initcall, otherwise the "default" BDI
won't be initialized
- remove bdi_init_fmt macro, it's not used very much
[akpm@linux-foundation.org: fix ia64 warning]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Acked-by: Greg KH <greg@kroah.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[RAPIDIO] Change RapidIO doorbell source and target ID field to 16-bit
[RAPIDIO] Add RapidIO connection info print out and re-training for broken connections
[RAPIDIO] Add serial RapidIO controller support, which includes MPC8548, MPC8641
[RAPIDIO] Add RapidIO node probing into MPC86xx_HPCN board id table
[RAPIDIO] Add RapidIO node into MPC8641HPCN dts file
[RAPIDIO] Auto-probe the RapidIO system size
[RAPIDIO] Add OF-tree support to RapidIO controller driver
[RAPIDIO] Add RapidIO multi mport support
[RAPIDIO] Move include/asm-ppc/rio.h to asm-powerpc
[RAPIDIO] Add RapidIO option to kernel configuration
[RAPIDIO] Change RIO function mpc85xx_ to fsl_
[POWERPC] Provide walk_memory_resource() for powerpc
[POWERPC] Update lmb data structures for hotplug memory add/remove
[POWERPC] Hotplug memory remove notifications for powerpc
[POWERPC] windfarm: Add PowerMac 12,1 support
[POWERPC] Fix building of pmac32 when CONFIG_NVRAM=m
[POWERPC] Add IRQSTACKS support on ppc32
[POWERPC] Use __always_inline for xchg* and cmpxchg*
[POWERPC] Add fast little-endian switch system call
The mapsize optimizations which were moved from x86 to the generic
code in commit 64970b68d2 increased the
binary size on non x86 architectures.
Looking into the real effects of the "optimizations" it turned out
that they are not used in find_next_bit() and find_next_zero_bit().
The ones in find_first_bit() and find_first_zero_bit() are used in a
couple of places but none of them is a real hot path.
Remove the "optimizations" all together and call the library functions
unconditionally.
Boot-tested on x86 and compile tested on every cross compiler I have.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid a possible kmem_cache_create() failure by creating idr_layer_cache
unconditionary at boot time rather than creating it on-demand when idr_init()
is called the first time.
This change also enables us to eliminate the check every time idr_init() is
called.
[akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()]
[akpm@linux-foundation.org: fix alpha build]
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change all ia64 machvecs to use the new dma_*map*_attrs() interfaces.
Implement the old dma_*map_*() interfaces in terms of the corresponding new
interfaces. For ia64/sn, make use of one dma attribute,
DMA_ATTR_WRITE_BARRIER. Introduce swiotlb_*map*_attrs() functions.
Signed-off-by: Arthur Kepner <akepner@sgi.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Jes Sorensen <jes@sgi.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: David Miller <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Due to the rcupreempt.h WARN_ON trigged, I got 2G syslog file. For some
serious complaining of kernel, we need repeat the warnings, so here I isolate
the ratelimit part of printk.c to a standalone file.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
iommu_is_span_boundary in lib/iommu-helper.c was exported for PARISC IOMMUs
(commit 3715863aa1). SWIOTLB can use it instead
of the homegrown function.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There's a pointlessly braced block of code in there. Remove the braces and
save a tabstop.
Cc: Andi Kleen <ak@suse.de>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jan Beulich <jbeulich@novell.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Almost all implementations of pci_iomap() in the kernel, including the generic
lib/iomap.c one, copies the content of a struct resource into unsigned long's
which will break on 32 bits platforms with 64 bits resources.
This fixes all definitions of pci_iomap() to use resource_size_t. I also
"fixed" the 64bits arch for consistency.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide walk_memory_resource() for 64-bit powerpc. PowerPC maintains
logical memory region mapping in the lmb.memory structure. Walk
through these structures and do the callbacks for the contiguous
chunks.
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The powerpc kernel maintains information about logical memory blocks
in the lmb.memory structure, which is initialized and updated at boot
time, but not when memory is added or removed while the kernel is
running.
This adds a hotplug memory notifier which updates lmb.memory when
memory is added or removed. This information is useful for eHEA
driver to find out the memory layout and holes.
NOTE: No special locking is needed for lmb_add() and lmb_remove().
Calls to these are serialized by caller. (pSeries_reconfig_chain).
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(),
with the usual cpumask and nodemask wrappers.
The bitmap_onto() operator computes one bitmap relative to another. If the
n-th bit in the origin mask is set, then the m-th bit of the destination mask
will be set, where m is the position of the n-th set bit in the relative mask.
The bitmap_fold() operator folds a bitmap into a second that has bit m set iff
the input bitmap has some bit n set, where m == n mod sz, for the specified sz
value.
There are two substantive changes between this patch and its
predecessor bitmap_relative:
1) Renamed bitmap_relative() to be bitmap_onto().
2) Added bitmap_fold().
The essential motivation for bitmap_onto() is to provide a mechanism for
converting a cpuset-relative CPU or Node mask to an absolute mask. Cpuset
relative masks are written as if the current task were in a cpuset whose CPUs
or Nodes were just the consecutive ones numbered 0..N-1, for some N. The
bitmap_onto() operator is provided in anticipation of adding support for the
first such cpuset relative mask, by the mbind() and set_mempolicy() system
calls, using a planned flag of MPOL_F_RELATIVE_NODES. These bitmap operators
(and their nodemask wrappers, in particular) will be used in code that
converts the user specified cpuset relative memory policy to a specific system
node numbered policy, given the current mems_allowed of the tasks cpuset.
Such cpuset relative mempolicies will address two deficiencies
of the existing interface between cpusets and mempolicies:
1) A task cannot at present reliably establish a cpuset
relative mempolicy because there is an essential race
condition, in that the tasks cpuset may be changed in
between the time the task can query its cpuset placement,
and the time the task can issue the applicable mbind or
set_memplicy system call.
2) A task cannot at present establish what cpuset relative
mempolicy it would like to have, if it is in a smaller
cpuset than it might have mempolicy preferences for,
because the existing interface only allows specifying
mempolicies for nodes currently allowed by the cpuset.
Cpuset relative mempolicies are useful for tasks that don't distinguish
particularly between one CPU or Node and another, but only between how many of
each are allowed, and the proper placement of threads and memory pages on the
various CPUs and Nodes available.
The motivation for the added bitmap_fold() can be seen in the following
example.
Let's say an application has specified some mempolicies that presume 16 memory
nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset
relative) nodes 12-15. Then lets say that application is crammed into a
cpuset that only has 8 memory nodes, 0-7. If one just uses bitmap_onto(),
this mempolicy, mapped to that cpuset, would ignore the requested relative
nodes above 7, leaving it empty of nodes. That's not good; better to fold the
higher nodes down, so that some nodes are included in the resulting mapped
mempolicy. In this case, the mempolicy nodes 12-15 are taken modulo 8 (the
weight of the mems_allowed of the confining cpuset), resulting in a mempolicy
specifying nodes 4-7.
Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: <kosaki.motohiro@jp.fujitsu.com>
Cc: <ray-lk@madrabbit.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Migrate flags must be set on slab creation as agreed upon when the antifrag
logic was reviewed. Otherwise some slabs of a slabcache will end up in the
unmovable and others in the reclaimable section depending on which flag was
active when a new slab page was allocated.
This likely slid in somehow when antifrag was merged. Remove it.
The buffer_heads are always allocated with __GFP_RECLAIMABLE because the
SLAB_RECLAIM_ACCOUNT option is set. The set_migrateflags() never had any
effect there.
Radix tree allocations are not directly reclaimable but they are allocated
with __GFP_RECLAIMABLE set on each allocation. We now set
SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix
tree slabs are consistently placed in the reclaimable section. Radix tree
slabs will also be accounted as such.
There is then no user left of set_migratepages. So remove it.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce GENERIC_FIND_FIRST_BIT and GENERIC_FIND_NEXT_BIT in
lib/Kconfig, defaulting to off. An arch that wants to use the
generic implementation now only has to use a select statement
to include them.
I added an always-y option (X86_CPU) to arch/x86/Kconfig.cpu
and used that to select the generic search functions. This
way ARCH=um SUBARCH=i386 automatically picks up the change
too, and arch/um/Kconfig.i386 can therefore be simplified a
bit. ARCH=um SUBARCH=x86_64 does things differently, but
still compiles fine. It seems that a "def_bool y" always
wins over a "def_bool n"?
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Generic versions of __find_first_bit and __find_first_zero_bit
are introduced as simplified versions of __find_next_bit and
__find_next_zero_bit. Their compilation and use are guarded by
a new config variable GENERIC_FIND_FIRST_BIT.
The generic versions of find_first_bit and find_first_zero_bit
are implemented in terms of the newly introduced __find_first_bit
and __find_first_zero_bit.
This patch does not remove the i386-specific implementation,
but it does switch i386 to use the generic functions by setting
GENERIC_FIND_FIRST_BIT=y for X86_32.
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This moves an optimization for searching constant-sized small
bitmaps form x86_64-specific to generic code.
On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
changes with this applied. I have observed only four places where this
optimization avoids a call into find_next_bit:
In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
In __next_cpu a call is avoided for a 32-bit bitmap. That's it.
On x86_64, 52 locations are optimized with a minimal increase in
code size:
Current #testing defconfig:
146 x bsf, 27 x find_next_*bit
text data bss dec hex filename
5392637 846592 724424 6963653 6a41c5 vmlinux
After removing the x86_64 specific optimization for find_next_*bit:
94 x bsf, 79 x find_next_*bit
text data bss dec hex filename
5392358 846592 724424 6963374 6a40ae vmlinux
After this patch (making the optimization generic):
146 x bsf, 27 x find_next_*bit
text data bss dec hex filename
5392396 846592 724424 6963412 6a40d4 vmlinux
[ tglx@linutronix.de: build fixes ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The versions with inline assembly are in fact slower on the machines I
tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
crashing the benchmark.
Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
1...512, for each possible bitmap with one bit set, for each possible
offset: find the position of the first bit starting at offset. If you
follow ;). Times include setup of the bitmap and checking of the
results.
Athlon Xeon Opteron 32/64bit
x86-specific: 0m3.692s 0m2.820s 0m3.196s / 0m2.480s
generic: 0m2.622s 0m1.662s 0m2.100s / 0m1.572s
If the bitmap size is not a multiple of BITS_PER_LONG, and no set
(cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
value outside of the range [0, size]. The generic version always returns
exactly size. The generic version also uses unsigned long everywhere,
while the x86 versions use a mishmash of int, unsigned (int), long and
unsigned long.
Using the generic version does give a slightly bigger kernel, though.
defconfig: text data bss dec hex filename
x86-specific: 4738555 481232 626688 5846475 5935cb vmlinux (32 bit)
generic: 4738621 481232 626688 5846541 59360d vmlinux (32 bit)
x86-specific: 5392395 846568 724424 6963387 6a40bb vmlinux (64 bit)
generic: 5392458 846568 724424 6963450 6a40fa vmlinux (64 bit)
Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add option to enable -Wframe-larger-than= on gcc 4.4
gcc mainline (upcoming 4.4) added a new -Wframe-larger-than=...
option to warn at build time about too large stack frames. Add a config
option to enable this warning, since this very useful for the kernel.
I choose (somewhat arbitarily) 2048 as default warning threshold for 64bit
and 1024 as default for 32bit architectures. With some research and
fixing all the code for smaller values these defaults should be probably
lowered.
With the default allyesconfigs have some new warnings, but I think
that is all code that should be just fixed.
At some point (when gcc 4.4 is released and widely used) this should
obsolete make checkstack
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Changeset d9024df02f ("[LMB] Restructure
allocation loops to avoid unsigned underflow") removed the alignment
of the 'size' argument to call lmb_add_region() done by __lmb_alloc_base().
In doing so it reintroduced the bug fixed by changeset
eea89e13a9 ("[LMB]: Fix bug in
__lmb_alloc_base().").
This puts it back.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (202 commits)
[POWERPC] Fix compile breakage for 64-bit UP configs
[POWERPC] Define copy_siginfo_from_user32
[POWERPC] Add compat handler for PTRACE_GETSIGINFO
[POWERPC] i2c: Fix build breakage introduced by OF helpers
[POWERPC] Optimize fls64() on 64-bit processors
[POWERPC] irqtrace support for 64-bit powerpc
[POWERPC] Stacktrace support for lockdep
[POWERPC] Move stackframe definitions to common header
[POWERPC] Fix device-tree locking vs. interrupts
[POWERPC] Make pci_bus_to_host()'s struct pci_bus * argument const
[POWERPC] Remove unused __max_memory variable
[POWERPC] Simplify xics direct/lpar irq_host setup
[POWERPC] Use pseries_setup_i8259_cascade() in pseries_mpic_init_IRQ()
[POWERPC] Turn xics_setup_8259_cascade() into a generic pseries_setup_i8259_cascade()
[POWERPC] Move xics_setup_8259_cascade() into platforms/pseries/setup.c
[POWERPC] Use asm-generic/bitops/find.h in bitops.h
[POWERPC] 83xx: mpc8315 - fix USB UTMI Host setup
[POWERPC] 85xx: Fix the size of qe muram for MPC8568E
[POWERPC] 86xx: mpc86xx_hpcn - Temporarily accept old dts node identifier.
[POWERPC] 86xx: mark functions static, other minor cleanups
...
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-2.6: (36 commits)
SCSI: convert struct class_device to struct device
DRM: remove unused dev_class
IB: rename "dev" to "srp_dev" in srp_host structure
IB: convert struct class_device to struct device
memstick: convert struct class_device to struct device
driver core: replace remaining __FUNCTION__ occurrences
sysfs: refill attribute buffer when reading from offset 0
PM: Remove destroy_suspended_device()
Firmware: add iSCSI iBFT Support
PM: Remove legacy PM (fix)
Kobject: Replace list_for_each() with list_for_each_entry().
SYSFS: Explicitly include required header file slab.h.
Driver core: make device_is_registered() work for class devices
PM: Convert wakeup flag accessors to inline functions
PM: Make wakeup flags available whenever CONFIG_PM is set
PM: Fix misuse of wakeup flag accessors in serial core
Driver core: Call device_pm_add() after bus_add_device() in device_add()
PM: Handle device registrations during suspend/resume
block: send disk "change" event for rescan_partitions()
sysdev: detect multiple driver registrations
...
Fixed trivial conflict in include/linux/memory.h due to semaphore header
file change (made irrelevant by the change to mutex).
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits)
sched: build fix
sched: better rt-group documentation
sched: features fix
sched: /debug/sched_features
sched: add SCHED_FEAT_DEADLINE
sched: debug: show a weight tree
sched: fair: weight calculations
sched: fair-group: de-couple load-balancing from the rb-trees
sched: fair-group scheduling vs latency
sched: rt-group: optimize dequeue_rt_stack
sched: debug: add some debug code to handle the full hierarchy
sched: fair-group: SMP-nice for group scheduling
sched, cpuset: customize sched domains, core
sched, cpuset: customize sched domains, docs
sched: prepatory code movement
sched: rt: multi level group constraints
sched: task_group hierarchy
sched: fix the task_group hierarchy for UID grouping
sched: allow the group scheduler to have multiple levels
sched: mix tasks and groups
...
Use the more concise list_for_each_entry(), which allows for the
deletion of the to_kobj() routine at the same time.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add warnings to kobject_put() to catch kobjects that are cleaned up but
were never initialized to begin with.
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Hannes Reinecke <hare@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Add a new function cpumask_scnprintf_len() to return the number of
characters needed to display "len" cpumask bits. The current method
of allocating NR_CPUS bytes is incorrect as what's really needed is
9 characters per 32-bit word of cpumask bits (8 hex digits plus the
seperator [','] or the terminating NULL.) This function provides the
caller the means to allocate the correct string length.
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There have been a few oopses caused by 'struct file's with NULL f_vfsmnts.
There was also a set of potentially missed mnt_want_write()s from
dentry_open() calls.
This patch provides a very simple debugging framework to catch these kinds of
bugs. It will WARN_ON() them, but should stop us from having any oopses or
mnt_writer count imbalances.
I'm quite convinced that this is a good thing because it found bugs in the
stuff I was working on as soon as I wrote it.
[hch: made it conditional on a debug option.
But it's still a little bit too ugly]
[hch: merged forced remount r/o fix from Dave and akpm's fix for the fix]
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26: (1090 commits)
[NET]: Fix and allocate less memory for ->priv'less netdevices
[IPV6]: Fix dangling references on error in fib6_add().
[NETLABEL]: Fix NULL deref in netlbl_unlabel_staticlist_gen() if ifindex not found
[PKT_SCHED]: Fix datalen check in tcf_simp_init().
[INET]: Uninline the __inet_inherit_port call.
[INET]: Drop the inet_inherit_port() call.
SCTP: Initialize partial_bytes_acked to 0, when all of the data is acked.
[netdrvr] forcedeth: internal simplifications; changelog removal
phylib: factor out get_phy_id from within get_phy_device
PHY: add BCM5464 support to broadcom PHY driver
cxgb3: Fix __must_check warning with dev_dbg.
tc35815: Statistics cleanup
natsemi: fix MMIO for PPC 44x platforms
[TIPC]: Cleanup of TIPC reference table code
[TIPC]: Optimized initialization of TIPC reference table
[TIPC]: Remove inlining of reference table locking routines
e1000: convert uint16_t style integers to u16
ixgb: convert uint16_t style integers to u16
sb1000.c: make const arrays static
sb1000.c: stop inlining largish static functions
...
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (137 commits)
[SCSI] iscsi: bidi support for iscsi_tcp
[SCSI] iscsi: bidi support at the generic libiscsi level
[SCSI] iscsi: extended cdb support
[SCSI] zfcp: Fix error handling for blocked unit for send FCP command
[SCSI] zfcp: Remove zfcp_erp_wait from slave destory handler to fix deadlock
[SCSI] zfcp: fix 31 bit compile warnings
[SCSI] bsg: no need to set BSG_F_BLOCK bit in bsg_complete_all_commands
[SCSI] bsg: remove minor in struct bsg_device
[SCSI] bsg: use better helper list functions
[SCSI] bsg: replace kobject_get with blk_get_queue
[SCSI] bsg: takes a ref to struct device in fops->open
[SCSI] qla1280: remove version check
[SCSI] libsas: fix endianness bug in sas_ata
[SCSI] zfcp: fix compiler warning caused by poking inside new semaphore (linux-next)
[SCSI] aacraid: Do not describe check_reset parameter with its value
[SCSI] aacraid: Fix down_interruptible() to check the return value
[SCSI] sun3_scsi_vme: add MODULE_LICENSE
[SCSI] st: rename flush_write_buffer()
[SCSI] tgt: use KMEM_CACHE macro
[SCSI] initio: fix big endian problems for auto request sense
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: (43 commits)
firewire: cleanups
firewire: fix synchronization of gap counts
firewire: wait until PHY configuration packet was transmitted (fix bus reset loop)
firewire: remove unused struct member
firewire: use bitwise and to get reg in handle_registers
firewire: replace more hex values with defined csr constants
firewire: reread config ROM when device reset the bus
firewire: replace static ROM cache by allocated cache
firewire: fw-ohci: work around generation bug in TI controllers (fix AV/C and more)
firewire: fw-ohci: extend logging of bus generations and node ID
firewire: fw-ohci: conditionally log busReset interrupts
firewire: fw-ohci: don't append to AT context when it's not active
firewire: fw-ohci: log regAccessFail events
firewire: fw-ohci: make sure HCControl register LPS bit is set
firewire: fw-ohci: missing PPC PMac feature calls in failure path
firewire: fw-ohci: untangle a mixed unsigned/signed expression
firewire: debug interrupt events
firewire: fw-ohci: catch self_id_count == 0
firewire: fw-ohci: add self ID error check
firewire: fw-ohci: refactor probe, remove, suspend, resume
...
This way firewire-ohci can be used for remote debugging like ohci1394.
Version with amendment from Fri, 11 Apr 2008 00:08:08 +0200.
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Bernhard Kaindl <bk@suse.de>
* 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc:
Remove DEBUG_SEMAPHORE from Kconfig
Improve semaphore documentation
Simplify semaphore implementation
Add down_timeout and change ACPI to use it
Introduce down_killable()
Generic semaphore implementation
Add semaphore.h to kernel_lock.c
Fix quota.h includes
This patch adds in the ability to compile the kgdb internal test
string into the kernel so as to run the tests at boot without changing
the kernel boot arguments. This patch also changes all the error
paths to invoke WARN_ON(1) which will emit the line number of the file
and dump the kernel stack when an error occurs.
You can disable the tests in a kernel that is built this way
using "kgdbts="
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch adds regression tests for testing the kgdb core and arch
specific implementation.
The kgdb test suite is designed to be built into the kernel and not as
a module because it uses a number of low level kernel and kgdb
primitives which should not be exported externally.
The kgdb test suite is designed as a KGDB I/O module which
simulates the communications that a debugger would have with kgdb.
The tests are broken up in to a line by line and referenced here as
a "get" which is kgdb requesting input and "put" which is kgdb
sending a response.
The kgdb suite can be invoked from the kernel command line
arguments system or executed dynamically at run time. The test
suite uses the variable "kgdbts" to obtain the information about
which tests to run and to configure the verbosity level. The
following are the various characters you can use with the kgdbts=
line:
When using the "kgdbts=" you only choose one of the following core
test types:
A = Run all the core tests silently
V1 = Run all the core tests with minimal output
V2 = Run all the core tests in debug mode
You can also specify optional tests:
N## = Go to sleep with interrupts of for ## seconds
to test the HW NMI watchdog
F## = Break at do_fork for ## iterations
S## = Break at sys_open for ## iterations
NOTE: that the do_fork and sys_open tests are mutually exclusive.
To invoke the kgdb test suite from boot you use a kernel start
argument as follows:
kgdbts=V1 kgdbwait
Or if you wanted to perform the NMI test for 6 seconds and do_fork
test for 100 forks, you could use:
kgdbts=V1N6F100 kgdbwait
The test suite can also be invoked at run time with:
echo kgdbts=V1N6F100 > /sys/module/kgdbts/parameters/kgdbts
Or as another example:
echo kgdbts=V2 > /sys/module/kgdbts/parameters/kgdbts
When developing a new kgdb arch specific implementation or
using these tests for the purpose of regression testing,
several invocations are required.
1) Boot with the test suite enabled by using the kernel arguments
"kgdbts=V1F100 kgdbwait"
## If kgdb arch specific implementation has NMI use
"kgdbts=V1N6F100
2) After the system boot run the basic test.
echo kgdbts=V1 > /sys/module/kgdbts/parameters/kgdbts
3) Run the concurrency tests. It is best to use n+1
while loops where n is the number of cpus you have
in your system. The example below uses only two
loops.
## This tests break points on sys_open
while [ 1 ] ; do find / > /dev/null 2>&1 ; done &
while [ 1 ] ; do find / > /dev/null 2>&1 ; done &
echo kgdbts=V1S10000 > /sys/module/kgdbts/parameters/kgdbts
fg # and hit control-c
fg # and hit control-c
## This tests break points on do_fork
while [ 1 ] ; do date > /dev/null ; done &
while [ 1 ] ; do date > /dev/null ; done &
echo kgdbts=V1F1000 > /sys/module/kgdbts/parameters/kgdbts
fg # and hit control-c
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
kgdb core code. Handles the protocol and the arch details.
[ mingo@elte.hu: heavily modified, simplified and cleaned up. ]
[ xemul@openvz.org: use find_task_by_pid_ns ]
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jan Kiszka <jan.kiszka@web.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Alpha and FRV mutexes had an option to print lots of debugging messages
in their semaphore implementation. This feature has not been carried
over to the generic semaphores, so remove the stale Kconfig option.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Semaphores are no longer performance-critical, so a generic C
implementation is better for maintainability, debuggability and
extensibility. Thanks to Peter Zijlstra for fixing the lockdep
warning. Thanks to Harvey Harrison for pointing out that the
unlikely() was unnecessary.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
kernel_lock.c uses DECLARE_MUTEX, up() and down() without explicitly
including asm/semaphore.h. This is fragile and leaves it vulnerable
to breakage during header reorganisations.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
There is a potential bug in __lmb_alloc_base where we subtract `size'
from the base address of a reserved region without checking whether
the subtraction could wrap around and produce a very large unsigned
value. In fact it probably isn't possible to hit the bug in practice
since it would only occur in the situation where we can't satisfy the
allocation request and there is a reserved region starting at 0.
This fixes the potential bug by breaking out of the loop when we get
to the point where the base of the reserved region is less than the
size requested. This also restructures the loop to be a bit easier to
follow.
The same logic got copied into lmb_alloc_nid_unreserved, so this makes
a similar change there. Here the bug is more likely to be hit because
the outer loop (in lmb_alloc_nid) goes through the memory regions in
increasing order rather than decreasing order as __lmb_alloc_base
does, and we are therefore more likely to hit the case where we are
testing against a reserved region with a base address of 0.
Signed-off-by: Paul Mackerras <paulus@samba.org>