The MMC_DATA_MULTI flag never had a proper definition of what it
means, so remove it and let the drivers check the block count in
the request.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
The write parameter in mmc_set_data_timeout() is redundant as the
data structure contains information about the direction of the
transfer.
Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
device_suspend() calls ACPI suspend functions, which seems to have undesired
side effects on lower idle C-states. It took me some time to realize that
especially the VAIO BIOSes (both Andrews jinxed UP and my elfstruck SMP one)
show this effect. I'm quite sure that other bug reports against suspend/resume
about turning the system into a brick have the same root cause.
After fishing in the dark for quite some time, I realized that removing the ACPI
processor module before suspend (this removes the lower C-state functionality)
made the problem disappear. Interestingly enough the propability of having a
bricked box is influenced by various factors (interrupts, size of the ram image,
...). Even adding a bunch of printks in the wrong places made the problem go
away. The previous periodic tick implementation simply pampered over the
problem, which explains why the dyntick / clockevents changes made this more
prominent.
We avoid complex functionality during the boot process and we have to do the
same during suspend/resume. It is a similar scenario and equaly fragile.
Add suspend / resume functions to the ACPI processor code and disable the lower
idle C-states across suspend/resume. Fall back to the default idle
implementation (halt) instead.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 34feb2c83b.
Suresh Siddha points out that this one breaks the fundamental
requirement that you cannot free page table pages before the TLB caches
are flushed. The quicklists do not give the same kinds of guarantees
that the mmu_gather structure does, at least not in NUMA configurations.
Requested-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This simplifies signalfd code, by avoiding it to remain attached to the
sighand during its lifetime.
In this way, the signalfd remain attached to the sighand only during
poll(2) (and select and epoll) and read(2). This also allows to remove
all the custom "tsk == current" checks in kernel/signal.c, since
dequeue_signal() will only be called by "current".
I think this is also what Ben was suggesting time ago.
The external effect of this, is that a thread can extract only its own
private signals and the group ones. I think this is an acceptable
behaviour, in that those are the signals the thread would be able to
fetch w/out signalfd.
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield()
more agressive, by moving the yielding task to the last position
in the rbtree.
with sched_compat_yield=0:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2539 mingo 20 0 1576 252 204 R 50 0.0 0:02.03 loop_yield
2541 mingo 20 0 1576 244 196 R 50 0.0 0:02.05 loop
with sched_compat_yield=1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2584 mingo 20 0 1576 248 196 R 99 0.0 0:52.45 loop
2582 mingo 20 0 1576 256 204 R 0 0.0 0:00.00 loop_yield
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Fix timekeeping on PowerPC 601
[POWERPC] Don't expose clock vDSO functions when CPU has no timebase
[POWERPC] spusched: Fix null pointer dereference in find_victim
Add a workaround to address warnings generated on the "n" constraint by
GCC 3.3 and below.
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
This patch proposes fixes to the reference counting of memory policy in the
page allocation paths and in show_numa_map(). Extracted from my "Memory
Policy Cleanups and Enhancements" series as stand-alone.
Shared policy lookup [shmem] has always added a reference to the policy,
but this was never unrefed after page allocation or after formatting the
numa map data.
Default system policy should not require additional ref counting, nor
should the current task's task policy. However, show_numa_map() calls
get_vma_policy() to examine what may be [likely is] another task's policy.
The latter case needs protection against freeing of the policy.
This patch adds a reference count to a mempolicy returned by
get_vma_policy() when the policy is a vma policy or another task's
mempolicy. Again, shared policy is already reference counted on lookup. A
matching "unref" [__mpol_free()] is performed in alloc_page_vma() for
shared and vma policies, and in show_numa_map() for shared and another
task's mempolicy. We can call __mpol_free() directly, saving an admittedly
inexpensive inline NULL test, because we know we have a non-NULL policy.
Handling policy ref counts for hugepages is a bit trickier.
huge_zonelist() returns a zone list that might come from a shared or vma
'BIND policy. In this case, we should hold the reference until after the
huge page allocation in dequeue_hugepage(). The patch modifies
huge_zonelist() to return a pointer to the mempolicy if it needs to be
unref'd after allocation.
Kernel Build [16cpu, 32GB, ia64] - average of 10 runs:
w/o patch w/ refcount patch
Avg Std Devn Avg Std Devn
Real: 100.59 0.38 100.63 0.43
User: 1209.60 0.37 1209.91 0.31
System: 81.52 0.42 81.64 0.34
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It turned out, that the user namespace is released during the do_exit() in
exit_task_namespaces(), but the struct user_struct is released only during the
put_task_struct(), i.e. MUCH later.
On debug kernels with poisoned slabs this will cause the oops in
uid_hash_remove() because the head of the chain, which resides inside the
struct user_namespace, will be already freed and poisoned.
Since the uid hash itself is required only when someone can search it, i.e.
when the namespace is alive, we can safely unhash all the user_struct-s from
it during the namespace exiting. The subsequent free_uid() will complete the
user_struct destruction.
For example simple program
#include <sched.h>
char stack[2 * 1024 * 1024];
int f(void *foo)
{
return 0;
}
int main(void)
{
clone(f, stack + 1 * 1024 * 1024, 0x10000000, 0);
return 0;
}
run on kernel with CONFIG_USER_NS turned on will oops the
kernel immediately.
This was spotted during OpenVZ kernel testing.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Acked-by: "Serge E. Hallyn" <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses
list_heads, thus occupying twice as much place as it could. Convert it to
hlist_heads.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Recent changes to the timekeeping code broke support for the PowerPC 601
processor which doesn't have the usual timebase facility but a slightly
different thing called (yuck) the RTC.
This fixes it, boot tested on an old 601 based PowerMac 7200.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Warn user if cpu is ignored.
[SPARC64]: Fix lockdep, particularly on SMP.
[SPARC64]: Update defconfig.
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[VLAN]: Fix net_device leak.
[PPP] generic: Fix receive path data clobbering & non-linear handling
[PPP] generic: Call skb_cow_head before scribbling over skb
[NET] skbuff: Add skb_cow_head
[BRIDGE]: Kill clone argument to br_flood_*
[PPP] pppoe: Fill in header directly in __pppoe_xmit
[PPP] pppoe: Fix data clobbering in __pppoe_xmit and return value
[PPP] pppoe: Fix skb_unshare_check call position
[SCTP]: Convert bind_addr_list locking to RCU
[SCTP]: Add RCU synchronization around sctp_localaddr_list
[PKT_SCHED]: sch_cbq.c: Shut up uninitialized variable warning
[PKTGEN]: srcmac fix
[IPV6]: Fix source address selection.
[IPV4]: Just increment OutDatagrams once per a datagram.
[IPV6]: Just increment OutDatagrams once per a datagram.
[IPV6]: Fix unbalanced socket reference with MSG_CONFIRM.
[NET_SCHED] protect action config/dump from irqs
[NET]: Fix two issues wrt. SO_BINDTODEVICE.
When CONFIG_ISA is disabled, the isa_driver support will not be compiled
in. Define stubs so that we don't get link-time errors.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adds an optimised version of skb_cow that avoids the copy if
the header can be modified even if the rest of the payload is cloned.
This can be used in encapsulating paths where we only need to modify the
header. As it is, this can be used in PPPOE and bridging.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the sctp_sockaddr_entry is now RCU enabled as part of
the patch to synchronize sctp_localaddr_list, it makes sense to
change all handling of these entries to RCU. This includes the
sctp_bind_addrs structure and it's list of bound addresses.
This list is currently protected by an external rw_lock and that
looks like an overkill. There are only 2 writers to the list:
bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
These are already seriealized via the socket lock, so they will
not step on each other. These are also relatively rare, so we
should be good with RCU.
The readers are varied and they are easily converted to RCU.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Sridhar Samdurala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sctp_localaddr_list is modified dynamically via NETDEV_UP
and NETDEV_DOWN events, but there is not synchronization
between writer (even handler) and readers. As a result,
the readers can access an entry that has been freed and
crash the sytem.
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Sridhar Samdurala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As noted by Al Viro, when we try to call prom_set_trap_table()
in the SMP trampoline code we try to take the PROM call spinlock
which doesn't work because the current thread pointer isn't
valid yet and lockdep depends upon that being correct.
Furthermore, we cannot set the current thread pointer register
because it can't be properly dereferenced until we return from
prom_set_trap_table(). Kernel TLB misses only work after that
call.
So do the PROM call to set the trap table directly instead of
going through the OBP library C code, and thus avoid the lock
altogether.
These calls are guarenteed to be serialized fully.
Since there are now no calls to the prom_set_trap_table{_sun4v}()
library functions, they can be deleted.
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.linux-xtensa.org/kernel/xtensa-feed:
[patch 1/2] Xtensa: enable arbitary tty speed setting ioctls
[patch 2/2] xtensa console.c: remove duplicate #include
[XTENSA] Add support for cache-aliasing
[XTENSA] Add kernel module support
[XTENSA] Add support for executable/non-executable feature in the mmu
[XTENSA] Use the generic version of get_order
[XTENSA] Initialize semaphore_wake_lock
[XTENSA] Add typecast macro for constants
[XTENSA] Fix timer instabilities.
[XTENSA] Fix fadvise64_64
[XTENSA] Remove extraneous include statement
[XTENSA] Move string-io functions to io.c from pci.c
[XTENSA] Move pre-initialized structures to init_task.c
[XTENSA] Add freestanding option to CFLAGS
[XTENSA] Add getpgrp system-call to unistd.h
[XTENSA] add missing system calls
[XTENSA] fix wrong usage of __init and __initdata in traps.c
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6:
Blackfin arch: fix some bugs in lib/string.h functions found by our string testing modules
Blackfin arch: fix the aliased write macros
Blackfin arch: Update/Fix PM support add new pm_ops valid
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] 20Kc: Disable use of WAIT instruction.
[MIPS] Workaround for 4Kc machine check exception
[MIPS] Malta: Fix off by one bug in interrupt handler.
[MIPS] No ide_default_io_base() if PCI IDE was not found
[MIPS] Add #include <linux/profile.h> to arch/mips/kernel/time.c
[MIPS] N32 needs to use compat_sys_futimesat
[MIPS] rtlx: Fix build error.
[MIPS] rtlx: fix int vs. long bug.
Revert b543858209 and add
no_pci_devices() check to avoid crash due to early calling of
pci_get_class().
Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Commit f629307c85 introduced uses of
kernel_termios_to_user_termios_1 and user_termios_to_kernel_termios_1
on all architectures. However, powerpc, s390, avr32 and frv don't
currently define those functions since their termios struct didn't
need to be changed when the arbitrary baud rate stuff was added, and
thus the kernel won't currently build on those architectures.
This adds definitions of kernel_termios_to_user_termios_1 and
user_termios_to_kernel_termios_1 to include/asm-generic/termios.h
which are identical to kernel_termios_to_user_termios and
user_termios_to_kernel_termios respectively. The definitions are the
same because the "old" termios and "new" termios are in fact the same
on these architectures (which are the same ones that use
asm-generic/termios.h).
Signed-off-by: Paul Mackerras <paulus@samba.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alan Cox <alan@redhat.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Taneli Vähäkangas <vahakang@cs.helsinki.fi> reported that commit
786d7e1612 aka "Fix rmmod/read/write races
in /proc entries" broke SBCL + SLIME combo.
The old code in do_select() used DEFAULT_POLLMASK, if couldn't find
->poll handler. The new code makes ->poll always there and returns 0 by
default, which is not correct. Return DEFAULT_POLLMASK instead.
Steps to reproduce:
install emacs, SBCL, SLIME
emacs
M-x slime in *inferior-lisp* buffer
[watch it doing "Connecting to Swank on port X.."]
Please, apply before 2.6.23.
P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: T Taneli Vahakangas <vahakang@cs.helsinki.fi>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The AdvanSys driver wants to align some pointers, and the ALIGN macro
doesn't work for pointers. Rather than try to make it work, add a new
PTR_ALIGN macro which is typesafe.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch has added #include <linux/spinlock.h> to include/linux/leds.h
for rwlock_t.
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
Make the SATA drive detection code from eighty_ninty_three() into inline
ide_dev_is_sata() helper fixing it along the way to be more strict while
checking word 80 for the reserved values...
Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6:
PCI: irq and pci_ids patch for Intel Tolapai
PCI: unhide SMBus on Compaq Deskpro EP 401963-001 motherboard
PCI: Remove __devinit from pcibios_get_irq_routing_table
PCI: remove devinit from pci_read_bridge_bases
PCI AER: fix warnings when PCIEAER=n
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[INET_DIAG]: Fix oops in netlink_rcv_skb
[IPv6]: Fix NULL pointer dereference in ip6_flush_pending_frames
[NETFILTER]: Fix/improve deadlock condition on module removal netfilter
[NETFILTER]: nf_conntrack_ipv4: fix "Frag of proto ..." messages
[NET] DOC: Update networking/multiqueue.txt with correct information.
[IPV6]: Freeing alive inet6 address
[DECNET]: Fix interface address listing regression.
[IPV4] devinet: show all addresses assigned to interface
[NET]: Do not dereference iov if length is zero
[TG3]: Workaround MSI bug on 5714/5780.
[Bluetooth] Fix parameter list for event filter command
[Bluetooth] Update security filter for Bluetooth 2.1
[Bluetooth] Add compat handling for timestamp structure
[Bluetooth] Add missing stat.byte_rx counter modification
* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
[SCSI] libiscsi: sync up iscsi and scsi eh's access to the connection
[SCSI] libiscsi: fix null ptr regression when aborting a command with data to transfer
[SCSI] qla2xxx: Update version number to 8.02.00-k3.
[SCSI] qla2xxx: Correct mailbox register dump for FWI2 capable ISPs.
[SCSI] qla2xxx: Correct 8GB iIDMA support.
[SCSI] qla2xxx: Correct management-server login-state synchronization issue.
[SCSI] qla2xxx: Don't modify parity bits during ISP25XX restart.
[SCSI] qla2xxx: Allocate enough space for the full PCI descriptor.
[SCSI] zfcp: fix the data buffer accessor patch
[SCSI] zfcp: allocate gid_pn_data objects from gid_pn_cache
[SCSI] zfcp: fix memory leak
This patch adds the Intel Tolapai LPC and SMBus Controller DID's.
Signed-off-by: Jason Gaston <jason.d.gaston@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Fix warnings when CONFIG_PCIEAER=n:
drivers/pci/pcie/portdrv_pci.c:105: warning: statement with no effect
drivers/pci/pcie/portdrv_pci.c:226: warning: statement with no effect
drivers/scsi/arcmsr/arcmsr_hba.c:352: warning: statement with no effect
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
So I've had a deadlock reported to me. I've found that the sequence of
events goes like this:
1) process A (modprobe) runs to remove ip_tables.ko
2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket,
increasing the ip_tables socket_ops use count
3) process A acquires a file lock on the file ip_tables.ko, calls remove_module
in the kernel, which in turn executes the ip_tables module cleanup routine,
which calls nf_unregister_sockopt
4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the
calling process into uninterruptible sleep, expecting the process using the
socket option code to wake it up when it exits the kernel
4) the user of the socket option code (process B) in do_ipt_get_ctl, calls
ipt_find_table_lock, which in this case calls request_module to load
ip_tables_nat.ko
5) request_module forks a copy of modprobe (process C) to load the module and
blocks until modprobe exits.
6) Process C. forked by request_module process the dependencies of
ip_tables_nat.ko, of which ip_tables.ko is one.
7) Process C attempts to lock the request module and all its dependencies, it
blocks when it attempts to lock ip_tables.ko (which was previously locked in
step 3)
Theres not really any great permanent solution to this that I can see, but I've
developed a two part solution that corrects the problem
Part 1) Modifies the nf_sockopt registration code so that, instead of using a
use counter internal to the nf_sockopt_ops structure, we instead use a pointer
to the registering modules owner to do module reference counting when nf_sockopt
calls a modules set/get routine. This prevents the deadlock by preventing set 4
from happening.
Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking
remove operations (the same way rmmod does), and add an option to explicity
request blocking operation. So if you select blocking operation in modprobe you
can still cause the above deadlock, but only if you explicity try (and since
root can do any old stupid thing it would like.... :) ).
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
libata clear horkage on ata_dev_init()
[libata, IDE] add new VIA bridge to VIA PATA drivers
pata_it821x: fix lost interrupt with atapi devices
Fix broken pata_via cable detection