WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Oleg Nesterov	f106eee100	pids: fix fork_idle() to setup ->pids correctly copy_process(pid => &init_struct_pid) doesn't do attach_pid/etc. It shouldn't, but this means that the idle threads run with the wrong pids copied from the caller's task_struct. In x86 case the caller is either kernel_init() thread or keventd. In particular, this means that after the series of cpu_up/cpu_down an idle thread (which never exits) can run with .pid pointing to nowhere. Change fork_idle() to initialize idle->pids[] correctly. We only set .pid = &init_struct_pid but do not add .node to list, INIT_TASK() does the same for the boot-cpu idle thread (swapper). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Mathias Krause <Mathias.Krause@secunet.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:52 -07:00
Oleg Nesterov	f20011457f	pids: init_struct_pid.tasks should never see the swapper process "statically initialize struct pid for swapper" commit `820e45db` says: Statically initialize a struct pid for the swapper process (pid_t == 0) and attach it to init_task. This is needed so task_pid(), task_pgrp() and task_session() interfaces work on the swapper process also. OK, but: - it doesn't make sense to add init_task.pids[].node into init_struct_pid.tasks[], and in fact this just wrong. idle threads are special, they shouldn't be visible on any global list. In particular do_each_pid_task(init_struct_pid) shouldn't see swapper. This is the actual reason why kill(0, SIGKILL) from /sbin/init (which starts with 0,0 special pids) crashes the kernel. The signal sent to pgid/sid == 0 must never see idle threads, even if the previous patch fixed the crash itself. - we have other idle threads running on the non-boot CPUs, see the next patch. Change INIT_STRUCT_PID/INIT_PID_LINK to create the empty/unhashed hlist_head/hlist_node. Like any other idle thread swapper can never exit, so detach_pid()->__hlist_del() is not possible, but we could change INIT_PID_LINK() to set pprev = &next if needed. All we need is the valid swapper->pids[].pid == &init_struct_pid. Reported-by: Mathias Krause <mathias.krause@secunet.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Mathias Krause <Mathias.Krause@secunet.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:52 -07:00
Oleg Nesterov	fa2755e20a	INIT_TASK() should initialize ->thread_group list The trivial /sbin/init doing int main(void) { kill(0, SIGKILL) } crashes the kernel. This happens because __kill_pgrp_info(init_struct_pid) also sends SIGKILL to the swapper process which runs with the uninitialized ->thread_group. Change INIT_TASK() to initialize ->thread_group properly. Note: the real problem is that the swapper process must not be visible to signals, see the next patch. But this change is right anyway and fixes the crash. Reported-and-tested-by: Mathias Krause <mathias.krause@secunet.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Mathias Krause <Mathias.Krause@secunet.com> Acked-by: Roland McGrath <roland@redhat.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:51 -07:00
Hedi Berriche	72680a191b	pids: increase pid_max based on num_possible_cpus On a system with a substantial number of processors, the early default pid_max of 32k will not be enough. A system with 1664 CPU's, there are 25163 processes started before the login prompt. It's estimated that with 2048 CPU's we will pass the 32k limit. With 4096, we'll reach that limit very early during the boot cycle, and processes would stall waiting for an available pid. This patch increases the early maximum number of pids available, and increases the minimum number of pids that can be set during runtime. [akpm@linux-foundation.org: fix warnings] Signed-off-by: Hedi Berriche <hedi@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Robin Holt <holt@sgi.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Pavel Machek <pavel@ucw.cz> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Greg KH <gregkh@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: John Stoffel <john@stoffel.org> Cc: Jack Steiner <steiner@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:51 -07:00
Thomas Moll	bd4fb654e3	rapidio: fix maintenance access to higher memory areas Fix the maintenance access functions to farend RapidIO devices. 1. Fixed shift of the given offset, to open the maintenance window 2. Mask offset to limit access to the opened maintenance window 3. Added extended destid part to rowtear register, required for 16bit mode This method is matching maintenance transactions generation described by Freescale in the appnote AN2932. With this modification full access to a 16MB maintenance window is possible, this patch is required for IDT cps switches. For easier handling of the access routines, the access was limited to aligned memory regions. This should be no problem because all registers are 32bit wide. Signed-off-by: Thomas Moll <thomas.moll@sysgo.com> Tested-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:51 -07:00
Alexandre Bounine	679395fdaa	rapidio: use default route value for CPS switches Fix to use correct default value for routing table entries. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Thomas Moll <thomas.moll@sysgo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:51 -07:00
Alexandre Bounine	7a88d62862	rapidio: add switch domain routines Add switch specific domain routines required for 16-bit routing support in switches with hierarchical implementation of routing tables. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Thomas Moll <thomas.moll@sysgo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:51 -07:00
Alexandre Bounine	058f88d672	rapidio: modify initialization of switch operations Modify the way how RapidIO switch operations are declared. Multiple assignments through the linker script replaced by single initialization call. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Thomas Moll <thomas.moll@sysgo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:51 -07:00
Alexandre Bounine	011507e49a	rapidio: add debug configuration option Add debug configuration option for RapidIO subsystem. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Thomas Moll <thomas.moll@sysgo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:51 -07:00
Alexandre Bounine	68b04cd38a	rapidio: fix typos and minor edits Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Thomas Moll <thomas.moll@sysgo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:51 -07:00
Thomas Moll	933af4a6c4	rapidio: add enabling SRIO port RX and TX Add the functionality to enable Input receiver and Output transmitter of every port, to allow non-maintenance traffic. Signed-off-by: Thomas Moll <thomas.moll@sysgo.com> Signed-off-by: Alexandre Bounine <abounine@tundra.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:51 -07:00
Alexandre Bounine	a52c8f521f	rapidio, powerpc/85xx: Add MChk handler for SRIO port Add Machine Check exception handling into RapidIO port driver for Freescale SoCs (MPC85xx). Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Tested-by: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:51 -07:00
Alexandre Bounine	5b2074ae4f	rapidio, powerpc/85xx: add Port-Write message handler for SRIO port Add RapidIO Port-Write message handler for Freescale SoCs with RapidIO port. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Tested-by: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:51 -07:00
Alexandre Bounine	e5cabeb3d6	rapidio: add Port-Write handling for EM Add RapidIO Port-Write message handling in the context of Error Management Extensions Specification Rev.1.3. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Tested-by: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:50 -07:00
Alexandre Bounine	818a04a0bb	rapidio: add switch locking during discovery Add switch access locking during RapidIO discovery. Access lock is required when reading switch routing table contents due to indexed mechanism of RT addressing. Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Tested-by: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:50 -07:00
Alexandre Bounine	07590ff039	rapidio: add IDT CPS/TSI switches Extentions to RapidIO switch support: 1. modify switch route operation declarations to allow using single switch-specific file for family of switches that share the same route table operations. 2. add standard route table operations for switches that that support route table manipulation registers as defined in the Rev.1.3 of RapidIO specification. 3. add clear-route-table operation for switches 4. add CPSxx and TSIxxx families of RapidIO switches Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com> Tested-by: Thomas Moll <thomas.moll@sysgo.com> Cc: Matt Porter <mporter@kernel.crashing.org> Cc: Li Yang <leoli@freescale.com> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:50 -07:00
Julia Lawall	f67231f801	drivers/char/applicom.c: use memdup_user Use memdup_user when user data is immediately copied into the allocated region. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; position p; identifier l1,l2; @@ - to = $kmalloc@p\\|kzalloc@p$(size,flag); + to = memdup_user(from,size); if ( - to==NULL + IS_ERR(to) \|\| ...) { <+... when != goto l1; - -ENOMEM + PTR_ERR(to) ...+> } - if (copy_from_user(to, from, size) != 0) { - <+... when != goto l2; - -EFAULT - ...+> - } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:50 -07:00
Julia Lawall	87575437d8	drivers/char/ppdev.c: use kasprintf kasprintf combines kmalloc and sprintf, and takes care of the size calculation itself. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression a,flag; expression list args; statement S; @@ a = - $kmalloc\\|kzalloc$(...,flag) + kasprintf(flag,args) <... when != a if (a == NULL \|\| ...) S ...> - sprintf(a,args); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Michael Buesch <mb@bu3sch.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:50 -07:00
Marco Stornelli	56d611a04f	char drivers: RAM oops/panic logger Ramoops, like mtdoops, can log oops/panic information but in RAM. It can be used with persistent RAM for systems without flash support. In addition, for this systems, with this driver, it's no more needed add to the kernel the mtd subsystem with advantage in footprint. It can be used in a very easy way with persistent RAM for systems without flash support. For these systems, with this driver, it is no longer required to cinlude mtd subsystem with an advantage in footprint. In addition, you can save flash space and store this information only in RAM. Signed-off-by: Marco Stornelli <marco.stornelli@gmail.com> Cc: Simon Kagstrom <simon.kagstrom@netinsight.net> Cc: David Woodhouse <David.Woodhouse@intel.com> Cc; Anders Grafstrom <anders.grafstrom@netinsight.net> Cc: Yuasa Yoichi <yuasa@linux-mips.org> Cc: Jamie Lokier <jamie@shareable.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:50 -07:00
Jiri Kosina	a747c5abc3	ipmi: handle run_to_completion properly in deliver_recv_msg() If run_to_completion flag is set, it means that we are running in a single-threaded mode, and thus no locks are held. This fixes a deadlock when IPMI notifier is being called during panic. Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Corey Minyard <minyard@acm.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:50 -07:00
Myron Stowe	279fbd0c5d	ipmi: update driver to use dev_printk and its constructs Update core IPMI driver printk()'s with dev_printk(), and its constructs, to provide additional device topology information. An example of the additional device topology for a PNP device - ipmi_si 00:02: probing via ACPI ipmi_si 00:02: [io 0x0ca2-0x0ca3] regsize 1 spacing 1 irq 0 ipmi_si 00:02: Found new BMC (man_id: 0x00000b, prod_id: 0x0000, ... ipmi_si 00:02: IPMI kcs interface initialized and for a PCI device - ipmi_si 0000:01:04.6: probing via PCI ipmi_si 0000:01:04.6: PCI INT A -> GSI 21 (level, low) -> IRQ 21 ipmi_si 0000:01:04.6: [mem 0xf1ef0000-0xf1ef00ff] regsize 1 spaci... ipmi_si 0000:01:04.6: IPMI kcs interface initialized [minyard@acm.org: rework to fix rejects, extended it a bit] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Myron Stowe <myron.stowe@hp.com> Signed-off-by: Corey Minyard <minyard@acm.org> Cc: Zhao Yakui <yakui.zhao@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:50 -07:00
Myron Stowe	8c8eae2742	ipmi: convert tracking of the ACPI device pointer to a PNP device Convert PNP patch (git `9e368fa011`) to maintain a pointer to a PNP device, 'pnp_dev', instead of the ACPI device, 'acpi_dev', that is currently being tracked with PNP based IPMI device discovery. Signed-off-by: Myron Stowe <myron.stowe@hp.com> Acked-by: Zhao Yakui <yakui.zhao@intel.com> Acked-by: Corey Minyard <minyard@acm.org> Cc: Len Brown <lenb@kernel.org> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:50 -07:00
Corey Minyard	ddac44b7b2	ipmi: change timeout and event poll to one second The timeouts in IPMI are in the 1-5 second range in message handling, so a 1 second timeout is a reasonable thing to do. This should help with reducing power consumption on idle systems. Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:50 -07:00
Matthew Garrett	06ee459402	ipmi: attempt to register multiple SIs of the same type Some odd systems may have multiple BMCs, and we want to be able to support them. Let's make the assumption that if a system legitimately has multiple BMCs then each BMC's SI will be of the same type, and also that we won't see multiple SIs of the same type unless we have multiple BMCs. If these hold true then we should register all SIs of the same type. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:50 -07:00
Matthew Garrett	3326f4f227	ipmi: reduce polling We can reasonably alter the poll rate depending on whether we're performing a transaction or merely waiting for an event. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:49 -07:00
Matthew Garrett	ea4078ca1a	ipmi: reduce polling when interrupts are available If we're not currently in the middle of a transaction, and if we have interrupts, there's no real reason to poll the controller more frequently than the core IPMI code does. Set the interrupt_disabled flag appropriately as the interrupt state changes, and make the timeout code reset itself only if the transaction is incomplete or we have no interrupts. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:49 -07:00
Matthew Garrett	754d453185	ipmi: change device discovery order The ipmi spec provides an ordering for si discovery. Change the driver to match, with the exception of preferring smbios to SPMI as HPs (at least) contain accurate information in the former but not the latter. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:49 -07:00
Matthew Garrett	d8cc5267b8	ipmi: only register one si per bmc Only register one si per bmc. Use any user-provided devices first, followed by the first device with an irq, followed by the first device discovered. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:49 -07:00
Matthew Garrett	2407d77a1a	ipmi: split device discovery and registration The ipmi spec indicates that we should only make use of one si per bmc, so separate device discovery and registration to make that possible. [thenzl@redhat.com: fix mutex use] Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:49 -07:00
Matthew Garrett	5fedc4a282	ipmi: change addr_source to an enum rather than strings Switch from a char* to an enum to identify the address source of SIs, making it easier to handle them appropriately during registration. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:49 -07:00
Julia Lawall	4de85cd6d6	ipc/sem.c: use ERR_CAST Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more clear what is the purpose of the operation, which otherwise looks like a no-op. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ type T; T x; identifier f; @@ T f (...) { <+... - ERR_PTR(PTR_ERR(x)) + x ...+> } @@ expression x; @@ - ERR_PTR(PTR_ERR(x)) + ERR_CAST(x) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:49 -07:00
Manfred Spraul	c5cf6359ad	ipc/sem.c: update description of the implementation ipc/sem.c begins with a 15 year old description about bugs in the initial implementation in Linux-1.0. The patch replaces that with a top level description of the current code. A TODO could be derived from this text: The opengroup man page for semop() does not mandate FIFO. Thus there is no need for a semaphore array list of pending operations. If - this list is removed - the per-semaphore array spinlock is removed (possible if there is no list to protect) - sem_otime is moved into the semaphores and calculated on demand during semctl() then the array would be read-mostly - which would significantly improve scaling for applications that use semaphore arrays with lots of entries. The price would be expensive semctl() calls: for(i=0;i<sma->sem_nsems;i++) spin_lock(sma->sem_lock); <do stuff> for(i=0;i<sma->sem_nsems;i++) spin_unlock(sma->sem_lock); I'm not sure if the complexity is worth the effort, thus here is the documentation of the current behavior first. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:49 -07:00
Manfred Spraul	31a7c4746e	ipc/sem.c: cacheline align the ipc spinlock for semaphores Cacheline align the spinlock for sysv semaphores. Without the patch, the spinlock and sem_otime [written by every semop that modified the array] and sem_base [read in the hot path of try_atomic_semop()] can be in the same cacheline. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:49 -07:00
Manfred Spraul	0a2b9d4c79	ipc/sem.c: move wake_up_process out of the spinlock section The wake-up part of semtimedop() consists out of two steps: - the right tasks must be identified. - they must be woken up. Right now, both steps run while the array spinlock is held. This patch reorders the code and moves the actual wake_up_process() behind the point where the spinlock is dropped. The code also moves setting sem->sem_otime to one place: It does not make sense to set the last modify time multiple times. [akpm@linux-foundation.org: repair kerneldoc] [akpm@linux-foundation.org: fix uninitialised retval] Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:49 -07:00
Manfred Spraul	fd5db42254	ipc/sem.c: optimize update_queue() for bulk wakeup calls The following series of patches tries to fix the spinlock contention reported by Chris Mason - his benchmark exposes problems of the current code: - In the worst case, the algorithm used by update_queue() is O(N^2). Bulk wake-up calls can enter this worst case. The patch series fix that. Note that the benchmark app doesn't expose the problem, it just should be fixed: Real world apps might do the wake-ups in another order than perfect FIFO. - The part of the code that runs within the semaphore array spinlock is significantly larger than necessary. The patch series fixes that. This change is responsible for the main improvement. - The cacheline with the spinlock is also used for a variable that is read in the hot path (sem_base) and for a variable that is unnecessarily written to multiple times (sem_otime). The last step of the series cacheline-aligns the spinlock. This patch: The SysV semaphore code allows to perform multiple operations on all semaphores in the array as atomic operations. After a modification, update_queue() checks which of the waiting tasks can complete. The algorithm that is used to identify the tasks is O(N^2) in the worst case. For some cases, it is simple to avoid the O(N^2). The patch adds a detection logic for some cases, especially for the case of an array where all sleeping tasks are single sembuf operations and a multi-sembuf operation is used to wake up multiple tasks. A big database application uses that approach. The patch fixes wakeup due to semctl(,,SETALL,) - the initial version of the patch breaks that. [akpm@linux-foundation.org: make do_smart_update() static] Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:49 -07:00
Imre Deak	2dcb22b346	idr: fix backtrack logic in idr_remove_all Currently idr_remove_all will fail with a use after free error if idr::layers is bigger than 2, which on 32 bit systems corresponds to items more than 1024. This is due to stepping back too many levels during backtracking. For simplicity let's assume that IDR_BITS=1 -> we have 2 nodes at each level below the root node and each leaf node stores two IDs. (In reality for 32 bit systems IDR_BITS=5, with 32 nodes at each sub-root level and 32 IDs in each leaf node). The sequence of freeing the nodes at the moment is as follows: layer 1 -> a(7) 2 -> b(3) c(5) 3 -> d(1) e(2) f(4) g(6) Until step 4 things go fine, but then node c is freed, whereas node g should be freed first. Since node c contains the pointer to node g we'll have a use after free error at step 6. How many levels we step back after visiting the leaf nodes is currently determined by the msb of the id we are currently visiting: Step 1. node d with IDs 0,1 is freed, current ID is advanced to 2. msb of the current ID bit 1. This means we need to step back 1 level to node b and take the next sibling, node e. 2-3. node e with IDs 2,3 is freed, current ID is 4, msb is bit 2. This means we need to step back 2 levels to node a, freeing node b on the way. 4-5. node f with IDs 4,5 is freed, current ID is 6, msb is still bit 2. This means we again need to step back 2 levels to node a and free c on the way. 6. We should visit node g, but its pointer is not available as node c was freed. The fix changes how we determine the number of levels to step back. Instead of deducting this merely from the msb of the current ID, we should really check if advancing the ID causes an overflow to a bit position corresponding to a given layer. In the above example overflow from bit 0 to bit 1 should mean stepping back 1 level. Overflow from bit 1 to bit 2 should mean stepping back 2 levels and so on. The fix was tested with IDs up to 1 << 20, which corresponds to 4 layers on 32 bit systems. Signed-off-by: Imre Deak <imre.deak@nokia.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: Eric Paris <eparis@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: <stable@kernel.org> [2.6.34.1] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:48 -07:00
Lai Jiangshan	79a6cdeb7e	cpuhotplug: do not need cpu_hotplug_begin() when CONFIG_HOTPLUG_CPU=n Since when CONFIG_HOTPLUG_CPU=n, get_online_cpus() do nothing, so we don't need cpu_hotplug_begin() either. This patch moves cpu_hotplug_begin()/cpu_hotplug_done() into the code block of CONFIG_HOTPLUG_CPU=y. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:48 -07:00
Akinobu Mita	c9d221f86e	fault-injection: add CPU notifier error injection module I used this module to test the series of modification to the cpu notifiers code. Example1: inject CPU offline error (-1 == -EPERM) # modprobe cpu-notifier-error-inject cpu_down_prepare_error=-1 # echo 0 > /sys/devices/system/cpu/cpu1/online bash: echo: write error: Operation not permitted Example2: inject CPU online error (-2 == -ENOENT) # modprobe cpu-notifier-error-inject cpu_up_prepare_error=-2 # echo 1 > /sys/devices/system/cpu/cpu1/online bash: echo: write error: No such file or directory [akpm@linux-foundation.org: fix Kconfig help text] Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:48 -07:00
Akinobu Mita	55af6bb509	md: convert cpu notifier to return encapsulate errno value By the previous modification, the cpu notifier can return encapsulate errno value. This converts the cpu notifiers for raid5. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:48 -07:00
Akinobu Mita	d882ba699d	s390: convert cpu notifier to return encapsulate errno value By the previous modification, the cpu notifier can return encapsulate errno value. This converts the cpu notifiers for s390. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:48 -07:00
Akinobu Mita	1dee31f74f	ehca: convert cpu notifier to return encapsulate errno value By the previous modification, the cpu notifier can return encapsulate errno value. This converts the cpu notifiers for ehca. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com> Cc: Christoph Raisch <raisch@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:48 -07:00
Akinobu Mita	92e99a98bb	iucv: convert cpu notifier to return encapsulate errno value By the previous modification, the cpu notifier can return encapsulate errno value. This converts the cpu notifiers for iucv. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:48 -07:00
Akinobu Mita	eac4068013	slab: convert cpu notifier to return encapsulate errno value By the previous modification, the cpu notifier can return encapsulate errno value. This converts the cpu notifiers for slab. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:48 -07:00
Akinobu Mita	80b5184cc5	kernel/: convert cpu notifier to return encapsulate errno value By the previous modification, the cpu notifier can return encapsulate errno value. This converts the cpu notifiers for kernel/*.c Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:48 -07:00
Akinobu Mita	ad84bb5b98	topology: convert cpu notifier to return encapsulate errno value By the previous modification, the cpu notifier can return encapsulate errno value. This converts the cpu notifiers for topology. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:48 -07:00
Akinobu Mita	a94247e7fb	x86: convert cpu notifier to return encapsulate errno value By the previous modification, the cpu notifier can return encapsulate errno value. This converts the cpu notifiers for msr, cpuid, and therm_throt. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:48 -07:00
Akinobu Mita	b957e043ee	notifier: change notifier_from_errno(0) to return NOTIFY_OK This changes notifier_from_errno(0) to be NOTIFY_OK instead of NOTIFY_STOP_MASK \| NOTIFY_OK. Currently, the notifiers which return encapsulated errno value have to do something like this: err = do_something(); // returns -errno if (err) return notifier_from_errno(err); else return NOTIFY_OK; This change makes the above code simple: err = do_something(); // returns -errno return return notifier_from_errno(err); Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:47 -07:00
Akinobu Mita	e6bde73b07	cpu-hotplug: return better errno on cpu hotplug failure Currently, onlining or offlining a CPU failure by one of the cpu notifiers error always cause -EINVAL error. (i.e. writing 0 or 1 to /sys/devices/system/cpu/cpuX/online gets EINVAL) To get better error reporting rather than always getting -EINVAL, This changes cpu_notify() to return -errno value with notifier_to_errno() and fix the callers. Now that cpu notifiers can return encapsulate errno value. Currently, all cpu hotplug notifiers return NOTIFY_OK, NOTIFY_BAD, or NOTIFY_DONE. So cpu_notify() can returns 0 or -EPERM with this change for now. (notifier_to_errno(NOTIFY_OK) == 0, notifier_to_errno(NOTIFY_DONE) == 0, notifier_to_errno(NOTIFY_BAD) == -EPERM) Forthcoming patches convert several cpu notifiers to return encapsulate errno value with notifier_from_errno(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:47 -07:00
Akinobu Mita	e9fb7631eb	cpu-hotplug: introduce cpu_notify(), __cpu_notify(), cpu_notify_nofail() No functional change. These are just wrappers of raw_cpu_notifier_call_chain. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:47 -07:00
Wu Fengguang	36e15263aa	kcore: add _text to KCORE_TEXT Extend KCORE_TEXT to cover the pages between _text and _stext, to allow examining some important page table pages. `readelf -a` output on x86_64 before and after patch: Type Offset VirtAddr PhysAddr before LOAD 0x00007fff8100c000 0xffffffff81009000 0x0000000000000000 after LOAD 0x00007fff81003000 0xffffffff81000000 0x0000000000000000 The newly covered pages are: 0xffffffff81000000 <startup_64> etc. 0xffffffff81001000 <init_level4_pgt> 0xffffffff81002000 <level3_ident_pgt> 0xffffffff81003000 <level3_kernel_pgt> 0xffffffff81004000 <level2_fixmap_pgt> 0xffffffff81005000 <level1_fixmap_pgt> 0xffffffff81006000 <level2_ident_pgt> 0xffffffff81007000 <level2_kernel_pgt> 0xffffffff81008000 <level2_spare_pgt> Before patch, /proc/kcore shows outdated contents for the above page table pages, for example: (gdb) p level3_ident_pgt $1 = {<text variable, no debug info>} 0xffffffff81002000 <level3_ident_pgt> (gdb) p/x ((pud_t )&level3_ident_pgt)@512 $2 = {{pud = 0x1006063}, {pud = 0x0} <repeats 511 times>} while the real content is: root@hp /home/wfg# hexdump -s 0x1002000 -n 4096 /dev/mem 1002000 6063 0100 0000 0000 8067 0000 0000 0000 1002010 0000 0000 0000 0000 0000 0000 0000 0000 * 1003000 That is, on a x86_64 box with 2GB memory, we can see first-1GB / full-2GB identity mapping before/after patch: (gdb) p/x ((pud_t )&level3_ident_pgt)@512 before $1 = {{pud = 0x1006063}, {pud = 0x0} <repeats 511 times>} after $1 = {{pud = 0x1006063}, {pud = 0x8067}, {pud = 0x0} <repeats 510 times>} Obviously the content before patch is wrong. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2010-05-27 09:12:47 -07:00

1 2 3 4 5 ...

199159 Коммитов Все ветки Поиск

199159 Коммитов

Все ветки