WSL2-Linux-Kernel

Граф коммитов

Автор	SHA1	Сообщение	Дата
Jens Axboe	18dabf473e	Change table chaining layout Change the page member of the scatterlist structure to be an unsigned long, and encode more stuff in the lower bits: - Bits 0 and 1 zero: this is a normal sg entry. Next sg entry is located at sg + 1. - Bit 0 set: this is a chain entry, the next real entry is at ->page_link with the two low bits masked off. - Bit 1 set: this is the final entry in the sg entry. sg_next() will return NULL when passed such an entry. It's thus important that sg table users use the proper accessors to get and set the page member. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-22 21:20:01 +02:00
Christoph Hellwig	e38f981758	exportfs: update documentation Update documentation to the current state of affairs. Remove duplicated method descruptions in exportfs.h and point to Documentation/filesystems/ Exporting instead. Add a little file header comment in expfs.c describing what's going on and mentioning Neils and my copyright [1]. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <linux-ext4@vger.kernel.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Hugh Dickins <hugh@veritas.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:21 -07:00
Christoph Hellwig	3965516440	exportfs: make struct export_operations const Now that nfsd has stopped writing to the find_exported_dentry member we an mark the export_operations const Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <linux-ext4@vger.kernel.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Hugh Dickins <hugh@veritas.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:21 -07:00
Christoph Hellwig	cfaea787c0	exportfs: remove old methods Now that all filesystems are converted remove support for the old methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <linux-ext4@vger.kernel.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Hugh Dickins <hugh@veritas.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:21 -07:00
Christoph Hellwig	be55caf177	reiserfs: new export ops Another nice little cleanup by using the new methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:20 -07:00
Christoph Hellwig	05da080482	efs: new export ops Trivial switch over to the new generic helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:20 -07:00
Christoph Hellwig	2596110a39	exportfs: add new methods Add the guts for the new filesystem API to exportfs. There's now a fh_to_dentry method that returns a dentry for the object looked for given a filehandle fragment, and a fh_to_parent operation that returns the dentry for the encoded parent directory in case the file handle contains it. There are default implementations for these methods that only take a callback for an nfs-enhanced iget variant and implement the rest of the semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <linux-ext4@vger.kernel.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Hugh Dickins <hugh@veritas.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:19 -07:00
Christoph Hellwig	6e91ea2bb0	exportfs: add fid type This patchset is a medium scale rewrite of the export operations interface. The goal is to make the interface less complex, and easier to understand from the filesystem side, aswell as preparing generic support for exporting of 64bit inode numbers. This touches all nfs exporting filesystems, and I've done testing on all of the filesystems I have here locally (xfs, ext2, ext3, reiserfs, jfs) This patch: Add a structured fid type so that we don't have to pass an array of u32 values around everywhere. It's a union of possible layouts. As a start there's only the u32 array and the traditional 32bit inode format, but there will be more in one of my next patchset when I start to document the various filehandle formats we have in lowlevel filesystems better. Also add an enum that gives the various filehandle types human- readable names. Note: Some people might think the struct containing an anonymous union is ugly, but I didn't want to pass around a raw union type. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: <linux-ext4@vger.kernel.org> Cc: Dave Kleikamp <shaggy@austin.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: David Chinner <dgc@sgi.com> Cc: Timothy Shimmin <tes@sgi.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Hugh Dickins <hugh@veritas.com> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mark.fasheh@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:19 -07:00
Bernhard Walle	00bf4098be	kexec: add BSS to resource tree Add the BSS to the resource tree just as kernel text and kernel data are in the resource tree. The main reason behind this is to avoid crashkernel reservation in that area. While it's not strictly necessary to have the BSS in the resource tree (the actual collision detection is done in the reserve_bootmem() function before), the usage of the BSS resource should be presented to the user in /proc/iomem just as Kernel data and Kernel code. Note: The patch currently is only implemented for x86 and ia64 (because efi_initialize_iomem_resources() has the same signature on i386 and ia64). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: <linux-arch@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:19 -07:00
Keshavamurthy, Anil S	3460a6d9ce	Intel IOMMU: DMAR fault handling support MSI interrupt handler registrations and fault handling support for Intel-IOMMU hadrware. This patch enables the MSI interrupts for the DMA remapping units and in the interrupt handler read the fault cause and outputs the same on to the console. Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christoph Lameter <clameter@sgi.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:19 -07:00
Keshavamurthy, Anil S	ba39592764	Intel IOMMU: Intel IOMMU driver Actual intel IOMMU driver. Hardware spec can be found at: http://www.intel.com/technology/virtualization This driver sets X86_64 'dma_ops', so hook into standard DMA APIs. In this way, PCI driver will get virtual DMA address. This change is transparent to PCI drivers. [akpm@linux-foundation.org: remove unneeded cast] [akpm@linux-foundation.org: build fix] [bunk@stusta.de: fix duplicate CONFIG_DMAR Makefile line] Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christoph Lameter <clameter@sgi.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:18 -07:00
Keshavamurthy, Anil S	994a65e25d	Intel IOMMU: PCI generic helper function When devices are under a p2p bridge, upstream transactions get replaced by the device id of the bridge as it owns the PCIE transaction. Hence its necessary to setup translations on behalf of the bridge as well. Due to this limitation all devices under a p2p share the same domain in a DMAR. We just cache the type of device, if its a native PCIe device or not for later use. [akpm@linux-foundation.org: BUG_ON -> WARN_ON+recover] Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christoph Lameter <clameter@sgi.com> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:18 -07:00
Keshavamurthy, Anil S	10e5247f40	Intel IOMMU: DMAR detection and parsing logic This patch supports the upcomming Intel IOMMU hardware a.k.a. Intel(R) Virtualization Technology for Directed I/O Architecture and the hardware spec for the same can be found here http://www.intel.com/technology/virtualization/index.htm FAQ! (questions from akpm, answers from ak) > So... what's all this code for? > > I assume that the intent here is to speed things up under Xen, etc? Yes in some cases, but not this code. That would be the Xen version of this code that could potentially assign whole devices to guests. I expect this to be only useful in some special cases though because most hardware is not virtualizable and you typically want an own instance for each guest. Ok at some point KVM might implement this too; i likely would use this code for this. > Do we > have any benchmark results to help us to decide whether a merge would be > justified? The main advantage for doing it in the normal kernel is not performance, but more safety. Broken devices won't be able to corrupt memory by doing random DMA. Unfortunately that doesn't work for graphics yet, for that need user space interfaces for the X server are needed. There are some potential performance benefits too: - When you have a device that cannot address the complete address range an IOMMU can remap its memory instead of bounce buffering. Remapping is likely cheaper than copying. - The IOMMU can merge sg lists into a single virtual block. This could potentially speed up SG IO when the device is slow walking SG lists. [I long ago benchmarked 5% on some block benchmark with an old MPT Fusion; but it probably depends a lot on the HBA] And you get better driver debugging because unexpected memory accesses from the devices will cause a trappable event. > > Does it slow anything down? It adds more overhead to each IO so yes. This patch: Add support for early detection and parsing of DMAR's (DMA Remapping) reported to OS via ACPI tables. DMA remapping(DMAR) devices support enables independent address translations for Direct Memory Access(DMA) from Devices. These DMA remapping devices are reported via ACPI tables and includes pci device scope covered by these DMA remapping device. For detailed info on the specification of "Intel(R) Virtualization Technology for Directed I/O Architecture" please see http://www.intel.com/technology/virtualization/index.htm Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ashok Raj <ashok.raj@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Christoph Lameter <clameter@sgi.com> Cc: Greg KH <greg@kroah.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:18 -07:00
Jan Kara	89910cccb8	ext2: avoid rec_len overflow with 64KB block size With 64KB blocksize, a directory entry can have size 64KB which does not fit into 16 bits we have for entry length. So we store 0xffff instead and convert the value when read from / written to disk. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:18 -07:00
Serge E. Hallyn	b68680e473	capabilities: clean up file capability reading Simplify the vfs_cap_data structure. Also fix get_file_caps which was declaring __le32 v1caps[XATTR_CAPS_SZ] on the stack, but XATTR_CAPS_SZ is already * sizeof(__le32). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Andrew Morgan <morgan@kernel.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:18 -07:00
Yasunori Goto	b9049e2344	memory hotplug: make kmem_cache_node for SLUB on memory online avoid panic Fix a panic due to access NULL pointer of kmem_cache_node at discard_slab() after memory online. When memory online is called, kmem_cache_nodes are created for all SLUBs for new node whose memory are available. slab_mem_going_online_callback() is called to make kmem_cache_node() in callback of memory online event. If it (or other callbacks) fails, then slab_mem_offline_callback() is called for rollback. In memory offline, slab_mem_going_offline_callback() is called to shrink all slub cache, then slab_mem_offline_callback() is called later. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: locking fix] [akpm@linux-foundation.org: build fix] Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:17 -07:00
Yasunori Goto	7b78d335ac	memory hotplug: rearrange memory hotplug notifier Current memory notifier has some defects yet. (Fortunately, nothing uses it.) This patch is to fix and rearrange for them. - Add information of start_pfn, nr_pages, and node id if node status is changes from/to memoryless node for callback functions. Callbacks can't do anything without those information. - Add notification going-online status. It is necessary for creating per node structure before the node's pages are available. - Move GOING_OFFLINE status notification after page isolation. It is good place for return memory like cache for callback, because returned page is not used again. - Make CANCEL events for rollingback when error occurs. - Delete MEM_MAPPING_INVALID notification. It will be not used. - Fix compile error of (un)register_memory_notifier(). Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:17 -07:00
Rusty Russell	214541d1f3	add WEAK() for creating weak asm labels Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-22 08:13:17 -07:00
Jens Axboe	82f66fbef5	[SG] Add helpers for manipulating SG entries We can then transition drivers without changing the generated code. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>	2007-10-22 17:07:37 +02:00
Hans Verkuil	3bcc95760c	V4L/DVB (6321): Remove obsolete VIDIOC_S/G_MPEGCOMP ioctls Remove the obsolete VIDIOC_G_MPEGCOMP and VIDIOC_S_MPEGCOMP ioctls from the V4L2 API as per the removal schedule (October 2007). Signed-off-by: Hans Verkuil <hverkuil@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-10-22 12:01:30 -02:00
Mauro Carvalho Chehab	22c4a4e98e	V4L/DVB (6320): v4l core: remove the unused .hardware V4L1 field struct video_device used to define a .hardware field. While initialized on severl drivers, this field is never used inside V4L. However, drivers using it need to include the old V4L1 header. This seems to cause compilation troubles with some random configs. Better just to remove it from all drivers. Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>	2007-10-22 12:01:24 -02:00
Pavel Emelyanov	e3fa259bcb	[NET]: Cut off the queue_mapping field from sk_buff Just hide it behind the #ifdef, because nobody wants it now. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:57 -07:00
Pavel Emelyanov	668f895a85	[NET]: Hide the queue_mapping field inside netif_subqueue_stopped Many places get the queue_mapping field from skb to pass it to the netif_subqueue_stopped() which will be 0 in any case. Make the helper that works with sk_buff Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:56 -07:00
Pavel Emelyanov	4e3ab47a54	[NET]: Make and use skb_get_queue_mapping Make the helper for getting the field, symmetrical to the "set" one. Return 0 if CONFIG_NETDEVICES_MULTIQUEUE=n Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:56 -07:00
Jean Delvare	305e1e9691	[INET]: Let inet_diag and friends autoload By adding module aliases to inet_diag, tcp_diag and dccp_diag, we let them load automatically as needed. This makes tools like "ss" run faster. Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:54 -07:00
Herbert Xu	deea84b0ae	[NET]: Fix SKB_WITH_OVERHEAD calculation The calculation in SKB_WITH_OVERHEAD is incorrect in that it can cause an overflow across a page boundary which is what it's meant to prevent. In particular, the header length (X) should not be lumped together with skb_shared_info. The latter needs to be aligned properly while the header has no choice but to sit in front of wherever the payload is. Therefore the correct calculation is to take away the aligned size of skb_shared_info, and then subtract the header length. The resulting quantity L satisfies the following inequality: SKB_DATA_ALIGN(L + X) + sizeof(struct skb_shared_info) <= PAGE_SIZE This is the quantity used by alloc_skb to do the actual allocation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:53 -07:00
Matt Carlson	6c7af27c8a	[TG3]: Add 5723 support This patch adds support for upcoming 5723 devices. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-22 02:59:49 -07:00
Marcel Holtmann	2cb3377a29	[Bluetooth] Add constant for Bluetooth socket options level Assign the next free socket options level to be used by the Bluetooth protocol and address family. Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2007-10-22 02:59:48 -07:00
Márton Németh	553a05b882	Input: i8042 - export i8042_command() Export the i8042_command() function which manages the mutual exclusion with the help of the i8042_lock spinlock. This allows to access i8042 safely from other parts of the kernel. Signed-off-by: Márton Németh <nm127@freemail.hu> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>	2007-10-22 00:56:52 -04:00
Al Viro	74c3cbe33b	[PATCH] audit: watching subtrees New kind of audit rule predicates: "object is visible in given subtree". The part that can be sanely implemented, that is. Limitations: * if you have hardlink from outside of tree, you'd better watch it too (or just watch the object itself, obviously) * if you mount something under a watched tree, tell audit that new chunk should be added to watched subtrees * if you umount something in a watched tree and it's still mounted elsewhere, you will get matches on events happening there. New command tells audit to recalculate the trees, trimming such sources of false positives. Note that it's _not_ about path - if something mounted in several places (multiple mount, bindings, different namespaces, etc.), the match does _not_ depend on which one we are using for access. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-10-21 02:37:45 -04:00
Al Viro	455434d450	[PATCH] new helper - inotify_evict_watch() Kicks the watch out without dropping it. Called under ->inotify_mutex Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-10-21 02:37:38 -04:00
Al Viro	b9efe8a234	[PATCH] new helper - inotify_clone_watch() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-10-21 02:37:32 -04:00
Al Viro	8aec080945	[PATCH] new helpers - collect_mounts() and release_collected_mounts() Get a snapshot of a subtree, creating private clones of vfsmounts for all its components and release such snapshot resp. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-10-21 02:37:25 -04:00
Al Viro	5a190ae697	[PATCH] pass dentry to audit_inode()/audit_inode_child() makes caller simpler and allows to scan ancestors Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2007-10-21 02:37:18 -04:00
Linus Torvalds	c00046c279	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (74 commits) fix do_sys_open() prototype sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake Documentation: Fix typo in SubmitChecklist. Typo: depricated -> deprecated Add missing profile=kvm option to Documentation/kernel-parameters.txt fix typo about TBI in e1000 comment proc.txt: Add /proc/stat field small documentation fixes Fix compiler warning in smount example program from sharedsubtree.txt docs/sysfs: add missing word to sysfs attribute explanation documentation/ext3: grammar fixes Documentation/java.txt: typo and grammar fixes Documentation/filesystems/vfs.txt: typo fix include/asm-*/system.h: remove unused set_rmb(), set_wmb() macros trivial copy_data_pages() tidy up Fix typo in arch/x86/kernel/tsc_32.c file link fix for Pegasus USB net driver help remove unused return within void return function Typo fixes retrun -> return x86 hpet.h: remove broken links ...	2007-10-19 20:36:17 -07:00
Linus Torvalds	5f737085be	Merge master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6: (50 commits) ide: remove inclusion of non-existent io_trace.h ide-disk: add get_smart_data() helper ide: fix ->data_phase in taskfile_load_raw() ide: check drive->using_dma in flagged_taskfile() ide: check ->dma_setup() return value in flagged_taskfile() dtc2278: note on docs qd65xx: remove pointless qd_{read,write}_reg() (take 2) ide: PCI BMDMA initialization fixes (take 2) ide: remove stale comments from ide-taskfile.c ide: remove dead code from ide_driveid_update() ide: use __ide_end_request() in ide_end_dequeued_request() ide: enhance ide_setup_pci_noise() cs5530: remove needless ide_lock taking ide: take ide_lock for prefetch disable/enable in do_special() ht6560b: fix deadlock on error handling cmd640: fix deadlock on error handling slc90e66: fix deadlock on error handling opti621: fix deadlock on error handling qd65xx: fix deadlock on error handling dtc2278: fix deadlock on error handling ...	2007-10-19 19:36:05 -07:00
Jason Uhlenkott	8e8a1407ac	fix do_sys_open() prototype Fix an argument name in do_sys_open()'s prototype. Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-20 03:16:18 +02:00
Rolf Eike Beer	405bbe9fa3	Typo: depricated -> deprecated Typo: depricated -> deprecated Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-20 03:10:57 +02:00
Mike Anderson	7a8c3d3b92	dm: uevent generate events This patch adds support for the dm_path_event dm_send_event functions which create and send udev events. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2007-10-20 02:01:26 +01:00
Mike Anderson	96a1f7dba6	dm: export name and uuid This patch adds a function to obtain a copy of a mapped device's name and uuid. Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>	2007-10-20 02:01:23 +01:00
Milan Broz	027d50f92e	dm io:ctl use constant struct size Make size of dm_ioctl struct always 312 bytes on all supported architectures. This change retains compatibility with already-compiled code because it uses an embedded offset to locate the payload that follows the structure. On 64-bit architectures there is no change at all; on 32-bit we are increasing the size of dm-ioctl from 308 to 312 bytes. Currently with 32-bit userspace / 64-bit kernel on x86_64 some ioctls (including rename, message) are incorrectly rejected by the comparison against 'param + 1'. This breaks userspace lvrename and multipath 'fail_if_no_path' changes, for example. (BTW Device-mapper uses its own versioning and ignores the ioctl size bits. Only the generic ioctl compat code on mixed arches checks them, and that will continue to accept both sizes for now, but we intend to list 308 as deprecated and eventually remove it.) Signed-off-by: Milan Broz <mbroz@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: Guido Guenther <agx@sigxcpu.org> Cc: Kevin Corry <kevcorry@us.ibm.com> Cc: stable@kernel.org	2007-10-20 02:00:58 +01:00
Serge Hallyn	6da34bae29	fix up security_socket_getpeersec_* documentation Update the security_socket_peersec documentation in include/linux/security.h. security_socket_peersec has been split into two functions - _stream and _dgram, with new capabilities. Signed-off-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-20 00:53:30 +02:00
Bartlomiej Zolnierkiewicz	8562043606	ide: constify struct ide_port_info Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:34 +02:00
Bartlomiej Zolnierkiewicz	039788e153	ide: replace ide_pci_device_t by struct ide_port_info * Rename struct ide_pci_device_s to struct ide_port_info. * Remove ide_pci_device_t typedef. While at it: * Fix __ide_pci_register_driver() comment. * Fix aec62xx_init_one() comment. * Remove unused 'cds' field from ide_hwgroup_t. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:34 +02:00
Bartlomiej Zolnierkiewicz	9239b33393	ide: remove write-only hwif->hw Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:33 +02:00
Bartlomiej Zolnierkiewicz	18e181fe13	ide: add hwif->ack_intr hook * Add hwif->ack_intr hook and use it instead of hwif->hw.ack_intr. * Add missing brackets to cris-v32 and powerpc ide_ack_intr() macros. Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:32 +02:00
Bartlomiej Zolnierkiewicz	86f3a492bb	icside: use ec->dma directly * hwif->hwif_data contains pointer to struct expansion_card so use ec->dma directly instead of caching it in hwif->hw.dma. * Remove no longer needed hw_regs_t.dma and NO_DMA define. Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:32 +02:00
Bartlomiej Zolnierkiewicz	847ddd2bbe	ide: add CONFIG_IDE_ARCH_OBSOLETE_INIT Add CONFIG_IDE_ARCH_OBSOLETE_INIT to drivers/ide/Kconfig and use it instead of defining IDE_ARCH_OBSOLETE_INIT in <arch/ide.h>. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:32 +02:00
Bartlomiej Zolnierkiewicz	f9b9309737	ide: remove redundant comments from ide.h There is better documentation for these functions in drivers/ide/. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:31 +02:00
Bartlomiej Zolnierkiewicz	baa8f3e94b	ide: add ide_find_port() helper * Add ide_find_port() helper. * Convert icside, rapide and ide_platform host drivers to use it. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:31 +02:00
Bartlomiej Zolnierkiewicz	8447d9d52a	ide: add ide_device_add() * Add ide_device_add() helper and convert host drivers to use it instead of open-coded variants. * Make ide_pci_setup_ports() and do_ide_setup_pci_device() take 'u8 idx' argument instead of 'ata_index_t index'. * Remove no longer needed ata_index_t. * Unexport probe_hwif_init() and make it static. * Unexport ide_proc_register_port(). There should be no functionality changes caused by this patch (sgiioc4.c: ide_proc_register_port() requires hwif->present to be set and it won't be set if probe_hwif_init() fails). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:31 +02:00
Bartlomiej Zolnierkiewicz	fd9bb53942	ide: add ->fixup method to ide_hwif_t * Add ->fixup method to ide_hwif_t. * Set hwif->fixup in ide_pci_setup_ports() to d->fixup. * Use hwif->fixup in probe_hwif(). * Use probe_hwif_init() instead of probe_hwif_init_with_fixup() in ide_setup_pci_device(). * Add 'fixup' argument to ide_register_hw() and use it to set hwif->fixup, update all ide_register_hw() users accordingly. * Convert ide-cs/delkin_cb host drivers to use ide_register_hw(). * Restore hwif->fixup in ide_hwif_restore(). * Remove ide_register_hw_with_fixup(), probe_hwif_init_with_fixup() and 'fixup' argument from probe_hwif(). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:31 +02:00
Bartlomiej Zolnierkiewicz	caea7602f3	ide: add IDE_HFLAG_{IO_32BIT,UNMASK_IRQS} host flags Add IDE_HFLAG_{IO_32BIT,UNMASK_IRQS} host flag to tell ide_pci_setup_ports() to set drive->{io_32bit,unmask} for both drives on the interface. Convert amd74xx, sl82c105 and via82cxxx host drivers to use these new host flags. While at it: * Add IDE_HFLAGS_AMD define (amd74xx host driver). * Add IDE_HFLAGS_VIA define (via82cxxx host driver). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:30 +02:00
Bartlomiej Zolnierkiewicz	272a370900	ide: add IDE_HFLAG_RQSIZE_256 host flag Add IDE_HFLAG_RQSIZE_256 host flag to tell ide_pci_setup_ports() to set hwif->rqsize to 256 sectors. Convert pdc202xx_old host driver to use it. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:30 +02:00
Bartlomiej Zolnierkiewicz	8acf28c090	ide: add IDE_HFLAG_FORCE_LEGACY_IRQS host flag Add IDE_HFLAG_FORCE_LEGACY_IRQS host flag to tell ide_pci_setup_ports() to always set hwif->irq to legacy IRQ 14/15 and convert generic IDE PCI and via82cxxx host drivers to use it. While at it: * Add IDE_HFLAGS_UMC define (generic IDE PCI host driver). * Remove no longer needed init_hwif_generic() (generic IDE PCI host driver). * Set d->udma_mask instead of hwif->ultra_mask (via82cxxx host driver). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:30 +02:00
Bartlomiej Zolnierkiewicz	528a572dae	ide: add ->chipset field to ide_pci_device_t Add ->chipset field to ide_pci_device_t and use it in ide_hwif_configure() to set hwif->chipset. Convert cmd64x, cy82c693, rz1000 and trm290 host drivers to use this new ability. While at it define hwif_chipset_t as u8 to save some space in hw_regs_t, ide_hwif_t and ide_pci_device_t instances. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:30 +02:00
Bartlomiej Zolnierkiewicz	44a59ad59f	ide: remove unused ->next field from ide_pci_device_t Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-20 00:32:29 +02:00
Linus Torvalds	60812a4a99	Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86 * ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86: (33 commits) x86: convert cpuinfo_x86 array to a per_cpu array x86: introduce frame_pointer() and stack_pointer() x86 & generic: change to __builtin_prefetch() i386: do not BUG_ON() when MSR is unknown x86: acpi use cpu_physical_id x86: convert cpu_llc_id to be a per cpu variable x86: convert cpu_to_apicid to be a per cpu variable i386: introduce "used_vectors" bitmap which can be used to reserve vectors. x86: use raw locks during oopses x86: honor _PAGE_PSE bit on page walks i386: do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE. x86: implement missing x86_64 function smp_call_function_mask() x86: use descriptor's functions instead of inline assembly i386: consolidate show_regs and show_registers for i386 i386: make callgraph use dump_trace() on i386/x86_64 x86: enable iommu_merge by default i386: i386 add AMD64 Barcelona PMU MSR definitions to msr.h x86: Unify i386 and x86-64 early quirks x86: enable HPET on ICH3 and ICH4 x86: force enable HPET on VT8235/8237 chipsets ... Manually fix trivial conflict with task pid container helper changes in arch/x86/kernel/process_32.c	2007-10-19 15:06:00 -07:00
Linus Torvalds	b04cde34cf	Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 * git://git.linux-nfs.org/pub/linux/nfs-2.6: NFSv4: Fix an rpc_cred reference leakage in fs/nfs/delegation.c NFSv4: Ensure that we wait for the CLOSE request to complete NFS: Fix a race in sillyrename NFS: Fix a writeback race...	2007-10-19 14:31:42 -07:00
Jan Engelhardt	96de0e252c	Convert files to UTF-8 and some cleanups * Convert files to UTF-8. * Also correct some people's names (one example is Eißfeldt, which was found in a source file. Given that the author used an ß at all in a source file indicates that the real name has in fact a 'ß' and not an 'ss', which is commonly used as a substitute for 'ß' when limited to 7bit.) * Correct town names (Goettingen -> Göttingen) * Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313) Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-19 23:21:04 +02:00
Trond Myklebust	565277f63c	NFS: Fix a race in sillyrename lookup() and sillyrename() can race one another because the sillyrename() completion cannot take the parent directory's inode->i_mutex since the latter may be held by whoever is calling dput(). We therefore have little option but to add extra locking to ensure that nfs_lookup() and nfs_atomic_open() do not race with the sillyrename completion. If somebody has looked up the sillyrenamed file in the meantime, we just transfer the sillydelete information to the new dentry. Please refer to the bug-report at http://bugzilla.linux-nfs.org/show_bug.cgi?id=150 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2007-10-19 17:19:16 -04:00
Robert P. J. Day	3a4fa0a25d	Fix misspellings of "system", "controller", "interrupt" and "necessary". Fix the various misspellings of "system", controller", "interrupt" and "[un]necessary". Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-19 23:10:43 +02:00
John Anthony Kazos Jr	18735dd8d2	crypto: convert crypto.h to UTF-8 Convert the encoding of <include/linux/crypto.h> from ISO-8859-1 to UTF-8. Signed-off-by: John Anthony Kazos Jr. <jakj@j-a-k-j.com> Signed-off-by: Adrian Bunk <bunk@kernel.org>	2007-10-19 23:07:36 +02:00
Linus Torvalds	c4ec207173	Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (41 commits) ACPICA: hw: Don't carry spinlock over suspend ACPICA: hw: remove use_lock flag from acpi_hw_register_{read, write} ACPI: cpuidle: port idle timer suspend/resume workaround to cpuidle ACPI: clean up acpi_enter_sleep_state_prep Hibernation: Make sure that ACPI is enabled in acpi_hibernation_finish ACPI: suppress uninitialized var warning cpuidle: consolidate 2.6.22 cpuidle branch into one patch ACPI: thinkpad-acpi: skip blanks before the data when parsing sysfs ACPI: AC: Add sysfs interface ACPI: SBS: Add sysfs alarm ACPI: SBS: Add ACPI_PROCFS around procfs handling code. ACPI: SBS: Add support for power_supply class (and sysfs) ACPI: SBS: Make SBS reads table-driven. ACPI: SBS: Simplify data structures in SBS ACPI: SBS: Split host controller (ACPI0001) from SBS driver (ACPI0002) ACPI: EC: Add new query handler to list head. ACPI: Add acpi_bus_generate_event4() function ACPI: Battery: add sysfs alarm ACPI: Battery: Add sysfs support ACPI: Battery: Misc clean-ups, no functional changes ... Fix up conflicts in drivers/misc/thinkpad_acpi.[ch] manually	2007-10-19 13:12:46 -07:00
Mathieu Desnoyers	31155bc03e	Linux Kernel Markers - Samples Module example showing how to use the Linux Kernel Markers. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:55 -07:00
Mathieu Desnoyers	8256e47cdc	Linux Kernel Markers The marker activation functions sits in kernel/marker.c. A hash table is used to keep track of the registered probes and armed markers, so the markers within a newly loaded module that should be active can be activated at module load time. marker_query has been removed. marker_get_first, marker_get_next and marker_release should be used as iterators on the markers. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: "Frank Ch. Eigler" <fche@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Mike Mason <mmlnx@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:54 -07:00
Srivatsa Vaddagiri	68318b8e0b	Hook up group scheduler with control groups Enable "cgroup" (formerly containers) based fair group scheduling. This will let administrator create arbitrary groups of tasks (using "cgroup" pseudo filesystem) and control their cpu bandwidth usage. [akpm@linux-foundation.org: fix cpp condition] Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:51 -07:00
Bernhard Walle	cba63c3089	Extended crashkernel command line This patch adds a extended crashkernel syntax that makes the value of reserved system RAM dependent on the system RAM itself: crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset] range=start-[end] For example: crashkernel=512M-2G:64M,2G-:128M The motivation comes from distributors that configure their crashkernel command line automatically with some configuration tool (YaST, you know ;)). Of course that tool knows the value of System RAM, but if the user removes RAM, then the system becomes unbootable or at least unusable and error handling is very difficult. This series implements this change for i386, x86_64, ia64, ppc64 and sh. That should be all platforms that support kdump in current mainline. I tested all platforms except sh due to the lack of a sh processor. This patch: This is the generic part of the patch. It adds a parse_crashkernel() function in kernel/kexec.c that is called by the architecture specific code that actually reserves the memory. That function takes the whole command line and looks itself for "crashkernel=" in it. If there are multiple occurrences, then the last one is taken. The advantage is that if you have a bootloader like lilo or elilo which allows you to append a command line parameter but not to remove one (like in GRUB), then you can add another crashkernel value for testing at the boot command line and this one overwrites the command line in the configuration then. Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:49 -07:00
Kirill Korotaev	3ac88a41ff	virtualization of sysv msg queues is incomplete Virtualization of sysv msg queues is incomplete: msg_hdrs and msg_bytes variables visible from userspace are global. Let's make them per-namespace. Signed-off-by: Alexey Kuznetsov <alexey@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Cc: Pierre Peiffer <pierre.peiffer@bull.net> Cc: Nadia Derbey <Nadia.Derbey@bull.net> Cc: Serge Hallyn <serue@us.ibm.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:48 -07:00
Nadia Derbey	7ca7e564e0	ipc: store ipcs into IDRs This patch introduces ipcs storage into IDRs. The main changes are: . This ipc_ids structure is changed: the entries array is changed into a root idr structure. . The grow_ary() routine is removed: it is not needed anymore when adding an ipc structure, since we are now using the IDR facility. . The ipc_rmid() routine interface is changed: . there is no need for this routine to return the pointer passed in as argument: it is now declared as a void . since the id is now part of the kern_ipc_perm structure, no need to have it as an argument to the routine Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:44 -07:00
Cliff Wickman	470fd64644	hotplug cpu: migrate a task within its cpuset When a cpu is disabled, move_task_off_dead_cpu() is called for tasks that have been running on that cpu. Currently, such a task is migrated: 1) to any cpu on the same node as the disabled cpu, which is both online and among that task's cpus_allowed 2) to any cpu which is both online and among that task's cpus_allowed It is typical of a multithreaded application running on a large NUMA system to have its tasks confined to a cpuset so as to cluster them near the memory that they share. Furthermore, it is typical to explicitly place such a task on a specific cpu in that cpuset. And in that case the task's cpus_allowed includes only a single cpu. This patch would insert a preference to migrate such a task to some cpu within its cpuset (and set its cpus_allowed to its entire cpuset). With this patch, migrate the task to: 1) to any cpu on the same node as the disabled cpu, which is both online and among that task's cpus_allowed 2) to any online cpu within the task's cpuset 3) to any cpu which is both online and among that task's cpus_allowed In order to do this, move_task_off_dead_cpu() must make a call to cpuset_cpus_allowed_locked(), a new subset of cpuset_cpus_allowed(), that will not block. (name change - per Oleg's suggestion) Calls are made to cpuset_lock() and cpuset_unlock() in migration_call() to set the cpuset mutex during the whole migrate_live_tasks() and migrate_dead_tasks() procedure. [akpm@linux-foundation.org: build fix] [pj@sgi.com: Fix indentation and spacing] Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Christoph Lameter <clameter@sgi.com> Cc: Paul Jackson <pj@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:44 -07:00
Pavel Emelyanov	ba25f9dcc4	Use helpers to obtain task pid in printks The task_struct->pid member is going to be deprecated, so start using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in the kernel. The first thing to start with is the pid, printed to dmesg - in this case we may safely use task_pid_nr(). Besides, printks produce more (much more) than a half of all the explicit pid usage. [akpm@linux-foundation.org: git-drm went and changed lots of stuff] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:43 -07:00
Pavel Emelyanov	9a2e70572e	Isolate the explicit usage of signal->pgrp The pgrp field is not used widely around the kernel so it is now marked as deprecated with appropriate comment. The initialization of INIT_SIGNALS is trimmed because a) they are set to 0 automatically; b) gcc cannot properly initialize two anonymous (the second one is the one with the session) unions. In this particular case to make it compile we'd have to add some field initialized right before the .pgrp. This is the same patch as the `1ec320afdc` one (from Cedric), but for the pgrp field. Some progress report: We have to deprecate the pid, tgid, session and pgrp fields on struct task_struct and struct signal_struct. The session and pgrp are already deprecated. The tgid value is close to being such - the worst known usage in in fs/locks.c and audit code. The pid field deprecation is mainly blocked by numerous printk-s around the kernel that print the tsk->pid to log. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:43 -07:00
Jiri Slaby	14ed9d23aa	remove BITS_TO_TYPE macro remove BITS_TO_TYPE macro I realized, that it is actually the same as DIV_ROUND_UP, use it instead. [akpm@linux-foundation.org: build fix] Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:42 -07:00
Jiri Slaby	93043ece03	define global BIT macro define global BIT macro move all local BIT defines to the new globally define macro. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Jeff Garzik <jeff@garzik.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: "John W. Linville" <linville@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:42 -07:00
Jiri Slaby	7b19ada2ed	get rid of input BIT* duplicate defines get rid of input BIT* duplicate defines use newly global defined macros for input layer. Also remove includes of input.h from non-input sources only for BIT macro definiton. Define the macro temporarily in local manner, all those local definitons will be removed further in this patchset (to not break bisecting). BIT macro will be globally defined (1<<x) Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: <dtor@mail.ru> Acked-by: Jiri Kosina <jkosina@suse.cz> Cc: <lenb@kernel.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Cc: <perex@suse.cz> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: <vernux@us.ibm.com> Cc: <malattia@linux.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:42 -07:00
Jiri Slaby	d05be13bcc	define first set of BIT* macros define first set of BIT* macros - move BITOP_MASK and BITOP_WORD from asm-generic/bitops/atomic.h to include/linux/bitops.h and rename it to BIT_MASK and BIT_WORD - move BITS_TO_LONGS and BITS_PER_BYTE to bitops.h too and allow easily define another BITS_TO_something (e.g. in event.c) by BITS_TO_TYPE macro Remaining (and common) BIT macro will be defined after all occurences and conflicts will be sorted out in the patches. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:42 -07:00
Jiri Slaby	1977f03272	remove asm/bitops.h includes remove asm/bitops.h includes including asm/bitops directly may cause compile errors. don't include it and include linux/bitops instead. next patch will deny including asm header directly. Cc: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:41 -07:00
Jiri Slaby	bc552f7715	Misc: phantom, improved data passing This new version guarantees amb_bit switch in small enough intervals, so that the device won't stop working in the middle of a movement anymore. However it preserves old (openhaptics) functionality. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:41 -07:00
Paul Menage	8707d8b8c0	Fix cpusets update_cpumask Cause writes to cpuset "cpus" file to update cpus_allowed for member tasks: - collect batches of tasks under tasklist_lock and then call set_cpus_allowed() on them outside the lock (since this can sleep). - add a simple generic priority heap type to allow efficient collection of batches of tasks to be processed without duplicating or missing any tasks in subsequent batches. - make "cpus" file update a no-op if the mask hasn't changed - fix race between update_cpumask() and sched_setaffinity() by making sched_setaffinity() post-check that it's not running on any cpus outside cpuset_cpus_allowed(). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:41 -07:00
Paul Jackson	029190c515	cpuset sched_load_balance flag Add a new per-cpuset flag called 'sched_load_balance'. When enabled in a cpuset (the default value) it tells the kernel scheduler that the scheduler should provide the normal load balancing on the CPUs in that cpuset, sometimes moving tasks from one CPU to a second CPU if the second CPU is less loaded and if that task is allowed to run there. When disabled (write "0" to the file) then it tells the kernel scheduler that load balancing is not required for the CPUs in that cpuset. Now even if this flag is disabled for some cpuset, the kernel may still have to load balance some or all the CPUs in that cpuset, if some overlapping cpuset has its sched_load_balance flag enabled. If there are some CPUs that are not in any cpuset whose sched_load_balance flag is enabled, the kernel scheduler will not load balance tasks to those CPUs. Moreover the kernel will partition the 'sched domains' (non-overlapping sets of CPUs over which load balancing is attempted) into the finest granularity partition that it can find, while still keeping any two CPUs that are in the same shed_load_balance enabled cpuset in the same element of the partition. This serves two purposes: 1) It provides a mechanism for real time isolation of some CPUs, and 2) it can be used to improve performance on systems with many CPUs by supporting configurations in which load balancing is not done across all CPUs at once, but rather only done in several smaller disjoint sets of CPUs. This mechanism replaces the earlier overloading of the per-cpuset flag 'cpu_exclusive', which overloading was removed in an earlier patch: cpuset-remove-sched-domain-hooks-from-cpusets See further the Documentation and comments in the code itself. [akpm@linux-foundation.org: don't be weird] Signed-off-by: Paul Jackson <pj@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:41 -07:00
Pavel Emelyanov	2f2a3a46fc	Uninline the task_xid_nr_ns() calls Since these are expanded into call to pid_nr_ns() anyway, it's OK to move the whole routine out-of-line. This is a cheap way to save ~100 bytes from vmlinux. Together with the previous two patches, it saves half-a-kilo from the vmlinux. Un-inline other (currently inlined) functions must be done with additional performance testing. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:41 -07:00
Pavel Emelyanov	8990571eb5	Uninline find_pid etc set of functions The find_pid/_vpid/_pid_ns functions are used to find the struct pid by its id, depending on whic id - global or virtual - is used. The find_vpid() is a macro that pushes the current->nsproxy->pid_ns on the stack to call another function - find_pid_ns(). It turned out, that this dereference together with the push itself cause the kernel text size to grow too much. Move all these out-of-line. Together with the previous patch this saves a bit less that 400 bytes from .text section. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:41 -07:00
Pavel Emelyanov	bac0abd617	Isolate some explicit usage of task->tgid With pid namespaces this field is now dangerous to use explicitly, so hide it behind the helpers. Also the pid and pgrp fields o task_struct and signal_struct are to be deprecated. Unfortunately this patch cannot be sent right now as this leads to tons of warnings, so start isolating them, and deprecate later. Actually the p->tgid == pid has to be changed to has_group_leader_pid(), but Oleg pointed out that in case of posix cpu timers this is the same, and thread_group_leader() is more preferable. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:40 -07:00
Pavel Emelyanov	19b9b9b54e	pid namespaces: remove the struct pid unneeded fields Since we've switched from using pid->nr to pid->upids->nr some fields on struct pid are no longer needed Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:40 -07:00
Pavel Emelyanov	228ebcbe63	Uninline find_task_by_xxx set of functions The find_task_by_something is a set of macros are used to find task by pid depending on what kind of pid is proposed - global or virtual one. All of them are wrappers above the most generic one - find_task_by_pid_type_ns() - and just substitute some args for it. It turned out, that dereferencing the current->nsproxy->pid_ns construction and pushing one more argument on the stack inline cause kernel text size to grow. This patch moves all this stuff out-of-line into kernel/pid.c. Together with the next patch it saves a bit less than 400 bytes from the .text section. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:40 -07:00
Sukadev Bhattiprolu	3eb07c8c8a	pid namespaces: destroy pid namespace on init's death Terminate all processes in a namespace when the reaper of the namespace is exiting. We do this by walking the pidmap of the namespace and sending SIGKILL to all processes. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:40 -07:00
Pavel Emelyanov	6f4e643353	pid namespaces: initialize the namespace's proc_mnt The namespace's proc_mnt must be kern_mount-ed to make this pointer always valid, independently of whether the user space mounted the proc or not. This solves raced in proc_flush_task, etc. with the proc_mnt switching from NULL to not-NULL. The initialization is done after the init's pid is created and hashed to make proc_get_sb() finr it and get for root inode. Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the superblock holds the namespace we must explicitly break this circle to destroy all the stuff. This is done after the init of the namespace dies. Running a few steps forward - when init exits it will kill all its children, so no proc_mnt will be needed after its death. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:40 -07:00
Pavel Emelyanov	30e49c263e	pid namespaces: allow cloning of new namespace When clone() is invoked with CLONE_NEWPID, create a new pid namespace and then create a new struct pid for the new process. Allocate pid_t's for the new process in the new pid namespace and all ancestor pid namespaces. Make the newly cloned process the session and process group leader. Since the active pid namespace is special and expected to be the first entry in pid->upid_list, preserve the order of pid namespaces. The size of 'struct pid' is dependent on the the number of pid namespaces the process exists in, so we use multiple pid-caches'. Only one pid cache is created during system startup and this used by processes that exist only in init_pid_ns. When a process clones its pid namespace, we create additional pid caches as necessary and use the pid cache to allocate 'struct pids' for that depth. Note, that with this patch the newly created namespace won't work, since the rest of the kernel still uses global pids, but this is to be fixed soon. Init pid namespace still works. [oleg@tv-sign.ru: merge fix] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:39 -07:00
Pavel Emelyanov	b461cc0382	pid namespaces: miscellaneous preparations for pid namespaces * remove pid.h from pid_namespaces.h; * rework is_(cgroup\|global)_init; * optimize (get\|put)_pid_ns for init_pid_ns; * declare task_child_reaper to return actual reaper. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:39 -07:00
Pavel Emelyanov	07543f5c75	pid namespaces: make proc have multiple superblocks - one for each namespace Each pid namespace have to be visible through its own proc mount. Thus we need to have per-namespace proc trees with their own superblocks. We cannot easily show different pid namespace via one global proc tree, since each pid refers to different tasks in different namespaces. E.g. pid 1 refers to the init task in the initial namespace and to some other task when seeing from another namespace. Moreover - pid, exisintg in one namespace may not exist in the other. This approach has one move advantage is that the tasks from the init namespace can see what tasks live in another namespace by reading entries from another proc tree. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:39 -07:00
Pavel Emelyanov	198fe21b0a	pid namespaces: helpers to find the task by its numerical ids When searching the task by numerical id on may need to find it using global pid (as it is done now in kernel) or by its virtual id, e.g. when sending a signal to a task from one namespace the sender will specify the task's virtual id and we should find the task by this value. [akpm@linux-foundation.org: fix gfs2 linkage] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:39 -07:00
Pavel Emelyanov	7af5729474	pid namespaces: helpers to obtain pid numbers When showing pid to user or getting the pid numerical id for in-kernel use the value of this id may differ depending on the namespace. This set of helpers is used to get the global pid nr, the virtual (i.e. seen by task in its namespace) nr and the nr as it is seen from the specified namespace. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:39 -07:00
Pavel Emelyanov	8ef047aaae	pid namespaces: make alloc_pid(), free_pid() and put_pid() work with struct upid Each struct upid element of struct pid has to be initialized properly, i.e. its nr mst be allocated from appropriate pidmap and ns set to appropriate namespace. When allocating a new pid, we need to know the namespace this pid will live in, so the additional argument is added to alloc_pid(). On the other hand, the rest of the kernel still uses the pid->nr and pid->pid_chain fields, so these ones are still initialized, but this will be removed soon. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:39 -07:00
Pavel Emelyanov	faacbfd3a6	pid namespaces: add support for pid namespaces hierarchy Each namespace has a parent and is characterized by its "level". Level is the number of the namespace generation. E.g. init namespace has level 0, after cloning new one it will have level 1, the next one - 2 and so on and so forth. This level is not explicitly limited. True hierarchy must have some way to find each namespace's children, but it is not used in the patches, so this ability is not added (yet). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:38 -07:00
Sukadev Bhattiprolu	4c3f2ead5a	pid namespaces: introduce struct upid Since task will be visible from different pid namespaces each of them have to be addressed by multiple pids. struct upid is to store the information about which id refers to which namespace. The constuciton looks like this. Each struct pid carried the reference counter and the list of tasks attached to this pid. At its end it has a variable length array of struct upid-s. Each struct upid has a numerical id (pid itself), pointer to the namespace, this ID is valid in and is hashed into a pid_hash for searching the pids. The nr and pid_chain fields are kept in struct pid for a while to make kernel still work (no patch initialize the upids yet), but it will be removed at the end of this series when we switch to upids completely. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:38 -07:00
Pavel Emelyanov	60347f6716	pid namespaces: prepare proc_flust_task() to flush entries from multiple proc trees The first part is trivial - we just make the proc_flush_task() to operate on arbitrary vfsmount with arbitrary ids and pass the pid and global proc_mnt to it. The other change is more tricky: I moved the proc_flush_task() call in release_task() higher to address the following problem. When flushing task from many proc trees we need to know the set of ids (not just one pid) to find the dentries' names to flush. Thus we need to pass the task's pid to proc_flush_task() as struct pid is the only object that can provide all the pid numbers. But after __exit_signal() task has detached all his pids and this information is lost. This creates a tiny gap for proc_pid_lookup() to bring some dentries back to tree and keep them in hash (since pids are still alive before __exit_signal()) till the next shrink, but since proc_flush_task() does not provide a 100% guarantee that the dentries will be flushed, this is OK to do so. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:38 -07:00
Pavel Emelyanov	8bf9725c29	pid namespaces: introduce MS_KERNMOUNT flag This flag tells the .get_sb callback that this is a kern_mount() call so that it can trust data pointer to be valid in-kernel one. If this flag is passed from the user process, it is cleared since the data pointer is not a valid kernel object. Running a few steps forward - this will be needed for proc to create the superblock and store a valid pid namespace on it during the namespace creation. The reason, why the namespace cannot live without proc mount is described in the appropriate patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:38 -07:00
Johannes Berg	4e6045f134	workqueue: debug flushing deadlocks with lockdep In the following scenario: code path 1: my_function() -> lock(L1); ...; flush_workqueue(); ... code path 2: run_workqueue() -> my_work() -> ...; lock(L1); ... you can get a deadlock when my_work() is queued or running but my_function() has acquired L1 already. This patch adds a pseudo-lock to each workqueue to make lockdep warn about this scenario. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:38 -07:00
Pavel Emelyanov	cf7b708c8d	Make access to task's nsproxy lighter When someone wants to deal with some other taks's namespaces it has to lock the task and then to get the desired namespace if the one exists. This is slow on read-only paths and may be impossible in some cases. E.g. Oleg recently noticed a race between unshare() and the (sent for review in cgroups) pid namespaces - when the task notifies the parent it has to know the parent's namespace, but taking the task_lock() is impossible there - the code is under write locked tasklist lock. On the other hand switching the namespace on task (daemonize) and releasing the namespace (after the last task exit) is rather rare operation and we can sacrifice its speed to solve the issues above. The access to other task namespaces is proposed to be performed like this: rcu_read_lock(); nsproxy = task_nsproxy(tsk); if (nsproxy != NULL) { / * * work with the namespaces here * e.g. get the reference on one of them * / } / * * NULL task_nsproxy() means that this task is * almost dead (zombie) * / rcu_read_unlock(); This patch has passed the review by Eric and Oleg :) and, of course, tested. [clg@fr.ibm.com: fix unshare()] [ebiederm@xmission.com: Update get_net_ns_by_pid] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Serge E. Hallyn	b460cbc581	pid namespaces: define is_global_init() and is_container_init() is_init() is an ambiguous name for the pid==1 check. Split it into is_global_init() and is_container_init(). A cgroup init has it's tsk->pid == 1. A global init also has it's tsk->pid == 1 and it's active pid namespace is the init_pid_ns. But rather than check the active pid namespace, compare the task structure with 'init_pid_ns.child_reaper', which is initialized during boot to the /sbin/init process and never changes. Changelog: 2.6.22-rc4-mm2-pidns1: - Use 'init_pid_ns.child_reaper' to determine if a given task is the global init (/sbin/init) process. This would improve performance and remove dependence on the task_pid(). 2.6.21-mm2-pidns2: - [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc, ppc,avr32}/traps.c for the _exception() call to is_global_init(). This way, we kill only the cgroup if the cgroup's init has a bug rather than force a kernel panic. [akpm@linux-foundation.org: fix comment] [sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c] [bunk@stusta.de: kernel/pid.c: remove unused exports] [sukadev@us.ibm.com: Fix capability.c to work with threaded init] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Sukadev Bhattiprolu	88f21d8182	pid namespaces: rename child_reaper() function Rename the child_reaper() function to task_child_reaper() to be similar to other task_* functions and to distinguish the function from 'struct pid_namspace.child_reaper'. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Pavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Sukadev Bhattiprolu	2894d650cd	pid namespaces: define and use task_active_pid_ns() wrapper With multiple pid namespaces, a process is known by some pid_t in every ancestor pid namespace. Every time the process forks, the child process also gets a pid_t in every ancestor pid namespace. While a process is visible in >=1 pid namespaces, it can see pid_t's in only one pid namespace. We call this pid namespace it's "active pid namespace", and it is always the youngest pid namespace in which the process is known. This patch defines and uses a wrapper to find the active pid namespace of a process. The implementation of the wrapper will be changed in when support for multiple pid namespaces are added. Changelog: 2.6.22-rc4-mm2-pidns1: - [Pavel Emelianov, Alexey Dobriyan] Back out the change to use task_active_pid_ns() in child_reaper() since task->nsproxy can be NULL during task exit (so child_reaper() continues to use init_pid_ns). to implement child_reaper() since init_pid_ns.child_reaper to implement child_reaper() since tsk->nsproxy can be NULL during exit. 2.6.21-rc6-mm1: - Rename task_pid_ns() to task_active_pid_ns() to reflect that a process can have multiple pid namespaces. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelianov <xemul@openvz.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Herbert Poetzel <herbert@13thfloor.at> Cc: Kirill Korotaev <dev@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Pavel Emelianov	baf8f0f82d	pid namespaces: dynamic kmem cache allocator for pid namespaces Add kmem_cache to pid_namespace to allocate pids from. Since both implementations expand the struct pid to carry more numerical values each namespace should have separate cache to store pids of different sizes. Each kmem cache is name "pid_<NR>", where <NR> is the number of numerical ids on the pid. Different namespaces with same level of nesting will have same caches. This patch has two FIXMEs that are to be fixed after we reach the consensus about the struct pid itself. The first one is that the namespace to free the pid from in free_pid() must be taken from pid. Now the init_pid_ns is used. The second FIXME is about the cache allocation. When we do know how long the object will be then we'll have to calculate this size in create_pid_cachep. Right now the sizeof(struct pid) value is used. [akpm@linux-foundation.org: coding-style repair] Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Cedric Le Goater <clg@fr.ibm.com> Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Pavel Emelianov	a05f7b15de	pid namespaces: make get_pid_ns() return the namespace itself Make get_pid_ns() return the namespace itself to look like the other getters and make the code using it look nicer. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Pavel Emelianov	a47afb0f9d	pid namespaces: round up the API The set of functions process_session, task_session, process_group and task_pgrp is confusing, as the names can be mixed with each other when looking at the code for a long time. The proposals are to * equip the functions that return the integer with _nr suffix to represent that fact, * and to make all functions work with task (not process) by making the common prefix of the same name. For monotony the routines signal_session() and set_signal_session() are replaced with task_session_nr() and set_task_session(), especially since they are only used with the explicit task->signal dereference. Signed-off-by: Pavel Emelianov <xemul@openvz.org> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Serge E. Hallyn	858d72ead4	cgroups: implement namespace tracking subsystem When a task enters a new namespace via a clone() or unshare(), a new cgroup is created and the task moves into it. This version names cgroups which are automatically created using cgroup_clone() as "node_<pid>" where pid is the pid of the unsharing or cloned process. (Thanks Pavel for the idea) This is safe because if the process unshares again, it will create /cgroups/(...)/node_<pid>/node_<pid> The only possibilities (AFAICT) for a -EEXIST on unshare are 1. pid wraparound 2. a process fails an unshare, then tries again. Case 1 is unlikely enough that I ignore it (at least for now). In case 2, the node_<pid> will be empty and can be rmdir'ed to make the subsequent unshare() succeed. Changelog: Name cloned cgroups as "node_<pid>". [clg@fr.ibm.com: fix order of cgroup subsystems in init/Kconfig] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:37 -07:00
Balbir Singh	846c7bb055	Add cgroupstats This patch is inspired by the discussion at http://lkml.org/lkml/2007/4/11/187 and implements per cgroup statistics as suggested by Andrew Morton in http://lkml.org/lkml/2007/4/11/263. The patch is on top of 2.6.21-mm1 with Paul's cgroups v9 patches (forward ported) This patch implements per cgroup statistics infrastructure and re-uses code from the taskstats interface. A new set of cgroup operations are registered with commands and attributes. It should be very easy to extend per cgroup statistics, by adding members to the cgroupstats structure. The current model for cgroupstats is a pull, a push model (to post statistics on interesting events), should be very easy to add. Currently user space requests for statistics by passing the cgroup file descriptor. Statistics about the state of all the tasks in the cgroup is returned to user space. TODO's/NOTE: This patch provides an infrastructure for implementing cgroup statistics. Based on the needs of each controller, we can incrementally add more statistics, event based support for notification of statistics, accumulation of taskstats into cgroup statistics in the future. Sample output # ./cgroupstats -C /cgroup/a sleeping 2, blocked 0, running 1, stopped 0, uninterruptible 0 # ./cgroupstats -C /cgroup/ sleeping 154, blocked 0, running 0, stopped 0, uninterruptible 0 If the approach looks good, I'll enhance and post the user space utility for the same Feedback, comments, test results are always welcome! [akpm@linux-foundation.org: build fix] Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Cc: Jay Lan <jlan@engr.sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Paul Menage	006cb99200	Task Control Groups: simple task cgroup debug info subsystem This example subsystem exports debugging information as an aid to diagnosing refcount leaks, etc, in the cgroup framework. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Paul Menage	62d0df6406	Task Control Groups: example CPU accounting subsystem This example demonstrates how to use the generic cgroup subsystem for a simple resource tracker that counts, for the processes in a cgroup, the total CPU time used and the %CPU used in the last complete 10 second interval. Portions contributed by Balbir Singh <balbir@in.ibm.com> Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Paul Menage	8793d854ed	Task Control Groups: make cpusets a client of cgroups Remove the filesystem support logic from the cpusets system and makes cpusets a cgroup subsystem The "cpuset" filesystem becomes a dummy filesystem; attempts to mount it get passed through to the cgroup filesystem with the appropriate options to emulate the old cpuset filesystem behaviour. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Paul Menage	81a6a5cdd2	Task Control Groups: automatic userspace notification of idle cgroups Add the following files to the cgroup filesystem: notify_on_release - configures/reports whether the cgroup subsystem should attempt to run a release script when this cgroup becomes unused release_agent - configures/reports the release agent to be used for this hierarchy (top level in each hierarchy only) releasable - reports whether this cgroup would have been auto-released if notify_on_release was true and a release agent was configured (mainly useful for debugging) To avoid locking issues, invoking the userspace release agent is done via a workqueue task; cgroups that need to have their release agents invoked by the workqueue task are linked on to a list. [pj@sgi.com: Need to include kmod.h] Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Paul Menage	817929ec27	Task Control Groups: shared cgroup subsystem group arrays Replace the struct css_set embedded in task_struct with a pointer; all tasks that have the same set of memberships across all hierarchies will share a css_set object, and will be linked via their css_sets field to the "tasks" list_head in the css_set. Assuming that many tasks share the same cgroup assignments, this reduces overall space usage and keeps the size of the task_struct down (three pointers added to task_struct compared to a non-cgroups kernel, no matter how many subsystems are registered). [akpm@linux-foundation.org: fix a printk] [akpm@linux-foundation.org: build fix] Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Paul Menage	a424316ca1	Task Control Groups: add procfs interface Add: /proc/cgroups - general system info /proc/*/cgroup - per-task cgroup membership info [a.p.zijlstra@chello.nl: cgroups: bdi init hooks] Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Paul Menage	697f416108	Task Control Groups: add cgroup_clone() interface Add support for cgroup_clone(), a way to create new cgroups intended to be used for systems such as namespace unsharing. A new subsystem callback, post_clone(), is added to allow subsystems to automatically configure cloned cgroups. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Paul Menage	b4f48b6363	Task Control Groups: add fork()/exit() hooks This adds the necessary hooks to the fork() and exit() paths to ensure that new children inherit their parent's cgroup assignments, and that exiting processes release reference counts on their cgroups. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Paul Menage	355e0c48b7	Add cgroup write_uint() helper method Add write_uint() helper method for cgroup subsystems This helper is analagous to the read_uint() helper method for reporting u64 values to userspace. It's designed to reduce the amount of boilerplate requierd for creating new cgroup subsystems. Signed-off-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Paul Menage	bbcb81d091	Task Control Groups: add tasks file interface Add the per-directory "tasks" file for cgroupfs mounts; this allows the user to determine which tasks are members of a cgroup by reading a cgroup's "tasks", and to move a task into a cgroup by writing its pid to its "tasks". Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Paul Menage	ddbcc7e8e5	Task Control Groups: basic task cgroup framework Generic Process Control Groups -------------------------- There have recently been various proposals floating around for resource management/accounting and other task grouping subsystems in the kernel, including ResGroups, User BeanCounters, NSProxy cgroups, and others. These all need the basic abstraction of being able to group together multiple processes in an aggregate, in order to track/limit the resources permitted to those processes, or control other behaviour of the processes, and all implement this grouping in different ways. This patchset provides a framework for tracking and grouping processes into arbitrary "cgroups" and assigning arbitrary state to those groupings, in order to control the behaviour of the cgroup as an aggregate. The intention is that the various resource management and virtualization/cgroup efforts can also become task cgroup clients, with the result that: - the userspace APIs are (somewhat) normalised - it's easier to test e.g. the ResGroups CPU controller in conjunction with the BeanCounters memory controller, or use either of them as the resource-control portion of a virtual server system. - the additional kernel footprint of any of the competing resource management systems is substantially reduced, since it doesn't need to provide process grouping/containment, hence improving their chances of getting into the kernel This patch: Add the main task cgroups framework - the cgroup filesystem, and the basic structures for tracking membership and associating subsystem state objects to tasks. Signed-off-by: Paul Menage <menage@google.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Paul Jackson <pj@sgi.com> Cc: Kirill Korotaev <dev@openvz.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:36 -07:00
Randy Dunlap	8f731f7d83	kernel-api docbook: fix content problems Fix kernel-api docbook contents problems. docproc: linux-2.6.23-git13/include/asm-x86/unaligned_32.h: No such file or directory Warning(linux-2.6.23-git13//include/linux/list.h:482): bad line: of list entry Warning(linux-2.6.23-git13//mm/filemap.c:864): No description found for parameter 'ra' Warning(linux-2.6.23-git13//block/ll_rw_blk.c:3760): No description found for parameter 'req' Warning(linux-2.6.23-git13//include/linux/input.h:1077): No description found for parameter 'private' Warning(linux-2.6.23-git13//include/linux/input.h:1077): No description found for parameter 'cdev' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Jens Axboe <jens.axboe@oracle.com> Cc: WU Fengguang <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jeff Mahoney	cb680c1be6	reiserfs: ignore on disk s_bmap_nr value Implement support for file systems larger than 8 TiB. The reiserfs superblock contains a 16 bit value for counting the number of bitmap blocks. The rest of the disk format supports file systems up to 2^32 blocks, but the bitmap block limitation artificially limits this to 8 TiB with a 4KiB block size. Rather than trust the superblock's 16-bit bitmap block count, we calculate it dynamically based on the number of blocks in the file system. When an incorrect value is observed in the superblock, it is zeroed out, ensuring that older kernels will not be able to mount the file system. Userspace support has already been implemented and shipped in reiserfsprogs 3.6.20. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jeff Mahoney	4d20851d37	reiserfs: remove first_zero_hint The first_zero_hint metadata caching was never actually used, and it's of dubious optimization quality. This patch removes it. It doesn't actually shrink the size of the reiserfs_bitmap_info struct, since that doesn't work with block sizes larger than 8K. There was a big fixme in there, and with all the work lately in allowing block size > page size, I might as well kill the fixme as well. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jeff Mahoney	3ee1667042	reiserfs: fix usage of signed ints for block numbers Do a quick signedness check for block numbers. There are a number of places where signed integers are used for block numbers, which limits the usable file system size to 8 TiB. The disk format, excepting a problem which will be fixed in the following patch, supports file systems up to 16 TiB in size. This patch cleans up those sites so that we can enable the full usable size. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Jose R. Santos	c2a9159cdd	jbd: config_jbd_debug cannot create /proc entry The jbd-debug file used to be located in /proc/sys/fs/jbd-debug, but create_proc_entry() does not do lookups on file names that are more that one directory deep. This causes the entry creation to fail and hence, no proc file is created. Instead of fixing this on procfs might as well move the jbd2-debug file to debugfs which would be the preferred location for this kind of tunable. The new location is now /sys/kernel/debug/jbd/jbd-debug. [akpm@linux-foundation.org: zillions of cleanups] Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:35 -07:00
Chris Snook	1c09924448	jbd: remove printk() from J_ASSERT macros Remove printk from J_ASSERT to preserve registers during BUG. Signed-off-by: Chris Snook <csnook@redhat.com> Cc: "Stephen C. Tweedie" <sct@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:34 -07:00
Samuel Thibault	b293d75847	Console events and accessibility Some external modules like Speakup need to monitor console output. This adds a VT notifier that such modules can use to get console output events: allocation, deallocation, writes, other updates (cursor position, switch, etc.) [akpm@linux-foundation.org: fix headers_check] Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:34 -07:00
Alexey Dobriyan	fe9d4f5763	Add kernel/notifier.c There is separate notifier header, but no separate notifier .c file. Extract notifier code out of kernel/sys.c which will remain for misc syscalls I hope. Merge kernel/die_notifier.c into kernel/notifier.c. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:34 -07:00
Samuel Thibault	41ab4396e1	Console keyboard events and accessibility Some blind people use a kernel engine called Speakup which uses hardware synthesis to speak what gets displayed on the screen. They use the PC keyboard to control this engine (start/stop, accelerate, ...) and also need to get keyboard feedback (to make sure to know what they are typing, the caps lock status, etc.) Up to now, the way it was done was very ugly. Below is a patch to add a notifier list for permitting a far better implementation, see ChangeLog above for details. You may wonder why this can't be done at the input layer. The problem is that what people want to monitor is the console keyboard, i.e. all input keyboards that got attached to the console, and with the currently active keymap (i.e. keysyms, not only keycodes). This adds a keyboard notifier that such modules can use to get the keyboard events and possibly eat them, at several stages: - keycodes: even before translation into keysym. - unbound keycodes: when no keysym is bound. - unicode: when the keycode would get translated into a unicode character. - keysym: when the keycode would get translated into a keysym. - post_keysym: after the keysym got interpreted, so as to see the result (caps lock, etc.) This also provides access to k_handler so as to permit simulation of keypresses. [akpm@linux-foundation.org: various fixes] Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Dmitry Torokhov <dtor@mail.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:33 -07:00
Miklos Szeredi	c18479fe01	put declaration of put_filesystem() in fs.h Declarations go into headers. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Ram Pai <linuxram@us.ibm.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-19 11:53:33 -07:00
Andi Kleen	ab483570a1	x86 & generic: change to __builtin_prefetch() gcc 3.2+ supports __builtin_prefetch, so it's possible to use it on all architectures. Change the generic fallback in linux/prefetch.h to use it instead of noping it out. gcc should do the right thing when the architecture doesn't support prefetching Undefine the x86-64 inline assembler version and use the fallback. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-19 20:35:04 +02:00
Linus Torvalds	4fa4d23fa2	Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: pcnet32: remove private net_device_stats structure vortex_up should initialize "err" pcnet32: remove compile warnings in non-napi mode pcnet32: fix non-napi packet reception fix EMAC driver for proper napi_synchronize API sky2: shutdown cleanup napi_synchronize: waiting for NAPI forcedeth msi bugfix gianfar: fix obviously wrong #ifdef CONFIG_GFAR_NAPI placement fs_enet: Update for API changes gianfar: remove orphan struct. forcedeth: fix rx-work condition in nv_rx_process_optimized() too	2007-10-18 19:31:54 -07:00
Linus Torvalds	a9e82d3a02	Merge master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6: (37 commits) ide: set drive->autotune in ide_pci_setup_ports() triflex: always tune PIO opti621: always tune PIO cy82c693: always tune PIO cs5520: always tune PIO alim15x3: always tune PIO ide: add IDE_HFLAG_LEGACY_IRQS host flag ide: add IDE_HFLAG_SERIALIZE host flag ide: add IDE_HFLAG_ERROR_STOPS_FIFO host flag piix: add DECLARE_ICH_DEV() macro pdc202xx_old: add DECLARE_PDC2026X_DEV() macro pdc202xx_new: add DECLARE_PDCNEW_DEV() macro aec62xx: no need to disable UDMA in ->init_hwif method for ATP850UF ide: remove .init_setup from ide_pci_device_t serverworks: remove ->init_setup scc_pata: remove ->init_setup pdc202xx_old: remove ->init_setup pdc202xx_new: remove ->init_setup hpt366: remove ->init_setup cmd64x: remove ->init_setup ...	2007-10-18 16:00:02 -07:00
Bartlomiej Zolnierkiewicz	3985ee3b4c	ide: add IDE_HFLAG_LEGACY_IRQS host flag Add IDE_HFLAG_LEGACY_IRQS host flag to tell ide_pci_setup_ports() to set hwif->irq to legacy IRQ 14/15 (iff hwif->irq is not already set) and convert atiixp, piix, serverworks, sis5513 and slc90e66 host drivers to use it. While at it: * In piix.c add IDE_HFLAGS_PIIX define and don't use ->init_hwif for MPIIX. Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-19 00:30:11 +02:00
Bartlomiej Zolnierkiewicz	1c51361a98	ide: add IDE_HFLAG_SERIALIZE host flag Add IDE_HFLAG_SERIALIZE host flag to tell ide_pci_setup_ports() to set hwif/mate->serialized and convert aec62xx, cs5530 and sc1200 host drivers to use it. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-19 00:30:10 +02:00
Bartlomiej Zolnierkiewicz	ed67b92385	ide: add IDE_HFLAG_ERROR_STOPS_FIFO host flag Add IDE_HFLAG_ERROR_STOPS_FIFO host flag and use it instead of hwif->err_stops_fifo. As a side-effect this change fixes hwif->err_stops_fifo not being restored by ide_hwif_restore(). Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-19 00:30:10 +02:00
Bartlomiej Zolnierkiewicz	942278ef64	ide: remove .init_setup from ide_pci_device_t Now that all users were fixed we can safely remove it. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-19 00:30:09 +02:00
Bartlomiej Zolnierkiewicz	5f8b6c3485	ide: add ->mwdma_mask and ->swdma_mask to ide_pci_device_t (take 2) * Add ->mwdma_mask and ->swdma_mask to ide_pci_device_t. * Set ide_hwif_t DMA masks using DMA masks from ide_pci_device_t in setup-pci.c::ide_pci_setup_ports() (iff DMA base is valid and ->init_hwif method may still override them). * Convert IDE PCI host drivers to use ide_pci_device_t DMA masks. While at it: * Use ATA_{UDMA,MWDMA,SWDMA}* defines. * hpt34x.c: add separate ide_pci_device_t instances for HPT343 and HPT345. * serverworks.c: fix DMA masks being set before checking DMA base. v2: * Add missing masks to DECLARE_GENERIC_PCI_DEV() macro. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-19 00:30:07 +02:00
Bartlomiej Zolnierkiewicz	238e4f142c	ide: add IDE_HFLAG_NO_LBA48 and IDE_HFLAG_NO_LBA48_DMA host flags Add IDE_HFLAG_NO_LBA48[_DMA] host flags, use it instead of hwif->no_lba48[_dma] and then remove no longer needed hwif->no_lba48[_dma]. As a side-effect this change fixes hwif->no_lba48_dma not being restored by ide_hwif_restore(). Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-19 00:30:07 +02:00
Bartlomiej Zolnierkiewicz	9ffcf364f9	ide: remove ->init_setup_dma from ide_pci_device_t (take 2) * Make ide_pci_device_t.host_flags u32 and add IDE_HFLAG_CS5520 host flag. * Pass ide_pci_device_t d to setup-pci.c::ide_get_or_set_dma_base() and use d->name instead of hwif->cds->name. Set IDE_HFLAG_CS5520 host flag in cs5520 host driver and use it in ide_get_or_set_dma_base() to find out which PCI BAR to use, remove no longer needed cs5520.c::cs5520_init_setup_dma() and ide_pci_device_t.init_setup_dma. This fixes PCI bus-mastering not being checked for CS5510/CS5520 hosts. v2: * It is wrong to check simplex bits on CS5510/CS5520 as v1 did. (Noticed by Alan). Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-19 00:30:07 +02:00
Bartlomiej Zolnierkiewicz	47b687882c	ide: add IDE_HFLAG_NO_{DMA,AUTODMA} host flags Add IDE_HFLAG_NO_{DMA,AUTODMA} host flags. Convert all host drivers using ide_pci_device_t to use these flags instead of d->autodma and then remove no longer needed d->autodma. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-19 00:30:06 +02:00
Bartlomiej Zolnierkiewicz	7cab14a799	ide: add IDE_HFLAG_BOOTABLE host flag Add IDE_HFLAG_BOOTABLE host flag and IDE_HFLAG_OFF_BOARD define. Convert all host drivers using ide_pci_device_t to use IDE_HFLAG_{BOOTABLE,OFF_BOARD} instead of d->bootable and then remove no longer needed d->bootable. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-19 00:30:06 +02:00
Bartlomiej Zolnierkiewicz	33c1002ed9	ide: add IDE_HFLAG_NO_ATAPI_DMA host flag Add IDE_HFLAG_NO_ATAPI_DMA host flag and set it in host drivers which don't support ATAPI DMA. Then remove no longer needed hwif->atapi_dma. Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-19 00:30:06 +02:00
Benjamin Herrenschmidt	1b67834712	ide: Add ide_get_paired_drive() helper This adds a helper to get to the "other" drive on a pair connected to a given hwif. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andrew Morton <akpm@osdl.org> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>	2007-10-19 00:30:05 +02:00
Linus Torvalds	58f9b52ee8	Merge ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt * ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt: hrtimer: hook compat_sys_nanosleep up to high res timer code hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier	2007-10-18 15:12:41 -07:00
Linus Torvalds	2af170dd24	Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: [libata] kill ata_sg_is_last() Update libata driver for bf548 atapi controller against the 2.6.24 tree. libata-sff: Correct use of check_status() drivers/ata: add support to Freescale 3.0Gbps SATA Controller pata_acpi: fix build breakage if !CONFIG_PM	2007-10-18 15:08:35 -07:00
Linus Torvalds	54e840dd50	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: reduce schedstat variable overhead a bit sched: add KERN_CONT annotation sched: cleanup, make struct rq comments more consistent sched: cleanup, fix spacing sched: fix return value of wait_for_completion_interruptible()	2007-10-18 14:54:03 -07:00
Linus Torvalds	a57793651f	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (51 commits) [IPV6]: Fix again the fl6_sock_lookup() fixed locking [NETFILTER]: nf_conntrack_tcp: fix connection reopening fix [IPV6]: Fix race in ipv6_flowlabel_opt() when inserting two labels [IPV6]: Lost locking in fl6_sock_lookup [IPV6]: Lost locking when inserting a flowlabel in ipv6_fl_list [NETFILTER]: xt_sctp: fix mistake to pass a pointer where array is required [NET]: Fix OOPS due to missing check in dev_parse_header(). [TCP]: Remove lost_retrans zero seqno special cases [NET]: fix carrier-on bug? [NET]: Fix uninitialised variable in ip_frag_reasm() [IPSEC]: Rename mode to outer_mode and add inner_mode [IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP [IPSEC]: Use the top IPv4 route's peer instead of the bottom [IPSEC]: Store afinfo pointer in xfrm_mode [IPSEC]: Add missing BEET checks [IPSEC]: Move type and mode map into xfrm_state.c [IPSEC]: Fix length check in xfrm_parse_spi [IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi [IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi [IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input ...	2007-10-18 14:40:30 -07:00
Linus Torvalds	9cf52b2921	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC/64]: Consolidate of_register_driver [SPARC] Videopix Frame Grabber: Convert device_lock_sem to mutex [SPARC]: Support for new termios. [SPARC64]: Check of_get_property() return in pci_determine_mem_io_space(). [SPARC64]: Fix boot failures due to bootmem. [SPARC64]: Implement atomic backoff.	2007-10-18 14:39:44 -07:00
Corey Minyard	d8c98618f4	IPMI: add 0.9 support Add support for IPMI 0.9 systems to the IPMI driver. Just handle a shorter get device ID command with less information. Signed-off-by: Corey Minyard <cminyard@mvista.com> Cc: Stian Jordet <liste@jordet.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:32 -07:00
Corey Minyard	fcfa472411	IPMI: add polled interface Currently the IPMI watchdog timer sets the watchdog timeout on a panic, but it doesn't actually poll the interface to make sure the message goes out. Add an interface for polling the IPMI driver, and add code to the IPMI watchdog timer to poll the interface when the timer is set from a panic. Signed-off-by: Corey Minyard <cminyard@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:32 -07:00
Ralf Baechle	e8c44319c6	Replace __attribute_pure__ with __pure To be consistent with the use of attributes in the rest of the kernel replace all use of __attribute_pure__ with __pure and delete the definition of __attribute_pure__. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <rmk@arm.linux.org.uk> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:32 -07:00
Miklos Szeredi	0e9663ee45	fuse: add blksize field to fuse_attr There are cases when the filesystem will be passed the buffer from a single read or write call, namely: 1) in 'direct-io' mode (not O_DIRECT), read/write requests don't go through the page cache, but go directly to the userspace fs 2) currently buffered writes are done with single page requests, but if Nick's ->perform_write() patch goes it, it will be possible to do larger write requests. But only if the original write() was also bigger than a page. In these cases the filesystem might want to give a hint to the app about the optimal I/O size. Allow the userspace filesystem to supply a blksize value to be returned by stat() and friends. If the field is zero, it defaults to the old PAGE_CACHE_SIZE value. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	f33321141b	fuse: add support for mandatory locking For mandatory locking the userspace filesystem needs to know the lock ownership for read, write and truncate operations. This patch adds the necessary fields to the protocol. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	b25e82e567	fuse: add helper for asynchronous writes This patch adds a new helper function fuse_write_fill() which makes it possible to send WRITE requests asynchronously. A new flag for WRITE requests is also added which indicates that this a write from the page cache, and not a "normal" file write. This patch is in preparation for writable mmap support. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	a9ff4f8705	fuse: support BSD locking semantics It is trivial to add support for flock(2) semantics to the existing protocol, by setting the lock owner field to the file pointer, and passing a new FUSE_LK_FLOCK flag with the locking request. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	6ff958edbf	fuse: add atomic open+truncate support This patch allows fuse filesystems to implement open(..., O_TRUNC) as a single request, instead of separate truncate and open requests. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:31 -07:00
Miklos Szeredi	17637cbaba	fuse: improve utimes support Add two new flags for setattr: FATTR_ATIME_NOW and FATTR_MTIME_NOW. These mean, that atime or mtime should be changed to the current time. Also it is now possible to update atime or mtime individually, not just together. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:30 -07:00
Miklos Szeredi	d139d7ffd0	VFS: allow filesystems to implement atomic open+truncate Add a new attribute flag ATTR_OPEN, with the meaning: "truncation was initiated by open() due to the O_TRUNC flag". This way filesystems wanting to implement truncation within their ->open() method can ignore such truncate requests. This is a quick & dirty hack, but it comes for free. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:30 -07:00
Miklos Szeredi	c79e322f63	fuse: add file handle to getattr operation Add necessary protocol changes for supplying a file handle with the getattr operation. Step the API version to 7.9. This patch doesn't actually supply the file handle, because that needs some kind of VFS support, which we haven't yet been able to agree upon. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:30 -07:00
Takashi Sato	0f0a89ebe1	ext3: support large blocksize up to PAGESIZE This patch set supports large block size(>4k, <=64k) in ext3 just enlarging the block size limit. But it is NOT possible to have 64kB blocksize on ext3 without some changes to the directory handling code. The reason is that an empty 64kB directory block would have a rec_len == (__u16)2^16 == 0, and this would cause an error to be hit in the filesystem. The proposed solution is treat 64k rec_len with a an impossible value like rec_len = 0xffff to handle this. The Patch-set consists of the following 2 patches. [1/2] ext3: enlarge blocksize - Allow blocksize up to pagesize [2/2] ext3: fix rec_len overflow - prevent rec_len from overflow with 64KB blocksize Now on 64k page ppc64 box runs with this patch set we could create a 64k block size ext3, and able to handle empty directory block. Signed-off-by: Takashi Sato <sho@tnes.nec.co.jp> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:29 -07:00
Nick Piggin	b8dc93cbe9	bit_spin_lock: use lock bitops Convert bit_spin_lock to new locking bitops. Slub can use the non-atomic store version to clear (Christoph?) Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:29 -07:00
Satyam Sharma	761bb43190	Redefine {un}register_hotcpu_notifier() !HOTPLUG_CPU stubs The return of the present "do {} while" based stub definition of register_hotcpu_notifier() cannot be checked. This makes the stub asymmetric w.r.t. the real HOTPLUG_CPU=y implementation that is int-returning. So let us redefine this to be consistent with the full version. Also do the same for unregister_hotcpu_notifier(). We cannot define these as static inline functions due to an existing GCC bug (#33172). So define as macros that return appropriately instead (int '0' for the register_hotcpu_notifier case and void for unregister_hotcpu_notifier). Signed-off-by: Satyam Sharma <satyam@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:28 -07:00
Michael Neuling	f494f8fcb1	add-scaled-time-to-taskstats-based-process-accounting fix This moves the new items to the end of the taskstats struct as requested by Balbir and yourself. Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:28 -07:00
Michael Neuling	c66f08be7e	Add scaled time to taskstats based process accounting This adds items to the taststats struct to account for user and system time based on scaling the CPU frequency and instruction issue rates. Adds account_(user\|system)_time_scaled callbacks which architectures can use to account for time using this mechanism. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:28 -07:00
Jiri Slaby	65f76a82ec	Char: cyclades, fix some -W warnings Most of them are signedness, the rest unused function parameters. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:26 -07:00
Jiri Slaby	ebafeeff0f	Char: cyclades, remove bottom half processing The work done in bottom half doesn't cost much cpu time (e.g. tty_hangup itself schedules its own bottom half), it's possible to do the work in isr directly and save hence some .text. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Paul Fulghum <paulkf@microgate.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:26 -07:00
Andrew Morgan	72c2d5823f	V3 file capabilities: alter behavior of cap_setpcap The non-filesystem capability meaning of CAP_SETPCAP is that a process, p1, can change the capabilities of another process, p2. This is not the meaning that was intended for this capability at all, and this implementation came about purely because, without filesystem capabilities, there was no way to use capabilities without one process bestowing them on another. Since we now have a filesystem support for capabilities we can fix the implementation of CAP_SETPCAP. The most significant thing about this change is that, with it in effect, no process can set the capabilities of another process. The capabilities of a program are set via the capability convolution rules: pI(post-exec) = pI(pre-exec) pP(post-exec) = (X(aka cap_bset) & fP) \| (pI(post-exec) & fI) pE(post-exec) = fE ? pP(post-exec) : 0 at exec() time. As such, the only influence the pre-exec() program can have on the post-exec() program's capabilities are through the pI capability set. The correct implementation for CAP_SETPCAP (and that enabled by this patch) is that it can be used to add extra pI capabilities to the current process - to be picked up by subsequent exec()s when the above convolution rules are applied. Here is how it works: Let's say we have a process, p. It has capability sets, pE, pP and pI. Generally, p, can change the value of its own pI to pI' where (pI' & ~pI) & ~pP = 0. That is, the only new things in pI' that were not present in pI need to be present in pP. The role of CAP_SETPCAP is basically to permit changes to pI beyond the above: if (pE & CAP_SETPCAP) { pI' = anything; /* ie., even (pI' & ~pI) & ~pP != 0 */ } This capability is useful for things like login, which (say, via pam_cap) might want to raise certain inheritable capabilities for use by the children of the logged-in user's shell, but those capabilities are not useful to or needed by the login program itself. One such use might be to limit who can run ping. You set the capabilities of the 'ping' program to be "= cap_net_raw+i", and then only shells that have (pI & CAP_NET_RAW) will be able to run it. Without CAP_SETPCAP implemented as described above, login(pam_cap) would have to also have (pP & CAP_NET_RAW) in order to raise this capability and pass it on through the inheritable set. Signed-off-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:24 -07:00
Eric W. Biederman	fc6cd25b73	sysctl: Error on bad sysctl tables After going through the kernels sysctl tables several times it has become clear that code review and testing is just not effective in prevent problematic sysctl tables from being used in the stable kernel. I certainly can't seem to fix the problems as fast as they are introduced. Therefore this patch adds sysctl_check_table which is called when a sysctl table is registered and checks to see if we have a problematic sysctl table. The biggest part of the code is the table of valid binary sysctl entries, but since we have frozen our set of binary sysctls this table should not need to change, and it makes it much easier to detect when someone unintentionally adds a new binary sysctl value. As best as I can determine all of the several hundred errors spewed on boot up now are legitimate. [bunk@kernel.org: kernel/sysctl_check.c must #include <linux/string.h>] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:23 -07:00
Eric W. Biederman	f429cd37a2	sysctl: properly register the irda binary sysctl numbers Grumble. These numbers should have been in sysctl.h from the beginning if we ever expected anyone to use them. Oh well put them there now so we can find them and make maintenance easier. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Samuel Ortiz <samuel@sortiz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:23 -07:00
Eric W. Biederman	25398a158d	sysctl: parport remove binary paths The sysctl binary paths don't look as if they even code work, .data is not filled in, and all of the proc_handlers look at extra1 and there is not strategy routine. So just kill the binary paths. In addition this patch removes the setting of extra1 on directories. It doesn't look like the parport code ever examines it, and it's bad sysctl form. [bunk@kernel.org: remove parport_device_num()] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:23 -07:00
Eric W. Biederman	49a0c45833	sysctl: Factor out sysctl_data. There as been no easy way to wrap the default sysctl strategy routine except for returning 0. Which is not always what we want. The few instances I have seen that want different behaviour have written their own version of sysctl_data. While not too hard it is unnecessary code and has the potential for extra bugs. So to make these situations easier and make that part of sysctl more symetric I have factord sysctl_data out of do_sysctl_strategy and exported as a function everyone can use. Further having sysctl_data be an explicit function makes checking for badly formed sysctl tables much easier. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Eric W. Biederman	d8217f076b	sysctl core: Stop using the unnecessary ctl_table typedef In sysctl.h the typedef struct ctl_table ctl_table violates coding style isn't needed and is a bit of a nuisance because it makes it harder to recognize ctl_table is a type name. So this patch removes it from the generic sysctl code. Hopefully I will have enough energy to send the rest of my patches will follow and to remove it from the rest of the kernel. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:22 -07:00
Andrew Morton	8f286c33f1	stop using DMA_xxBIT_MASK Now that we have DMA_BIT_MASK(), these macros are pointless. Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:21 -07:00
Borislav Petkov	34c6538413	unify DMA_..BIT_MASK definitions: v3.1 Remove redundant DMA_..BIT_MASK definitions across two drivers. The computation of the majority of the bitmasks is done by the compiler. The initial split of the patch touching each a different file got removed due to possible git bisect breakage. Signed-off-by: Borislav Petkov <bbpetkov@yahoo.de> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Reviewed-by: Satyam Sharma <satyam@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:21 -07:00
Tony Breeds	2c62214831	Fix discrepancy between VDSO based gettimeofday() and sys_gettimeofday(). On platforms that copy sys_tz into the vdso (currently only x86_64, soon to include powerpc), it is possible for the vdso to get out of sync if a user calls (admittedly unusual) settimeofday(NULL, ptr). This patch adds a hook for architectures that set CONFIG_GENERIC_TIME_VSYSCALL to ensure when sys_tz is updated they can also updatee their copy in the vdso. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Cc: Andi Kleen <ak@suse.de> Cc: Tony Luck <tony.luck@intel.com> Acked-by: John Stultz <johnstul@us.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:20 -07:00
Alexey Dobriyan	6212e3a388	Remove struct task_struct::io_wait Hell knows what happened in commit 63b05203af57e7de4f3bb63b8b81d43bc196d32b during 2.6.9 development. Commit introduced io_wait field which remained write-only than and still remains write-only. Also garbage collect macros which "use" io_wait. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:20 -07:00
Rafael J. Wysocki	c7e0831d38	Hibernation: Check if ACPI is enabled during restore in the right place The following scenario leads to total confusion of the platform firmware on some boxes (eg. HPC nx6325): * Hibernate with ACPI enabled * Resume passing "acpi=off" to the boot kernel To prevent this from happening it's necessary to check if ACPI is enabled (and enable it if that's not the case) _right_ _after_ control has been transfered from the boot kernel to the image kernel, before device_power_up() is called (ie. with interrupts disabled). Enabling ACPI after calling device_power_up() turns out to be insufficient. For this reason, introduce new hibernation callback ->leave() that will be executed before device_power_up() by the restored image kernel. To make it work, it also is necessary to move swsusp_suspend() from swsusp.c to disk.c (it's name is changed to "create_image", which is more up to the point). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:20 -07:00
Andres Salomon	8f4ce8c32f	serial: turn serial console suspend a boot rather than compile time option Currently, there's a CONFIG_DISABLE_CONSOLE_SUSPEND that allows one to stop the serial console from being suspended when the rest of the machine goes to sleep. This is incredibly useful for debugging power management-related things; however, having it as a compile-time option has proved to be incredibly inconvenient for us (OLPC). There are plenty of times that we want serial console to not suspend, but for the most part we'd like serial console to be suspended. This drops CONFIG_DISABLE_CONSOLE_SUSPEND, and replaces it with a kernel boot parameter (no_console_suspend). By default, the serial console will be suspended along with the rest of the system; by passing 'no_console_suspend' to the kernel during boot, serial console will remain alive during suspend. For now, this is pretty serial console specific; further fixes could be applied to make this work for things like netconsole. Signed-off-by: Andres Salomon <dilinger@debian.org> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:19 -07:00
Rafael J. Wysocki	e42837bcd3	freezer: introduce freezer-friendly waiting macros Introduce freezer-friendly wrappers around wait_event_interruptible() and wait_event_interruptible_timeout(), originally defined in <linux/wait.h>, to be used in freezable kernel threads. Make some of the freezable kernel threads use them. This is necessary for the freezer to stop sending signals to kernel threads, which is implemented in the next patch. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:19 -07:00
Rafael J. Wysocki	b3dac3b304	PM: Rename hibernation_ops to platform_hibernation_ops Rename 'struct hibernation_ops' to 'struct platform_hibernation_ops' in analogy with 'struct platform_suspend_ops'. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:18 -07:00
Rafael J. Wysocki	74f270af0c	PM: Rework struct hibernation_ops During hibernation we also need to tell the ACPI core that we're going to put the system into the S4 sleep state. For this reason, an additional method in 'struct hibernation_ops' is needed, playing the role of set_target() in 'struct platform_suspend_operations'. Moreover, the role of the .prepare() method is now different, so it's better to introduce another method, that in general may be different from .prepare(), that will be used to prepare the platform for creating the hibernation image (.prepare() is used anyway to notify the platform that we're going to enter the low power state after the image has been saved). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:18 -07:00
Rafael J. Wysocki	f242d9196f	PM: Make suspend_ops static The variable suspend_ops representing the set of global platform-specific suspend-related operations, used by the PM core, need not be exported outside of kernel/power/main.c . Make it static. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:18 -07:00
Rafael J. Wysocki	e6c5eb9541	PM: Rework struct platform_suspend_ops There is no reason why the .prepare() and .finish() methods in 'struct platform_suspend_ops' should take any arguments, since architectures don't use these methods' argument in any practically meaningful way (ie. either the target system sleep state is conveyed to the platform by .set_target(), or there is only one suspend state supported and it is indicated to the PM core by .valid(), or .prepare() and .finish() aren't defined at all). There also is no reason why .finish() should return any result. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:18 -07:00
Rafael J. Wysocki	26398a70ea	PM: Rename struct pm_ops and related things The name of 'struct pm_ops' suggests that it is related to the power management in general, but in fact it is only related to suspend. Moreover, its name should indicate what this structure is used for, so it seems reasonable to change it to 'struct platform_suspend_ops'. In that case, the name of the global variable of this type used by the PM core and the names of related functions should be changed accordingly. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:18 -07:00
Rafael J. Wysocki	95d9ffbe01	PM: Move definition of struct pm_ops to suspend.h Move the definition of 'struct pm_ops' and related functions from <linux/pm.h> to <linux/suspend.h> . There are, at least, the following reasons to do that: * 'struct pm_ops' is specifically related to suspend and not to the power management in general. * As long as 'struct pm_ops' is defined in <linux/pm.h>, any modification of it causes the entire kernel to be recompiled, which is unnecessary and annoying. * Some suspend-related features are already defined in <linux/suspend.h>, so it is logical to move the definition of 'struct pm_ops' into there. * 'struct hibernation_ops', being the hibernation-related counterpart of 'struct pm_ops', is defined in <linux/suspend.h> . Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-18 14:37:18 -07:00
Anton Blanchard	04c227140f	hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier Pull the copy_to_user out of hrtimer_nanosleep and into the callers (common_nsleep, sys_nanosleep) in preparation for converting compat_sys_nanosleep to use hrtimers. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2007-10-18 22:54:18 +02:00
Jeff Garzik	3be6cbd73f	[libata] kill ata_sg_is_last() Short term, this works around a bug introduced by early sg-chaining work. Long term, removing this function eliminates a branch from a hot path loop in each scatter/gather table build. Also, as this code demonstrates, we don't need to _track_ the end of the s/g list, as long as we mark it in some way. And doing so programatically is nice. So its a useful cleanup, regardless of its short term effects. Based conceptually on a quick patch by Jens Axboe. Signed-off-by: Jeff Garzik <jgarzik@redhat.com>	2007-10-18 16:21:18 -04:00
Ken Chen	480b9434c5	sched: reduce schedstat variable overhead a bit schedstat is useful in investigating CPU scheduler behavior. Ideally, I think it is beneficial to have it on all the time. However, the cost of turning it on in production system is quite high, largely due to number of events it collects and also due to its large memory footprint. Most of the fields probably don't need to be full 64-bit on 64-bit arch. Rolling over 4 billion events will most like take a long time and user space tool can be made to accommodate that. I'm proposing kernel to cut back most of variable width on 64-bit system. (note, the following patch doesn't affect 32-bit system). Signed-off-by: Ken Chen <kenchen@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2007-10-18 21:32:56 +02:00
Li Zefan	009e8c965f	[NETFILTER]: xt_sctp: fix mistake to pass a pointer where array is required Macros like SCTP_CHUNKMAP_XXX(chukmap) require chukmap to be an array, but match_packet() passes a pointer to these macros. Also remove the ELEMCOUNT macro and fix a bug in SCTP_CHUNKMAP_COPY. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 05:12:21 -07:00
Patrick McHardy	1b83336bb9	[NET]: Fix OOPS due to missing check in dev_parse_header(). [ This is kernel bugzilla 9174 "linux-2.6.23-git11 kernel panic" ] The device in question is an IPv6-over-IPv4 tunnel, which doesn't have any header_ops, so the crash happens in dev_parse_header when dereferencing them. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-18 05:09:28 -07:00
Pavel Emelyanov	55b333253d	[NET]: Introduce the sk_detach_filter() call Filter is attached in a separate function, so do the same for filter detaching. This also removes one variable sock_setsockopt(). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 21:21:26 -07:00
Stephen Rothwell	5c45708352	[SPARC/64]: Consolidate of_register_driver Also of_unregister_driver. These will be shortly also used by the PowerPC code. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-10-17 21:17:42 -07:00
Stephen Hemminger	c264c3dee9	napi_synchronize: waiting for NAPI Some drivers with shared NAPI need a synchronization barrier. Also suggested by Benjamin Herrenschmidt for EMAC. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-10-17 20:17:34 -04:00
Aneesh Kumar K.V	d8dd0b4543	ext4: Convert ext4_extent_idx.ei_leaf to ext4_extent_idx.ei_leaf_lo Convert ext4_extent_idx.ei_leaf ext4_extent_idx.ei_leaf_lo This helps in finding BUGs due to direct partial access of these split 48 bit values. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:03 -04:00
Aneesh Kumar K.V	b377611d11	ext4: Convert ext4_extent.ee_start to ext4_extent.ee_start_lo Convert ext4_extent.ee_start to ext4_extent.ee_start_lo This helps in finding BUGs due to direct partial access of these split 48 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:03 -04:00
Aneesh Kumar K.V	308ba3ece7	ext4: Convert s_r_blocks_count and s_free_blocks_count Convert s_r_blocks_count and s_free_blocks_count to s_r_blocks_count_lo and s_free_blocks_count_lo This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-10-17 18:50:02 -04:00
Aneesh Kumar K.V	6bc9feff14	ext4: Convert s_blocks_count to s_blocks_count_lo Convert s_blocks_count to s_blocks_count_lo This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix direct partial access in ext4 code Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:02 -04:00
Aneesh Kumar K.V	5272f83727	ext4: Convert bg_inode_bitmap and bg_inode_table Convert bg_inode_bitmap and bg_inode_table to bg_inode_bitmap_lo and bg_inode_table_lo. This helps in finding BUGs due to direct partial access of these split 64 bit values Also fix one direct partial access Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:02 -04:00
Aneesh Kumar K.V	3a14589cce	ext4: Convert bg_block_bitmap to bg_block_bitmap_lo Convert bg_block_bitmap to bg_block_bitmap_lo This helps in catching some BUGS due to direct partial access of these split fields. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:01 -04:00
Jose R. Santos	ce42158179	ext4: FLEX_BG Kernel support v2. This feature relaxes check restrictions on where each block groups meta data is located within the storage media. This allows for the allocation of bitmaps or inode tables outside the block group boundaries in cases where bad blocks forces us to look for new blocks which the owning block group can not satisfy. This will also allow for new meta-data allocation schemes to improve performance and scalability. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-17 18:50:01 -04:00
Aneesh Kumar K.V	c1bddad949	ext4: Fix sparse warnings Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-17 18:50:01 -04:00
Andreas Dilger	717d50e497	Ext4: Uninitialized Block Groups In pass1 of e2fsck, every inode table in the fileystem is scanned and checked, regardless of whether it is in use. This is this the most time consuming part of the filesystem check. The unintialized block group feature can greatly reduce e2fsck time by eliminating checking of uninitialized inodes. With this feature, there is a a high water mark of used inodes for each block group. Block and inode bitmaps can be uninitialized on disk via a flag in the group descriptor to avoid reading or scanning them at e2fsck time. A checksum of each group descriptor is used to ensure that corruption in the group descriptor's bit flags does not cause incorrect operation. The feature is enabled through a mkfs option mke2fs /dev/ -O uninit_groups A patch adding support for uninitialized block groups to e2fsprogs tools has been posted to the linux-ext4 mailing list. The patches have been stress tested with fsstress and fsx. In performance tests testing e2fsck time, we have seen that e2fsck time on ext3 grows linearly with the total number of inodes in the filesytem. In ext4 with the uninitialized block groups feature, the e2fsck time is constant, based solely on the number of used inodes rather than the total inode count. Since typical ext4 filesystems only use 1-10% of their inodes, this feature can greatly reduce e2fsck time for users. With performance improvement of 2-20 times, depending on how full the filesystem is. The attached graph shows the major improvements in e2fsck times in filesystems with a large total inode count, but few inodes in use. In each group descriptor if we have EXT4_BG_INODE_UNINIT set in bg_flags: Inode table is not initialized/used in this group. So we can skip the consistency check during fsck. EXT4_BG_BLOCK_UNINIT set in bg_flags: No block in the group is used. So we can skip the block bitmap verification for this group. We also add two new fields to group descriptor as a part of uninitialized group patch. __le16 bg_itable_unused; /* Unused inodes count / __le16 bg_checksum; / crc16(sb_uuid+group+desc) */ bg_itable_unused: If we have EXT4_BG_INODE_UNINIT not set in bg_flags then bg_itable_unused will give the offset within the inode table till the inodes are used. This can be used by fsck to skip list of inodes that are marked unused. bg_checksum: Now that we depend on bg_flags and bg_itable_unused to determine the block and inode usage, we need to make sure group descriptor is not corrupt. We add checksum to group descriptor to detect corruption. If the descriptor is found to be corrupt, we mark all the blocks and inodes in the group used. Signed-off-by: Avantika Mathur <mathur@us.ibm.com> Signed-off-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	2007-10-17 18:50:00 -04:00
Eric Sandeen	4074fe3736	ext4: remove #ifdef CONFIG_EXT4_INDEX CONFIG_EXT4_INDEX is not an exposed config option in the kernel, and it is unconditionally defined in ext4_fs.h. tune2fs is already able to turn off dir indexing, so at this point it's just cluttering up the code. Remove it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2007-10-17 18:50:00 -04:00
Coly Li	f077d0d7ea	ext4: Remove (partial, never completed) fragment support Fragment support in ext2/3/4 was never implemented, and it probably will never be implemented. So remove it from ext4. Signed-off-by: Coly Li <coyli@suse.de> Acked-by: Andreas Dilger <adilger@clusterfs.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-10-17 18:49:59 -04:00
Mingming Cao	cd02ff0b14	jbd2: JBD_XXX to JBD2_XXX naming cleanup change JBD_XXX macros to JBD2_XXX in JBD2/Ext4 Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>	2007-10-17 18:49:58 -04:00
Mingming Cao	2d917969bc	JBD2: replace jbd_kmalloc with kmalloc directly. This patch cleans up jbd_kmalloc and replace it with kmalloc directly Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2007-10-17 18:49:57 -04:00
Mingming Cao	a5005da204	JBD: replace jbd_kmalloc with kmalloc directly This patch cleans up jbd_kmalloc and replace it with kmalloc directly Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2007-10-17 18:49:57 -04:00
Mingming Cao	af1e76d6b3	JBD2: jbd2 slab allocation cleanups JBD2: Replace slab allocations with page allocations JBD2 allocate memory for committed_data and frozen_data from slab. However JBD2 should not pass slab pages down to the block layer. Use page allocator pages instead. This will also prepare JBD for the large blocksize patchset. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2007-10-17 18:49:56 -04:00
Mingming Cao	c089d490df	JBD: JBD slab allocation cleanups JBD: Replace slab allocations with page allocations JBD allocate memory for committed_data and frozen_data from slab. However JBD should not pass slab pages down to the block layer. Use page allocator pages instead. This will also prepare JBD for the large blocksize patchset. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com>	2007-10-17 18:49:56 -04:00
Pierre Ossman	727c26ed78	net: libertas sdio driver Add driver for Marvell's Libertas 8385 and 8686 wifi chips. Signed-off-by: Pierre Ossman <drzeus@drzeus.cx> Acked-by: Dan Williams <dcbw@redhat.com>	2007-10-17 22:51:13 +02:00
Linus Torvalds	e6d5a11dad	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: fix new task startup crash sched: fix !SYSFS build breakage sched: fix improper load balance across sched domain sched: more robust sd-sysctl entry freeing	2007-10-17 09:11:18 -07:00
Linus Torvalds	c548f08a4f	Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (24 commits) [POWERPC] Fix vmemmap warning in init_64.c [POWERPC] Fix 64 bits vDSO DWARF info for CR register [POWERPC] Add 1TB workaround for PA6T [POWERPC] Enable NO_HZ and high res timers for pseries and ppc64 configs [POWERPC] Quieten cache information at boot [POWERPC] Quieten clockevent printk [POWERPC] Enable SLUB in *_defconfig [POWERPC] Fix 1TB segment detection [POWERPC] Fix iSeries_hpte_insert prototype [POWERPC] Fix copyright symbol [POWERPC] ibmebus: Move to of_device and of_platform_driver, match eHCA and eHEA drivers [POWERPC] ibmebus: Add device creation and bus probing based on of_device [POWERPC] ibmebus: Remove bus match/probe/remove functions [POWERPC] Move of_device allocation into of_device.[ch] [POWERPC] mpc52xx: device tree changes for FEC and MDIO [POWERPC] bestcomm: GenBD task support [POWERPC] bestcomm: FEC task support [POWERPC] bestcomm: ATA task support [POWERPC] bestcomm: core bestcomm support for Freescale MPC5200 [POWERPC] mpc52xx: Update mpc52xx_psc structure with B revision changes ...	2007-10-17 09:05:55 -07:00
Linus Torvalds	5c8e191e84	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-x86setup: Remove magic macros for screen_info structure members [x86] remove uses of magic macros for boot_params access	2007-10-17 09:00:30 -07:00
Adrian Bunk	cbfee34520	security/ cleanups This patch contains the following cleanups that are now possible: - remove the unused security_operations->inode_xattr_getsuffix - remove the no longer used security_operations->unregister_security - remove some no longer required exit code - remove a bunch of no longer used exports Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:07 -07:00
Serge E. Hallyn	b53767719b	Implement file posix capabilities Implement file posix capabilities. This allows programs to be given a subset of root's powers regardless of who runs them, without having to use setuid and giving the binary all of root's powers. This version works with Kaigai Kohei's userspace tools, found at http://www.kaigai.gr.jp/index.php. For more information on how to use this patch, Chris Friedhoff has posted a nice page at http://www.friedhoff.org/fscaps.html. Changelog: Nov 27: Incorporate fixes from Andrew Morton (security-introduce-file-caps-tweaks and security-introduce-file-caps-warning-fix) Fix Kconfig dependency. Fix change signaling behavior when file caps are not compiled in. Nov 13: Integrate comments from Alexey: Remove CONFIG_ ifdef from capability.h, and use %zd for printing a size_t. Nov 13: Fix endianness warnings by sparse as suggested by Alexey Dobriyan. Nov 09: Address warnings of unused variables at cap_bprm_set_security when file capabilities are disabled, and simultaneously clean up the code a little, by pulling the new code into a helper function. Nov 08: For pointers to required userspace tools and how to use them, see http://www.friedhoff.org/fscaps.html. Nov 07: Fix the calculation of the highest bit checked in check_cap_sanity(). Nov 07: Allow file caps to be enabled without CONFIG_SECURITY, since capabilities are the default. Hook cap_task_setscheduler when !CONFIG_SECURITY. Move capable(TASK_KILL) to end of cap_task_kill to reduce audit messages. Nov 05: Add secondary calls in selinux/hooks.c to task_setioprio and task_setscheduler so that selinux and capabilities with file cap support can be stacked. Sep 05: As Seth Arnold points out, uid checks are out of place for capability code. Sep 01: Define task_setscheduler, task_setioprio, cap_task_kill, and task_setnice to make sure a user cannot affect a process in which they called a program with some fscaps. One remaining question is the note under task_setscheduler: are we ok with CAP_SYS_NICE being sufficient to confine a process to a cpuset? It is a semantic change, as without fsccaps, attach_task doesn't allow CAP_SYS_NICE to override the uid equivalence check. But since it uses security_task_setscheduler, which elsewhere is used where CAP_SYS_NICE can be used to override the uid equivalence check, fixing it might be tough. task_setscheduler note: this also controls cpuset:attach_task. Are we ok with CAP_SYS_NICE being used to confine to a cpuset? task_setioprio task_setnice sys_setpriority uses this (through set_one_prio) for another process. Need same checks as setrlimit Aug 21: Updated secureexec implementation to reflect the fact that euid and uid might be the same and nonzero, but the process might still have elevated caps. Aug 15: Handle endianness of xattrs. Enforce capability version match between kernel and disk. Enforce that no bits beyond the known max capability are set, else return -EPERM. With this extra processing, it may be worth reconsidering doing all the work at bprm_set_security rather than d_instantiate. Aug 10: Always call getxattr at bprm_set_security, rather than caching it at d_instantiate. [morgan@kernel.org: file-caps clean up for linux/capability.h] [bunk@kernel.org: unexport cap_inode_killpriv] Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Andrew Morgan <morgan@kernel.org> Signed-off-by: Andrew Morgan <morgan@kernel.org> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:07 -07:00
Alexey Dobriyan	57c521ce61	ifdef struct task_struct::security For those who don't care about CONFIG_SECURITY. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:07 -07:00
James Morris	20510f2f4e	security: Convert LSM into a static interface Convert LSM into a static interface, as the ability to unload a security module is not required by in-tree users and potentially complicates the overall security architecture. Needlessly exported LSM symbols have been unexported, to help reduce API abuse. Parameters for the capability and root_plug modules are now specified at boot. The SECURITY_FRAMEWORK_VERSION macro has also been removed. In a nutshell, there is no safe way to unload an LSM. The modular interface is thus unecessary and broken infrastructure. It is used only by out-of-tree modules, which are often binary-only, illegal, abusive of the API and dangerous, e.g. silently re-vectoring SELinux. [akpm@linux-foundation.org: cleanups] [akpm@linux-foundation.org: USB Kconfig fix] [randy.dunlap@oracle.com: fix LSM kernel-doc] Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: "Serge E. Hallyn" <serue@us.ibm.com> Acked-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:07 -07:00
Dave Hansen	ce8d2cdf3d	r/o bind mounts: filesystem helpers for custom 'struct file's Why do we need r/o bind mounts? This feature allows a read-only view into a read-write filesystem. In the process of doing that, it also provides infrastructure for keeping track of the number of writers to any given mount. This has a number of uses. It allows chroots to have parts of filesystems writable. It will be useful for containers in the future because users may have root inside a container, but should not be allowed to write to somefilesystems. This also replaces patches that vserver has had out of the tree for several years. It allows security enhancement by making sure that parts of your filesystem read-only (such as when you don't trust your FTP server), when you don't want to have entire new filesystems mounted, or when you want atime selectively updated. I've been using the following script to test that the feature is working as desired. It takes a directory and makes a regular bind and a r/o bind mount of it. It then performs some normal filesystem operations on the three directories, including ones that are expected to fail, like creating a file on the r/o mount. This patch: Some filesystems forego the vfs and may_open() and create their own 'struct file's. This patch creates a couple of helper functions which can be used by these filesystems, and will provide a unified place which the r/o bind mount code may patch. Also, rename an existing, static-scope init_file() to a less generic name. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:04 -07:00
Bjorn Helgaas	402b310cb6	PNP: remove null pointer checks Remove some null pointer checks. Null pointers in these areas indicate programming errors, and I think it's better to oops immediately rather than return an error that is easily ignored. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Adam Belay <ambx1@neo.rr.com> Cc: Len Brown <lenb@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:04 -07:00
Adrian Bunk	5ebf2c1260	bitmap.h: remove dead artifacts bitmap_active() no longer exists and BITMAP_ACTIVE is no longer used. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Neil Brown <neilb@suse.de> Cc: "J. Bruce Fields" <bfields@fieldses.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:03 -07:00
Martin J. Bligh	a686cd898b	ext2 reservations Val's cross-port of the ext3 reservations code into ext2. [mbligh@mbligh.org: Small type error for printk [akpm@linux-foundation.org: fix types, sync with ext3] [mbligh@mbligh.org: Bring ext2 reservations code in line with latest ext3] [akpm@linux-foundation.org: kill noisy printk] [akpm@linux-foundation.org: remember to dirty the gdp's block] [akpm@linux-foundation.org: cross-port the missed `5dea5176e5`] [akpm@linux-foundation.org: cross-port `e6022603b9`] [akpm@linux-foundation.org: Port the omitted `08fb306fe6`] [akpm@linux-foundation.org: Backport the missed `20acaa18d0`] [akpm@linux-foundation.org: fixes] [cmm@us.ibm.com: fix reservation extension] [bunk@stusta.de: make ext2_get_blocks() static] [hugh@veritas.com: fix hang] [hugh@veritas.com: ext2_new_blocks should reset the reservation window size] [hugh@veritas.com: ext2 balloc: fix off-by-one against rsv_end] [hugh@veritas.com: grp_goal 0 is a genuine goal (unlike -1), so ext2_try_to_allocate_with_rsv should treat it as such] [hugh@veritas.com: rbtree usage cleanup] [pbadari@us.ibm.com: Fix for ext2 reservation] [bunk@kernel.org: remove fs/ext2/balloc.c:reserve_blocks()] [hugh@veritas.com: ext2 balloc: use io_error label] Cc: "Martin J. Bligh" <mbligh@mbligh.org> Cc: Valerie Henson <val_henson@linux.intel.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:02 -07:00
Joern Engel	1c0eeaf569	introduce I_SYNC I_LOCK was used for several unrelated purposes, which caused deadlock situations in certain filesystems as a side effect. One of the purposes now uses the new I_SYNC bit. Also document the various bits and change their order from historical to logical. [bunk@stusta.de: make fs/inode.c:wake_up_inode() static] Signed-off-by: Joern Engel <joern@wohnheim.fh-wedel.de> Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Cc: David Chinner <dgc@sgi.com> Cc: Anton Altaparmakov <aia21@cam.ac.uk> Cc: Al Viro <viro@ftp.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:02 -07:00
Fengguang Wu	2e6883bdf4	writeback: introduce writeback_control.more_io to indicate more io After making dirty a 100M file, the normal behavior is to start the writeback for all data after 30s delays. But sometimes the following happens instead: - after 30s: ~4M - after 5s: ~4M - after 5s: all remaining 92M Some analyze shows that the internal io dispatch queues goes like this: s_io s_more_io ------------------------- 1) 100M,1K 0 2) 1K 96M 3) 0 96M 1) initial state with a 100M file and a 1K file 2) 4M written, nr_to_write <= 0, so write more 3) 1K written, nr_to_write > 0, no more writes(BUG) nr_to_write > 0 in (3) fools the upper layer to think that data have all been written out. The big dirty file is actually still sitting in s_more_io. We cannot simply splice s_more_io back to s_io as soon as s_io becomes empty, and let the loop in generic_sync_sb_inodes() continue: this may starve newly expired inodes in s_dirty. It is also not an option to draw inodes from both s_more_io and s_dirty, an let the loop go on: this might lead to live locks, and might also starve other superblocks in sync time(well kupdate may still starve some superblocks, that's another bug). We have to return when a full scan of s_io completes. So nr_to_write > 0 does not necessarily mean that "all data are written". This patch introduces a flag writeback_control.more_io to indicate this situation. With it the big dirty file no longer has to wait for the next kupdate invocation 5s later. Cc: David Chinner <dgc@sgi.com> Cc: Ken Chen <kenchen@google.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:02 -07:00
Fengguang Wu	08d8e9749e	writeback: fix ntfs with sb_has_dirty_inodes() NTFS's if-condition on dirty inodes is not complete. Fix it with sb_has_dirty_inodes(). Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Ken Chen <kenchen@google.com> Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:02 -07:00
Ken Chen	0e0f4fc22e	writeback: fix periodic superblock dirty inode flushing Current -mm tree has bucketful of bug fixes in periodic writeback path. However, we still hit a glitch where dirty pages on a given inode aren't completely flushed to the disk, and system will accumulate large amount of dirty pages beyond what dirty_expire_interval is designed for. The problem is __sync_single_inode() will move an inode to sb->s_dirty list even when there are more pending dirty pages on that inode. If there is another inode with a small number of dirty pages, we hit a case where the loop iteration in wb_kupdate() terminates prematurely because wbc.nr_to_write > 0. Thus leaving the inode that has large amount of dirty pages behind and it has to wait for another dirty_writeback_interval before we flush it again. We effectively only write out MAX_WRITEBACK_PAGES every dirty_writeback_interval. If the rate of dirtying is sufficiently high, the system will start accumulate a large number of dirty pages. So fix it by having another sb->s_more_io list on which to park the inode while we iterate through sb->s_io and to allow each dirty inode which resides on that sb to have an equal chance of flushing some amount of dirty pages. Signed-off-by: Ken Chen <kenchen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:02 -07:00
Ingo Molnar	4749252776	printk: add KERN_CONT annotation printk: add the KERN_CONT annotation (which is empty string but via which checkpatch.pl can notice that the lacking KERN_ level is fine). This useful for multiple calls of hand-crafted printk output done by early debug code or similar. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:01 -07:00
Ulrich Drepper	22d2b35b20	F_DUPFD_CLOEXEC implementation One more small change to extend the availability of creation of file descriptors with FD_CLOEXEC set. Adding a new command to fcntl() requires no new system call and the overall impact on code size if minimal. If this patch gets accepted we will also add this change to the next revision of the POSIX spec. To test the patch, use the following little program. Adjust the value of F_DUPFD_CLOEXEC appropriately. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #include <errno.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #ifndef F_DUPFD_CLOEXEC # define F_DUPFD_CLOEXEC 12 #endif int main (int argc, char *argv[]) { if (argc > 1) { if (fcntl (3, F_GETFD) == 0) { puts ("descriptor not closed"); exit (1); } if (errno != EBADF) { puts ("error not EBADF"); exit (1); } exit (0); } int fd = fcntl (STDOUT_FILENO, F_DUPFD_CLOEXEC, 0); if (fd == -1 && errno == EINVAL) { puts ("F_DUPFD_CLOEXEC not supported"); return 0; } if (fd != 3) { puts ("program called with descriptors other than 0,1,2"); return 1; } execl ("/proc/self/exe", "/proc/self/exe", "1", NULL); puts ("execl failed"); return 1; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Ulrich Drepper <drepper@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:01 -07:00
Alexey Dobriyan	18796aa002	task_struct: move ->fpu_counter and ->oomkilladj There is nice 2 byte hole after struct task_struct::ioprio field into which we can put two 1-byte fields: ->fpu_counter and ->oomkilladj. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:01 -07:00
Davide Libenzi	96358de6bc	rename signalfd_siginfo fields For Michael Kerrisk request, the following patch renames signalfd_siginfo fields in order to keep them consistent with the siginfo_t ones. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:01 -07:00
Eric Sandeen	059590f495	ext3: remove #ifdef CONFIG_EXT3_INDEX CONFIG_EXT3_INDEX is not an exposed config option in the kernel, and it is unconditionally defined in ext3_fs.h. tune2fs is already able to turn off dir indexing, so at this point it's just cluttering up the code. Remove it. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:01 -07:00
Ahmed S. Darwish	b4471cbb09	Completely remove deprecated IRQ flags (SA_) Only very little files use the deprecated SA_ IRQ flags in latest pull. This patch series removes such macros from the tree and transfrom old code to the new IRQF_* flags. I've grepped the whole tree to make sure that no more files than the patched ones use such deprecated macros. I hope this series won't introduce build errors. Signed-off-by: Ahmed S. Darwish <darwish.07@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Matthew Wilcox <willy@debian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:00 -07:00
Andrey Mirkin	fd5eea4214	change inotifyfs magic as the same magic is used for futexfs Right now futexfs and inotifyfs have one magic 0xBAD1DEA, that looks a little bit confusing. Use 0xBAD1DEA as magic for futexfs and 0x2BAD1DEA as magic for inotifyfs. Signed-off-by: Andrey Mirkin <major@openvz.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:00 -07:00
Olaf Hering	4f9a58d75b	increase AT_VECTOR_SIZE to terminate saved_auxv properly include/asm-powerpc/elf.h has 6 entries in ARCH_DLINFO. fs/binfmt_elf.c has 14 unconditional NEW_AUX_ENT entries and 2 conditional NEW_AUX_ENT entries. So in the worst case, saved_auxv does not get an AT_NULL entry at the end. The saved_auxv array must be terminated with an AT_NULL entry. Make the size of mm_struct->saved_auxv arch dependend, based on the number of ARCH_DLINFO entries. Signed-off-by: Olaf Hering <olh@suse.de> Cc: Roland McGrath <roland@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:43:00 -07:00
Pavel Emelyanov	1efd24fa05	Remove unused member from nsproxy The nslock spinlock is not used in the kernel at all. Remove it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:59 -07:00
Alexey Dobriyan	970a8645ca	user.c: #ifdef ->mq_bytes For those who deselect POSIX message queues. Reduces SLAB size of user_struct from 64 to 32 bytes here, SLUB size -- from 40 bytes to 32 bytes. [akpm@linux-foundation.org: fix build] Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:59 -07:00
Alan Cox	5f519d7281	tty: expose new methods needed for drivers to get termios right This adds three new functions (or in one case to be more exact makes it always available) tty_termios_copy_hw Copies all the hardware settings from one termios structure to the other. This is intended for drivers that support little or no hardware setting tty_termios_encode_baud_rate Allows you to set the input and output baud rate in a termios structure. A driver is supposed to set the resulting baud rate from a request so most will want to use this function to set the resulting input and output rates to match the hardware values. Internally it knows about keeping Bxxx encoding when possible to maximise compatibility. tty_encode_baud_rate As above but for the tty's own current termios structure I suspect this will initially need some tweaking as it gets enabled by driver patches over the next few mm cycles so consider this lot -mm only for the moment so it can stabilize and end up neat before it goes to base. I've tried not to break any obscure architectures - if you get a speed you can't represent the code will print warnings on non updated termios systems but not break. Once this is merged and seems sane I've got a growing pile of driver updates to use it - notably for USB serial drivers. [akpm@linux-foundation.org: cleanups] Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:58 -07:00
Denys Vlasenko	b9ec0339d8	add consts where appropriate in fs/nls/* Add const modifiers to a few struct nls_table's member pointers in include/linux/nls.h and adds a lot of const's in fs/nls/*.c files. Resulting changes as visible by size: text data bss dec hex filename 113612 481216 2368 597196 91ccc nls.org/built-in.o 593548 3296 288 597132 91c8c nls/built-in.o Apparently compiler managed to optimize code a bit better because of const-ness. No other changes are made. Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:58 -07:00
Denis V. Lunev	37c42524d6	shrink_dcache_sb speedup This patch makes shrink_dcache_sb consistent with dentry pruning policy. On the first pass we iterate over dentry unused list and prepare some dentries for removal. However, since the existing code moves evicted dentries to the beginning of the LRU it can happen that fresh dentries from other superblocks will be inserted before our dentries. This can result in significant slowdown of shrink_dcache_sb(). Moreover, for virtual filesystems like unionfs which can call dput() during dentries kill existing code results in O(n^2) complexity. We observed 2 minutes shrink_dcache_sb() with only 35000 dentries. To avoid this effects we propose to isolate sb dentries at the end of LRU list. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Andrey Mirkin <amirkin@openvz.org> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:57 -07:00
Emil Medve	1f7c8234c7	Make the pr_() family of macros in kernel.h complete Other/Some pr_() macros are already defined in kernel.h, but pr_err() was defined multiple times in several other places Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com> Cc: Jean Delvare <khali@linux-fr.org> Cc: Jeff Garzik <jeff@garzik.org> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Tony Lindgren <tony@atomide.com> Reviewed-by: Satyam Sharma <satyam@infradead.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:57 -07:00
David Howells	76181c134f	KEYS: Make request_key() and co fundamentally asynchronous Make request_key() and co fundamentally asynchronous to make it easier for NFS to make use of them. There are now accessor functions that do asynchronous constructions, a wait function to wait for construction to complete, and a completion function for the key type to indicate completion of construction. Note that the construction queue is now gone. Instead, keys under construction are linked in to the appropriate keyring in advance, and that anyone encountering one must wait for it to be complete before they can use it. This is done automatically for userspace. The following auxiliary changes are also made: (1) Key type implementation stuff is split from linux/key.h into linux/key-type.h. (2) AF_RXRPC provides a way to allocate null rxrpc-type keys so that AFS does not need to call key_instantiate_and_link() directly. (3) Adjust the debugging macros so that they're -Wformat checked even if they are disabled, and make it so they can be enabled simply by defining __KDEBUG to be consistent with other code of mine. (3) Documentation. [alan@lxorguk.ukuu.org.uk: keys: missing word in documentation] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:57 -07:00
Matti Linnanvuori	f20fda4861	Mutex documentation is unclear about software interrupts, tasklets and timers Acked-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:57 -07:00
Bill Nottingham	2e8ecb9db0	add CONFIG_VT_UNICODE As of now, the kernel defaults to non-unicode and XLATE for the keyboard. We've been changing this in Fedora, but that requires patching the defaults in the kernel. The attached introduces CONFIG_VT_UNICODE, which sets the console in unicode mode by default on boot, including both the virtual terminal and the keyboard driver. Signed-off-by: Bill Nottingham <notting@redhat.com> Cc: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:56 -07:00
Jan Beulich	22e48eaf58	constify string/array kparam tracking structures .. in an effort to make read-only whatever can be made, so that CONFIG_DEBUG_RODATA can catch as many issues as possible. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:56 -07:00
Jan Beulich	d5aa0daf6d	store __setup_str_* in a more compact way __setup_str_* are referenced only during boot, hence there's no need to waste image space for aligning these strings (with the aim of improving performance). Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:56 -07:00
Robert P. J. Day	b311e921b3	Add a "rounddown_pow_of_two" routine to log2.h To go along with the existing "roundup_pow_of_two" routine, add one for rounding down since that operation appears to crop up on a regular basis in the source tree. [m.kozlowski@tuxland.pl: fix unbalanced parentheses] Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:56 -07:00
Jan Kara	8e8934695d	quota: send messages via netlink Implement sending of quota messages via netlink interface. The advantage is that in userspace we can better decide what to do with the message - for example display a dialogue in your X session or just write the message to the console. As a bonus, we can get rid of problems with console locking deep inside filesystem code once we remove the old printing mechanism. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:56 -07:00
Adrian Bunk	b012d346c0	make kernel/profile.c:time_hook static {,un}register_timer_hook() is the API that should be used. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:55 -07:00
Adrian Bunk	cba4fbbff2	remove include/asm-/ipc.h All asm/ipc.h files do only #include <asm-generic/ipc.h>. This patch therefore removes all include/asm-/ipc.h files and moves the contents of include/asm-generic/ipc.h to include/linux/ipc.h. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:55 -07:00
Alexey Dobriyan	4af3c9cc4f	Drop some headers from mm.h mm.h doesn't use directly anything from mutex.h and backing-dev.h, so remove them and add them back to files which need them. Cross-compile tested on many configs and archs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:55 -07:00
Paul Clements	7fdfd4065c	NBD: allow hung network I/O to be cancelled Allow NBD I/O to be cancelled when a network outage occurs. Previously, I/O would just hang, and if enough I/O was hung in nbd, the system (at least user-level) would completely hang until a TCP timeout (default, 15 minutes) occurred. The patch introduces a new ioctl NBD_SET_TIMEOUT that allows a transmit timeout value (in seconds) to be specified. Any network send that exceeds the timeout will be cancelled and the nbd connection will be shut down. I've tested with various timeout values and 6 seconds seems to be a good choice for the timeout. If the NBD_SET_TIMEOUT ioctl is not called, you get the old (I/O hang) behavior. Signed-off-by: Paul Clements <paul.clements@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-10-17 08:42:55 -07:00

... 3 4 5 6 7 ...

8410 Коммитов