* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-lguest: (45 commits)
Use "struct boot_params" in example launcher
Loading bzImage directly.
Revert lguest magic and use hook in head.S
Update lguest documentation to reflect the new virtual block device name.
generalize lgread_u32/lgwrite_u32.
Example launcher handle guests not being ready for input
Update example launcher for virtio
Lguest support for Virtio
Remove old lguest I/O infrrasructure.
Remove old lguest bus and drivers.
Virtio helper routines for a descriptor ringbuffer implementation
Module autoprobing support for virtio drivers.
Virtio console driver
Block driver using virtio.
Net driver using virtio
Virtio interface
Boot with virtual == physical to get closer to native Linux.
Allow guest to specify syscall vector to use.
Rename "cr3" to "gpgdir" to avoid x86-specific naming.
Pagetables to use normal kernel types
...
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Add Vitaly Bordug as PPC8xx maintainer
[POWERPC] 4xx: Enable EMAC on Bamboo board
[POWERPC] 4xx: Enable EMAC for PPC405 Walnut board
[POWERPC] 4xx: Fix timebase clock selection on Walnut
[POWERPC] 4xx: Enable EMAC on the PPC 440GP Ebony board
[POWERPC] 4xx: Split early debug output and early boot console for 44x
[POWERPC] 4xx: Enable NEW EMAC support for Sequoia 440EPx.
[POWERPC] 4xx: Add RGMII support for Sequoia 440EPx
Reduce the function pointer mess of the m68knommu timer code by calling
directly to the local hardware's timer setup, and expose the local
common timer interrupt handler to the lower level hardware timer.
Ultimately this will save definitions of all these functions across all
the platform code to setup the function pointers (which for any given
m68knommu CPU family member can be only one set of hardware timer
functions).
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix system call restart handling. We can call directly to the
restart handler, no need to back track through trap that isn't
even implemented on m68knommu.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add make support for the Savant/Rosie1 board.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is no separate stack region addresses to print at startup time,
so remove it from the debug listing
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add configure support for the Savant/Rosie1 board.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Updated defconfig with new options for m68knommu.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix the system call code for handling syscall tracing, so strace
and gdbserver work properly.
This fix originally developed by Philippe De Muyter and Stuart Hughes.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zero out dma_length in the entry immediately following the last mapped
entry for unmap_sg.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Implement at32_add_device_cf() which will add a platform_device for
the at32_cf driver (not merged yet). Separate out most of the
at32_add_device_ide() code and use it to implement
at32_add_device_cf() as well.
This changes the API in the following ways:
* The board code must initialize data->cs to the chipselect ID to
use before calling any of these functions.
* The board code must use GPIO_PIN_NONE to indicate unused CF pins.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Implement functions for adding platform devices for TWI, MCI, AC97C
and ABDAC. They may need to be modified to cope with platform data,
etc. when the corresponding drivers are ready to be merged, but such
changes are much less likely to conflict than adding support for a
whole new type of device.
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
This patch adds platform code for PATA devices on the AP7000.
[hskinnemoen@atmel.com: board code left out for now since stk1000
doesn't support IDE out of the box]
Signed-off-by: Kristoffer Nyborg Gregertsen <kngregertsen@norway.atmel.com>
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
arch/parisc/kernel/pci-dma.c: In function 'pa11_dma_map_sg':
arch/parisc/kernel/pci-dma.c:487: error: 'struct scatterlist' has no member named 'page'
arch/parisc/kernel/pci-dma.c: In function 'pa11_dma_unmap_sg':
arch/parisc/kernel/pci-dma.c:508: error: 'struct scatterlist' has no member named 'page'
arch/parisc/kernel/pci-dma.c:508: error: 'struct scatterlist' has no member named 'page'
arch/parisc/kernel/pci-dma.c: In function 'pa11_dma_sync_sg_for_cpu':
arch/parisc/kernel/pci-dma.c:535: error: 'struct scatterlist' has no member named 'page'
arch/parisc/kernel/pci-dma.c:535: error: 'struct scatterlist' has no member named 'page'
arch/parisc/kernel/pci-dma.c: In function 'pa11_dma_sync_sg_for_device':
arch/parisc/kernel/pci-dma.c:545: error: 'struct scatterlist' has no member named 'page'
arch/parisc/kernel/pci-dma.c:545: error: 'struct scatterlist' has no member named 'page'
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
arch/arm/common/dmabounce.c: In function 'dma_map_sg':
arch/arm/common/dmabounce.c:445: error: implicit declaration of function 'sg_page'
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Version 2.07 of the boot protocol uses 0x23C for the hardware_subarch
field, that for lguest is "1". This allows us to use the standard
boot entry point rather than the "GenuineLguest" string hack.
The standard entry point also clears the BSS and copies the boot parameters
and commandline for us, saving more code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This makes lguest able to use the virtio devices.
We change the device descriptor page from a simple array to a variable
length "type, config_len, status, config data..." format, and
implement virtio_config_ops to read from that config data.
We use the virtio ring implementation for an efficient Guest <-> Host
virtqueue mechanism, and the new LHCALL_NOTIFY hypercall to kick the
host when it changes.
We also use LHCALL_NOTIFY on kernel addresses for very very early
console output. We could have another hypercall, but this hack works
quite well.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This gets rid of the lguest bus, drivers and DMA mechanism, to make
way for a generic virtio mechanism.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
These helper routines supply most of the virtqueue_ops for hypervisors
which want to use a ring for virtio. Unlike the previous lguest
implementation:
1) The rings are variable sized (2^n-1 elements).
2) They have an unfortunate limit of 65535 bytes per sg element.
3) The page numbers are always 64 bit (PAE anyone?)
4) They no longer place used[] on a separate page, just a separate
cacheline.
5) We do a modulo on a variable. We could be tricky if we cared.
6) Interrupts and notifies are suppressed using flags within the rings.
Users need only get the ring pages and provide a notify hook (KVM
wants the guest to allocate the rings, lguest does it sanely).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dor Laor <dor.laor@qumranet.com>
1) This allows us to get alot closer to booting bzImages.
2) It means we don't have to know page_offset.
3) The Guest needs to modify the boot pagetables to create the
PAGE_OFFSET mapping before jumping to C code.
4) guest_pa() walks the page tables rather than using page_offset.
5) We don't use page_offset to figure out whether to emulate: it was
always kinda quesationable, and won't work for instructions done
before remapping (bzImage unpacking in particular).
6) We still want the kernel address for tlb flushing: have the initial
hypercall give us that, too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
(Based on Ron Minnich's LGUEST_PLAN9_SYSCALL patch).
This patch allows Guests to specify what system call vector they want,
and we try to reserve it. We only allow one non-Linux system call
vector, to try to avoid DoS on the Host.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Clean up the hypercall code to make the code in hypercalls.c
architecture independent. First process the common hypercalls and
then call lguest_arch_do_hcall() if the call hasn't been handled.
Rename struct hcall_ring to hcall_args.
This patch requires the previous patch which reorganize the layout of
struct lguest_regs on i386 so they match the layout of struct
hcall_args.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Separate i386 architecture specific from core.c and move it to
x86/core.c and add x86/lguest.h header file to match.
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Lguest has two sides: host support (to launch guests) and guest
support (replacement boot path and paravirt_ops). This moves the
guest side to arch/x86/lguest where it's closer to related code.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
1) Group all the "guest OS" support options together, under a PARAVIRT_GUEST
menu.
2) Make those options select CONFIG_PARAVIRT, as suggested by Andi.
3) Make kconfig help titles consistent.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Andi Kleen <ak@suse.de>
Cc: Zach Amsden <zach@vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Chris Wright <chrisw@sous-sol.org>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/cooloney/blackfin-2.6:
Blackfin arch: use KBUILD_CFLAGS and KBUILD_AFLAGS in Makefile
Blackfin arch: Javier Herrer writes: fix building when icache and dcache is disabled
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Enable restart support for lite5200 board
[POWERPC] Add restart support for mpc52xx based platforms
[POWERPC] Update device tree binding for mpc5200 gpt
[POWERPC] Add mpc52xx_find_and_map_path(), refactor utility functions
[POWERPC] bestcomm: Restrict bus prefetch bugfix to original mpc5200 silicon.
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] time: Make c0_compare_int_usable more bullet proof
[MIPS] Kbuild: Use the new cc-cross-prefix feature.
[MIPS] Fix include wrapper symbol to something sane.
[MIPS] Malta: Delete dead code.
[MIPS] time: Add GT641xx timer0 clockevent driver
[MIPS] time: SMP-proofing of Sibyte clockevent/clocksource code.
[MIPS] time: SMP/NUMA-proofing of IP27 HUB RT timer code.
[MIPS] time: Fix calculation in clockevent_set_clock()
* 'sg' of git://git.kernel.dk/linux-2.6-block:
Add CONFIG_DEBUG_SG sg validation
Change table chaining layout
Update arch/ to use sg helpers
Update swiotlb to use sg helpers
Update net/ to use sg helpers
Update fs/ to use sg helpers
[SG] Update drivers to use sg helpers
[SG] Update crypto/ to sg helpers
[SG] Update block layer to use sg helpers
[SG] Add helpers for manipulating SG entries
And make use of it for Cobalt. A few others such as the Malta could make
use of it as well.
Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The BCM148 has 4 cores but there are also just 4 generic timers available
so use the ZBbus cycle counter instead of it. In addition the ZBbus
counter also offers a much higher resolution and 64-bit counting so I'm
considering a later complete conversion to it once I figure out if all
members of the Sibyte SOC family support it - the docs seem to agree but
the headers files seem to disagree ...
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Add the BSS to the resource tree just as kernel text and kernel data are in
the resource tree. The main reason behind this is to avoid crashkernel
reservation in that area.
While it's not strictly necessary to have the BSS in the resource tree (the
actual collision detection is done in the reserve_bootmem() function before),
the usage of the BSS resource should be presented to the user in /proc/iomem
just as Kernel data and Kernel code.
Note: The patch currently is only implemented for x86 and ia64 (because
efi_initialize_iomem_resources() has the same signature on i386 and ia64).
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This config option (DMAR_FLPY_WA) sets up 1:1 mapping for the floppy device so
that the floppy device which does not use DMA api's will continue to work.
Once the floppy driver starts using DMA api's this config option can be turn
off or this patch can be yanked out of kernel at that time.
[akpm@linux-foundation.org: cleanups, rename things, build fix]
[jengelh@computergmbh.de: Kconfig fixes]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we fix all the opensource gfx drivers to use the DMA api's, at that time
we can yank this config options out.
[jengelh@computergmbh.de: Kconfig fixes]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MSI interrupt handler registrations and fault handling support for Intel-IOMMU
hadrware.
This patch enables the MSI interrupts for the DMA remapping units and in the
interrupt handler read the fault cause and outputs the same on to the console.
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Actual intel IOMMU driver. Hardware spec can be found at:
http://www.intel.com/technology/virtualization
This driver sets X86_64 'dma_ops', so hook into standard DMA APIs. In this
way, PCI driver will get virtual DMA address. This change is transparent to
PCI drivers.
[akpm@linux-foundation.org: remove unneeded cast]
[akpm@linux-foundation.org: build fix]
[bunk@stusta.de: fix duplicate CONFIG_DMAR Makefile line]
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch supports the upcomming Intel IOMMU hardware a.k.a. Intel(R)
Virtualization Technology for Directed I/O Architecture and the hardware spec
for the same can be found here
http://www.intel.com/technology/virtualization/index.htm
FAQ! (questions from akpm, answers from ak)
> So... what's all this code for?
>
> I assume that the intent here is to speed things up under Xen, etc?
Yes in some cases, but not this code. That would be the Xen version of this
code that could potentially assign whole devices to guests. I expect this to
be only useful in some special cases though because most hardware is not
virtualizable and you typically want an own instance for each guest.
Ok at some point KVM might implement this too; i likely would use this code
for this.
> Do we
> have any benchmark results to help us to decide whether a merge would be
> justified?
The main advantage for doing it in the normal kernel is not performance, but
more safety. Broken devices won't be able to corrupt memory by doing random
DMA.
Unfortunately that doesn't work for graphics yet, for that need user space
interfaces for the X server are needed.
There are some potential performance benefits too:
- When you have a device that cannot address the complete address range an
IOMMU can remap its memory instead of bounce buffering. Remapping is likely
cheaper than copying.
- The IOMMU can merge sg lists into a single virtual block. This could
potentially speed up SG IO when the device is slow walking SG lists. [I
long ago benchmarked 5% on some block benchmark with an old MPT Fusion; but
it probably depends a lot on the HBA]
And you get better driver debugging because unexpected memory accesses from
the devices will cause a trappable event.
>
> Does it slow anything down?
It adds more overhead to each IO so yes.
This patch:
Add support for early detection and parsing of DMAR's (DMA Remapping) reported
to OS via ACPI tables.
DMA remapping(DMAR) devices support enables independent address translations
for Direct Memory Access(DMA) from Devices. These DMA remapping devices are
reported via ACPI tables and includes pci device scope covered by these DMA
remapping device.
For detailed info on the specification of "Intel(R) Virtualization Technology
for Directed I/O Architecture" please see
http://www.intel.com/technology/virtualization/index.htm
Signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Greg KH <greg@kroah.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch uses the updated boot protocol to do paravirtualized boot.
If the boot version is >= 2.07, then it will do two things:
1. Check the bootparams loadflags to see if we should reload the
segment registers and clear interrupts. This is appropriate
for normal native boot and some paravirtualized environments, but
inapproprate for others.
2. Check the hardware architecture, and dispatch to the appropriate
kernel entrypoint. If the bootloader doesn't set this, then we
simply do the normal boot sequence.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Updates for version 2.07 of the boot protocol. This includes:
load_flags.KEEP_SEGMENTS- flag to request/inhibit segment reloads
hardware_subarch - what subarchitecture we're booting under
hardware_subarch_data - per-architecture data
The intention of these changes is to make booting a paravirtualized
kernel work via the normal Linux boot protocol.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- De-confuse the defines for the address-space-control-elements
and the segment/region table entries.
- Create out of line functions for page table allocation / freeing.
- Simplify get_shadow_xxx functions.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current tlb flushing code for page table entries violates the
s390 architecture in a small detail. The relevant section from the
principles of operation (SA22-7832-02 page 3-47):
"A valid table entry must not be changed while it is attached
to any CPU and may be used for translation by that CPU except to
(1) invalidate the entry by using INVALIDATE PAGE TABLE ENTRY or
INVALIDATE DAT TABLE ENTRY, (2) alter bits 56-63 of a page-table
entry, or (3) make a change by means of a COMPARE AND SWAP AND
PURGE instruction that purges the TLB."
That means if one thread of a multithreaded applciation uses a vma
while another thread does an unmap on it, the page table entries of
that vma needs to get removed with IPTE, IDTE or CSP. In some strange
and rare situations a cpu could check-stop (die) because a entry has
been pushed out of the TLB that is still needed to complete a
(milli-coded) instruction. I've never seen it happen with the current
code on any of the supported machines, so right now this is a
theoretical problem. But I want to fix it nevertheless, to avoid
headaches in the futures.
To get this implemented correctly without changing common code the
primitives ptep_get_and_clear, ptep_get_and_clear_full and
ptep_set_wrprotect need to use the IPTE instruction to invalidate the
pte before the new pte value gets stored. If IPTE is always used for
the three primitives three important operations will have a performace
hit: fork, mprotect and exit_mmap. Time for some workarounds:
* 1: ptep_get_and_clear_full is used in unmap_vmas to remove page
tables entries in a batched tlb gather operation. If the mmu_gather
context passed to unmap_vmas has been started with full_mm_flush==1
or if only one cpu is online or if the only user of a mm_struct is the
current process then the fullmm indication in the mmu_gather context is
set to one. All TLBs for mm_struct are flushed by the tlb_gather_mmu
call. No new TLBs can be created while the unmap is in progress. In
this case ptep_get_and_clear_full clears the ptes with a simple store.
* 2: ptep_get_and_clear is used in change_protection to clear the
ptes from the page tables before they are reentered with the new
access flags. At the end of the update flush_tlb_range clears the
remaining TLBs. In general the ptep_get_and_clear has to issue IPTE
for each pte and flush_tlb_range is a nop. But if there is only one
user of the mm_struct then ptep_get_and_clear uses simple stores
to do the update and flush_tlb_range will flush the TLBs.
* 3: Similar to 2, ptep_set_wrprotect is used in copy_page_range
for a fork to make all ptes of a cow mapping read-only. At the end of
of copy_page_range dup_mmap will flush the TLBs with a call to
flush_tlb_mm. Check for mm->mm_users and if there is only one user
avoid using IPTE in ptep_set_wrprotect and let flush_tlb_mm clear the
TLBs.
Overall for single threaded programs the tlb flush code now performs
better, for multi threaded programs it is slightly worse. In particular
exit_mmap() now does a single IDTE for the mm and then just frees every
page cache reference and every page table page directly without a delay
over the mmu_gather structure.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently the ccw method is used to ipl the DASD dump record under LPAR.
This mechanism is not reliable, which can cause dump failures. This fix
now uses the diag 308 ipl method for all machines, which have diag308
subcode 5 and 4 support.
Signed-off-by: Michael Holzheu <holzheu@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add two new sysfs entries per cpu: idle_count and idle_time.
idle_count contains the number of times a cpu went into idle state.
idle_time contains the time a cpu spent in idle state in microseconds.
This can be used e.g. by powertop to tell how often idle state is
entered and left.
# cat /sys/devices/system/cpu/cpu0/idle_count
504
# cat /sys/devices/system/cpu/cpu0/idle_time
469734037 us
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A few trivial Makefile cleanups
- dependencipes in head.o was all wrong - deleted
- CMODEL_CFLAG was not used anywhere
- NEW_GCC was then not used outside sparc64/Makefe - do not export it
- FIXME seems not appropriate - all other put oprofile in drivers-y too
- No reason to do -I. (and it still builds)
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Invoke the desc->handle_irq directly in the top-level dispatch,
just like other sophisticated ports.
This will allow us to decrease the cost of the MSI queue dispatch.
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the watchdog timer to implement board restart support.
Signed-off-by: Marian Balakowicz <m8@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Add common helper routines: mpc52xx_map_wdt() and mpc52xx_restart().
Signed-off-by: Marian Balakowicz <m8@semihalf.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Add 'fsl,' prefix to 'compatible' property for gpt nodes.
Add 'fsl,' prefix to empty, GPT0 specific 'has-wdt' property.
The fsl, prefix is being added to better match the convention of prefixing
manufacturer specific properties and values with the vendors name.
Signed-off-by: Marian Balakowicz <m8@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Add helper routine mpc52xx_find_and_map_path(). Extract common code to
mpc52xx_map_node() and refactor mpc52xx_find_and_map().
Signed-off-by: Jan Wrobel <wrr@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
now that -mno-fdpic works, force it on so that
we can use any blackfin toolchain to build up the
kernel and kernel modules
wrap -mno-fdpic in $(call cc-option,-mno-fdpic) so that older
toolchains will still work
Signed-off-by: Mike Frysinger <michael.frysinger@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
We must balance calls to get_task_mm with corresponding mmput calls, otherwise
refcounting is screwed up and mms don't get freed when their task exits.
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
Fix up /proc/cpuinfo so it is like everyone else, and gets
parsed by various applications properly. Still needs some tweaking on
parts without full L1 sram, like 532, 531, so it doesn't print out L1
bank info that doesn't exist.
Signed-off-by: Robin Getz <robin.getz@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
force irq_flags into the .data section by initializing it to
the hardware masks that cannot be disabled. this way if we
use irq enable/disable functions before the .bss has been
zeroed out (as does our l1 relocate/dma functions), we dont
hit a problem where bss contains bogus crap.
Signed-off-by: Mike Frysinger <michael.frysinger@analog.com>
Signed-off-by: Bryan Wu <bryan.wu@analog.com>
* 'master' of hera.kernel.org:/pub/scm/linux/kernel/git/kyle/parisc-2.6: (29 commits)
[PARISC] fix uninitialized variable warning in asm/rtc.h
[PARISC] Port checkstack.pl to parisc
[PARISC] Make palo target work when $obj != $src
[PARISC] Zap unused variable warnings in pci.c
[PARISC] Fix tests in palo target
[PARISC] Fix palo target
[PARISC] Restore palo target
[PARISC] Attempt to clean up parisc/Makefile
[PARISC] Fix infinite loop in /proc/iomem
[PARISC] Quiet sysfs_create_link __must_check warnings in pdc_stable
[PARISC] Squelch pci_enable_device __must_check warning in superio
[PARISC] Kill off broken irqstack code
[PARISC] Remove hardcoded uses of PAGE_SIZE
[PARISC] Clean up pointless ASM_PAGE_SIZE_DIV use
[PARISC] Kill off the last vestiges of ASM_PAGE_SIZE
[PARISC] Kill off ASM_PAGE_SIZE use
[PARISC] Beautify parisc vmlinux.lds.S
[PARISC] Clean up a resource_size_t warning in sba_iommu
[PARISC] Kill incorrect cast warning in unwinder
[PARISC] Kill zone_to_nid printk warning
...
Fixed trivial conflict in include/asm-parisc/tlbflush.h manually
The vector stride of the double-precision vector instructions must be changed
to 1-2 from even 2-4, because the double registers numbering has been
changed to 0-15 from even 0-30 by
1356c1948d commit.
Signed-off-by: Takashi Ohmasa <ohmasa.takashi@jp.panasonic.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
All exception flags of the FPEXC register must be cleared before
returning from exception code to user code, including FP2V and OFC.
Signed-off-by: Takashi Ohmasa <ohmasa.takashi@jp.panasonic.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This fixes a typo in MFP address map.
Signed-off-by: Aleksey Makarov <amakarov@ru.mvista.com>
Acked-by: Eric Miao <eric.y.miao@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Commit 9a39e273d4 removed the boot directory
addition to CFLAGS that was being used by the subdirectory builds. For the
other files, that patch set EXTRA_CFLAGS, but Makefile.build explicitly
sets that to empty as it is explicitly for a single directory only.
Append to KBUILD_CFLAGS instead.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
'bus' was basically useless and 'hba' is only applicable on
64bit. Sigh, there's got to be a cleaner way to do this...
Signed-off-by: Kyle McMartin <kyle@mcmartin.ca>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (74 commits)
fix do_sys_open() prototype
sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake
Documentation: Fix typo in SubmitChecklist.
Typo: depricated -> deprecated
Add missing profile=kvm option to Documentation/kernel-parameters.txt
fix typo about TBI in e1000 comment
proc.txt: Add /proc/stat field
small documentation fixes
Fix compiler warning in smount example program from sharedsubtree.txt
docs/sysfs: add missing word to sysfs attribute explanation
documentation/ext3: grammar fixes
Documentation/java.txt: typo and grammar fixes
Documentation/filesystems/vfs.txt: typo fix
include/asm-*/system.h: remove unused set_rmb(), set_wmb() macros
trivial copy_data_pages() tidy up
Fix typo in arch/x86/kernel/tsc_32.c
file link fix for Pegasus USB net driver help
remove unused return within void return function
Typo fixes retrun -> return
x86 hpet.h: remove broken links
...
Most of these fixes were already submitted for old kernel versions, and were
approved, but for some reason they never made it into the releases.
Because this is a consolidation of a couple old missed patches, it touches both
Kconfigs and documentation texts.
Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Spelling fixes in arch/um/.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Acked-by: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Spelling fixes in arch/frv/.
Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Acked-By: David Howells <dhowells@redhat.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
a rather obvious fix given the opening of the function:
...
static __init int no_ipi_broadcast(char *str)
{
get_option(&str, &no_broadcast);
...
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
* ssh://master.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-x86: (33 commits)
x86: convert cpuinfo_x86 array to a per_cpu array
x86: introduce frame_pointer() and stack_pointer()
x86 & generic: change to __builtin_prefetch()
i386: do not BUG_ON() when MSR is unknown
x86: acpi use cpu_physical_id
x86: convert cpu_llc_id to be a per cpu variable
x86: convert cpu_to_apicid to be a per cpu variable
i386: introduce "used_vectors" bitmap which can be used to reserve vectors.
x86: use raw locks during oopses
x86: honor _PAGE_PSE bit on page walks
i386: do cpuid_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.
x86: implement missing x86_64 function smp_call_function_mask()
x86: use descriptor's functions instead of inline assembly
i386: consolidate show_regs and show_registers for i386
i386: make callgraph use dump_trace() on i386/x86_64
x86: enable iommu_merge by default
i386: i386 add AMD64 Barcelona PMU MSR definitions to msr.h
x86: Unify i386 and x86-64 early quirks
x86: enable HPET on ICH3 and ICH4
x86: force enable HPET on VT8235/8237 chipsets
...
Manually fix trivial conflict with task pid container helper changes in
arch/x86/kernel/process_32.c
* Convert files to UTF-8.
* Also correct some people's names
(one example is Eißfeldt, which was found in a source file.
Given that the author used an ß at all in a source file
indicates that the real name has in fact a 'ß' and not an 'ss',
which is commonly used as a substitute for 'ß' when limited to
7bit.)
* Correct town names (Goettingen -> Göttingen)
* Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313)
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Fix some device tree omissions that prevented the new EMAC driver from
setting up ethernet on the Bamboo board correctly and update the Bamboo
defconfig.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This patch enables the ibm_newemac driver for the Walnut board. It fixes the
device tree for the walnut board to order the MAL interrupts correctly and
adds the local-mac-address property to the EMAC node. The bootwrapper is also
updated to extract the MAC address from the OpenBIOS offset where it is stored.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
The current bootwrapper fails to set the timebase clock to the CPU clock
which causes the time to increment incorrectly. This fixes it by using the
correct #define for the CPC0_CR1 register.
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Currently there's no way to enable early boot console on PowerPC 44x
not specifying uart's physical address in kernel config, which is used
for very early debug messages. This patch splits very early debug output
(which needs uart physical address in kernel config) and early boot console
(which searches for uarts in the device tree using find_legacy_serial_ports).
We enable early boot console for all 44x processors, while (dangerous)
early debug is user-selectable.
Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This patch enables NEW EMAC support for PowerPC 440EPx Sequoia board.
Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
This adds RGMII support to Sequoia DTS and sets correct phy-mode
for EMACs. According to Sequoia datasheet, both ethernet ports
are connected to RGMII interface, while ZMII is used only for MDIO.
Signed-off-by: Valentine Barshak <vbarshak@ru.mvista.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Fix the various misspellings of "system", controller", "interrupt" and
"[un]necessary".
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (41 commits)
ACPICA: hw: Don't carry spinlock over suspend
ACPICA: hw: remove use_lock flag from acpi_hw_register_{read, write}
ACPI: cpuidle: port idle timer suspend/resume workaround to cpuidle
ACPI: clean up acpi_enter_sleep_state_prep
Hibernation: Make sure that ACPI is enabled in acpi_hibernation_finish
ACPI: suppress uninitialized var warning
cpuidle: consolidate 2.6.22 cpuidle branch into one patch
ACPI: thinkpad-acpi: skip blanks before the data when parsing sysfs
ACPI: AC: Add sysfs interface
ACPI: SBS: Add sysfs alarm
ACPI: SBS: Add ACPI_PROCFS around procfs handling code.
ACPI: SBS: Add support for power_supply class (and sysfs)
ACPI: SBS: Make SBS reads table-driven.
ACPI: SBS: Simplify data structures in SBS
ACPI: SBS: Split host controller (ACPI0001) from SBS driver (ACPI0002)
ACPI: EC: Add new query handler to list head.
ACPI: Add acpi_bus_generate_event4() function
ACPI: Battery: add sysfs alarm
ACPI: Battery: Add sysfs support
ACPI: Battery: Misc clean-ups, no functional changes
...
Fix up conflicts in drivers/misc/thinkpad_acpi.[ch] manually
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] Delete totally outdated Documentation/mips/time.README
[MIPS] Kill duplicated setup_irq() for cp0 timer
[MIPS] Sibyte: Finish conversion to modern time APIs.
[MIPS] time: Helpers to compute clocksource/event shift and mult values.
[MIPS] SMTC: Build fix.
[MIPS] time: Delete dead code.
[MIPS] MIPSsim: Strip defconfig file to the bones.
Quoting Randy:
"It seems sad that this patch sources Kconfig.marker, a 7-line file,
20-something times. Yes, you (we) don't want to put those 7 lines into
20-something different files, so sourcing is the right thing.
However, what you did for avr32 seems more on the right track to me: make
_one_ Instrumentation support menu that includes PROFILING, OPROFILE, KPROBES,
and MARKERS and then use (source) that in all of the arches."
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes the crashkernel parsing from arch/sh/kernel/machine_kexec.c
and calls the generic function, introduced in the generic patch, in
setup_bootmem_allocator().
This is necessary because the amount of System RAM must be known in this
function now because of the new syntax.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adapts the ppc64 code to use the generic parse_crashkernel()
function introduced in the generic patch of that series.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch adapts IA64 to use the generic parse_crashkernel() function instead
of its own parsing for the crashkernel command line.
Because the total amount of System RAM must be known when calling this
function, efi_memmap_init() is modified to return its accumulated total_memory
variable.
Also, the crashkernel handling is moved in an own function in
arch/ia64/kernel/setup.c to make the code more readable.
[kamalesh@linux.vnet.ibm.com: build fix]
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes the crashkernel parsing from
arch/x86_64/kernel/machine_kexec.c and calls the generic function, introduced
in the last patch, in setup_bootmem_allocator().
This is necessary because the amount of System RAM must be known in this
function now because of the new syntax.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes the crashkernel parsing from
arch/i386/kernel/machine_kexec.c and calls the generic function, introduced in
the last patch, in setup_bootmem_allocator().
This is necessary because the amount of System RAM must be known in this
function now because of the new syntax.
Signed-off-by: Bernhard Walle <bwalle@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Vivek Goyal <vgoyal@in.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One of the easiest things to isolate is the pid printed in kernel log.
There was a patch, that made this for arch-independent code, this one makes
so for arch/xxx files.
It took some time to cross-compile it, but hopefully these are all the
printks in arch code.
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
define global BIT macro
move all local BIT defines to the new globally define macro.
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Kumar Gala <galak@gate.crashing.org>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Russell King <rmk@arm.linux.org.uk>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Cc: "John W. Linville" <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
remove asm/bitops.h includes
including asm/bitops directly may cause compile errors. don't include it
and include linux/bitops instead. next patch will deny including asm header
directly.
Cc: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.
The idea is:
- all in-kernel data structures must store either struct pid itself
or the pid's global nr, obtained with pid_nr() call;
- when seeking the task from kernel code with the stored id one
should use find_task_by_pid() call that works with global pids;
- when showing pid's numerical value to the user the virtual one
should be used, but however when one shows task's pid outside this
task's namespace the global one is to be used;
- when getting the pid from userspace one need to consider this as
the virtual one and use appropriate task/pid-searching functions.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Paul Menage <menage@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
is_init() is an ambiguous name for the pid==1 check. Split it into
is_global_init() and is_container_init().
A cgroup init has it's tsk->pid == 1.
A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns. But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.
Changelog:
2.6.22-rc4-mm2-pidns1:
- Use 'init_pid_ns.child_reaper' to determine if a given task is the
global init (/sbin/init) process. This would improve performance
and remove dependence on the task_pid().
2.6.21-mm2-pidns2:
- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
ppc,avr32}/traps.c for the _exception() call to is_global_init().
This way, we kill only the cgroup if the cgroup's init has a
bug rather than force a kernel panic.
[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The set of functions process_session, task_session, process_group and
task_pgrp is confusing, as the names can be mixed with each other when looking
at the code for a long time.
The proposals are to
* equip the functions that return the integer with _nr suffix to
represent that fact,
* and to make all functions work with task (not process) by making
the common prefix of the same name.
For monotony the routines signal_session() and set_signal_session() are
replaced with task_session_nr() and set_task_session(), especially since they
are only used with the explicit task->signal dereference.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Herbert Poetzl <herbert@13thfloor.at>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In pre-cgroup cpusets, a few config files enabled cpusets by default.
Signed-off-by: Paul Jackson <pj@sgi.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch uses vm_get_page_prot() to setup vma->vm_page_prot.
Though inside vm_get_page_prot() the protection flags is AND with
(VM_READ|VM_WRITE|VM_EXEC|VM_SHARED), it does not hurt correct code.
Signed-off-by: Coly Li <coyli@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
copy_oldmem_page should not return leaving a page frame from the
previous kernel mapped.
Signed-off-by: Fernando Luis Vázquez Cao <fernando@oss.ntt.co.jp>
Acked-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cpu_data is currently an array defined using NR_CPUS. This means that
we overallocate since we will rarely really use maximum configured cpus.
When NR_CPU count is raised to 4096 the size of cpu_data becomes
3,145,728 bytes.
These changes were adopted from the sparc64 (and ia64) code. An
additional field was added to cpuinfo_x86 to be a non-ambiguous cpu
index. This corresponds to the index into a cpumask_t as well as the
per_cpu index. It's used in various places like show_cpuinfo().
cpu_data is defined to be the boot_cpu_data structure for the NON-SMP
case.
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Christoph Lameter <clameter@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Mark M. Hoffman <mhoffman@lightlink.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch defines frame_pointer() and stack_pointer() similar to the
already defined instruction_pointer(). Thus the oprofile code can be
written in a more readable fashion.
[ tglx: arch/x86 adaptation ]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Here is a small patch to change the behavior of the PMU msr allocator
to avoid BUG_ON() when the MSR is unknwon. Instead, it now returns
ok, which means "I do not manage". The current allocator is not
yet managing the full set of PMU registers (e.g., GLOBAL_* on Core 2).
[watchdog] do not BUG_ON() in the MSR allocator if MSR is unknown, return ok
instead
Signed-off-by: Stephane Eranian <eranian@hpl.hp.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This is from an earlier message from Christoph Lameter:
processor_core.c currently tries to determine the apicid by special casing
for IA64 and x86. The desired information is readily available via
cpu_physical_id()
on IA64, i386 and x86_64.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Additionally, boot_cpu_id needed to be exported to fix compile errors in
dma code when !CONFIG_SMP.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Convert cpu_llc_id from a static array sized by NR_CPUS to a per_cpu
variable. This saves sizeof(cpu_llc_id) * NR unused cpus. Access is
mostly from startup and CPU HOTPLUG functions.
Note there's an additional change of the type of cpu_llc_id from int to
u8 for ARCH i386 to correspond with the same type in ARCH x86_64.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch converts the x86_cpu_to_apicid array to be a per cpu
variable. This saves sizeof(apicid) * NR unused cpus. Access is mostly
from startup and CPU HOTPLUG functions.
MP_processor_info() is one of the functions that require access to the
x86_cpu_to_apicid array before the per_cpu data area is setup. For this
case, a pointer to the __initdata array is initialized in setup_arch()
and removed in smp_prepare_cpus() after the per_cpu data area is
initialized.
A second change is included to change the initial array value of ARCH
i386 from 0xff to BAD_APICID to be consistent with ARCH x86_64.
Signed-off-by: Mike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This simplifies the io_apic.c __assign_irq_vector() logic and removes
the explicit SYSCALL_VECTOR check, and also allows for vectors to be
reserved by other mechanisms (ie. lguest).
[ tglx: arch/x86 adaptation ]
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Don't want any lockdep or other fragile machinery to run during oopses.
Use raw spinlocks directly for oops locking.
Also disables irq flag tracing there.
[ tglx: arch/x86 adaptation ]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch defines the missing function smp_call_function_mask() for x86_64,
this is more or less a cut&paste of i386 function. It removes also some
duplicate code.
This function is needed by KVM to execute a function on some CPUs.
AK: Fixed description
AK: Moved WARN_ON(irqs_disabled) one level up to not warn in the panic case.
[ tglx: arch/x86 adaptation ]
Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch provides a new set of functions for managing the descriptor
tables that can be used instead of putting the raw assembly in .c files.
Remodeling of store_tr() suggested by Frederik Deweerdt.
[ tglx: arch/x86 adaptation ]
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Both functions printk the same information, except for CRx and
debug registers in the show_registers() one and a bit different
manner. So move the common code into one place. This is already
done for x86_64, so I think it's worth having the same on i386.
This saves 100 bytes of .rodata section :) ...
but only 8 from .text :(
[ tglx: arch/x86 adaptation ]
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch improves oprofile callgraphs for i386/x86_64. The old
backtracing code was unable to produce kernel backtraces if the
kernel wasn't compiled with framepointers. The code now uses
dump_trace().
[ tglx: arch/x86 adaptation ]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
They were already very similar; just use the same file now.
[ tglx: arch/x86 adaptation ]
Cc: lenb@kernel.org
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ICH3 and ICH4 have undocumented HPET capabilities. This patch enables
HPET for platforms based around these ICHs.
Tested on various ICH3 and ICH4 platforms.
Because HPET is not officially documented for ICH3/4 and may not have
been validated by chipset folks, we're on thin ice here. I'd recommend
testing this patch in -hrt or -mm for a while and wait for
success/failure reports before feeding it upstream.
tglx: depends on the force_hpet command line option !
Signed-off-by: Udo A. Steinberg <us15@os.inf.tu-dresden.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This patch adds quirks to force enable HPET on Via VT8235 and
VT8237 chipsets. The datasheet for 8237 documents HPET
functionality (although wrongly) whereas HPET is undocumented
for 8235.
Tested on A7V880 (8237) and K7VT4A+ (8235) boards.
tglx: depends on the force_hept commandline option
Signed-off-by: Udo A. Steinberg <us15@os.inf.tu-dresden.de>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Andi Kleen <ak@suse.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
add force_hpet boot option.
(this will be useful to make the forced-enable quirks depend on.)
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
don't zero pad addresses in segfault message. Matches the other trap
messages. This leaves some more space for the new file name.
[ mingo: arch/x86 adaptation ]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This makes x86-64's ia32 code use the new linux/elfcore-compat.h, reducing
some hand-copied duplication.
Signed-off-by: Roland McGrath <roland@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Old debugging code that is not really needed anymore. If someone
wants it it would be better replaced with a systemtap script or
kprobe.
This avoids a potential cache miss during page fault processing.
[ mingo: arch/x86 adaptation ]
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The helptext for IA32_EMULATION in arch/x86_64/Kconfig is wider than 80
chars, thus failing to be displayed in 80x24 screens.
This patch re-breaks lines.
Signed-off-by: Adrian Knoth <adi@drcomp.erfurt.thur.de>
Simple cosmetic update for the cs5536 reboot fixup; define the MSR that's used
for rebooting in geode.h, and use the define.
Signed-off-by: Andres Salomon <dilinger@debian.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
x86 NUMA kernels crash in the scheduler setup code if "nosmp" or
"maxcpus=0" is passed on the boot command line:
| Brought up 1 CPUs
| BUG: unable to handle kernel NULL pointer dereference at virtual address 00000000
| printing eip: c011f0b5 *pde = 00000000
| Oops: 0000 [#1] SMP
|
| Pid: 1, comm: swapper Not tainted (2.6.23 #67)
| EIP: 0060:[<c011f0b5>] EFLAGS: 00010246 CPU: 0
| EIP is at sd_degenerate+0x35/0x40
the reason is sloppy spaghetti code in smpboot_32.c that resulted in a
missing map_cpu_to_logical_apicid() call - which also had the side-effect
of setting up the cpu_2_node[] entry for the lone CPU. That resulted in
node_to_cpumask(0) resulting in 00000000 - confusing the sched-domains
setup code.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Use relative symlinks so we can refer bzImage from different tree
structures aka via NFS.
Moved the creation of directory + symlink after a successful build of
the boot parts.
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Appended patch fixes an oops while changing the vsyscall sysctl.
I am sure no one tested this code before integrating into mainline :(
BTW, using ioremap() in vsyscall_sysctl_change() to get the virtual
address of a kernel symbol sounds like an over kill.. I wonder if we
can define a simple __va_vsymbol() which will return directly the
kernel direct mapping. comments in the code which says gcc has trouble
with __va(__pa()) sounds bogus to me. __pa() on a vsyscall address will
not work anyhow :(
And also, the whole nop out syscall in vsyscall page infrastructure
(vsyscall_sysctl_change()) is added to make some attacks difficult,
and yet I don't see this nop out being done by default. This area
requires more cleanups?
Fix an oops with __pa_vsymbol(). VSYSCALL_FIRST_PAGE is a fixmap index.
We want the starting virtual address of the vsyscall page and not the index.
[ mingo: arch/x86 adaptation ]
Reported-by: Yanmin Zhang <yanmin.zhang@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
vdso / vsycall create .so.dbg files now.
Add *.so.dbg to the main .ignore file
Exclude the compile time created boot directory in arch/x86_64 as well
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
While we were reviewing pageattr_32/64.c for unification,
Thomas Gleixner noticed the following serious SMP bug in
global_flush_tlb():
down_read(&init_mm.mmap_sem);
list_replace_init(&deferred_pages, &l);
up_read(&init_mm.mmap_sem);
this is SMP-unsafe because list_replace_init() done on two CPUs in
parallel can corrupt the list.
This bug has been introduced about a year ago in the 64-bit tree:
commit ea7322decb
Author: Andi Kleen <ak@suse.de>
Date: Thu Dec 7 02:14:05 2006 +0100
[PATCH] x86-64: Speed and clean up cache flushing in change_page_attr
down_read(&init_mm.mmap_sem);
- dpage = xchg(&deferred_pages, NULL);
+ list_replace_init(&deferred_pages, &l);
up_read(&init_mm.mmap_sem);
the xchg() based version was SMP-safe, but list_replace_init() is not.
So this "cleanup" introduced a nasty bug.
why this bug never become prominent is a mystery - it can probably be
explained with the (still) relative obscurity of the x86_64 architecture.
the safe fix for now is to write-lock init_mm.mmap_sem.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
[MIPS] time: Move R4000 clockevent device code to separate configurable file
[MIPS] time: Delete dead cycles_per_jiffy, mips_timer_ack and null_timer_ack
[MIPS] IP32: Retire use of plat_timer_setup.
[MIPS] Jazz: Retire use of plat_timer_setup.
[MIPS] IP27: Convert to clock_event_device.
[MIPS] JMR3927: Convert to clock_event_device.
[MIPS] Always do the ARC64_TWIDDLE_PC thing.
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (51 commits)
[IPV6]: Fix again the fl6_sock_lookup() fixed locking
[NETFILTER]: nf_conntrack_tcp: fix connection reopening fix
[IPV6]: Fix race in ipv6_flowlabel_opt() when inserting two labels
[IPV6]: Lost locking in fl6_sock_lookup
[IPV6]: Lost locking when inserting a flowlabel in ipv6_fl_list
[NETFILTER]: xt_sctp: fix mistake to pass a pointer where array is required
[NET]: Fix OOPS due to missing check in dev_parse_header().
[TCP]: Remove lost_retrans zero seqno special cases
[NET]: fix carrier-on bug?
[NET]: Fix uninitialised variable in ip_frag_reasm()
[IPSEC]: Rename mode to outer_mode and add inner_mode
[IPSEC]: Disallow combinations of RO and AH/ESP/IPCOMP
[IPSEC]: Use the top IPv4 route's peer instead of the bottom
[IPSEC]: Store afinfo pointer in xfrm_mode
[IPSEC]: Add missing BEET checks
[IPSEC]: Move type and mode map into xfrm_state.c
[IPSEC]: Fix length check in xfrm_parse_spi
[IPSEC]: Move ip_summed zapping out of xfrm6_rcv_spi
[IPSEC]: Get nexthdr from caller in xfrm6_rcv_spi
[IPSEC]: Move tunnel parsing for IPv4 out of xfrm4_input
...
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC/64]: Consolidate of_register_driver
[SPARC] Videopix Frame Grabber: Convert device_lock_sem to mutex
[SPARC]: Support for new termios.
[SPARC64]: Check of_get_property() return in pci_determine_mem_io_space().
[SPARC64]: Fix boot failures due to bootmem.
[SPARC64]: Implement atomic backoff.
To be consistent with the use of attributes in the rest of the kernel
replace all use of __attribute_pure__ with __pure and delete the definition
of __attribute_pure__.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Get rid of sparse related warnings from places that use integer as NULL
pointer.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Matt Mackall <mpm@selenic.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
msr_class_cpu_callback() can be marked __cpuinit, being the notifier callback
for a __cpuinitdata notifier_block. So can be marked msr_device_create() too,
called only from the newly-__cpuinit msr_class_cpu_callback() or from
__init-marked msr_init().
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Cc: Andi Kleen <ak@suse.de>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds POWERPC specific hooks for scaled time accounting.
POWER6 includes a SPURR register. The SPURR is based off the PURR register
but is scaled based on CPU frequency and issue rates. This gives a more
accurate account of the instructions used per task. The PURR and timebase
will be constant relative to the wall clock, irrespective of the CPU
frequency.
This implementation reads the SPURR register in account_system_vtime which
is only call called on context witch and hard and soft irq entry and exit.
The percentage of user and system time is then estimated using the ratio of
these accounted by the PURR. If the SPURR is not present, the PURR read.
An earlier implementation of this patch read the SPURR whenever the PURR
was read, which included the system call entry and exit path.
Unfortunately this showed a performance regression on lmbench runs, so was
re-implemented.
I've included the lmbench results here when run bare metal on POWER6. 1st
column is the unpatch results. 2nd column is the results using the below
patch and the 3rd is the % diff of these results from the base. 4th and
5th columns are the results and % differnce from the base using the older
patch (SPURR read in syscall entry/exit path).
Base Scaled-Acct SPURR-in-syscall
Result Result % diff Result % diff
Simple syscall: 0.3086 0.3086 0.0000 0.3452 11.8600
Simple read: 0.4591 0.4671 1.7425 0.5044 9.86713
Simple write: 0.4364 0.4366 0.0458 0.4731 8.40971
Simple stat: 2.0055 2.0295 1.1967 2.0669 3.06158
Simple fstat: 0.5962 0.5876 -1.442 0.6368 6.80979
Simple open/close: 3.1283 3.1009 -0.875 3.2088 2.57328
Select on 10 fd's: 0.8554 0.8457 -1.133 0.8667 1.32101
Select on 100 fd's: 3.5292 3.6329 2.9383 3.6664 3.88756
Select on 250 fd's: 7.9097 8.1881 3.5197 8.2242 3.97613
Select on 500 fd's: 15.2659 15.836 3.7357 15.873 3.97814
Select on 10 tcp fd's: 0.9576 0.9416 -1.670 0.9752 1.83792
Select on 100 tcp fd's: 7.248 7.2254 -0.311 7.2685 0.28283
Select on 250 tcp fd's: 17.7742 17.707 -0.375 17.749 -0.1406
Select on 500 tcp fd's: 35.4258 35.25 -0.496 35.286 -0.3929
Signal handler installation: 0.6131 0.6075 -0.913 0.647 5.52927
Signal handler overhead: 2.0919 2.1078 0.7600 2.1831 4.35967
Protection fault: 0.7345 0.7478 1.8107 0.8031 9.33968
Pipe latency: 33.006 16.398 -50.31 33.475 1.42368
AF_UNIX sock stream latency: 14.5093 30.910 113.03 30.715 111.692
Process fork+exit: 219.8 222.8 1.3648 229.37 4.35623
Process fork+execve: 876.14 873.28 -0.32 868.66 -0.8533
Process fork+/bin/sh -c: 2830 2876.5 1.6431 2958 4.52296
File /var/tmp/XXX write bw: 1193497 1195536 0.1708 118657 -0.5799
Pagefaults on /var/tmp/XXX: 3.1272 3.2117 2.7020 3.2521 3.99398
Also, kernel compile times show no difference with this patch applied.
[pbadari@us.ibm.com: Avoid unnecessary PURR reading]
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Jay Lan <jlan@engr.sgi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Found these while looking at printk uses.
Add missing newlines to dev_<level> uses
Add missing KERN_<level> prefixes to multiline dev_<level>s
Fixed a wierd->weird spelling typo
Added a newline to a printk
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Mark M. Hoffman <mhoffman@lightlink.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Smart <James.Smart@Emulex.Com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Fix resource leakage in error case within detect_cache_attributes()
- Don't register hotcpu notifier when cache_add_dev() returns error
- Introduce cache_dev_map cpumask to track whether cache interface for
CPU is successfully added by cache_add_dev() or not.
cache_add_dev() may fail with out of memory error. In order to
avoid cache_remove_dev() with that uninitialized cache interface when
CPU_DEAD event is delivered we need to have the cache_dev_map cpumask.
(We cannot change cache_add_dev() from CPU_ONLINE event handler
to CPU_UP_PREPARE event handler. Because cache_add_dev() needs
to do cpuid and store the results with its CPU online.)
[nix.or.die@googlemail.com: fix a section mismatch warning]
Cc: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Gabriel Craciunescu <nix.or.die@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Clear kobject in percpu device_mce before calling sysdev_register() with
Because mce_create_device() may fail and it leaves kobject filled with
junk. It will be the problem when mce_create_device() will be called
next time.
- Fix error handling in mce_create_device()
Error handling should not do sysdev_remove_file() with not yet added
attributes.
- Don't register hotcpu notifier when mce_create_device() returns error
- Do mce_create_device() in CPU_UP_PREPARE instead of CPU_ONLINE
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Andi Kleen <ak@suse.de>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>