Граф коммитов

5493 Коммитов

Автор SHA1 Сообщение Дата
Nathan Lynch 93197a36a9 powerpc: Rewrite sysfs processor cache info code
The current code for providing processor cache information in sysfs
has the following deficiencies:
- several complex functions that are hard to understand
- implicit recursion (cache_desc_release -> kobject_put -> cache_desc_release)
- explicit recursion (create_cache_index_info)
- use of two per-cpu arrays when one would suffice
- duplication of work on systems where CPUs share cache

Also, when I looked at implementing support for a shared_cpu_map
attribute, it was pretty much impossible to handle hotplug without
checking every single online CPU's cache_desc list and fixing things
up... not that this is a hot path, but it would have introduced
O(n^2)-ish behavior during boot.  Addressing this involved rethinking
the core data structures used, which didn't lend itself to an
incremental approach.

This implementation maintains a "forest" (potentially more than one
tree) of cache objects which reflects the system's cache topology.
Cache objects are instantiated as needed as CPUs come online.  A
per-cpu array is used mainly for sysfs-related bookkeeping; the
objects in the array just point to the appropriate points in the
forest.

This maintains compatibility with the existing code and includes some
enhancements:
- Implement the shared_cpu_map attribute, which is essential for
  enabling userspace to discover the system's overall cache topology.
- Use cache-block-size properties if cache-line-size is not available.

I chose to place this implementation in a new file since it would have
roughly doubled the size of sysfs.c, which is already kind of messy.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:10 +11:00
Michael Ellerman 5c9a2606bc powerpc/iseries: Kexec is known not to work on iseries
The non-zero return from the prepare callback is returned by sys_kexec_load()
to userspace, indicating that kexec is not supported on the machine.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:10 +11:00
Grant Likely 29f1aff2cc powerpc: Copy bootable images in the default install script
This patch makes the default install script (arch/powerpc/boot/install.sh)
copy the bootable image files into the install directory.  Before this
patch only the vmlinux image file was copied.

This patch makes the default 'make install' command useful for embedded
development when $(INSTALL_PATH) is set in the environment.

As a side effect, this patch changes the calling convention of the
install.sh script.  Instead of a single 5th parameter, the script is now
passed a list of all the target images stored in the $(image-y) Makefile
variable.  This should be backwards compatible with existing install scripts
since it just adds additional arguments and does not change existing ones.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:09 +11:00
Dave Hansen 893473df78 powerpc/mm: Cleanup careful_allocation(): consolidate memset()
Both users of careful_allocation() immediately memset() the
result.  So, just do it in one place.

Also give careful_allocation() a 'z' prefix to bring it in
line with kzmalloc() and friends.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:09 +11:00
Dave Hansen 0be210fd66 powerpc/mm: Make careful_allocation() return virtual addrs
Since we memset() the result in both of the uses here,
just make careful_alloc() return a virtual address.
Also, add a separate variable to store the physial
address that comes back from the lmb_alloc() functions.
This makes it less likely that someone will screw it up
forgetting to convert before returning since the vaddr
is always in a void* and the paddr is always in an
unsigned long.

I admit this is arbitrary since one of its users needs
a paddr and one a vaddr, but it does remove a good
number of casts.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:08 +11:00
Dave Hansen 5d21ea2b0e powerpc/mm:: Cleanup careful_allocation(): bootmem already panics
If we fail a bootmem allocation, the bootmem code itself
panics.  No need to redo it here.

Also change the wording of the other panic.  We don't
strictly have to allocate memory on the specified node.
It is just a hint and that node may not even *have* any
memory on it.  In that case we can and do fall back to
other nodes.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:08 +11:00
Dave Hansen c555e520ef powerpc/mm: Add better comment on careful_allocation()
The behavior in careful_allocation() really confused me
at first.  Add a comment to hopefully make it easier
on the next doofus that looks at it.

Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:08 +11:00
Nicolas Palix afcb065450 powerpc/powermac: Add missing of_node_put
This patch fixes some unbalanced OF node references in the
powermac code

Signed-off-by: Nicolas Palix <npalix@diku.dk>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:07 +11:00
Benjamin Herrenschmidt c1f343028d powerpc/pci: Reserve legacy regions on PCI
There's a problem on some embedded platforms when we re-assign
everything on PCI, such as 44x. The generic code tries to avoid
assigning devices to addresses overlapping the low legacy
addresses such as VGA hard decoded areas using constants that
are unfortunately no good for us, as they don't take into account
the address translation we do to access PCI busses.

Thus we end up allocating things like IO BARs to 0, which is
technically legal, but will shadow hard decoded ports for use
by things like VGA cards.

This works around it by attempting to reserve legacy regions
before we try to assign addresses.

NOTE: This may have nasty side effects in cases I haven't tested
yet:

 - We try to use FW mappings (ie. powermac) and the FW has allocated
a conflicting address over those legacy regions. This will typically
happen. I would expect the new code to just fail with an informative
message without harm but I haven't had a chance to test that scenario
yet.

 - A device with fixed BARs overlapping those legacy addresses such
as an IDE controller in legacy mode is in the system. I don't know
for sure yet what will happen there, I have to test :-)

Ideally, we should change PCIBIOS_MIN_IO/MIN_MEM accross the board
to take a bus pointer so they can provide appropriate per-bus translated
values to the generic code but that's a more invasive patch. I will
do that in the future, but in the meantime, this fixes the problem
locally

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:07 +11:00
Benjamin Herrenschmidt 24f030175d Merge commit 'origin/master' into next 2009-01-08 16:24:38 +11:00
Linus Torvalds b424e8d3b4 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (98 commits)
  PCI PM: Put PM callbacks in the order of execution
  PCI PM: Run default PM callbacks for all devices using new framework
  PCI PM: Register power state of devices during initialization
  PCI PM: Call pci_fixup_device from legacy routines
  PCI PM: Rearrange code in pci-driver.c
  PCI PM: Avoid touching devices behind bridges in unknown state
  PCI PM: Move pci_has_legacy_pm_support
  PCI PM: Power-manage devices without drivers during suspend-resume
  PCI PM: Add suspend counterpart of pci_reenable_device
  PCI PM: Fix poweroff and restore callbacks
  PCI: Use msleep instead of cpu_relax during ASPM link retraining
  PCI: PCIe portdrv: Add kerneldoc comments to remining core funtions
  PCI: PCIe portdrv: Rearrange code so that related things are together
  PCI: PCIe portdrv: Fix suspend and resume of PCI Express port services
  PCI: PCIe portdrv: Add kerneldoc comments to some core functions
  x86/PCI: Do not use interrupt links for devices using MSI-X
  net: sfc: Use pci_clear_master() to disable bus mastering
  PCI: Add pci_clear_master() as opposite of pci_set_master()
  PCI hotplug: remove redundant test in cpq hotplug
  PCI: pciehp: cleanup register and field definitions
  ...
2009-01-07 15:41:01 -08:00
Linus Torvalds 7c7758f99d Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (123 commits)
  wimax/i2400m: add CREDITS and MAINTAINERS entries
  wimax: export linux/wimax.h and linux/wimax/i2400m.h with headers_install
  i2400m: Makefile and Kconfig
  i2400m/SDIO: TX and RX path backends
  i2400m/SDIO: firmware upload backend
  i2400m/SDIO: probe/disconnect, dev init/shutdown and reset backends
  i2400m/SDIO: header for the SDIO subdriver
  i2400m/USB: TX and RX path backends
  i2400m/USB: firmware upload backend
  i2400m/USB: probe/disconnect, dev init/shutdown and reset backends
  i2400m/USB: header for the USB bus driver
  i2400m: debugfs controls
  i2400m: various functions for device management
  i2400m: RX and TX data/control paths
  i2400m: firmware loading and bootrom initialization
  i2400m: linkage to the networking stack
  i2400m: Generic probe/disconnect, reset and message passing
  i2400m: host/device procotol and core driver definitions
  i2400m: documentation and instructions for usage
  wimax: Makefile, Kconfig and docbook linkage for the stack
  ...
2009-01-07 15:37:24 -08:00
Trent Piepho 6fd8be4bf7 powerpc/fsl-booke: Remove num_tlbcam_entries
This is a global variable defined in fsl_booke_mmu.c with a value that gets
initialized in assembly code in head_fsl_booke.S.

It's never used.

If some code ever does want to know the number of entries in TLB1, then
"numcams = mfspr(SPRN_TLB1CFG) & 0xfff", is a whole lot simpler than a
global initialized during kernel boot from assembly.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-07 15:33:07 -06:00
Trent Piepho 19f5465e82 powerpc/fsl-booke: Don't hard-code size of struct tlbcam
Some assembly code in head_fsl_booke.S hard-coded the size of struct tlbcam
to 20 when it indexed the TLBCAM table.  Anyone changing the size of struct
tlbcam would not know to expect that.

The kernel already has a system to get the size of C structures into
assembly language files, asm-offsets, so let's use it.

The definition of the struct gets moved to a header, so that asm-offsets.c
can include it.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-07 15:33:06 -06:00
Trent Piepho 565f37642c powerpc/fsl-pci: Set relaxed ordering on prefetchable ranges
Provides a small speedup when accessing pefetchable ranges.  To indicate
that a memory range is prefetchable, mark it in the dts file with 42000000
instead of 02000000.

A powepc pci_controller is allowed three memory ranges, any of which may be
prefetchable.  However, the PCI-PCI bridge configuration space only has one
field for "non-prefetchable memory behind bridge", which has a 32 bit
address, and one field for "prefetchable memory behind bridge", which may
have a 64 bit address.  These are PCI bus addresses, not CPU physical
addresses.

So really you're only allowed one memory range of each type.  And if you
want the range at a PCI address above 32 bits you must make it
prefetchable.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-07 15:33:05 -06:00
Trent Piepho a097a78c1e powerpc/fsl-pci: Better ATMU setup for 85xx/86xx
The code that sets up the outbound ATMU windows, which is used to map CPU
physical addresses into PCI bus addresses where BARs will be mapped, didn't
work so well.

For one, it leaked the ioremap() of the ATMU registers.  Another small bug
was the high 20 bits of the PCI bus address were left as zero.  It's legal
for prefetchable memory regions to be above 32 bits, so the high 20 bits
might not be zero.

Mainly, it couldn't handle ranges that were not a power of two in size or
were not naturally aligned.  The ATMU windows have these requirements (size
& alignment), but the code didn't bother to check if the ranges it was
programming met them.  If they didn't, the windows would silently be
programmed incorrectly.

This new code can handle ranges which are not power of two sized nor
naturally aligned.  It simply splits the ranges into multiple valid ATMU
windows.  As there are only four windows, pooly aligned or sized ranges
(which didn't even work before) may run out of windows.  In this case an
error is printed and an effort is made to disable the unmapped resources.

An improvement that could be made would be to make use of the default
outbound window.  Iff hose->pci_mem_offset is zero, then it's possible that
some or all of the ranges might not need an outbound window and could just
use the default window.

The default ATMU window can support a pci_mem_offset less than zero too,
but pci_mem_offset is unsigned.  One could say the abilities allowed a
powerpc pci_controller is neither subset nor a superset of the abilities of
a Freescale PCIe controller.  Thankfully, the most useful bits are in the
intersection of the two abilities.

Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-07 15:32:54 -06:00
Linus Torvalds 57c44c5f6f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
  trivial: chack -> check typo fix in main Makefile
  trivial: Add a space (and a comma) to a printk in 8250 driver
  trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
  trivial: Fix misspelling of "firmware" in powerpc Makefile
  trivial: Fix misspelling of "firmware" in usb.c
  trivial: Fix misspelling of "firmware" in qla1280.c
  trivial: Fix misspelling of "firmware" in a100u2w.c
  trivial: Fix misspelling of "firmware" in megaraid.c
  trivial: Fix misspelling of "firmware" in ql4_mbx.c
  trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
  trivial: Fix misspelling of "firmware" in ipw2100.c
  trivial: Fix misspelling of "firmware" in atmel.c
  trivial: Fix misspelled firmware in Kconfig
  trivial: fix an -> a typos in documentation and comments
  trivial: fix then -> than typos in comments and documentation
  trivial: update Jesper Juhl CREDITS entry with new email
  trivial: fix singal -> signal typo
  trivial: Fix incorrect use of "loose" in event.c
  trivial: printk: fix indentation of new_text_line declaration
  trivial: rtc-stk17ta8: fix sparse warning
  ...
2009-01-07 11:31:52 -08:00
Bjorn Helgaas 3f9455d488 PCI: powerpc: use generic pci_swizzle_interrupt_pin()
Use the generic pci_swizzle_interrupt_pin() instead of arch-specific code.

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:52 -08:00
Vitaly Bordug 796bcae736 USB: powerpc: Workaround for the PPC440EPX USBH_23 errata [take 3]
A published errata for ppc440epx states, that when running Linux with
both EHCI and OHCI modules loaded, the EHCI module experiences a fatal
error when a high-speed device is connected to the USB2.0, and
functions normally if OHCI module is not loaded.

There used to be recommendation to use only hi-speed or full-speed
devices with specific conditions, when respective module was unloaded.
Later, it was observed that ohci suspend is enough to keep things
going, and it was turned into workaround, as explained below.

Quote from original descriprion:

The 440EPx USB 2.0 Host controller is an EHCI compliant controller.  In
USB 2.0 Host controllers, each EHCI controller has one or more companion
controllers, which may be OHCI or UHCI.  An USB 2.0 Host controller will
contain one or more ports.  For each port, only one of the controllers
is connected at any one time. In the 440EPx, there is only one OHCI
companion controller, and only one USB 2.0 Host port.
All ports on an USB 2.0 controller default to the companion
controller.  If you load only an ohci driver, it will have control of
the ports and any deviceplugged in will operate, although high speed
devices will be forced to operate at full speed.  When an ehci driver
is loaded, it explicitly takes control of the ports.  If there is a
device connected, and / or every time there is a new device connected,
the ehci driver determines if the device is high speed or not.  If it
is high speed, the driver retains control of the port.  If it is not,
the driver explicitly gives the companion controller control of the
port.

The is a software workaround that uses
Initial version of the software workaround was posted to
linux-usb-devel:

http://www.mail-archive.com/linux-usb-devel@lists.sourceforge.net/msg54019.html

and later available from amcc.com:
http://www.amcc.com/Embedded/Downloads/download.html?cat=1&family=15&ins=2

The patch below is generally based on the latter, but reworked to
powerpc/of_device USB drivers, and uses a few devicetree inquiries to
get rid of (some) hardcoded defines.

Signed-off-by: Vitaly Bordug <vitb@kernel.crashing.org>
Signed-off-by: Stefan Roese <sr@denx.de>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 09:59:52 -08:00
Timur Tabi fdd4e8152f powerpc/qe: add Ethernet UPSMR definitions to QE library
Add the UCC_GETH_UPSMR_xxx definitions to qe.h.  The ucc_geth driver will
eventually use these instead of the UPSMR_ macros it currently defines.

Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-07 09:18:53 -06:00
Kumar Gala be122d6d8b powerpc/85xx: Fix PCIe error interrupts
The PCIe interrupts for 8544ds and 8572ds were incorrect.  The 8572 case
was found by Liu Yu.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-07 09:18:52 -06:00
Jason Jin 1433fa7d8d powerpc: Fix the ide suspend function in uli1575
The general pci resume code can only restore part of the
configuration registers. We need to reconfigure those
registers in the FIXUP_RESUME.

Signed-off-by: Jason Jin <Jason.jin@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2009-01-07 09:18:52 -06:00
Harvey Harrison 156ca2bbf6 powerpc: introduce asm/swab.h
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 18:10:27 -08:00
Masami Hiramatsu 1294156078 kprobes: add kprobe_insn_mutex and cleanup arch_remove_kprobe()
Add kprobe_insn_mutex for protecting kprobe_insn_pages hlist, and remove
kprobe_mutex from architecture dependent code.

This allows us to call arch_remove_kprobe() (and free_insn_slot) while
holding kprobe_mutex.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:20 -08:00
Matthew Wilcox ea43546750 atomic_t: unify all arch definitions
The atomic_t type cannot currently be used in some header files because it
would create an include loop with asm/atomic.h.  Move the type definition
to linux/types.h to break the loop.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:10 -08:00
Gary Hade c04fc586c1 mm: show node to memory section relationship with symlinks in sysfs
Show node to memory section relationship with symlinks in sysfs

Add /sys/devices/system/node/nodeX/memoryY symlinks for all
the memory sections located on nodeX.  For example:
/sys/devices/system/node/node1/memory135 -> ../../memory/memory135
indicates that memory section 135 resides on node1.

Also revises documentation to cover this change as well as updating
Documentation/ABI/testing/sysfs-devices-memory to include descriptions
of memory hotremove files 'phys_device', 'phys_index', and 'state'
that were previously not described there.

In addition to it always being a good policy to provide users with
the maximum possible amount of physical location information for
resources that can be hot-added and/or hot-removed, the following
are some (but likely not all) of the user benefits provided by
this change.
Immediate:
  - Provides information needed to determine the specific node
    on which a defective DIMM is located.  This will reduce system
    downtime when the node or defective DIMM is swapped out.
  - Prevents unintended onlining of a memory section that was
    previously offlined due to a defective DIMM.  This could happen
    during node hot-add when the user or node hot-add assist script
    onlines _all_ offlined sections due to user or script inability
    to identify the specific memory sections located on the hot-added
    node.  The consequences of reintroducing the defective memory
    could be ugly.
  - Provides information needed to vary the amount and distribution
    of memory on specific nodes for testing or debugging purposes.
Future:
  - Will provide information needed to identify the memory
    sections that need to be offlined prior to physical removal
    of a specific node.

Symlink creation during boot was tested on 2-node x86_64, 2-node
ppc64, and 2-node ia64 systems.  Symlink creation during physical
memory hot-add tested on a 2-node x86_64 system.

Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:59:00 -08:00
Mel Gorman 3340289ddf mm: report the MMU pagesize in /proc/pid/smaps
The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
kernel to back a VMA.  This matches the size used by the MMU in the
majority of cases.  However, one counter-example occurs on PPC64 kernels
whereby a kernel using 64K as a base pagesize may still use 4K pages for
the MMU on older processor.  To distinguish, this patch reports
MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-06 15:58:58 -08:00
Nick Andrew 2a94739c70 trivial: Fix misspelling of "firmware" in powerpc Makefile
Fix misspelling of "firmware" in powerpc Makefile

It's spelled "firmware".

Signed-off-by: Nick Andrew <nick@nick-andrew.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-06 11:28:09 +01:00
Frederik Schwarzer 025dfdafe7 trivial: fix then -> than typos in comments and documentation
- (better, more, bigger ...) then -> (...) than

Signed-off-by: Frederik Schwarzer <schwarzerf@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-01-06 11:28:06 +01:00
Al Viro 56ff5efad9 zero i_uid/i_gid on inode allocation
... and don't bother in callers.  Don't bother with zeroing i_blocks,
while we are at it - it's already been zeroed.

i_mode is not worth the effort; it has no common default value.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-01-05 11:54:28 -05:00
Benjamin Herrenschmidt 4aa12f7b92 Merge commit 'kumar/kumar-next' into next 2009-01-05 14:16:48 +11:00
Linus Torvalds 7d3b56ba37 Merge branch 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
  x86: setup_per_cpu_areas() cleanup
  cpumask: fix compile error when CONFIG_NR_CPUS is not defined
  cpumask: use alloc_cpumask_var_node where appropriate
  cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
  x86: use cpumask_var_t in acpi/boot.c
  x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
  sched: put back some stack hog changes that were undone in kernel/sched.c
  x86: enable cpus display of kernel_max and offlined cpus
  ia64: cpumask fix for is_affinity_mask_valid()
  cpumask: convert RCU implementations, fix
  xtensa: define __fls
  mn10300: define __fls
  m32r: define __fls
  h8300: define __fls
  frv: define __fls
  cris: define __fls
  cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
  cpumask: zero extra bits in alloc_cpumask_var_node
  cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
  cpumask: convert mm/
  ...
2009-01-03 12:04:39 -08:00
Linus Torvalds 61420f59a5 Merge branch 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'cputime' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
  [PATCH] fast vdso implementation for CLOCK_THREAD_CPUTIME_ID
  [PATCH] improve idle cputime accounting
  [PATCH] improve precision of idle time detection.
  [PATCH] improve precision of process accounting.
  [PATCH] idle cputime accounting
  [PATCH] fix scaled & unscaled cputime accounting
2009-01-03 11:56:24 -08:00
Mike Travis 7eb1955336 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask into merge-rr-cpumask
Conflicts:
	arch/x86/kernel/io_apic.c
	kernel/rcuclassic.c
	kernel/sched.c
	kernel/time/tick-sched.c

Signed-off-by: Mike Travis <travis@sgi.com>
[ mingo@elte.hu: backmerged typo fix for io_apic.c ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-03 18:53:31 +01:00
Linus Torvalds b840d79631 Merge branch 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
  x86: export vector_used_by_percpu_irq
  x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
  sched: nominate preferred wakeup cpu, fix
  x86: fix lguest used_vectors breakage, -v2
  x86: fix warning in arch/x86/kernel/io_apic.c
  sched: fix warning in kernel/sched.c
  sched: move test_sd_parent() to an SMP section of sched.h
  sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
  sched: activate active load balancing in new idle cpus
  sched: bias task wakeups to preferred semi-idle packages
  sched: nominate preferred wakeup cpu
  sched: favour lower logical cpu number for sched_mc balance
  sched: framework for sched_mc/smt_power_savings=N
  sched: convert BALANCE_FOR_xx_POWER to inline functions
  x86: use possible_cpus=NUM to extend the possible cpus allowed
  x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
  x86: update io_apic.c to the new cpumask code
  x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
  x86: xen: use smp_call_function_many()
  x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
  ...

Fixed up trivial conflict in kernel/time/tick-sched.c manually
2009-01-02 11:44:09 -08:00
Linus Torvalds 597b0d2162 Merge branch 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm
* 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (140 commits)
  KVM: MMU: handle large host sptes on invlpg/resync
  KVM: Add locking to virtual i8259 interrupt controller
  KVM: MMU: Don't treat a global pte as such if cr4.pge is cleared
  MAINTAINERS: Maintainership changes for kvm/ia64
  KVM: ia64: Fix kvm_arch_vcpu_ioctl_[gs]et_regs()
  KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMI
  KVM: VMX: Fix pending NMI-vs.-IRQ race for user space irqchip
  KVM: fix handling of ACK from shared guest IRQ
  KVM: MMU: check for present pdptr shadow page in walk_shadow
  KVM: Consolidate userspace memory capability reporting into common code
  KVM: Advertise the bug in memory region destruction as fixed
  KVM: use cpumask_var_t for cpus_hardware_enabled
  KVM: use modern cpumask primitives, no cpumask_t on stack
  KVM: Extract core of kvm_flush_remote_tlbs/kvm_reload_remote_mmus
  KVM: set owner of cpu and vm file operations
  anon_inodes: use fops->owner for module refcount
  x86: KVM guest: kvm_get_tsc_khz: return khz, not lpj
  KVM: MMU: prepopulate the shadow on invlpg
  KVM: MMU: skip global pgtables on sync due to cr3 switch
  KVM: MMU: collapse remote TLB flushes on root sync
  ...
2009-01-02 11:41:11 -08:00
Rusty Russell 9150641dd1 cpumask: Introduce topology_core_cpumask()/topology_thread_cpumask(): powerpc
Impact: New API

The old topology_core_siblings() and topology_thread_siblings() return
a cpumask_t; these new ones return a (const) struct cpumask *.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-01 10:12:21 +10:30
Al Viro 18d8fda7c3 take init_fs to saner place
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:42 -05:00
Nick Piggin c2452f3278 shrink struct dentry
struct dentry is one of the most critical structures in the kernel. So it's
sad to see it going neglected.

With CONFIG_PROFILING turned on (which is probably the common case at least
for distros and kernel developers), sizeof(struct dcache) == 208 here
(64-bit). This gives 19 objects per slab.

I packed d_mounted into a hole, and took another 4 bytes off the inline
name length to take the padding out from the end of the structure. This
shinks it to 200 bytes. I could have gone the other way and increased the
length to 40, but I'm aiming for a magic number, read on...

I then got rid of the d_cookie pointer. This shrinks it to 192 bytes. Rant:
why was this ever a good idea? The cookie system should increase its hash
size or use a tree or something if lookups are a problem. Also the "fast
dcookie lookups" in oprofile should be moved into the dcookie code -- how
can oprofile possibly care about the dcookie_mutex? It gets dropped after
get_dcookie() returns so it can't be providing any sort of protection.

At 192 bytes, 21 objects fit into a 4K page, saving about 3MB on my system
with ~140 000 entries allocated. 192 is also a multiple of 64, so we get
nice cacheline alignment on 64 and 32 byte line systems -- any given dentry
will now require 3 cachelines to touch all fields wheras previously it
would require 4.

I know the inline name size was chosen quite carefully, however with the
reduction in cacheline footprint, it should actually be just about as fast
to do a name lookup for a 36 character name as it was before the patch (and
faster for other sizes). The memory footprint savings for names which are
<= 32 or > 36 bytes long should more than make up for the memory cost for
33-36 byte names.

Performance is a feature...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-12-31 18:07:38 -05:00
Avi Kivity ca9edaee1a KVM: Consolidate userspace memory capability reporting into common code
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:46 +02:00
Hollis Blanchard 7b7015914b KVM: ppc: mostly cosmetic updates to the exit timing accounting code
The only significant changes were to kvmppc_exit_timing_write() and
kvmppc_exit_timing_show(), both of which were dramatically simplified.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:41 +02:00
Hollis Blanchard 73e75b416f KVM: ppc: Implement in-kernel exit timing statistics
Existing KVM statistics are either just counters (kvm_stat) reported for
KVM generally or trace based aproaches like kvm_trace.
For KVM on powerpc we had the need to track the timings of the different exit
types. While this could be achieved parsing data created with a kvm_trace
extension this adds too much overhead (at least on embedded PowerPC) slowing
down the workloads we wanted to measure.

Therefore this patch adds a in-kernel exit timing statistic to the powerpc kvm
code. These statistic is available per vm&vcpu under the kvm debugfs directory.
As this statistic is low, but still some overhead it can be enabled via a
.config entry and should be off by default.

Since this patch touched all powerpc kvm_stat code anyway this code is now
merged and simplified together with the exit timing statistic code (still
working with exit timing disabled in .config).

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:41 +02:00
Hollis Blanchard c5fbdffbda KVM: ppc: save and restore guest mappings on context switch
Store shadow TLB entries in memory, but only use it on host context switch
(instead of every guest entry). This improves performance for most workloads on
440 by reducing the guest TLB miss rate.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:09 +02:00
Hollis Blanchard 7924bd4109 KVM: ppc: directly insert shadow mappings into the hardware TLB
Formerly, we used to maintain a per-vcpu shadow TLB and on every entry to the
guest would load this array into the hardware TLB. This consumed 1280 bytes of
memory (64 entries of 16 bytes plus a struct page pointer each), and also
required some assembly to loop over the array on every entry.

Instead of saving a copy in memory, we can just store shadow mappings directly
into the hardware TLB, accepting that the host kernel will clobber these as
part of the normal 440 TLB round robin. When we do that we need less than half
the memory, and we have decreased the exit handling time for all guest exits,
at the cost of increased number of TLB misses because the host overwrites some
guest entries.

These savings will be increased on processors with larger TLBs or which
implement intelligent flush instructions like tlbivax (which will avoid the
need to walk arrays in software).

In addition to that and to the code simplification, we have a greater chance of
leaving other host userspace mappings in the TLB, instead of forcing all
subsequent tasks to re-fault all their mappings.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:09 +02:00
Hollis Blanchard c0ca609c5f powerpc/44x: declare tlb_44x_index for use in C code
KVM currently ignores the host's round robin TLB eviction selection, instead
maintaining its own TLB state and its own round robin index. However, by
participating in the normal 44x TLB selection, we can drop the alternate TLB
processing in KVM. This results in a significant performance improvement,
since that processing currently must be done on *every* guest exit.

Accordingly, KVM needs to be able to access and increment tlb_44x_index.
(KVM on 440 cannot be a module, so there is no need to export this symbol.)

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:09 +02:00
Hollis Blanchard 891686188f KVM: ppc: support large host pages
KVM on 440 has always been able to handle large guest mappings with 4K host
pages -- we must, since the guest kernel uses 256MB mappings.

This patch makes KVM work when the host has large pages too (tested with 64K).

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:07 +02:00
Hollis Blanchard fe4e771d5c KVM: ppc: fix userspace mapping invalidation on context switch
We used to defer invalidating userspace TLB entries until jumping out of the
kernel. This was causing MMU weirdness most easily triggered by using a pipe in
the guest, e.g. "dmesg | tail". I believe the problem was that after the guest
kernel changed the PID (part of context switch), the old process's mappings
were still present, and so copy_to_user() on the "return to new process" path
ended up using stale mappings.

Testing with large pages (64K) exposed the problem, probably because with 4K
pages, pressure on the TLB faulted all process A's mappings out before the
guest kernel could insert any for process B.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:26 +02:00
Hollis Blanchard df9b856c45 KVM: ppc: use prefetchable mappings for guest memory
Bare metal Linux on 440 can "overmap" RAM in the kernel linear map, so that it
can use large (256MB) mappings even if memory isn't a multiple of 256MB. To
prevent the hardware prefetcher from loading from an invalid physical address
through that mapping, it's marked Guarded.

However, KVM must ensure that all guest mappings are backed by real physical
RAM (since a deliberate access through a guarded mapping could still cause a
machine check). Accordingly, we don't need to make our mappings guarded, so
let's allow prefetching as the designers intended.

Curiously this patch didn't affect performance at all on the quick test I
tried, but it's clearly the right thing to do anyways and may improve other
workloads.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:26 +02:00
Hollis Blanchard bf5d4025c9 KVM: ppc: use MMUCR accessor to obtain TID
We have an accessor; might as well use it.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:25 +02:00
Hollis Blanchard 74ef740da6 KVM: ppc: fix Kconfig constraints
Make sure that CONFIG_KVM cannot be selected without processor support
(currently, 440 is the only processor implementation available).

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:25 +02:00
Hollis Blanchard fcfdbd266a KVM: ppc: improve trap emulation
set ESR[PTR] when emulating a guest trap. This allows Linux guests to
properly handle WARN_ON() (i.e. detect that it's a non-fatal trap).

Also remove debugging printk in trap emulation.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:24 +02:00
Hollis Blanchard d4cf3892e5 KVM: ppc: optimize irq delivery path
In kvmppc_deliver_interrupt is just one case left in the switch and it is a
rare one (less than 8%) when looking at the exit numbers. Therefore we can
at least drop the switch/case and if an if. I inserted an unlikely too, but
that's open for discussion.

In kvmppc_can_deliver_interrupt all frequent cases are in the default case.
I know compilers are smart but we can make it easier for them. By writing
down all options and removing the default case combined with the fact that
ithe values are constants 0..15 should allow the compiler to write an easy
jump table.
Modifying kvmppc_can_deliver_interrupt pointed me to the fact that gcc seems
to be unable to reduce priority_exception[x] to a build time constant.
Therefore I changed the usage of the translation arrays in the interrupt
delivery path completely. It is now using priority without translation to irq
on the full irq delivery path.
To be able to do that ivpr regs are stored by their priority now.

Additionally the decision made in kvmppc_can_deliver_interrupt is already
sufficient to get the value of interrupt_msr_mask[x]. Therefore we can replace
the 16x4byte array used here with a single 4byte variable (might still be one
miss, but the chance to find this in cache should be better than the right
entry of the whole array).

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:23 +02:00
Hollis Blanchard 9ab80843c0 KVM: ppc: optimize find first bit
Since we use a unsigned long here anyway we can use the optimized __ffs.

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:23 +02:00
Hollis Blanchard 1b6766c7f3 KVM: ppc: optimize kvm stat handling
Currently we use an unnecessary if&switch to detect some cases.
To be honest we don't need the ligh_exits counter anyway, because we can
calculate it out of others. Sum_exits can also be calculated, so we can
remove that too.
MMIO, DCR  and INTR can be counted on other places without these
additional control structures (The INTR case was never hit anyway).

The handling of BOOKE_INTERRUPT_EXTERNAL/BOOKE_INTERRUPT_DECREMENTER is
similar, but we can avoid the additional if when copying 3 lines of code.
I thought about a goto there to prevent duplicate lines, but rewriting three
lines should be better style than a goto cross switch/case statements (its
also not enough code to justify a new inline function).

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:23 +02:00
Hollis Blanchard b8fd68ac8d KVM: ppc: fix set regs to take care of msr change
When changing some msr bits e.g. problem state we need to take special
care of that. We call the function in our mtmsr emulation (not needed for
wrtee[i]), but we don't call kvmppc_set_msr if we change msr via set_regs
ioctl.
It's a corner case we never hit so far, but I assume it should be
kvmppc_set_msr in our arch set regs function (I found it because it is also
a corner case when using pv support which would miss the update otherwise).

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:23 +02:00
Hollis Blanchard 5cf8ca2214 KVM: ppc: adjust vcpu types to support 64-bit cores
However, some of these fields could be split into separate per-core structures
in the future.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:22 +02:00
Hollis Blanchard db93f5745d KVM: ppc: create struct kvm_vcpu_44x and introduce container_of() accessor
This patch doesn't yet move all 44x-specific data into the new structure, but
is the first step down that path. In the future we may also want to create a
struct kvm_vcpu_booke.

Based on patch from Liu Yu <yu.liu@freescale.com>.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:22 +02:00
Hollis Blanchard 5cbb5106f5 KVM: ppc: Move the last bits of 44x code out of booke.c
Needed to port to other Book E processors.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:22 +02:00
Hollis Blanchard 75f74f0dbe KVM: ppc: refactor instruction emulation into generic and core-specific pieces
Cores provide 3 emulation hooks, implemented for example in the new
4xx_emulate.c:
kvmppc_core_emulate_op
kvmppc_core_emulate_mtspr
kvmppc_core_emulate_mfspr

Strictly speaking the last two aren't necessary, but provide for more
informative error reporting ("unknown SPR").

Long term I'd like to have instruction decoding autogenerated from tables of
opcodes, and that way we could aggregate universal, Book E, and core-specific
instructions more easily and without redundant switch statements.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:21 +02:00
Hollis Blanchard c381a04313 ppc: Create disassemble.h to extract instruction fields
This is used in a couple places in KVM, but isn't KVM-specific.

However, this patch doesn't modify other in-kernel emulation code:
- xmon uses a direct copy of ppc_opc.c from binutils
- emulate_instruction() doesn't need it because it can use a series
  of mask tests.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:21 +02:00
Hollis Blanchard 9dd921cfea KVM: ppc: Refactor powerpc.c to relocate 440-specific code
This introduces a set of core-provided hooks. For 440, some of these are
implemented by booke.c, with the rest in (the new) 44x.c.

Note that these hooks are link-time, not run-time. Since it is not possible to
build a single kernel for both e500 and 440 (for example), using function
pointers would only add overhead.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:21 +02:00
Hollis Blanchard d9fbd03d24 KVM: ppc: combine booke_guest.c and booke_host.c
The division was somewhat artificial and cumbersome, and had no functional
benefit anyways: we can only guests built for the real host processor.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:51:50 +02:00
Hollis Blanchard 0f55dc481e KVM: ppc: Rename "struct tlbe" to "struct kvmppc_44x_tlbe"
This will ease ports to other cores.

Also remove unused "struct kvm_tlb" while we're at it.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:51:50 +02:00
Hollis Blanchard a0d7b9f246 KVM: ppc: Move 440-specific TLB code into 44x_tlb.c
This will make it easier to provide implementations for other cores.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:51:50 +02:00
Martin Schwidefsky 79741dd357 [PATCH] idle cputime accounting
The cpu time spent by the idle process actually doing something is
currently accounted as idle time. This is plain wrong, the architectures
that support VIRT_CPU_ACCOUNTING=y can do better: distinguish between the
time spent doing nothing and the time spent by idle doing work. The first
is accounted with account_idle_time and the second with account_system_time.
The architectures that use the account_xxx_time interface directly and not
the account_xxx_ticks interface now need to do the check for the idle
process in their arch code. In particular to improve the system vs true
idle time accounting the arch code needs to measure the true idle time
instead of just testing for the idle process.
To improve the tick based accounting as well we would need an architecture
primitive that can tell us if the pt_regs of the interrupted context
points to the magic instruction that halts the cpu.

In addition idle time is no more added to the stime of the idle process.
This field now contains the system time of the idle process as it should
be. On systems without VIRT_CPU_ACCOUNTING this will always be zero as
every tick that occurs while idle is running will be accounted as idle
time.

This patch contains the necessary common code changes to be able to
distinguish idle system time and true idle time. The architectures with
support for VIRT_CPU_ACCOUNTING need some changes to exploit this.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-12-31 15:11:46 +01:00
Martin Schwidefsky 457533a7d3 [PATCH] fix scaled & unscaled cputime accounting
The utimescaled / stimescaled fields in the task structure and the
global cpustat should be set on all architectures. On s390 the calls
to account_user_time_scaled and account_system_time_scaled never have
been added. In addition system time that is accounted as guest time
to the user time of a process is accounted to the scaled system time
instead of the scaled user time.
To fix the bugs and to prevent future forgetfulness this patch merges
account_system_time_scaled into account_system_time and
account_user_time_scaled into account_user_time.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: Michael Neuling <mikey@neuling.org>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-12-31 15:11:46 +01:00
Rusty Russell 2ca1a61583 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
Conflicts:

	arch/x86/kernel/io_apic.c
2008-12-31 23:05:57 +10:30
Linus Torvalds 5f34fe1cfc Merge branch 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
  stacktrace: provide save_stack_trace_tsk() weak alias
  rcu: provide RCU options on non-preempt architectures too
  printk: fix discarding message when recursion_bug
  futex: clean up futex_(un)lock_pi fault handling
  "Tree RCU": scalable classic RCU implementation
  futex: rename field in futex_q to clarify single waiter semantics
  x86/swiotlb: add default swiotlb_arch_range_needs_mapping
  x86/swiotlb: add default phys<->bus conversion
  x86: unify pci iommu setup and allow swiotlb to compile for 32 bit
  x86: add swiotlb allocation functions
  swiotlb: consolidate swiotlb info message printing
  swiotlb: support bouncing of HighMem pages
  swiotlb: factor out copy to/from device
  swiotlb: add arch hook to force mapping
  swiotlb: allow architectures to override phys<->bus<->phys conversions
  swiotlb: add comment where we handle the overflow of a dma mask on 32 bit
  rcu: fix rcutorture behavior during reboot
  resources: skip sanity check of busy resources
  swiotlb: move some definitions to header
  swiotlb: allow architectures to override swiotlb pool allocation
  ...

Fix up trivial conflicts in
  arch/x86/kernel/Makefile
  arch/x86/mm/init_32.c
  include/linux/hardirq.h
as per Ingo's suggestions.
2008-12-30 16:10:19 -08:00
Anton Vorontsov 068e8c9d02 powerpc/qe: Select QE_USB with USB_GADGET_FSL_QE
Boards should know when QE_USB is used, so that they can configure USB
clocks and pins.

Another option would be to add 'select QE_USB' into USB_GADGET_FSL_QE,
but selects are evil since they don't support dependencies.

While at it, also remove 'host' from the symbol description, since the
QE_USB code is used to support the gadget driver as well.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 12:12:12 -06:00
Julia Lawall 870029a682 powerpc/85xx: Add local_irq_restore in error handling code
There is a call to local_irq_restore in the normal exit case, so it would
seem that there should be one on an error return as well.

The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
expression l;
expression E,E1,E2;
@@

local_irq_save(l);
... when != local_irq_restore(l)
    when != spin_unlock_irqrestore(E,l)
    when any
    when strict
(
if (...) { ... when != local_irq_restore(l)
               when != spin_unlock_irqrestore(E1,l)
+   local_irq_restore(l);
    return ...;
}
|
if (...)
+   {local_irq_restore(l);
    return ...;
+   }
|
spin_unlock_irqrestore(E2,l);
|
local_irq_restore(l);
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:35:30 -06:00
Kumar Gala 8bd3947afd powerpc/85xx: Add SMP support to MPC8572 DS
Enable SMP support on the MPC8572 DS reference board.

Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:30:42 -06:00
Kumar Gala 00c4b95c44 powerpc/85xx: Enable SMP support
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:30:41 -06:00
Becky Bruce 47f80a325c powerpc/86xx: Update 8641hpcn dts file to match latest u-boot
The newest revision of uboot reworks the memory map for this
board to look more like the 85xx boards.  Also, some regions
which were far larger than the actual hardware have been scaled
back to match the board, and the imaginary second flash bank has
been removed. Rapidio and PCI are mutually exclusive in the hardware,
and they now are occupying the same space in the address map.
The Rapidio node is commented out of the .dts since PCI is the
common use case.

Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:30:40 -06:00
Anton Vorontsov be11d3b354 powerpc/qe: Fix few build errors with CONFIG_QUICC_ENGINE=n
Some 83xx boards were not ready for the optional QUICC Engine support.

This patch fixes following build errors:

arch/powerpc/platforms/built-in.o: In function `flush_disable_caches':
(.text+0xb308): undefined reference to `par_io_data_set'
arch/powerpc/platforms/built-in.o: In function `flush_disable_caches':
(.text+0xb334): undefined reference to `par_io_data_set'
arch/powerpc/platforms/built-in.o: In function `flush_disable_caches':
(.text+0xb408): undefined reference to `qe_ic_get_high_irq'
arch/powerpc/platforms/built-in.o: In function `flush_disable_caches':
(.text+0xb478): undefined reference to `qe_ic_get_low_irq'
arch/powerpc/platforms/built-in.o: In function `mpc832x_spi_init':
mpc832x_rdb.c:(.init.text+0x574c): undefined reference to `par_io_config_pin'
mpc832x_rdb.c:(.init.text+0x5768): undefined reference to `par_io_config_pin'
mpc832x_rdb.c:(.init.text+0x5784): undefined reference to `par_io_config_pin'
mpc832x_rdb.c:(.init.text+0x57a0): undefined reference to `par_io_config_pin'
mpc832x_rdb.c:(.init.text+0x57bc): undefined reference to `par_io_config_pin'
arch/powerpc/platforms/built-in.o:mpc832x_rdb.c:(.init.text+0x57d8): more undefined references to `par_io_config_pin' follow
arch/powerpc/platforms/built-in.o: In function `mpc836x_rdk_init_IRQ':
mpc836x_rdk.c:(.init.text+0x5e84): undefined reference to `qe_ic_init'
arch/powerpc/platforms/built-in.o: In function `mpc836x_rdk_setup_arch':
mpc836x_rdk.c:(.init.text+0x5f10): undefined reference to `qe_reset'
make: *** [.tmp_vmlinux1] Error 1

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:14:06 -06:00
Anton Vorontsov 20cfb41ba8 powerpc/83xx: Fix few build errors with CONFIG_QUICC_ENGINE=n
Some 83xx boards were not ready for the optional QUICC Engine support.

This patch fixes following build errors:

arch/powerpc/platforms/built-in.o: In function `flush_disable_caches':
(.text+0xb308): undefined reference to `par_io_data_set'
arch/powerpc/platforms/built-in.o: In function `flush_disable_caches':
(.text+0xb334): undefined reference to `par_io_data_set'
arch/powerpc/platforms/built-in.o: In function `flush_disable_caches':
(.text+0xb408): undefined reference to `qe_ic_get_high_irq'
arch/powerpc/platforms/built-in.o: In function `flush_disable_caches':
(.text+0xb478): undefined reference to `qe_ic_get_low_irq'
arch/powerpc/platforms/built-in.o: In function `mpc832x_spi_init':
mpc832x_rdb.c:(.init.text+0x574c): undefined reference to `par_io_config_pin'
mpc832x_rdb.c:(.init.text+0x5768): undefined reference to `par_io_config_pin'
mpc832x_rdb.c:(.init.text+0x5784): undefined reference to `par_io_config_pin'
mpc832x_rdb.c:(.init.text+0x57a0): undefined reference to `par_io_config_pin'
mpc832x_rdb.c:(.init.text+0x57bc): undefined reference to `par_io_config_pin'
arch/powerpc/platforms/built-in.o:mpc832x_rdb.c:(.init.text+0x57d8): more undefined references to `par_io_config_pin' follow
arch/powerpc/platforms/built-in.o: In function `mpc836x_rdk_init_IRQ':
mpc836x_rdk.c:(.init.text+0x5e84): undefined reference to `qe_ic_init'
arch/powerpc/platforms/built-in.o: In function `mpc836x_rdk_setup_arch':
mpc836x_rdk.c:(.init.text+0x5f10): undefined reference to `qe_reset'
make: *** [.tmp_vmlinux1] Error 1

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:14:05 -06:00
Anton Vorontsov c9dadffbe9 powerpc/fsl_pci: Fix sparse warnings
This patch fixes following sparse warnings:

  CHECK   fsl_pci.c
fsl_pci.c:32:13: warning: symbol 'setup_pci_atmu' was not declared. Should it be static?
fsl_pci.c:89:13: warning: symbol 'setup_pci_cmd' was not declared. Should it be static?
fsl_pci.c:133:12: warning: symbol 'fsl_pcie_check_link' was not declared. Should it be static?

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:14:02 -06:00
Anton Vorontsov 25adde18e6 powerpc/83xx: Add USB Host support for MPC8360E-RDK boards
Simply add the usb node to support USB host on the MPC8360E-RDK
boards.

Currently U-Boot doesn't fill the clock-frequency property for
timer nodes, so for now we have to fill it manually.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:13:47 -06:00
Anton Vorontsov c9c5e52d44 powerpc/83xx: Add USB Host/Gadget support for MPC8360E-MDS boards
- Update the device tree per QE USB bindings;
- Add timer (FSL GTM) node;
- Add gpio-controller node for BCSR13 bank (GPIOs on that bank
  are used to control the USB transceiver);
- Set up other BCSR registers;
- Configure the QE Par IO.

The work is loosely based on Li Yang's patch[1], which was used
to support peripheral mode only.

[1] http://ozlabs.org/pipermail/linuxppc-dev/2008-August/061357.html

The s-o-b line of the original patch preserved here.

Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:13:46 -06:00
Anton Vorontsov 3d64de9c50 powerpc: Implement GPIO driver for simple memory-mapped banks
The driver supports very simple GPIO controllers, that is, when a
controller provides just a 'data' register. Such controllers may be
found in various BCSRs (Board's FPGAs used to control board's
switches, LEDs, chip-selects, Ethernet/USB PHY power, etc).

So far we support only 1-byte GPIO banks. Support for other widths may
be implemented when/if needed.

p.s.
To avoid "made up" compatible entries (like compatible = "simple-gpio"),
boards must call simple_gpiochip_init() to pass the compatible string.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:13:45 -06:00
Anton Vorontsov 1b9e89046c powerpc/qe: Implement QE Pin Multiplexing API
With this API we're able to set a QE pin to the GPIO mode or a dedicated
peripheral function.

The API relies on the fact that QE gpio controllers are registered. If
they aren't, the API won't work (gracefully though).

There is one caveat though: if anybody occupied the node->data before us,
or overwrote it, then bad things will happen. Luckily this is all in the
platform code that we fully control, so this should never happen.

I could implement more checks (for example we could create a list of
successfully registered QE controllers, and compare the node->data in the
qe_pin_request()), but this is unneeded if nobody is going to do silly
things behind our back.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:13:43 -06:00
Anton Vorontsov 78c7705037 powerpc/83xx: Fix sparse warnings in mpc836x_mds.c
This patch fixes following sparse warnings:

  CHECK   mpc836x_mds.c
mpc836x_mds.c:75:12: warning: Using plain integer as NULL pointer
mpc836x_mds.c:79:13: warning: incorrect type in assignment (different address spaces)
mpc836x_mds.c:79:13:    expected unsigned char [usertype] *static [toplevel] bcsr_regs
mpc836x_mds.c:79:13:    got void [noderef] <asn:2>*
mpc836x_mds.c:105:3: warning: incorrect type in argument 1 (different address spaces)
mpc836x_mds.c:105:3:    expected unsigned char volatile [noderef] [usertype] <asn:2>*addr
mpc836x_mds.c:105:3:    got unsigned char [usertype] *
mpc836x_mds.c:105:3: warning: incorrect type in argument 1 (different address spaces)
mpc836x_mds.c:105:3:    expected unsigned char const volatile [noderef] [usertype] <asn:2>*addr
mpc836x_mds.c:105:3:    got unsigned char [usertype] *
mpc836x_mds.c:107:3: warning: incorrect type in argument 1 (different address spaces)
mpc836x_mds.c:107:3:    expected unsigned char volatile [noderef] [usertype] <asn:2>*addr
mpc836x_mds.c:107:3:    got unsigned char [usertype] *
mpc836x_mds.c:107:3: warning: incorrect type in argument 1 (different address spaces)
mpc836x_mds.c:107:3:    expected unsigned char const volatile [noderef] [usertype] <asn:2>*addr
mpc836x_mds.c:107:3:    got unsigned char [usertype] *
mpc836x_mds.c:131:11: warning: incorrect type in argument 1 (different address spaces)
mpc836x_mds.c:131:11:    expected void volatile [noderef] <asn:2>*addr
mpc836x_mds.c:131:11:    got unsigned char [usertype] *static [toplevel] bcsr_regs

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:13:42 -06:00
Anton Vorontsov 81b36a0b6e powerpc/83xx: Fix sparse warnings in board files
This patch fixes following sparse warnings:

  CHECK   83xx/usb.c
83xx/usb.c:205:5: warning: symbol 'mpc837x_usb_cfg' was not declared. Should it be static?
  CHECK   83xx/mpc831x_rdb.c
83xx/mpc831x_rdb.c:45:13: warning: symbol 'mpc831x_rdb_init_IRQ' was not declared. Should it be static?
  CHECK   83xx/mpc832x_rdb.c
83xx/mpc832x_rdb.c:133:13: warning: symbol 'mpc832x_rdb_init_IRQ' was not declared. Should it be static?
  CHECK   83xx/mpc832x_mds.c
83xx/mpc832x_mds.c:68:12: warning: Using plain integer as NULL pointer
83xx/mpc832x_mds.c:72:13: warning: incorrect type in assignment (different address spaces)
83xx/mpc832x_mds.c:72:13:    expected unsigned char [usertype] *static [toplevel] bcsr_regs
83xx/mpc832x_mds.c:72:13:    got void [noderef] <asn:2>*
83xx/mpc832x_mds.c:99:11: warning: incorrect type in argument 1 (different address spaces)
83xx/mpc832x_mds.c:99:11:    expected void volatile [noderef] <asn:2>*addr
83xx/mpc832x_mds.c:99:11:    got unsigned char [usertype] *static [toplevel] bcsr_regs

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:13:41 -06:00
Anton Vorontsov a5dae76a3d powerpc: Implement get_brgfreq() and get_baudrate() stubs
This is needed to not bother with ugly #ifdefs in the drivers.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-30 11:13:40 -06:00
Rusty Russell 33edcf133b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-12-30 08:02:35 +10:30
Linus Torvalds 3c92ec8ae9 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (144 commits)
  powerpc/44x: Support 16K/64K base page sizes on 44x
  powerpc: Force memory size to be a multiple of PAGE_SIZE
  powerpc/32: Wire up the trampoline code for kdump
  powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M
  powerpc/32: Allow __ioremap on RAM addresses for kdump kernel
  powerpc/32: Setup OF properties for kdump
  powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs()
  powerpc: Prepare xmon_save_regs for use with kdump
  powerpc: Remove default kexec/crash_kernel ops assignments
  powerpc: Make default kexec/crash_kernel ops implicit
  powerpc: Setup OF properties for ppc32 kexec
  powerpc/pseries: Fix cpu hotplug
  powerpc: Fix KVM build on ppc440
  powerpc/cell: add QPACE as a separate Cell platform
  powerpc/cell: fix build breakage with CONFIG_SPUFS disabled
  powerpc/mpc5200: fix error paths in PSC UART probe function
  powerpc/mpc5200: add rts/cts handling in PSC UART driver
  powerpc/mpc5200: Make PSC UART driver update serial errors counters
  powerpc/mpc5200: Remove obsolete code from mpc5200 MDIO driver
  powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver
  ...

Fix trivial conflict in drivers/char/Makefile as per Paul's directions
2008-12-28 16:54:33 -08:00
Ilya Yanok ca9153a3a2 powerpc/44x: Support 16K/64K base page sizes on 44x
This adds support for 16k and 64k page sizes on PowerPC 44x processors.

The PGDIR table is much smaller than a page when using 16k or 64k
pages (512 and 32 bytes respectively) so we allocate the PGDIR with
kzalloc() instead of __get_free_pages().

One PTE table covers rather a large memory area when using 16k or 64k
pages (32MB or 512MB respectively), so we can easily put FIXMAP and
PKMAP in the area covered by one PTE table.

Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Vladimir Panfilov <pvr@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-29 09:53:25 +11:00
Hollis Blanchard 6ca4f7494b powerpc: Force memory size to be a multiple of PAGE_SIZE
Ensure that total memory size is page-aligned, because otherwise
mark_bootmem() gets upset.

This error case was triggered by using 64 KiB pages in the kernel
while arch/powerpc/boot/4xx.c arbitrarily reduced the amount of memory
by 4096 (to work around a chip bug that affects the last 256 bytes of
physical memory).

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-29 09:53:14 +11:00
Linus Torvalds 0191b625ca Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
  net: Allow dependancies of FDDI & Tokenring to be modular.
  igb: Fix build warning when DCA is disabled.
  net: Fix warning fallout from recent NAPI interface changes.
  gro: Fix potential use after free
  sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
  sfc: When disabling the NIC, close the device rather than unregistering it
  sfc: SFT9001: Add cable diagnostics
  sfc: Add support for multiple PHY self-tests
  sfc: Merge top-level functions for self-tests
  sfc: Clean up PHY mode management in loopback self-test
  sfc: Fix unreliable link detection in some loopback modes
  sfc: Generate unique names for per-NIC workqueues
  802.3ad: use standard ethhdr instead of ad_header
  802.3ad: generalize out mac address initializer
  802.3ad: initialize ports LACPDU from const initializer
  802.3ad: remove typedef around ad_system
  802.3ad: turn ports is_individual into a bool
  802.3ad: turn ports is_enabled into a bool
  802.3ad: make ntt bool
  ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
  ...

Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
to the conversion to %pI (in this networking merge) and the addition of
doing IPv6 addresses (from the earlier merge of CIFS).
2008-12-28 12:49:40 -08:00
Linus Torvalds 1db2a5c11e Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: (85 commits)
  [S390] provide documentation for hvc_iucv kernel parameter.
  [S390] convert ctcm printks to dev_xxx and pr_xxx macros.
  [S390] convert zfcp printks to pr_xxx macros.
  [S390] convert vmlogrdr printks to pr_xxx macros.
  [S390] convert zfcp dumper printks to pr_xxx macros.
  [S390] convert cpu related printks to pr_xxx macros.
  [S390] convert qeth printks to dev_xxx and pr_xxx macros.
  [S390] convert sclp printks to pr_xxx macros.
  [S390] convert iucv printks to dev_xxx and pr_xxx macros.
  [S390] convert ap_bus printks to pr_xxx macros.
  [S390] convert dcssblk and extmem printks messages to pr_xxx macros.
  [S390] convert monwriter printks to pr_xxx macros.
  [S390] convert s390 debug feature printks to pr_xxx macros.
  [S390] convert monreader printks to pr_xxx macros.
  [S390] convert appldata printks to pr_xxx macros.
  [S390] convert setup printks to pr_xxx macros.
  [S390] convert hypfs printks to pr_xxx macros.
  [S390] convert time printks to pr_xxx macros.
  [S390] convert cpacf printks to pr_xxx macros.
  [S390] convert cio printks to pr_xxx macros.
  ...
2008-12-28 12:33:21 -08:00
Linus Torvalds a39b863342 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (31 commits)
  sched: fix warning in fs/proc/base.c
  schedstat: consolidate per-task cpu runtime stats
  sched: use RCU variant of list traversal in for_each_leaf_rt_rq()
  sched, cpuacct: export percpu cpuacct cgroup stats
  sched, cpuacct: refactoring cpuusage_read / cpuusage_write
  sched: optimize update_curr()
  sched: fix wakeup preemption clock
  sched: add missing arch_update_cpu_topology() call
  sched: let arch_update_cpu_topology indicate if topology changed
  sched: idle_balance() does not call load_balance_newidle()
  sched: fix sd_parent_degenerate on non-numa smp machine
  sched: add uid information to sched_debug for CONFIG_USER_SCHED
  sched: move double_unlock_balance() higher
  sched: update comment for move_task_off_dead_cpu
  sched: fix inconsistency when redistribute per-cpu tg->cfs_rq shares
  sched/rt: removed unneeded defintion
  sched: add hierarchical accounting to cpu accounting controller
  sched: include group statistics in /proc/sched_debug
  sched: rename SCHED_NO_NO_OMIT_FRAME_POINTER => SCHED_OMIT_FRAME_POINTER
  sched: clean up SCHED_CPUMASK_ALLOC
  ...
2008-12-28 12:27:58 -08:00
Linus Torvalds b0f4b285d7 Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (241 commits)
  sched, trace: update trace_sched_wakeup()
  tracing/ftrace: don't trace on early stage of a secondary cpu boot, v3
  Revert "x86: disable X86_PTRACE_BTS"
  ring-buffer: prevent false positive warning
  ring-buffer: fix dangling commit race
  ftrace: enable format arguments checking
  x86, bts: memory accounting
  x86, bts: add fork and exit handling
  ftrace: introduce tracing_reset_online_cpus() helper
  tracing: fix warnings in kernel/trace/trace_sched_switch.c
  tracing: fix warning in kernel/trace/trace.c
  tracing/ring-buffer: remove unused ring_buffer size
  trace: fix task state printout
  ftrace: add not to regex on filtering functions
  trace: better use of stack_trace_enabled for boot up code
  trace: add a way to enable or disable the stack tracer
  x86: entry_64 - introduce FTRACE_ frame macro v2
  tracing/ftrace: add the printk-msg-only option
  tracing/ftrace: use preempt_enable_no_resched_notrace in ring_buffer_time_stamp()
  x86, bts: correctly report invalid bts records
  ...

Fixed up trivial conflict in scripts/recordmcount.pl due to SH bits
being already partly merged by the SH merge.
2008-12-28 12:21:10 -08:00
Rusty Russell 86c6f274f5 cpumask: powerpc: Introduce cpumask_of_{node,pcibus} to replace {node,pcibus}_to_cpumask
Impact: New APIs

The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these
return a pointer to a struct cpumask.  Part of removing cpumasks from
the stack.

(Also replaces powerpc internal uses of node_to_cpumask).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2008-12-26 22:23:39 +10:30
Ingo Molnar 4e202284e6 Merge branch 'sched/urgent'; commit 'v2.6.28' into sched/core 2008-12-25 13:42:23 +01:00
Martin Schwidefsky fc5243d98a [S390] arch_setup_additional_pages arguments
arch_setup_additional_pages currently gets two arguments, the binary
format descripton and an indication if the process uses an executable
stack or not. The second argument is not used by anybody, it could
be removed without replacement.

What actually does make sense is to pass an indication if the process
uses the elf interpreter or not. The glibc code will not use anything
from the vdso if the process does not use the dynamic linker, so for
statically linked binaries the architecture backend can choose not
to map the vdso.

Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2008-12-25 13:38:54 +01:00
James Morris cbacc2c7f0 Merge branch 'next' into for-linus 2008-12-25 11:40:09 +11:00
Dale Farnsworth f8f50b1bdd powerpc/32: Wire up the trampoline code for kdump
Wire up the trampoline code for ppc32 to relay exceptions from the
vectors at address 0 to vectors at address 32MB, and modify Kconfig
to enable Kdump support for all classic powerpcs.

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:29 +11:00
Dale Farnsworth ccdcef72c2 powerpc/32: Add the ability for a classic ppc kernel to be loaded at 32M
Add the ability for a classic ppc kernel to be loaded at an address
of 32MB.  This done by fixing a few places that assume we are loaded
at address 0, and by changing several uses of KERNELBASE to use
PAGE_OFFSET, instead.

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:29 +11:00
Anton Vorontsov 01695a9687 powerpc/32: Allow __ioremap on RAM addresses for kdump kernel
While for debugging it is good to catch bogus users of ioremap, though
for kdump support it is more convenient to use __ioremap for
copy_oldmem_page() (exactly as we do for PPC64 currently).

Note that copy_oldmem_page() calls __ioremap with flags set to '0',
so it should be safe with the regard to the caches.

The other option is to use kmap_atomic_pfn()[1], but it will not work
for kernels compiled without HIGHMEM.

That is, on a board with 256MB RAM and crashkernel=64M@32M case, the
!HIGHMEM capturing kernel maps 0-96M range, which does not include all
the memory needed to capture the dump. And, obviously, accessing
anything upper than 96M will cause faults.

[1] http://ozlabs.org/pipermail/linuxppc-dev/2007-November/046747.html

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:29 +11:00
Dale Farnsworth 6f29c3298b powerpc/32: Setup OF properties for kdump
Refactor the setting of kdump OF properties, moving the common code
from machine_kexec_64.c to machine_kexec.c where it can be used on
both ppc64 and ppc32.  This will be needed for kdump to work on ppc32
platforms.

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:29 +11:00
Anton Vorontsov 7375331388 powerpc/32/kdump: Implement crash_setup_regs() using ppc_save_regs()
This replaces the dummy crash_setup_regs function with full-fledged
crash_setup_regs implementation.  On PPC32 we simply use the new
ppc_save_regs function to dump the registers.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:28 +11:00
Anton Vorontsov 322b439455 powerpc: Prepare xmon_save_regs for use with kdump
Today the arch/powerpc/xmon/setjmp.S file contains only the
xmon_save_regs function.  We want to use it for kdump purposes, so
let's move the file into arch/powerpc/kernel/ and give the function a
more generic name (ppc_save_regs).

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:28 +11:00
Anton Vorontsov 5be8554875 powerpc: Remove default kexec/crash_kernel ops assignments
Default ops are implicit now.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:28 +11:00
Anton Vorontsov 77733f8a33 powerpc: Make default kexec/crash_kernel ops implicit
This removes the need for each platform to specify default kexec and
crash kernel ops, thus effectively adds a working kexec support for
most 6xx/7xx/7xxx-based boards.

Platforms that can't cope with default ops will explode in some weird
way (a hang or reboot is most likely), which means that the board's
kexec support should be fixed or blacklisted via dummy _prepare
callback returning -ENOSYS.

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:28 +11:00
Dale Farnsworth 2e8e4f5b80 powerpc: Setup OF properties for ppc32 kexec
Refactor the setting of kexec OF properties, moving the common code
from machine_kexec_64.c to machine_kexec.c where it can be used on
both ppc64 and ppc32.  This is needed for kexec to work on ppc32
platforms.

Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:28 +11:00
Sebastien Dugue b906cfa397 powerpc/pseries: Fix cpu hotplug
Currently, pseries_cpu_die() calls msleep() while polling RTAS for
the status of the dying cpu.

However, if the cpu that is going down also happens to be the one
doing the tick then we're hosed as the tick_do_timer_cpu 'baton' is
only passed later on in tick_shutdown() when _cpu_down() does the
CPU_DEAD notification.  Therefore jiffies won't be updated anymore.

This replaces that msleep() with a cpu_relax() to make sure we're not
going to schedule at that point.

With this patch my test box survives a 100k iterations hotplug stress
test on _all_ cpus, whereas without it, it quickly dies after ~50
iterations.

Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 15:13:27 +11:00
Paul Mackerras fad7b9b51e powerpc: Fix KVM build on ppc440
Commit 2a4aca1144 ("powerpc/mm: Split
low level tlb invalidate for nohash processors") changed a call to
_tlbia to _tlbil_all but didn't include the header that defines
_tlbil_all, leading to a build failure on 440 if KVM is enabled.
This fixes it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 14:58:30 +11:00
Benjamin Krill def434c231 powerpc/cell: add QPACE as a separate Cell platform
Since the QPACE (Chromodynamics Parallel Computing on the
Cell Broadband Engine) platform doesn't use a iommu, doesn't
have PCI devices and a MPIC much lesser setup and
configurations are needed. So far all devices are detected
as OF device. A notifier function is used to set the dma_ops
for the of_platform bus. Further this patch splits the
PPC_CELL_NATIVE into PPC_CELL_COMMON which are parts that are
shared with the QPACE platform and the rest.

Signed-off-by: Benjamin Krill <ben@codiert.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2008-12-22 22:19:19 +01:00
Arnd Bergmann e68558ddcd powerpc/cell: fix build breakage with CONFIG_SPUFS disabled
CBE_THERM and OPROFILE_CELL both cannot be built without
SPU_FS disabled, so make the dependency explicit.

Reported-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2008-12-22 22:08:26 +01:00
Wolfram Sang aec739e010 powerpc/mpc5200: add rts/cts handling in PSC UART driver
Add RTS/CTS-support for the PSC of the MPC5200B. Tested with a Phytec
MPC5200B-IO.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-12-21 02:54:32 -07:00
Tim Yamin 6b61e69e7b powerpc/mpc5200: Add MDMA/UDMA support to MPC5200 ATA driver
This patch adds MDMA/UDMA support using BestComm for DMA on the MPC5200
platform.  Based heavily on previous work by Freescale (Bernard Kuhn,
John Rigby) and Domen Puncer.

With this patch, a SanDisk Extreme IV CF card gets read speeds of
approximately 26.70 MB/sec.

Signed-off-by: Tim Yamin <plasm@roo.me.uk>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-12-21 02:54:29 -07:00
Grant Likely aaab5e83c2 powerpc/mpc5200: Disable bestcomm prefetching when ATA DMA enabled
When ATA DMA is enabled, bestcomm prefetching does not work.  This
patch adds a function to disable bestcomm prefetch when the ATA
Bestcomm task is initialized.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-12-21 02:54:28 -07:00
Tim Yamin e4efe3c271 powerpc/mpc5200: Bestcomm fixes to ATA support
1) ata.h has dst_pa in the wrong place (needs to match what the BestComm
   task microcode in bcom_ata_task.c expects); fix it.

2) The BestComm ATA task priority was changed to maximum in bestcomm_priv.h;
   this fixes a deadlock issue experienced with heavy DMA occurring on
   both the ATA and Ethernet BestComm tasks, e.g. when downloading a large
   file over a LAN to disk.

Signed-off-by: Tim Yamin <plasm@roo.me.uk>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-12-21 02:54:28 -07:00
Grant Likely 622882455a powerpc/mpc5200: Bugfix on handling variable sized buffer descriptors
The buffer descriptors for the ATA BestComm task are larger than the
current definition for bcom_bd.  This causes problems because the
various bcom_... functions dereference the buffer descriptor pointer
by using the array operator which doesn't work when the buffer
descriptors are a different size.

This patch adds the bcom_get_bd() function which uses the value in
bcom_task.bd_size to calculate the offset into the BD table.  This
patch also changes the definition of bcom_bd to specify a data size
of 0 instead of 1 so that it will never work if anyone attempts to
dereference the bd list as an array (as opposed to something that
might work even though it is wrong).

Finally, this patch moves the definition of bcom_bd up in the file
to eliminate a forward declaration.

Based on patch originally written by Tim Yamin.

Signed-off-by: Tim Yamin <plasm@roo.me.uk>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-12-21 02:54:27 -07:00
Grant Likely dd952cbb3d powerpc/mpc5200: Make internal 5200 PIC the default interrupt controller
The MPC5200 internal interrupt controller setup function needs to set
the default interrupt controller when it is called.  Without this
irq_create_of_mapping() cannot be called without first determining
the pointer to the irq controller (ie. call with controller = NULL).

Reported-by: Steven Cavanagh <scavanagh@secretlab.ca>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-12-21 02:54:27 -07:00
Grant Likely bcb73f5611 powerpc/mpc5200: Document and tidy irq driver
This patch adds documentation to the mpc5200 interrupt controller
driver and cleans up some minor coding conventions.  It also moves the
contents of mpc52xx_pic.h into the driver proper (except for a small
common bit that is moved to the common mpc52xx.h) because the
information encoded there is not required by any other part of kernel
code.  Finally for code readability sake, the L2_OFFSET shift value
is removed because the code using it resolves to a noop.

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-12-21 02:54:26 -07:00
Benjamin Herrenschmidt a14953597b powerpc: Fix missing 'blr' in _tlbia()
Rework to MMU code dropped a much missed 'blr' instruction.

Brown-Paper-Bag-Worn-By: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2008-12-21 02:54:25 -07:00
Scott Wood 49e6e3f1ae powerpc/bootwrapper: Use the child-bus #address-cells to decide which range entry to use
The correct #address-cells was still used for the actual translation,
so the impact is only a possibility of choosing the wrong range entry
or failing to find any match.  Most common cases were not affected.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:17 +11:00
Grant Erickson e14d77490d powerpc: Const-qualify Device Node Argument to DCR Resource Extent API
Add const qualifier to device_node argument for
dcr_resource_{start,len} as of_get_property also const-qualifies this
argument.

Signed-off-by: Grant Erickson <gerickson@nuovations.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt 9dce3ce5c5 powerpc/44x: 44x TLB doesn't need "Guarded" set for all pages
After discussing with chip designers, it appears that it's not
necessary to set G everywhere on 440 cores. The various core
errata related to prefetch should be sorted out by firmware by
disabling icache prefetching in CCR0. We add the workaround to
the kernel however just in case oooold firmwares don't do it.

This is valid for -all- 4xx core variants. Later ones hard wire
the absence of prefetch but it doesn't harm to clear the bits
in CCR0 (they should already be cleared anyway).

We still leave G=1 on the linear mapping for now, we need to
stop over-mapping RAM to be able to remove it.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt 64b3d0e812 powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in
in the hash code based on some CPU feature bit.  We also manipulate
_PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places.

This changes the logic so that instead, the PTE now contains
_PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms
that need it.  The hash code clears it if the feature bit is not set.

It also adds some clean accessors to setup various valid combinations
of access flags and change various bits of code to use them instead.

This should help having the PTE actually containing the bit
combinations that we really want.

I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead
set it explicitely from the TLB miss.  I will ultimately remove it
completely as it appears that it might not be needed after all
but in the meantime, having it in the TLB miss makes things a
lot easier.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt 7752035180 powerpc/mm: Runtime allocation of mmu context maps for nohash CPUs
This makes the MMU context code used for CPUs with no hash table
(except 603) dynamically allocate the various maps used to track
the state of contexts.

Only the main free map and CPU 0 stale map are allocated at boot
time.  Other CPU maps are allocated when those CPUs are brought up
and freed if they are unplugged.

This also moves the initialization of the MMU context management
slightly later during the boot process, which should be fine as
it's really only needed when userland if first started anyways.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt 760ec0e02d powerpc/44x: No need to mask MSR:CE, ME or DE in _tlbil_va on 440
The handlers for Critical, Machine Check or Debug interrupts
will save and restore MMUCR nowadays, thus we only need to
disable normal interrupts when invalidating TLB entries.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt 2a4aca1144 powerpc/mm: Split low level tlb invalidate for nohash processors
Currently, the various forms of low level TLB invalidations are all
implemented in misc_32.S for 32-bit processors, in a fairly scary
mess of #ifdef's and with interesting duplication such as a whole
bunch of code for FSL _tlbie and _tlbia which are no longer used.

This moves things around such that _tlbie is now defined in
hash_low_32.S and is only used by the 32-bit hash code, and all
nohash CPUs use the various _tlbil_* forms that are now moved to
a new file, tlb_nohash_low.S.

I moved all the definitions for that stuff out of
include/asm/tlbflush.h as they are really internal mm stuff, into
mm/mmu_decl.h

The code should have no functional changes.  I kept some variants
inline for trivial forms on things like 40x and 8xx.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt f048aace29 powerpc/mm: Add SMP support to no-hash TLB handling
This commit moves the whole no-hash TLB handling out of line into a
new tlb_nohash.c file, and implements some basic SMP support using
IPIs and/or broadcast tlbivax instructions.

Note that I'm using local invalidations for D->I cache coherency.

At worst, if another processor is trying to execute the same and
has the old entry in its TLB, it will just take a fault and re-do
the TLB flush locally (it won't re-do the cache flush in any case).

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt 7c03d653cd powerpc/mm: Introduce MMU features
We're soon running out of CPU features and I need to add some new
ones for various MMU related bits, so this patch separates the MMU
features from the CPU features.  I moved over the 32-bit MMU related
ones, added base features for MMU type families, but didn't move
over any 64-bit only feature yet.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Benjamin Herrenschmidt 2ca8cf7389 powerpc/mm: Rework context management for CPUs with no hash table
This reworks the context management code used by 4xx,8xx and
freescale BookE.  It adds support for SMP by implementing a
concept of stale context map to lazily flush the TLB on
processors where a context may have been invalidated.  This
also contains the ground work for generalizing such lazy TLB
flushing by just picking up a new PID and marking the old one
stale.  This will be implemented later.

This is a first implementation that uses a global spinlock.

Ideally, we should try to get at least the fast path (context ID
already assigned) lockless or limited to a per context lock,
but for now this will do.

I tried to keep the UP case reasonably simple to avoid adding
too much overhead to 8xx which does a lot of context stealing
since it effectively has only 16 PIDs available.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt 5e696617c4 powerpc/mm: Split mmu_context handling
This splits the mmu_context handling between 32-bit hash based
processors, 64-bit hash based processors and everybody else.  This is
preliminary work for adding SMP support for BookE processors.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Benjamin Herrenschmidt 6d2170be45 powerpc/4xx: Extended DCR support v2
This adds supports to the "extended" DCR addressing via the indirect
mfdcrx/mtdcrx instructions supported by some 4xx cores (440H6 and
later).

I enabled the feature for now only on AMCC 460 chips.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Brian King fecba96268 powerpc: Add reboot notifier to Collaborative Memory Manager
When running Active Memory Sharing, pages can get marked as
"loaned" with the hypervisor by the CMM driver. This state gets
cleared by the system firmware when rebooting the partition.
When using kexec to boot a new kernel, this state never gets
cleared and the hypervisor and CMM driver can get out of sync
with respect to the number of pages currently marked "loaned".
Fix this by adding a reboot notifier to the CMM driver to deflate
the balloon and mark all pages as active.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Brian King 2218108e18 powerpc: Disable Collaborative Memory Manager for kdump
When running Active Memory Sharing, the Collaborative Memory Manager
(CMM) may mark some pages as "loaned" with the hypervisor.
Periodically, the CMM will query the hypervisor for a loan request,
which is a single signed value.  When kexec'ing into a kdump kernel,
the CMM driver in the kdump kernel is not aware of the pages the
previous kernel had marked as "loaned", so the hypervisor and the CMM
driver are out of sync.  This results in the CMM driver getting a
negative loan request, which can then get treated as a large unsigned
value and can cause kdump to hang due to the CMM driver inflating too
large.  Since there really is no clean way for the CMM driver in the
kdump kernel to clean this up, simply disable CMM in the kdump kernel.
This fixes hangs we were seeing doing kdump with AMS.

Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Stephen Rothwell 5d84e4bee0 powerpc/iseries: viodasd needs to depend on CONFIG_BLOCK
Otherwise you get lot of errors like these:

drivers/block/viodasd.c:72: error: dereferencing pointer to incomplete type
drivers/block/viodasd.c: In function 'viodasd_open':
drivers/block/viodasd.c:135: error: dereferencing pointer to incomplete type
drivers/block/viodasd.c: In function 'viodasd_release':
drivers/block/viodasd.c:184: error: dereferencing pointer to incomplete type
drivers/block/viodasd.c: In function 'viodasd_getgeo':
drivers/block/viodasd.c:209: error: dereferencing pointer to incomplete type
drivers/block/viodasd.c:214: error: implicit declaration of function 'get_capacity'
drivers/block/viodasd.c: At top level:
drivers/block/viodasd.c:222: error: variable 'viodasd_fops' has initializer but incomplete type
drivers/block/viodasd.c:223: error: unknown field 'owner' specified in initializer

Discovered by a randconfig build.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Tony Breeds 532774ec7f powerpc: Pass a valid token to rtas_call() in phyp-dump code
ibm_configure_kernel_dump is passed as the token to rtas_call() is
never initialised.  This sets it to something sane.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Nathan Lynch <ntl@pobox.com>
Acked-by: Manish Ahuja <mahujam@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:15 +11:00
Tony Breeds 7a2eab0d4e powerpc: Protect against NULL pointer deref in phyp-dump code
print_dump_header() will be called at least once with a NULL pointer in
a normal boot sequence.  If DEBUG is defined then we will dereference
the pointer and crash.  Add a quick fix to exit early in the NULL pointer
case.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Acked-by: Manish Ahuja <mahujam@gmail.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:14 +11:00
David Howells 8168b5400b powerpc: Rename struct vm_region to avoid conflict with NOMMU
Rename PowerPC's struct vm_region so that I can introduce my own
global version for NOMMU.  It's feasible that the PowerPC version may
wish to use my global one instead.

The NOMMU vm_region struct defines areas of the physical memory map
that are under mmap.  This may include chunks of RAM or regions of
memory mapped devices, such as flash.  It is also used to retain
copies of file content so that shareable private memory mappings of
files can be made.  As such, it may be compatible with what is
described in the banner comment for PowerPC's vm_region struct.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:14 +11:00
Nathan Lynch 13ba3c0092 powerpc: Convert sysfs cache code to of_find_next_cache_node()
Using the common code means that more complete cache information will
provided in sysfs on platforms that don't use the l2-cache property
convention.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:14 +11:00
Nathan Lynch b2ea25b958 powerpc: Convert cpu_to_l2cache() to of_find_next_cache_node()
The smp code uses cache information to populate cpu_core_map; change
it to use common code for cache lookup.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:14 +11:00
Nathan Lynch e523f723d6 powerpc: Add of_find_next_cache_node()
We have more than one piece of code that looks up cache nodes manually
using the "l2-cache" property.  Add a common helper routine which does
this and handles ePAPR's "next-level-cache" property as well as
powermac.

Signed-off-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:14 +11:00
Ingo Molnar 30cd324e97 Merge branches 'tracing/ftrace', 'tracing/ring-buffer' and 'tracing/urgent' into tracing/core
Conflicts:
	include/linux/ftrace.h
2008-12-19 09:42:40 +01:00
Paul E. McKenney 64db4cfff9 "Tree RCU": scalable classic RCU implementation
This patch fixes a long-standing performance bug in classic RCU that
results in massive internal-to-RCU lock contention on systems with
more than a few hundred CPUs.  Although this patch creates a separate
flavor of RCU for ease of review and patch maintenance, it is intended
to replace classic RCU.

This patch still handles stress better than does mainline, so I am still
calling it ready for inclusion.  This patch is against the -tip tree.
Nevertheless, experience on an actual 1000+ CPU machine would still be
most welcome.

Most of the changes noted below were found while creating an rcutiny
(which should permit ejecting the current rcuclassic) and while doing
detailed line-by-line documentation.

Updates from v9 (http://lkml.org/lkml/2008/12/2/334):

o	Fixes from remainder of line-by-line code walkthrough,
	including comment spelling, initialization, undesirable
	narrowing due to type conversion, removing redundant memory
	barriers, removing redundant local-variable initialization,
	and removing redundant local variables.

	I do not believe that any of these fixes address the CPU-hotplug
	issues that Andi Kleen was seeing, but please do give it a whirl
	in case the machine is smarter than I am.

	A writeup from the walkthrough may be found at the following
	URL, in case you are suffering from terminal insomnia or
	masochism:

	http://www.kernel.org/pub/linux/kernel/people/paulmck/tmp/rcutree-walkthrough.2008.12.16a.pdf

o	Made rcutree tracing use seq_file, as suggested some time
	ago by Lai Jiangshan.

o	Added a .csv variant of the rcudata debugfs trace file, to allow
	people having thousands of CPUs to drop the data into
	a spreadsheet.	Tested with oocalc and gnumeric.  Updated
	documentation to suit.

Updates from v8 (http://lkml.org/lkml/2008/11/15/139):

o	Fix a theoretical race between grace-period initialization and
	force_quiescent_state() that could occur if more than three
	jiffies were required to carry out the grace-period
	initialization.  Which it might, if you had enough CPUs.

o	Apply Ingo's printk-standardization patch.

o	Substitute local variables for repeated accesses to global
	variables.

o	Fix comment misspellings and redundant (but harmless) increments
	of ->n_rcu_pending (this latter after having explicitly added it).

o	Apply checkpatch fixes.

Updates from v7 (http://lkml.org/lkml/2008/10/10/291):

o	Fixed a number of problems noted by Gautham Shenoy, including
	the cpu-stall-detection bug that he was having difficulty
	convincing me was real.  ;-)

o	Changed cpu-stall detection to wait for ten seconds rather than
	three in order to reduce false positive, as suggested by Ingo
	Molnar.

o	Produced a design document (http://lwn.net/Articles/305782/).
	The act of writing this document uncovered a number of both
	theoretical and "here and now" bugs as noted below.

o	Fix dynticks_nesting accounting confusion, simplify WARN_ON()
	condition, fix kerneldoc comments, and add memory barriers
	in dynticks interface functions.

o	Add more data to tracing.

o	Remove unused "rcu_barrier" field from rcu_data structure.

o	Count calls to rcu_pending() from scheduling-clock interrupt
	to use as a surrogate timebase should jiffies stop counting.

o	Fix a theoretical race between force_quiescent_state() and
	grace-period initialization.  Yes, initialization does have to
	go on for some jiffies for this race to occur, but given enough
	CPUs...

Updates from v6 (http://lkml.org/lkml/2008/9/23/448):

o	Fix a number of checkpatch.pl complaints.

o	Apply review comments from Ingo Molnar and Lai Jiangshan
	on the stall-detection code.

o	Fix several bugs in !CONFIG_SMP builds.

o	Fix a misspelled config-parameter name so that RCU now announces
	at boot time if stall detection is configured.

o	Run tests on numerous combinations of configurations parameters,
	which after the fixes above, now build and run correctly.

Updates from v5 (http://lkml.org/lkml/2008/9/15/92, bad subject line):

o	Fix a compiler error in the !CONFIG_FANOUT_EXACT case (blew a
	changeset some time ago, and finally got around to retesting
	this option).

o	Fix some tracing bugs in rcupreempt that caused incorrect
	totals to be printed.

o	I now test with a more brutal random-selection online/offline
	script (attached).  Probably more brutal than it needs to be
	on the people reading it as well, but so it goes.

o	A number of optimizations and usability improvements:

	o	Make rcu_pending() ignore the grace-period timeout when
		there is no grace period in progress.

	o	Make force_quiescent_state() avoid going for a global
		lock in the case where there is no grace period in
		progress.

	o	Rearrange struct fields to improve struct layout.

	o	Make call_rcu() initiate a grace period if RCU was
		idle, rather than waiting for the next scheduling
		clock interrupt.

	o	Invoke rcu_irq_enter() and rcu_irq_exit() only when
		idle, as suggested by Andi Kleen.  I still don't
		completely trust this change, and might back it out.

	o	Make CONFIG_RCU_TRACE be the single config variable
		manipulated for all forms of RCU, instead of the prior
		confusion.

	o	Document tracing files and formats for both rcupreempt
		and rcutree.

Updates from v4 for those missing v5 given its bad subject line:

o	Separated dynticks interface so that NMIs and irqs call separate
	functions, greatly simplifying it.  In particular, this code
	no longer requires a proof of correctness.  ;-)

o	Separated dynticks state out into its own per-CPU structure,
	avoiding the duplicated accounting.

o	The case where a dynticks-idle CPU runs an irq handler that
	invokes call_rcu() is now correctly handled, forcing that CPU
	out of dynticks-idle mode.

o	Review comments have been applied (thank you all!!!).
	For but one example, fixed the dynticks-ordering issue that
	Manfred pointed out, saving me much debugging.  ;-)

o	Adjusted rcuclassic and rcupreempt to handle dynticks changes.

Attached is an updated patch to Classic RCU that applies a hierarchy,
greatly reducing the contention on the top-level lock for large machines.
This passes 10-hour concurrent rcutorture and online-offline testing on
128-CPU ppc64 without dynticks enabled, and exposes some timekeeping
bugs in presence of dynticks (exciting working on a system where
"sleep 1" hangs until interrupted...), which were fixed in the
2.6.27 kernel.  It is getting more reliable than mainline by some
measures, so the next version will be against -tip for inclusion.
See also Manfred Spraul's recent patches (or his earlier work from
2004 at http://marc.info/?l=linux-kernel&m=108546384711797&w=2).
We will converge onto a common patch in the fullness of time, but are
currently exploring different regions of the design space.  That said,
I have already gratefully stolen quite a few of Manfred's ideas.

This patch provides CONFIG_RCU_FANOUT, which controls the bushiness
of the RCU hierarchy.  Defaults to 32 on 32-bit machines and 64 on
64-bit machines.  If CONFIG_NR_CPUS is less than CONFIG_RCU_FANOUT,
there is no hierarchy.  By default, the RCU initialization code will
adjust CONFIG_RCU_FANOUT to balance the hierarchy, so strongly NUMA
architectures may choose to set CONFIG_RCU_FANOUT_EXACT to disable
this balancing, allowing the hierarchy to be exactly aligned to the
underlying hardware.  Up to two levels of hierarchy are permitted
(in addition to the root node), allowing up to 16,384 CPUs on 32-bit
systems and up to 262,144 CPUs on 64-bit systems.  I just know that I
am going to regret saying this, but this seems more than sufficient
for the foreseeable future.  (Some architectures might wish to set
CONFIG_RCU_FANOUT=4, which would limit such architectures to 64 CPUs.
If this becomes a real problem, additional levels can be added, but I
doubt that it will make a significant difference on real hardware.)

In the common case, a given CPU will manipulate its private rcu_data
structure and the rcu_node structure that it shares with its immediate
neighbors.  This can reduce both lock and memory contention by multiple
orders of magnitude, which should eliminate the need for the strange
manipulations that are reported to be required when running Linux on
very large systems.

Some shortcomings:

o	More bugs will probably surface as a result of an ongoing
	line-by-line code inspection.

	Patches will be provided as required.

o	There are probably hangs, rcutorture failures, &c.  Seems
	quite stable on a 128-CPU machine, but that is kind of small
	compared to 4096 CPUs.  However, seems to do better than
	mainline.

	Patches will be provided as required.

o	The memory footprint of this version is several KB larger
	than rcuclassic.

	A separate UP-only rcutiny patch will be provided, which will
	reduce the memory footprint significantly, even compared
	to the old rcuclassic.  One such patch passes light testing,
	and has a memory footprint smaller even than rcuclassic.
	Initial reaction from various embedded guys was "it is not
	worth it", so am putting it aside.

Credits:

o	Manfred Spraul for ideas, review comments, and bugs spotted,
	as well as some good friendly competition.  ;-)

o	Josh Triplett, Ingo Molnar, Peter Zijlstra, Mathieu Desnoyers,
	Lai Jiangshan, Andi Kleen, Andy Whitcroft, and Andrew Morton
	for reviews and comments.

o	Thomas Gleixner for much-needed help with some timer issues
	(see patches below).

o	Jon M. Tollefson, Tim Pepper, Andrew Theurer, Jose R. Santos,
	Andy Whitcroft, Darrick Wong, Nishanth Aravamudan, Anton
	Blanchard, Dave Kleikamp, and Nathan Lynch for keeping machines
	alive despite my heavy abuse^Wtesting.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-18 21:56:04 +01:00
Ingo Molnar b9974dc6bd Merge branch 'linus' into cpus4096 2008-12-18 11:48:30 +01:00
Paul Mackerras c280266a32 Merge branch 'linux-2.6' into next 2008-12-18 11:06:12 +11:00
Guillaume Knispel af4d364386 powerpc: Fix corruption error in rh_alloc_fixed()
There is an error in rh_alloc_fixed() of the Remote Heap code:
If there is at least one free block blk won't be NULL at the end of the
search loop, so -ENOMEM won't be returned and the else branch of
"if (bs == s || be == e)" will be taken, corrupting the management
structures.

Signed-off-by: Guillaume Knispel <gknispel@proformatique.com>
Acked-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-17 10:06:14 -06:00
Dave Liu 28707af01b powerpc/fsl-booke: Fix the miss interrupt restore
The commit e5e774d883
powerpc/fsl-booke: Fix problem with _tlbil_va being interrupted
introduce one issue. that casue the problem like this:

Kernel BUG at c00b19fc [verbose debug info unavailable]
Oops: Exception in kernel mode, sig: 5 [#1]
MPC8572 DS
Modules linked in:
NIP: c00b19fc LR: c00b1c34 CTR: c0064e88
REGS: ef02b7b0 TRAP: 0700   Not tainted  (2.6.28-rc8-00057-g1bda712)
MSR: 00021000 <ME>  CR: 44048028  XER: 20000000
TASK = ef02c000[1] 'init' THREAD: ef02a000
GPR00: 00000001 ef02b860 ef02c000 eec201a0 c0dec2c0 00000000 000078a1 00000400
GPR08: c00b4e40 000078a1 c048ec00 a1780000 44048028 ecd26917 00000001 ef02b948
GPR16: ffffffea 0000020c 00000000 00000000 00000003 0000000a 00000000 000078a1
GPR24: eec201a0 00000000 ed849000 00000400 ef02b95c 00000001 ef02b978 ef02b984
NIP [c00b19fc] __find_get_block+0x24/0x238
LR [c00b1c34] __getblk+0x24/0x2a0
Call Trace:
[ef02b860] [c017b768] generic_make_request+0x290/0x328 (unreliable)
[ef02b8b0] [c00b1c34] __getblk+0x24/0x2a0
[ef02b910] [c00b4ae4] __bread+0x14/0xf8
[ef02b920] [c00fc228] ext2_get_branch+0xf0/0x138
[ef02b940] [c00fcc88] ext2_get_block+0xb8/0x828
[ef02ba00] [c00bbdc8] do_mpage_readpage+0x188/0x808
[ef02bac0] [c00bc5b4] mpage_readpages+0xec/0x144
[ef02bb50] [c00fba38] ext2_readpages+0x24/0x34
[ef02bb60] [c006ade0] __do_page_cache_readahead+0x150/0x230
[ef02bbb0] [c0064bdc] filemap_fault+0x31c/0x3e0
[ef02bbf0] [c00728b8] __do_fault+0x60/0x5b0
[ef02bc50] [c0011e0c] do_page_fault+0x2d8/0x4c4
[ef02bd10] [c000ed90] handle_page_fault+0xc/0x80
[ef02bdd0] [c00c7adc] set_brk+0x74/0x9c
[ef02bdf0] [c00c9274] load_elf_binary+0x70c/0x1180
[ef02be70] [c00945f0] search_binary_handler+0xa8/0x274
[ef02bea0] [c0095818] do_execve+0x19c/0x1d4
[ef02bed0] [c000766c] sys_execve+0x58/0x84
[ef02bef0] [c000e950] ret_from_syscall+0x0/0x3c
[ef02bfb0] [c009c6fc] sys_dup+0x24/0x6c
[ef02bfc0] [c0001e04] init_post+0xb0/0xf0
[ef02bfd0] [c046c1ac] kernel_init+0xcc/0xf4
[ef02bff0] [c000e6d0] kernel_thread+0x4c/0x68
Instruction dump:
4bffffa4 813f000c 4bffffac 9421ffb0 7c0802a6 7d800026 90010054 bf210034
91810030 7c0000a6 68008000 54008ffe <0f000000> 3d20c04e 3b29ffb8 38000008

The issue was the beqlr returns early but we haven't reenabled interrupts.

Signed-off-by: Dave Liu <daveliu@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
2008-12-17 10:06:13 -06:00
Ingo Molnar 1f3f424a6b Merge branch 'linus' into cpus4096 2008-12-17 13:07:48 +01:00
Paul Mackerras eddce368f9 Merge branch 'next' of master.kernel.org:/pub/scm/linux/kernel/git/jwboyer/powerpc-4xx into next 2008-12-17 11:01:43 +11:00
Andy Fleming b31a1d8b41 gianfar: Convert gianfar to an of_platform_driver
Does the same for the accompanying MDIO driver, and then modifies the TBI
configuration method.  The old way used fields in einfo, which no longer
exists.  The new way is to create an MDIO device-tree node for each instance
of gianfar, and create a tbi-handle property to associate ethernet controllers
with the TBI PHYs they are connected to.

Signed-off-by: Andy Fleming <afleming@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-12-16 15:29:15 -08:00
Ingo Molnar c3895b01e8 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-ingo into cpus4096 2008-12-16 12:24:38 +01:00
Ingo Molnar 3c68b4a807 Merge branch 'linus' into cpus4096 2008-12-16 12:24:30 +01:00
Kay Sievers aab0d375e0 powerpc: struct device - replace bus_id with dev_name(), dev_set_name()
Acked-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-16 15:53:38 +11:00
Benjamin Herrenschmidt f63837f058 powerpc/mm: Remove flush_HPTE()
The function flush_HPTE() is used in only one place, the implementation
of DEBUG_PAGEALLOC on ppc32.

It's actually a dup of flush_tlb_page() though it's -slightly- more
efficient on hash based processors.  We remove it and replace it by
a direct call to the hash flush code on those processors and to
flush_tlb_page() for everybody else.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-16 15:53:34 +11:00