Граф коммитов

31973 Коммитов

Автор SHA1 Сообщение Дата
Ingo Molnar f180053694 x86, mm: dont use non-temporal stores in pagecache accesses
Impact: standardize IO on cached ops

On modern CPUs it is almost always a bad idea to use non-temporal stores,
as the regression in this commit has shown it:

  30d697f: x86: fix performance regression in write() syscall

The kernel simply has no good information about whether using non-temporal
stores is a good idea or not - and trying to add heuristics only increases
complexity and inserts fragility.

The regression on cached write()s took very long to be found - over two
years. So dont take any chances and let the hardware decide how it makes
use of its caches.

The only exception is drivers/gpu/drm/i915/i915_gem.c: there were we are
absolutely sure that another entity (the GPU) will pick up the dirty
data immediately and that the CPU will not touch that data before the
GPU will.

Also, keep the _nocache() primitives to make it easier for people to
experiment with these details. There may be more clear-cut cases where
non-cached copies can be used, outside of filemap.c.

Cc: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 11:06:49 +01:00
Michael Hennerich 0f29456a21 Blackfin arch: Make IRQ_EPPIx_ERROR naming consistent
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
2009-03-02 18:06:13 +08:00
Sonic Zhang 28e4cf22a3 Blackfin arch: Disable NAND option by default
Signed-off-by: Sonic Zhang <sonic.zhang@analog.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
2009-03-02 18:04:24 +08:00
Mike Frysinger a572e217c6 Blackfin arch: drop untested and useless "generic" board file
Signed-off-by: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Bryan Wu <cooloney@kernel.org>
2009-03-02 17:22:36 +08:00
Pekka Paalanen 340430c572 x86 mmiotrace: fix race with release_kmmio_fault_page()
There was a theoretical possibility to a race between arming a page in
post_kmmio_handler() and disarming the page in
release_kmmio_fault_page():

cpu0                             cpu1
------------------------------------------------------------------
mmiotrace shutdown
enter release_kmmio_fault_page
                                 fault on the page
                                 disarm the page
disarm the page
                                 handle the MMIO access
                                 re-arm the page
put the page on release list
remove_kmmio_fault_pages()
                                 fault on the page
                                 page not known to mmiotrace
                                 fall back to do_page_fault()
                                 *KABOOM*

(This scenario also shows the double disarm case which is allowed.)

Fixed by acquiring kmmio_lock in post_kmmio_handler() and checking
if the page is being released from mmiotrace.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:37 +01:00
Stuart Bennett 3e39aa156a x86 mmiotrace: improve handling of secondary faults
Upgrade some kmmio.c debug messages to warnings.
Allow secondary faults on probed pages to fall through, and only log
secondary faults that are not due to non-present pages.

Patch edited by Pekka Paalanen.

Signed-off-by: Stuart Bennett <stuart@freedesktop.org>
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:37 +01:00
Pekka Paalanen 0b700a6a25 x86 mmiotrace: split set_page_presence()
From 36772dcb6ffbbb68254cbfc379a103acd2fbfefc Mon Sep 17 00:00:00 2001
From: Pekka Paalanen <pq@iki.fi>
Date: Sat, 28 Feb 2009 21:34:59 +0200

Split set_page_presence() in kmmio.c into two more functions set_pmd_presence()
and set_pte_presence(). Purely code reorganization, no functional changes.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:36 +01:00
Pekka Paalanen 5359b585fb x86 mmiotrace: fix save/restore page table state
From baa99e2b32449ec7bf147c234adfa444caecac8a Mon Sep 17 00:00:00 2001
From: Pekka Paalanen <pq@iki.fi>
Date: Sun, 22 Feb 2009 20:02:43 +0200

Blindly setting _PAGE_PRESENT in disarm_kmmio_fault_page() overlooks the
possibility, that the page was not present when it was armed.

Make arm_kmmio_fault_page() store the previous page presence in struct
kmmio_fault_page and use it on disarm.

This patch was originally written by Stuart Bennett, but Pekka Paalanen
rewrote it a little different.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:36 +01:00
Stuart Bennett e9d54cae8f x86 mmiotrace: WARN_ONCE if dis/arming a page fails
Print a full warning once, if arming or disarming a page fails.

Also, if initial arming fails, do not handle the page further. This
avoids the possibility of a page failing to arm and then later claiming
to have handled any fault on that page.

WARN_ONCE added by Pekka Paalanen.

Signed-off-by: Stuart Bennett <stuart@freedesktop.org>
Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:35 +01:00
Pekka Paalanen 5ff93697fc x86: add far read test to testmmiotrace
Apparently pages far into an ioremapped region might not actually be
mapped during ioremap(). Add an optional read test to try to trigger a
multiply faulting MMIO access. Also add more messages to the kernel log
to help debugging.

This patch is based on a patch suggested by
Stuart Bennett <stuart@freedesktop.org>
who discovered bugs in mmiotrace related to normal kernel space faults.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:35 +01:00
Pekka Paalanen fab852aaf7 x86: count errors in testmmiotrace.ko
Check the read values against the written values in the MMIO read/write
test. This test shows if the given MMIO test area really works as
memory, which is a prerequisite for a successful mmiotrace test.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Cc: Stuart Bennett <stuart@freedesktop.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-02 10:20:34 +01:00
Ingo Molnar 55f2b78995 Merge branch 'x86/urgent' into x86/pat 2009-03-01 12:47:58 +01:00
Ingo Molnar f5c1aa1537 Revert "gpu/drm, x86, PAT: PAT support for io_mapping_*"
This reverts commit 17581ad812.

Sitsofe Wheeler reported that /dev/dri/card0 is MIA on his EeePC 900
and bisected it to this commit.

Graphics card is an i915 in an EeePC 900:

 00:02.0 VGA compatible controller [0300]:
   Intel Corporation Mobile 915GM/GMS/910GML
     Express Graphics Controller [8086:2592] (rev 04)

( Most likely the ioremap() of the driver failed and hence the card
  did not initialize. )

Reported-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Bisected-by: Sitsofe Wheeler <sitsofe@yahoo.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-01 12:47:49 +01:00
Tejun Heo d0c4f57027 bootmem, x86: further fixes for arch-specific bootmem wrapping
Impact: fix new breakages introduced by previous fix

Commit c132937556 tried to clean up
bootmem arch wrapper but it wasn't quite correct.  Before the commit,
the followings were broken.

* Low level interface functions prefixed with __ ignored arch
  preference.

* reserve_bootmem(...) can't be mapped into
  reserve_bootmem_node(NODE_DATA(0)->bdata, ...) because the node is
  not preference here.  The region specified MUST fall into the
  specified region; otherwise, it will panic.

After the commit,

* If allocation fails for the arch preferred node, it should fallback
  to whatever is available.  Instead, it simply failed allocation.

There are too many internal details to allow generic wrapping and
still keep things simple for archs.  Plus, all that arch wants is a
way to prefer certain node over another.

This patch drops the generic wrapping around alloc_bootmem_core() and
add alloc_bootmem_core() instead.  If necessary, arch can define
bootmem_arch_referred_node() macro or function which takes all
allocation information and returns the preferred node.  bootmem
generic code will always try the preferred node first and then
fallback to other nodes as usual.

Breakages noted and changes reviewed by Johannes Weiner.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
2009-03-01 16:06:56 +09:00
Tejun Heo af6326d72c alpha: fix typo in recent early vmalloc change
Impact: fix build

Add missing 'o' in variable name.  Compile tested.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
2009-03-01 16:03:16 +09:00
Jaswinder Singh Rajput 327f4387e3 x86: remove double copy of show_cpuinfo_core for 32 and 64 bit
Impact: unification

show_cpuinfo_core is identical for 32 and 64 bit and can be unified,
and CONFIG_X86_HT inherently depends on CONFIG_X86_SMP.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-28 19:26:33 -08:00
Ingo Molnar 92b9af9e4f x86: i915 needs pgprot_writecombine() and is_io_mapping_possible()
Impact: build fix

Theodore Ts reported that the i915 driver needs these symbols:

 ERROR: "pgprot_writecombine" [drivers/gpu/drm/i915/i915.ko] undefined!
 ERROR: "is_io_mapping_possible" [drivers/gpu/drm/i915/i915.ko] undefined!

Reported-by: Theodore Ts'o <tytso@mit.edu> wrote:
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-28 14:22:44 +01:00
Hiroshi Shimamoto 1fae0279ce x86: signal: introduce helper align_sigframe()
Impact: cleanup

Introduce helper align_sigframe() to align stack pointer for signal frame.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-28 09:17:31 +01:00
Hiroshi Shimamoto 75779f0526 x86: signal: unify get_sigframe()
Impact: cleanup

Unify get_sigframe().

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-28 09:17:30 +01:00
Hiroshi Shimamoto 36a4526583 x86: signal: use 16 bytes boundary for rt_sigframe
Impact: cleanup

Supporting xsave/xrestore introduces 64 bytes boundary for save_i387_xstate().
16 bytes boundary is OK for rt_sigframe.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-28 09:17:30 +01:00
Hiroshi Shimamoto 97286a2b64 x86: signal: intrroduce get_sigframe() and replace get_sigstack()
Impact: cleanup

Introduce get_sigframe() like 32-bit to replace get_sigstack().
Move the i387 stuff into get_sigframe().

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-28 09:17:29 +01:00
Hiroshi Shimamoto 144b0712dd x86: signal: add __user annotation
Impact: cleanup

Add missing __user annotation to the parameter of get_sigframe().
Also change cast type to void __user * of *fpstate.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-28 09:17:29 +01:00
Gustavo F. Padovan c577b098f9 x86, fixmap: unify fixmap.h
Impact: unification

This patch unify fixmap_32.h and fixmap_64.h into fixmap.h.
Things that we can't merge now are using CONFIG_X86_{32,64}
(e.g.:vsyscall and EFI)

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:48 -08:00
Gustavo F. Padovan c78f322ce8 x86, fixmap: prepare fixmap_32.h for unification
Impact: cleanup

Just prepare fixmap for later mechanic unification.
No real modification on code.

   text    data     bss     dec     hex filename
3831152  353188  372736 4557076  458914 vmlinux-32.after
3831152  353188  372736 4557076  458914 vmlinux-32.before

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:48 -08:00
Gustavo F. Padovan e365bcd921 x86, fixmap: prepare fixmap_64.h for unification
Impact: cleanup

Just prepare fixmap for later mechanic unification.
No real modification on code.

  text    data     bss     dec     hex filename
4312362  527192  421924 5261478  5048a6 vmlinux-64.after
4312362  527192  421924 5261478  5048a6 vmlinux-64.before

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:48 -08:00
Gustavo F. Padovan 5f403fa9de x86, fixmap: add CONFIG_EFI
Impact: new fixmap allocation

FIX_EFI_IO_MAP_FIRST_PAGE is used only when EFI is enabled.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:47 -08:00
Gustavo F. Padovan 2ae38daf25 x86, fixmap: add CONFIG_X86_{LOCAL,IO}_APIC
Impact: New fixmap allocations

Add CONFIG_X86_{LOCAL,IO}_APIC to enum fixed_address.
FIX_APIC_BASE is used only when CONFIG_X86_LOCAL_APIC is
enabled and FIX_IO_APIC_BASE_* are used only when
CONFIG_X86_IO_APIC is enabled.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:47 -08:00
Gustavo F. Padovan fd862dde18 x86, fixmap: define reserve_top_address for x86_64
Impact: new interface (not yet use)

Define reserve_top_address for x86_64; only for later x86 integration.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:47 -08:00
Gustavo F. Padovan ab93e3c45d x86, fixmap: define FIXADDR_BOOT_* and redefine FIX_ADDR_SIZE
Impact: new interface, not yet used

Now, with these macros, x86_64 code can know where start the
permanent and non-permanent fixed mapped address.
This patch make these macros equal fixmap_32.h for future
x86 integration.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:47 -08:00
Gustavo F. Padovan d09375a9ec x86, fixmap: rename __FIXADDR_SIZE and __FIXADDR_BOOT_SIZE
Impact: rename

Rename __FIXADDR_SIZE to FIXADDR_SIZE
and __FIXADDR_BOOT_SIZE to FIXADDR_BOOT_SIZE.

Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br>
Acked-by: Glauber Costa <gcosta@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-27 20:57:47 -08:00
Linus Torvalds 3c4f1158cd Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (21 commits)
  USB: musb: fix srp sysfs entry deletion
  USB: musb: resume suspended root hub on disconnect
  USB: musb: use right poll limit for low speed devices
  USB: musb: be careful with 64K+ transfer lengths, host side
  USB: musb: fix data toggle saving with shared FIFO
  USB: musb: host endpoint_disable() oops fixes
  USB: musb: fix urb_dequeue() method
  USB: musb: fix musb_host_tx() for shared endpoint FIFO
  USB: musb: be careful with 64K+ transfer lengths (gadget side)
  usb: musb: make Davinci *work* in mainline
  USB: usb_get_string should check the descriptor type
  USB: gadget: fix build error in omap_apollon_2420_defconfig
  USB: g_file_storage: automatically disable stalls under Atmel
  USB: usb-storage: add IGNORE_RESIDUE flag for Genesys Logic adapters
  USB: Quirk for Hummingbird huc56s / Conexant ACM modem
  USB: serial: add support for second revision of Ericsson F3507G WWAN card
  USB: cdc-acm: add usb id for motomagx phones
  USB: option: add BenQ 3g modem information
  usb: gadget: obex: select correct ep descriptors
  USB: EHCI: slow down ITD reuse
  ...
2009-02-27 16:49:26 -08:00
Linus Torvalds 7187adbf08 Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus:
  Revert "MIPS: Print irq handler description"
  MIPS: CVE-2009-0029: Enable syscall wrappers.
  MIPS: Alchemy: In plat_time_init() t reaches -1, tested: 0
  MIPS: Only allow Cavium OCTEON to be configured for boards that support it
2009-02-27 16:48:33 -08:00
Linus Torvalds 535d8e8f19 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: enable DMAR by default
  xen: disable interrupts early, as start_kernel expects
  gpu/drm, x86, PAT: io_mapping_create_wc and resource_size_t
  gpu/drm, x86, PAT: Handle io_mapping_create_wc() errors in a clean way
  x86, Voyager: fix compile by lifting the degeneracy of phys_cpu_present_map
  x86, doc: fix references to Documentation/x86/i386/boot.txt
2009-02-27 16:43:05 -08:00
Linus Torvalds 6febf65b29 Merge branch 'sh/for-2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
* 'sh/for-2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
  sh: ap325rxa: Revert ov772x support.
  serial: sh-sci: fix overrun error handling for SH7785 SCIF.
  sh: Storage class should be before const qualifier
2009-02-27 16:40:00 -08:00
David Brownell 34f32c9701 usb: musb: make Davinci *work* in mainline
Now that the musb build fixes for DaVinci got merged (RC3?), kick in
the other bits needed to get it finally *working* in mainline:

 - Use clk_enable()/clk_disable() ... the "always enable USB clocks"
   code this originally relied on has since been removed.

 - Initialize the USB device only after the relevant I2C GPIOs are
   available, so the host side can properly enable VBUS.

 - Tweak init sequencing to cope with mainline's relatively late init
   of the I2C system bus for power switches, transceivers, and so on.

Sanity tested on DM6664 EVM for host and peripheral modes; that system
won't boot with CONFIG_PM enabled, so OTG can't yet be tested.  Also
verified on OMAP3.

(Unrelated:  correct the MODULE_PARM_DESC spelling of musb_debug.)

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Felipe Balbi <me@felipebalbi.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-02-27 14:40:51 -08:00
Ralf Baechle 5312dc6bc0 Revert "MIPS: Print irq handler description"
This reverts commit 558d1de8ba.
2009-02-27 17:56:35 +00:00
Ralf Baechle dbda6ac089 MIPS: CVE-2009-0029: Enable syscall wrappers.
Thanks to David Daney helping with debugging and testing.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
2009-02-27 17:56:35 +00:00
Roel Kluin 4b0d3f5c28 MIPS: Alchemy: In plat_time_init() t reaches -1, tested: 0
With a postfix decrement t reaches -1 rather than 0, so the fall-back will
not occur.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: mano@roarinelk.homelinux.net
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2009-02-27 17:56:34 +00:00
David Daney 5e6833892e MIPS: Only allow Cavium OCTEON to be configured for boards that support it
Signed-off-by: David Daney <ddaney@caviumnetworks.com>
CC: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2009-02-27 17:56:34 +00:00
Paul Mundt 08c2f5b4d7 sh: ap325rxa: Revert ov772x support.
This change depends on some v4l changes that have been pushed back to
2.6.30, so drop this and fall back on the old soc_camera code until then.

Reported-by: Nobuhiro Iwamatsu <iwamatsu.nobuhiro@renesas.com>
Acked-by: Kuninori Morimoto <morimoto.kuninori@renesas.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2009-02-27 15:41:14 +09:00
Benjamin Herrenschmidt 1ac00cc213 powerpc/44x: Fix address decoding setup of PCI 2.x cells
The PCI 2.x cells used on some 44x SoCs only let us configure the decode
for the low 32-bit of the incoming PLB addresses. The top 4 bits (this
is a 36-bit bus) are hard wired to different values depending on the
specific SoC in use. Our code used to work "by accident" until I added
support for the ISA memory holes and while at it added more validity
checking of the addresses.

This patch should bring it back to working condition. It still relies
on the device-tree being correct but that's somewhat a pre-requisite
for anything to work anyway.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
2009-02-27 09:30:17 +11:00
Kyle McMartin f6be37fdc6 x86: enable DMAR by default
Now that the obvious bugs have been worked out, specifically
the iwlagn issue, and the write buffer errata, DMAR should be safe
to turn back on by default. (We've had it on since those patches were
first written a few weeks ago, without any noticeable bug reports
(most have been due to the dma-api debug patchset.))

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Acked-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 20:59:47 +01:00
Ingo Molnar 3b900d4419 x86: fix !ACPI build for es7000_32.c
arch/x86/kernel/apic/es7000_32.c:702: error: 'es7000_acpi_madt_oem_check_cluster' undeclared here (not in a function)

Provide a es7000_acpi_madt_oem_check_cluster() definition in the !ACPI
case too.

Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 14:35:56 +01:00
Ingo Molnar 0b1da1c8fc x86: apic: simplify secondary CPU wakeup methods, fix
Impact: build fix

init_deasserted is only available on SMP. Make the secondary-wakeup
function conditional on SMP.

Also clean up the file some.

Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 14:11:06 +01:00
Ingo Molnar 1f5bcabf1b x86: apic: simplify secondary CPU wakeup methods
Impact: cleanup

- rename apic->wakeup_cpu  to apic->wakeup_secondary_cpu, to
  make it apparent that this is an SMP-only method

- handle NULL ->wakeup_secondary_cpus to mean the default INIT
  wakeup sequence - this allows simplification of the APIC
  driver templates.

Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 13:58:26 +01:00
Ingo Molnar 0917c01f8e x86: remove update_apic from x86_quirks, fix
Impact: build fix

wakeup_secondary_cpu_via_init(), the default platform method for
booting a secondary CPU, is always used on UP due to probe_32.c,
if CONFIG_X86_LOCAL_APIC is enabled but SMP is off.

So provide a UP wrapper inline as well.

Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 12:49:34 +01:00
Herbert Xu a760a6656e crypto: api - Fix module load deadlock with fallback algorithms
With the mandatory algorithm testing at registration, we have
now created a deadlock with algorithms requiring fallbacks.
This can happen if the module containing the algorithm requiring
fallback is loaded first, without the fallback module being loaded
first.  The system will then try to test the new algorithm, find
that it needs to load a fallback, and then try to load that.

As both algorithms share the same module alias, it can attempt
to load the original algorithm again and block indefinitely.

As algorithms requiring fallbacks are a special case, we can fix
this by giving them a different module alias than the rest.  Then
it's just a matter of using the right aliases according to what
algorithms we're trying to find.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2009-02-26 14:06:31 +08:00
Yinghai Lu 129d8bc828 x86: don't compile vsmp_64 for 32bit
Impact: cleanup

that is only needed when CONFIG_X86_VSMP is defined with 64bit
also remove dead code about PCI, because CONFIG_X86_VSMP depends on PCI

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 06:40:06 +01:00
Yinghai Lu 2b6163bf57 x86: remove update_apic from x86_quirks
Impact: cleanup

x86_quirks->update_apic() calling looks crazy. so try to remove it:

 1. every apic take wakeup_cpu member directly
 2. separate es7000_apic to es7000_apic_cluster
 3. use uv_wakeup_cpu directly

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 06:32:25 +01:00
Ingo Molnar ecc25fbd6b Merge branches 'x86/apic', 'x86/defconfig', 'x86/memtest', 'x86/mm' and 'linus' into x86/core 2009-02-26 06:31:32 +01:00
Ingo Molnar 801c0be814 Merge branches 'x86/urgent' and 'x86/pat' into x86/core
Conflicts:
	arch/x86/include/asm/pat.h
2009-02-26 06:31:23 +01:00
Ingo Molnar 13b2eda64d Merge branch 'x86/urgent' into x86/core
Conflicts:
	arch/x86/mach-voyager/voyager_smp.c
2009-02-26 06:30:42 +01:00
Mark Nelson f72b728bf1 powerpc: Fix 64bit __copy_tofrom_user() regression
This fixes a regression introduced by commit
a4e22f02f5 ("powerpc: Update 64bit
__copy_tofrom_user() using CPU_FTR_UNALIGNED_LD_STD").

The same bug that existed in the 64bit memcpy() also exists here so fix
it here too. The fix is the same as that applied to memcpy() with the
addition of fixes for the exception handling code required for
__copy_tofrom_user().

This stops us reading beyond the end of the source region we were told
to copy.

Signed-off-by: Mark Nelson <markn@au1.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-26 14:02:54 +11:00
Mark Nelson e423b9ecd6 powerpc: Fix 64bit memcpy() regression
This fixes a regression introduced by commit
25d6e2d7c5 ("powerpc: Update 64bit memcpy()
using CPU_FTR_UNALIGNED_LD_STD").

This commit allowed CPUs that have the CPU_FTR_UNALIGNED_LD_STD CPU
feature bit present to do the memcpy() with unaligned load doubles. But,
along with this came a bug where our final load double would read bytes
beyond a page boundary and into the next (unmapped) page. This was caught
by enabling CONFIG_DEBUG_PAGEALLOC,

The fix was to read only the number of bytes that we need to store rather
than reading a full 8-byte doubleword and storing only a portion of that.

In order to minimise the amount of existing code touched we use the
original do_tail for the src_unaligned case.

Below is an example of the regression, as reported by Sachin Sant:

Unable to handle kernel paging request for data at address 0xc00000003f380000
Faulting instruction address: 0xc000000000039574
cpu 0x1: Vector: 300 (Data Access) at [c00000003baf3020]
    pc: c000000000039574: .memcpy+0x74/0x244
    lr: d00000000244916c: .ext3_xattr_get+0x288/0x2f4 [ext3]
    sp: c00000003baf32a0
   msr: 8000000000009032
   dar: c00000003f380000
 dsisr: 40000000
  current = 0xc00000003e54b010
  paca    = 0xc000000000a53680
    pid   = 1840, comm = readahead
enter ? for help
[link register   ] d00000000244916c .ext3_xattr_get+0x288/0x2f4 [ext3]
[c00000003baf32a0] d000000002449104 .ext3_xattr_get+0x220/0x2f4 [ext3]
(unreliab
le)
[c00000003baf3390] d00000000244a6e8 .ext3_xattr_security_get+0x40/0x5c [ext3]
[c00000003baf3400] c000000000148154 .generic_getxattr+0x74/0x9c
[c00000003baf34a0] c000000000333400 .inode_doinit_with_dentry+0x1c4/0x678
[c00000003baf3560] c00000000032c6b0 .security_d_instantiate+0x50/0x68
[c00000003baf35e0] c00000000013c818 .d_instantiate+0x78/0x9c
[c00000003baf3680] c00000000013ced0 .d_splice_alias+0xf0/0x120
[c00000003baf3720] d00000000243e05c .ext3_lookup+0xec/0x134 [ext3]
[c00000003baf37c0] c000000000131e74 .do_lookup+0x110/0x260
[c00000003baf3880] c000000000134ed0 .__link_path_walk+0xa98/0x1010
[c00000003baf3970] c0000000001354a0 .path_walk+0x58/0xc4
[c00000003baf3a20] c000000000135720 .do_path_lookup+0x138/0x1e4
[c00000003baf3ad0] c00000000013645c .path_lookup_open+0x6c/0xc8
[c00000003baf3b70] c000000000136780 .do_filp_open+0xcc/0x874
[c00000003baf3d10] c0000000001251e0 .do_sys_open+0x80/0x140
[c00000003baf3dc0] c00000000016aaec .compat_sys_open+0x24/0x38
[c00000003baf3e30] c00000000000855c syscall_exit+0x0/0x40

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-26 14:02:53 +11:00
Michael Neuling 49f297f8df powerpc: Fix load/store float double alignment handler
When we introduced VSX, we changed the way FPRs are stored in the
thread_struct.  Unfortunately we missed the load/store float double
alignment handler code when updating how we access FPRs in the
thread_struct.

Below fixes this and merges the little/big endian case.

Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-02-26 14:02:53 +11:00
Ingo Molnar 13093cb0e5 gpu/drm, x86, PAT: PAT support for io_mapping_*, export symbols for modules
Impact: build fix

 ERROR: "reserve_io_memtype_wc" [drivers/gpu/drm/i915/i915.ko] undefined!
 ERROR: "free_io_memtype" [drivers/gpu/drm/i915/i915.ko] undefined!

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-26 03:43:53 +01:00
Roel Kluin 5b5923975f [IA64] Don't go beyond iosapic_intr_info's arraysize
vi arch/ia64/kernel/iosapic.c +142
static struct iosapic_intr_info {
	...
} iosapic_intr_info[NR_IRQS];

But at line 510 we have:
	for (i = 0; i <= NR_IRQS; i++) {

s/<=/</

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-02-25 11:50:53 -08:00
Roel Kluin aa2f63c954 [IA64] Do not go beyond ARRAY_SIZE of unw.hash
static struct {

... :114
        unsigned short hash[UNW_HASH_SIZE];

... :2152
	for (index = 0; index <= UNW_HASH_SIZE; ++index) {

This is a bug, isn't it?

s/<=/</

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-02-25 11:48:04 -08:00
Kyle McMartin 6b1ff036d4 [IA64] enable setting DMAR on by default
The previous commit which introduced the DMAR_DEFAULT_ON setting in
drivers/pci/dmar.c neglected to add the ability for ia64 to enable
the IOMMU by default. Rectify that mistake, doh!

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-02-25 11:40:27 -08:00
Jeremy Fitzhardinge 55d8085671 xen: disable interrupts early, as start_kernel expects
This avoids a lockdep warning from:
	if (DEBUG_LOCKS_WARN_ON(unlikely(!early_boot_irqs_enabled)))
		return;
in trace_hardirqs_on_caller();

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 18:51:57 +01:00
Linus Torvalds f8dacde8c0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Fix crashes in jbusmc_print_dimm()
2009-02-25 09:31:56 -08:00
Ingo Molnar 2e31add2a7 Merge branch 'x86/urgent' into x86/pat 2009-02-25 16:40:10 +01:00
Peter Zijlstra 34754b69a6 x86: make vmap yell louder when it is used under irqs_disabled()
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 16:38:34 +01:00
Venkatesh Pallipadi 17581ad812 gpu/drm, x86, PAT: PAT support for io_mapping_*
Make io_mapping_create_wc and io_mapping_free go through PAT to make sure
that there are no memory type aliases.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 13:09:52 +01:00
Venkatesh Pallipadi 4ab0d47d0a gpu/drm, x86, PAT: io_mapping_create_wc and resource_size_t
io_mapping_create_wc should take a resource_size_t parameter in place of
unsigned long. With unsigned long, there will be no way to map greater than 4GB
address in i386/32 bit.

On x86, greater than 4GB addresses cannot be mapped on i386 without PAE. Return
error for such a case.

Patch also adds a structure for io_mapping, that saves the base, size and
type on HAVE_ATOMIC_IOMAP archs, that can be used to verify the offset on
io_mapping_map calls.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 13:09:51 +01:00
Venkatesh Pallipadi 7880f74645 gpu/drm, x86, PAT: routine to keep identity map in sync
Add a function to check and keep identity maps in sync, when changing
any memory type. One of the follow on patches will also use this
routine.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Keith Packard <keithp@keithp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 13:09:51 +01:00
Andreas Herrmann 63823126c2 x86: memtest: add additional (regular) test patterns
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:47 +01:00
Andreas Herrmann bfb4dc0da4 x86: memtest: wipe out test pattern from memory
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:46 +01:00
Andreas Herrmann 570c9e69aa x86: memtest: adapt log messages
- print test pattern instead of pattern number,
- show pattern as stored in memory,
- use proper priority flags,
- consistent use of u64 throughout the code

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:46 +01:00
Andreas Herrmann 7dad169e57 x86: memtest: cleanup memtest function
Impact: code cleanup

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:45 +01:00
Andreas Herrmann 6d74171bf7 x86: memtest: introduce array to select memtest patterns
Impact: code cleanup

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:45 +01:00
Andreas Herrmann 40823f737e x86: memtest: reuse test patterns when memtest parameter exceeds number of available patterns
Impact: fix unexpected behaviour when pattern number is out of range

Current implementation provides 4 patterns for memtest. The code doesn't
check whether the memtest parameter value exceeds the maximum pattern number.

Instead the memtest code pretends to test with non-existing patterns, e.g.
when booting with memtest=10 I've observed the following

  ...
  early_memtest: pattern num 10
  0000001000 - 0000006000 pattern 0
  ...
  0000001000 - 0000006000 pattern 1
  ...
  0000001000 - 0000006000 pattern 2
  ...
  0000001000 - 0000006000 pattern 3
  ...
  0000001000 - 0000006000 pattern 4
  ...
  0000001000 - 0000006000 pattern 5
  ...
  0000001000 - 0000006000 pattern 6
  ...
  0000001000 - 0000006000 pattern 7
  ...
  0000001000 - 0000006000 pattern 8
  ...
  0000001000 - 0000006000 pattern 9
  ...

But in fact Linux didn't test anything for patterns > 4 as the default
case in memtest() is to leave the function.

I suggest to use the memtest parameter as the number of tests to be
performed and to re-iterate over all existing patterns.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 12:19:44 +01:00
Ingo Molnar 95108fa34a x86: usercopy: check for total size when deciding non-temporal cutoff
Impact: make more types of copies non-temporal

This change makes the following simple fix:

  30d697f: x86: fix performance regression in write() syscall

A bit more sophisticated: we check the 'total' number of bytes
written to decide whether to copy in a cached or a non-temporal
way.

This will for example cause the tail (modulo 4096 bytes) chunk
of a large write() to be non-temporal too - not just the page-sized
chunks.

Cc: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 10:20:05 +01:00
Ingo Molnar 3255aa2eb6 x86, mm: pass in 'total' to __copy_from_user_*nocache()
Impact: cleanup, enable future change

Add a 'total bytes copied' parameter to __copy_from_user_*nocache(),
and update all the callsites.

The parameter is not used yet - architecture code can use it to
more intelligently decide whether the copy should be cached or
non-temporal.

Cc: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 10:20:03 +01:00
Ingo Molnar 95f66b3770 Merge branch 'x86/asm' into x86/mm 2009-02-25 08:27:46 +01:00
Tejun Heo d325100504 x86: convert cacheflush macros inline functions
Impact: cleanup

Unused macro parameters cause spurious unused variable warnings.
Convert all cacheflush macros to inline functions to avoid the
warnings and achieve better type checking.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-25 11:06:51 +09:00
Tejun Heo 24ff954233 x86, percpu: fix minor bugs in setup_percpu.c
Recent changes in setup_percpu.c made a now meaningless DBG()
statement fail to compile and introduced a
comparison-of-different-types warning.  Fix them.

Compile failure is reported by Ingo Molnar.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ingo Molnar <mingo@elte.hu>
2009-02-25 10:38:10 +09:00
H. Peter Anvin 638bee71c8 Merge branch 'x86/core' into x86/mce2 2009-02-24 16:11:51 -08:00
H. Peter Anvin 2aaa822984 Merge branch 'x86/defconfig' into x86/mce2
Conflicts (resolved):
	arch/x86/configs/x86_64_defconfig

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 16:01:17 -08:00
H. Peter Anvin 15d4fcd615 x86, mce: enable machine checks in 32-bit defconfig
Impact: defconfig change

Enable MCE in the 32-bit defconfig.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 15:52:58 -08:00
Andi Kleen 1250fbed14 x86, mce: enable machine checks in 64-bit defconfig
Impact: defconfig change

Enable MCE in the 64-bit defconfig.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-24 15:52:21 -08:00
Yinghai Lu 46cb27f516 x86: check range in reserve_early()
Impact: cleanup

one 32-bit system reports:

BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000001c000000 (usable)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
DMI 2.0 present.
last_pfn = 0x1c000 max_arch_pfn = 0x100000
kernel direct mapping tables up to 1c000000 @ 7000-c000
..
RAMDISK: 1bc69000 - 1bfef4fa
..
0MB HIGHMEM available.
448MB LOWMEM available.
  mapped low ram: 0 - 1c000000
  low ram: 00000000 - 1c000000
  bootmap 00002000 - 00005800
(9 early reservations) ==> bootmem [0000000000 - 001c000000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000001000 - 0000002000]    EX TRAMPOLINE ==> [0000001000 - 0000002000]
  #2 [0000006000 - 0000007000]       TRAMPOLINE ==> [0000006000 - 0000007000]
  #3 [0000400000 - 00009ed14c]    TEXT DATA BSS ==> [0000400000 - 00009ed14c]
  #4 [001bc69000 - 001bfef4fa]          RAMDISK ==> [001bc69000 - 001bfef4fa]
  #5 [00009ee000 - 00009f2000]    INIT_PG_TABLE ==> [00009ee000 - 00009f2000]
  #6 [000009f400 - 0000100000]    BIOS reserved ==> [000009f400 - 0000100000]
  #7 [0000007000 - 0000007000]          PGTABLE
  #8 [0000002000 - 0000006000]          BOOTMAP ==> [0000002000 - 0000006000]

Notice the strange blank PGTABLE entry.

The reason is init_pg_table is big enough, and zero range is called
with init_memory_mapping/reserve_early().

So try to check the range in reserve_early()

v2: fix the reversed compare

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: nickpiggin@yahoo.com.au
Cc: ink@jurassic.park.msu.ru
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 22:43:15 +01:00
Andi Kleen be71b8553d x86, mce, cmci: recheck CMCI banks after APIC has been enabled on CPU #0
Impact: Fix marginal race condition

One the first CPU the machine checks are enabled early before
the local APIC is enabled. This could in theory lead
to some lost CMCI events very early during boot because
CMCIs cannot be delivered with disabled LAPIC.

The poller also doesn't recover from this because it doesn't
check CMCI banks.

Add an explicit CMCI banks check after the LAPIC is enabled.
This is only done for CPU #0, the other CPUs only initialize
machine checks after the LAPIC is on.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:41:01 -08:00
Andi Kleen 5ca8681ca1 x86, mce, cmci: disable CMCI on rebooting
Impact: Avoids confusing other OSes.

Disable the CMCI vector on reboot to avoid confusing other OS.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:41:01 -08:00
H. Peter Anvin df20e2eb3e x86, mce, cmci: remove incorrect __cpuinit/__cpuexit annotations
Impact: Bug fix on UP

The MCE code is reinitialized from resume, so we can't use
__cpuinit/__cpuexit for most of the code.  Remove those annotations
for anything downstream of mce_init().

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:41:01 -08:00
Andi Kleen 88ccbedd9c x86, mce, cmci: add CMCI support
Impact: Major new feature

Intel CMCI (Corrected Machine Check Interrupt) is a new
feature on Nehalem CPUs. It allows the CPU to trigger
interrupts on corrected events, which allows faster
reaction to them instead of with the traditional
polling timer.

Also use CMCI to discover shared banks. Machine check banks
can be shared by CPU threads or even cores. Using the CMCI enable
bit it is possible to detect the fact that another CPU already
saw a specific bank. Use this to assign shared banks only
to one CPU to avoid reporting duplicated events.

On CPU hot unplug bank sharing is re discovered. This is done
using a thread that cycles through all the CPUs.

To avoid races between the poller and CMCI we only poll
for banks that are not CMCI capable and only check CMCI
owned banks on a interrupt.

The shared banks ownership information is currently only used for
CMCI interrupts, not polled banks.

The sharing discovery code follows the algorithm recommended in the
IA32 SDM Vol3a 14.5.2.1

The CMCI interrupt handler just calls the machine check poller to
pick up the machine check event that caused the interrupt.

I decided not to implement a separate threshold event like
the AMD version has, because the threshold is always one currently
and adding another event didn't seem to add any value.

Some code inspired by Yunhong Jiang's Xen implementation,
which was in term inspired by a earlier CMCI implementation
by me.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:41:00 -08:00
Andi Kleen 03195c6b40 x86, mce, cmci: define MSR names and fields for new CMCI registers
Impact: New register definitions only

CMCI means support for raising an interrupt on a corrected machine
check event instead of having to poll for it. It's a new feature in
Intel Nehalem CPUs available on some machine check banks.

For details see the IA32 SDM Vol3a 14.5

Define the registers for it as a preparation for further patches.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:41:00 -08:00
Andi Kleen ee031c31d6 x86, mce, cmci: use polled banks bitmap in machine check poller
Define a per cpu bitmap that contains the banks polled by the machine
check poller. This is needed for the CMCI code in the next patches
to be able to disable polling on specific banks.

The bank by default contains all banks, so there is no behaviour
change. Only future code will remove some banks from the polling
set.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:26:05 -08:00
Andi Kleen 8457c84d68 x86, mce: replace machine check events logged interval with ratelimit
Impact: behavior change, use common code

Use a standard leaky bucket ratelimit for the machine check
warning print interval instead of waiting every check_interval.
Also decrease the limit to twice per minute.
This interacts better with threshold interrupts because
they can happen more often than check_interval.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:25:53 -08:00
Andi Kleen f9695df42c x86, mce, cmci: avoid potential reentry of threshold interrupt
Impact: minor bugfix

The threshold handler on AMD (and soon on Intel) could be theoretically
reentered by the hardware. This could lead to corrupted events
because the machine check poll code assumes it is not reentered.

Move the APIC ACK to the end of the interrupt handler to let
the hardware avoid that.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:24:42 -08:00
Andi Kleen b276268631 x86, mce, cmci: factor out threshold interrupt handler
Impact: cleanup; preparation for feature

The mce_amd_64 code has an own private MC threshold vector with an own
interrupt handler. Since Intel needs a similar handler
it makes sense to share the vector because both can not
be active at the same time.

I factored the common APIC handler code into a separate file which can
be used by both the Intel or AMD MC code.

This is needed for the next patch which adds an Intel specific
CMCI handler.

This patch should be a nop for AMD, it just moves some code
around.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:24:42 -08:00
Andi Kleen 41fdff322e x86, mce, cmci: export MAX_NR_BANKS
Impact: Cleanup (code movement)

Move MAX_NR_BANKS into mce.h because it's needed there
for followup patches.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 13:24:42 -08:00
Jiri Slaby b5f26d0556 x86_32: summit_32, de-inline functions
The ones which go only into struct genapic are de-inlined
by compiler anyway, so remove the inline specifier from them.

Afterwards, remove summit_setup_portio_remap completely as it
is unused.

Remove inline also from summit_cpu_mask_to_apicid, since it's
not worth it (it is used in struct genapic too).

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 22:07:51 +01:00
Jiri Slaby 10b614eaa8 x86_32: summit_32, use BAD_APICID
Use BAD_APICID instead of 0xFF constants in summit_cpu_mask_to_apicid.

Also remove bogus comments about what we actually return.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 22:07:51 +01:00
Ingo Molnar 0edcf8d692 Merge branch 'tj-percpu' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into core/percpu
Conflicts:
	arch/x86/include/asm/pgtable.h
2009-02-24 21:52:45 +01:00
Ingo Molnar a852cbfaaf Merge branches 'x86/acpi', 'x86/apic', 'x86/asm', 'x86/cleanups', 'x86/mm', 'x86/signal' and 'x86/urgent'; commit 'v2.6.29-rc6' into x86/core 2009-02-24 21:50:43 +01:00
James Bottomley ddf9499b3d x86, Voyager: fix compile by lifting the degeneracy of phys_cpu_present_map
This was changed to a physmap_t giving a clashing symbol redefinition,
but actually using a physmap_t consumes rather a lot of space on x86,
so stick with a private copy renamed with a voyager_ prefix and made
static.  Nothing outside of the Voyager code uses it, anyway.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-24 12:50:11 -08:00
Mark Brown c8532db7f2 [ARM] 5411/1: S3C64XX: Fix EINT unmask
Currently the unmask function for EINT interrupts was setting the mask
bit rather than clearing it.  This was also previously reported and
fixed by Kyungmin Park <kyungmin.park@samsung.com> and others.

Acked-By: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-24 19:12:31 +00:00
Russell King 531660ef56 Add i2c_board_info for RiscPC PCF8583
Add the necessary i2c_board_info structure to fix the lack of PCF8583
RTC on RiscPC.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
2009-02-24 19:19:50 +01:00
Cyrill Gorcunov 9f331119a4 x86: efi_stub_32,64 - add missing ENDPROCs
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 18:08:40 +01:00
Cyrill Gorcunov bc8b2b9258 x86: head_64.S - use GLOBAL macro
Impact: cleanup

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 18:08:40 +01:00
Cyrill Gorcunov b3baaa138c x86: entry_64.S - add missing ENDPROC
native_usergs_sysret64 is described as

	extern void native_usergs_sysret64(void)

so lets add ENDPROC here

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 18:08:39 +01:00
Cyrill Gorcunov 57e372932c x86: invalid_vm86_irq -- use predefined macros
Impact: cleanup

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 18:08:39 +01:00
Cyrill Gorcunov 5e112ae23b x86: head_64.S - use IDT_ENTRIES instead of hardcoded number
Impact: cleanup

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 18:08:38 +01:00
Cyrill Gorcunov 2a0b100111 x86: head_64.S - remove useless balign
Impact: cleanup

NEXT_PAGE already has 'balign' so no
need to keep this redundant one.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: heukelum@fastmail.fm
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 18:08:38 +01:00
Salman Qazi 30d697fa3a x86: fix performance regression in write() syscall
While the introduction of __copy_from_user_nocache (see commit:
0812a579c9) may have been an improvement
for sufficiently large writes, there is evidence to show that it is
deterimental for small writes.  Unixbench's fstime test gives the
following results for 256 byte writes with MAX_BLOCK of 2000:

    2.6.29-rc6 ( 5 samples, each in KB/sec ):
    283750, 295200, 294500, 293000, 293300

    2.6.29-rc6 + this patch (5 samples, each in KB/sec):
    313050, 3106750, 293350, 306300, 307900

    2.6.18
    395700, 342000, 399100, 366050, 359850

    See w_test() in src/fstime.c in unixbench version 4.1.0.  Basically, the above test
    consists of counting how much we can write in this manner:

    alarm(10);
    while (!sigalarm) {
            for (f_blocks = 0; f_blocks < 2000; ++f_blocks) {
                   write(f, buf, 256);
            }
            lseek(f, 0L, 0);
    }

Note, there are other components to the write syscall regression
that are not addressed here.

Signed-off-by: Salman Qazi <sqazi@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-24 17:16:36 +01:00
Tejun Heo 8ac8375714 x86: add remapping percpu first chunk allocator
Impact: add better first percpu allocation for NUMA

On NUMA, embedding allocator can't be used as different units can't be
made to fall in the correct NUMA nodes.  To use large page mapping,
each unit needs to be remapped.  However, percpu areas are usually
much smaller than large page size and unused space hurts a lot as the
number of cpus grow.  This allocator remaps large pages for each chunk
but gives back unused part to the bootmem allocator making the large
pages mapped twice.

This adds slightly to the TLB pressure but is much better than using
4k mappings while still being NUMA-friendly.

Ingo suggested that this would be the correct approach for NUMA.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-02-24 11:57:22 +09:00
Tejun Heo 89c9215165 x86: add embedding percpu first chunk allocator
Impact: add better first percpu allocation for !NUMA

On !NUMA, we can simply allocate contiguous memory and use it for the
first chunk without mapping it into vmalloc area.  As the memory area
is covered by the large page physical memory mapping, it allows the
dynamic perpcu allocator to not add any TLB overhead for the static
percpu area and whatever falls into the first chunk and the
implementation is very simple too.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-24 11:57:21 +09:00
Tejun Heo 5f5d8405d1 x86: separate out setup_pcpu_4k() from setup_per_cpu_areas()
Impact: modularize percpu first chunk allocation

x86 is gonna have a few different strategies for the first chunk
allocation.  Modularize it by separating out the current allocation
mechanism into pcpu_alloc_bootmem() and setup_pcpu_4k().

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-24 11:57:21 +09:00
Tejun Heo 8d408b4be3 percpu: give more latitude to arch specific first chunk initialization
Impact: more latitude for first percpu chunk allocation

The first percpu chunk serves the kernel static percpu area and may or
may not contain extra room for further dynamic allocation.
Initialization of the first chunk needs to be done before normal
memory allocation service is up, so it has its own init path -
pcpu_setup_static().

It seems archs need more latitude while initializing the first chunk
for example to take advantage of large page mapping.  This patch makes
the following changes to allow this.

* Define PERCPU_DYNAMIC_RESERVE to give arch hint about how much space
  to reserve in the first chunk for further dynamic allocation.

* Rename pcpu_setup_static() to pcpu_setup_first_chunk().

* Make pcpu_setup_first_chunk() much more flexible by fetching page
  pointer by callback and adding optional @unit_size, @free_size and
  @base_addr arguments which allow archs to selectively part of chunk
  initialization to their likings.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-24 11:57:21 +09:00
Tejun Heo 458a3e644c x86: update populate_extra_pte() and add populate_extra_pmd()
Impact: minor change to populate_extra_pte() and addition of pmd flavor

Update populate_extra_pte() to return pointer to the pte_t for the
specified address and add populate_extra_pmd() which only populates
till the pmd and returns pointer to the pmd entry for the address.

For 64bit, pud/pmd/pte fill functions are separated out from
set_pte_vaddr[_pud]() and used for set_pte_vaddr[_pud]() and
populate_extra_{pte|pmd}().

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-24 11:57:21 +09:00
Tejun Heo c0c0a29379 vmalloc: add @align to vm_area_register_early()
Impact: allow larger alignment for early vmalloc area allocation

Some early vmalloc users might want larger alignment, for example, for
custom large page mapping.  Add @align to vm_area_register_early().
While at it, drop docbook comment on non-existent @size.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
2009-02-24 11:57:21 +09:00
Tejun Heo c132937556 bootmem: clean up arch-specific bootmem wrapping
Impact: cleaner and consistent bootmem wrapping

By setting CONFIG_HAVE_ARCH_BOOTMEM_NODE, archs can define
arch-specific wrappers for bootmem allocation.  However, this is done
a bit strangely in that only the high level convenience macros can be
changed while lower level, but still exported, interface functions
can't be wrapped.  This not only is messy but also leads to strange
situation where alloc_bootmem() does what the arch wants it to do but
the equivalent __alloc_bootmem() call doesn't although they should be
able to be used interchangeably.

This patch updates bootmem such that archs can override / wrap the
backend function - alloc_bootmem_core() instead of the highlevel
interface functions to allow simpler and consistent wrapping.  Also,
HAVE_ARCH_BOOTMEM_NODE is renamed to HAVE_ARCH_BOOTMEM.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@saeurebad.de>
2009-02-24 11:57:20 +09:00
H. Peter Anvin dc731ca609 Merge branch 'x86/urgent' into x86/mce2 2009-02-23 14:05:56 -08:00
H. Peter Anvin ec5b3d3243 x86, mce: remove invalid __cpuinit/__cpuexit annotations
Impact: Bug fix when CPU hotplug is disabled

Correct the following broken __cpuinit/__cpuexit annotations:

- mce_cpu_features() is called from mce_resume(), and so cannot be
  __cpuinit.
- mce_disable_cpu() and mce_reenable_cpu() are called from
  mce_cpu_callback(), and so cannot be __cpuexit().

Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-23 14:01:04 -08:00
Stas Sergeev bda3a89745 x86: minor cleanup in the espfix code
Impact: Cleanup

Checkin be44d2aabc eliminates the use of
a 16-bit stack for espfix.  However, at least one instruction remained
that only operated on the low 16 bits of %esp.

This is not a bug per se because the kernel stack is always an aligned
4K or 8K block.  Therefore it cannot cross 64K boundaries; this code,
in fact, relies strictly on that fact.

However, it's a lot cleaner (and, for that matter, smaller) to operate
on the entire 32-bit register.

Signed-off-by: Stas Sergeev <stsp@aknet.ru>
CC: Zachary Amsden <zach@vmware.com>
CC: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-23 11:34:04 -08:00
Yinghai Lu ecda06289f x86: check mptable physptr with max_low_pfn on 32bit
Impact: fix early crash on LinuxBIOS systems

Kevin O'Connor reported that Coreboot aka LinuxBIOS tries to put
mptable somewhere very high, well above max_low_pfn (below which
BIOSes generally put the mptable), causing a panic.

The BIOS will probably be changed to be compatible with older
Linus versions, but nevertheless the MP-spec does not forbid
an MP-table in arbitrary system RAM, so make sure it all
works even if the table is in an unexpected place.

Check physptr with max_low_pfn * PAGE_SIZE.

Reported-by: Kevin O'Connor <kevin@koconnor.net>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Stefan Reinauer <stepan@coresystems.de>
Cc: coreboot@coreboot.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-23 07:41:31 +01:00
Ingo Molnar 8e6dafd6c7 x86: refactor x86_quirks support
Impact: cleanup

Make x86_quirks support more transparent. The highlevel
methods are now named:

  extern void x86_quirk_pre_intr_init(void);
  extern void x86_quirk_intr_init(void);

  extern void x86_quirk_trap_init(void);

  extern void x86_quirk_pre_time_init(void);
  extern void x86_quirk_time_init(void);

This makes it clear that if some platform extension has to
do something here that it is considered ... weird, and is
discouraged.

Also remove arch_hooks.h and move it into setup.h (and other
header files where appropriate).

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-23 00:08:11 +01:00
Ingo Molnar d85a881d78 x86: remove various unused subarch hooks
Impact: remove dead code

Remove:

 - pre_setup_arch_hook()
 - mca_nmi_hook()

If needed they can be added back via an x86_quirk handler.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-23 00:06:49 +01:00
Ingo Molnar 965c7ecaf2 x86: remove the Voyager 32-bit subarch
Impact: remove unused/broken code

The Voyager subarch last built successfully on the v2.6.26 kernel
and has been stale since then and does not build on the v2.6.27,
v2.6.28 and v2.6.29-rc5 kernels.

No actual users beyond the maintainer reported this breakage.
Patches were sent and most of the fixes were accepted but the
discussion around how to do a few remaining issues cleanly
fizzled out with no resolution and the code remained broken.

In the v2.6.30 x86 tree development cycle 32-bit subarch support
has been reworked and removed - and the Voyager code, beyond the
build problems already known, needs serious and significant
changes and probably a rewrite to support it.

CONFIG_X86_VOYAGER has been marked BROKEN then. The maintainer has
been notified but no patches have been sent so far to fix it.

While all other subarchs have been converted to the new scheme,
voyager is still broken. We'd prefer to receive patches which
clean up the current situation in a constructive way, but even in
case of removal there is no obstacle to add that support back
after the issues have been sorted out in a mutually acceptable
fashion.

So remove this inactive code for now.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-23 00:54:01 +01:00
Andrei Birjukov d82ad6d683 [ARM] at91: fix for Atmel AT91 powersaving
We've discovered that our AT91SAM9260 board consumed too much power when
returning from a slowclock low-power mode.  RAM self-refresh is enabled in
a bootloader in our case, this is how we saw a difference.  Estimated ca.
30mA more on 4V battery than the same state before powersaving.

After a small research we found that there seems to be a bogus
sdram_selfrefresh_disable() call at the end of at91_pm_enter() call, which
overwrites the LPR register with uninitialized value.  Please find the
suggested patch attached.

This patch fixes correct restoring of LPR register of the Atmel AT91 SDRAM
controller when returning from a power saving mode.

Signed-off-by: Andrei Birjukov <andrei.birjukov@artecdesign.ee>
Acked-by: Andrew Victor <linux@maxim.org.za>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-22 22:37:21 +00:00
Ravikiran G Thirumalai 8425091ff8 x86: improve the help text of X86_EXTENDED_PLATFORM
Change the CONFIG_X86_EXTENDED_PLATFORM help text to display the
32bit/64bit extended platform list. This is as suggested by Ingo.

Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: shai@scalex86.org
Cc: "Benzi Galili (Benzi@ScaleMP.com)" <benzi@scalemp.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 20:21:31 +01:00
Ingo Molnar fc6fc7f1b1 Merge branch 'linus' into x86/apic
Conflicts:
	arch/x86/mach-default/setup.c

Semantic conflict resolution:
	arch/x86/kernel/setup.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 20:05:19 +01:00
Rafael J. Wysocki 770824bdc4 PM: Split up sysdev_[suspend|resume] from device_power_[down|up]
Move the sysdev_suspend/resume from the callee to the callers, with
no real change in semantics, so that we can rework the disabling of
interrupts during suspend/hibernation.

This is based on an earlier patch from Linus.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-22 10:33:44 -08:00
Linus Torvalds 936577c61d x86: Add IRQF_TIMER to legacy x86 timer interrupt descriptors
Right now nobody cares, but the suspend/resume code will eventually want
to suspend device interrupts without suspending the timer, and will
depend on this flag to know.

The modern x86 timer infrastructure uses the local APIC timers and never
shows up as a device interrupt at all, so it isn't affected and doesn't
need any of this.

Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-22 10:27:49 -08:00
Linus Torvalds 7c24af498f Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPI: remove CONFIG_ACPI_SYSTEM
  fujitsu-laptop: Use RFKILL support bitmask from firmware
  x86_64: Fix S3 fail path
  x86_64: acpi/wakeup_64 cleanup
  battery: don't assume we are fully charged when not charging or discharging
  ACPI: EC: Add delay for slow MSI controller
2009-02-22 09:28:46 -08:00
Geert Uytterhoeven 3d92e8f3ae m68k: atari - Rename "mfp" to "st_mfp"
http://kisskb.ellerman.id.au/kisskb/buildresult/72115/:
| net/mac80211/ieee80211_i.h:327: error: syntax error before 'volatile'
| net/mac80211/ieee80211_i.h:350: error: syntax error before '}' token
| net/mac80211/ieee80211_i.h:455: error: field 'sta' has incomplete type
| distcc[19430] ERROR: compile net/mac80211/main.c on sprygo/32 failed

This is caused by

| # define mfp ((*(volatile struct MFP*)MFP_BAS))

in arch/m68k/include/asm/atarihw.h, which conflicts with the new "mfp" enum in
net/mac80211/ieee80211_i.h.

Rename "mfp" to "st_mfp", as it's a way too generic name for a global #define.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-22 09:23:02 -08:00
Suresh Siddha ef1f87aa7b x86: select x2apic ops in early apic probe only if x2apic mode is enabled
If BIOS hands over the control to OS in legacy xapic mode, select
legacy xapic related ops in the early apic probe and shift to x2apic
ops later in the boot sequence, only after enabling x2apic mode.

If BIOS hands over the control in x2apic mode, select x2apic related
ops in the early apic probe.

This fixes the early boot panic, where we were selecting x2apic ops,
while the cpu is still in legacy xapic mode.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 18:20:50 +01:00
Hiroshi Shimamoto a967bb3fbe x86: ia32_signal: introduce {get|set}_user_seg()
Impact: cleanup

Introduce {get|set}_user_seg() and loadsegment_xx() macros to make code clean.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 17:54:47 +01:00
Hiroshi Shimamoto 8801ead40c x86: ia32_signal: introduce GET_SEG() macro
Impact: cleanup

introduce GET_SEG() macro like arch/x86/kernel/signal.c.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 17:54:47 +01:00
Hiroshi Shimamoto a47e3ec197 x86: ia32_signal: remove unused debug code
Impact: cleanup

DEBUG_SIG will not be used.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 17:54:46 +01:00
Ingo Molnar b319eed0aa x86, mm: fault.c, simplify kmmio_fault(), cleanup
Clarify the kmmio_fault() comment.

Acked-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 10:24:18 +01:00
Hannes Eder 2366c298b5 x86: numa_32.c: fix sparse warning: Using plain integer as NULL pointer
Fix this sparse warning:

  arch/x86/mm/numa_32.c:197:24: warning: Using plain integer as NULL pointer

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 09:27:12 +01:00
Hannes Eder fc6fcdfbb8 x86: kexec/i386: fix sparse warnings: Using plain integer as NULL pointer
Fix these sparse warnings:

  arch/x86/kernel/machine_kexec_32.c:124:22: warning: Using plain integer as NULL pointer
  arch/x86/kernel/traps.c:950:24: warning: Using plain integer as NULL pointer

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Cc: trivial@kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-22 09:27:11 +01:00
Jiri Slaby 6defa2fe20 x86_64: Fix S3 fail path
As acpi_enter_sleep_state can fail, take this into account in
do_suspend_lowlevel and don't return to the do_suspend_lowlevel's
caller. This would break (currently) fpu status and preempt count.

Technically, this means use `call' instead of `jmp' and `jmp' to
the `resume_point' after the `call' (i.e. if
acpi_enter_sleep_state returns=fails). `resume_point' will handle
the restore of fpu and preempt count gracefully.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-02-21 21:58:18 -05:00
Jiri Slaby e6bd6760c9 x86_64: acpi/wakeup_64 cleanup
- remove %ds re-set, it's already set in wakeup_long64
- remove double labels and alignment (ENTRY already adds both)
- use meaningful resume point labelname
- skip alignment while jumping from wakeup_long64 to the resume point
- remove .size, .type and unused labels
[v2]
- added ENDPROCs

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-02-21 21:58:18 -05:00
Linus Torvalds 460c1338fc Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mce: remove incorrect __cpuinit for mce_cpu_features()
  MAINTAINERS: paravirt-ops maintainers update
2009-02-21 09:15:39 -08:00
H. Peter Anvin cc3ca22063 x86, mce: remove incorrect __cpuinit for mce_cpu_features()
Impact: Bug fix on UP

Checkin 6ec68bff3c81e776a455f6aca95c8c5f1d630198:
    x86, mce: reinitialize per cpu features on resume

introduced a call to mce_cpu_features() in the resume path, in order
for the MCE machinery to get properly reinitialized after a resume.
However, this function (and its successors) was flagged __cpuinit,
which becomes __init on UP configurations (on SMP suspend/resume
requires CPU hotplug and so this would not be seen.)

Remove the offending __cpuinit annotations for mce_cpu_features() and
its successor functions.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-20 23:40:40 -08:00
Linus Torvalds be71cb5b52 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: use the right protections for split-up pagetables
  x86, vmi: TSC going backwards check in vmi clocksource
2009-02-20 18:03:07 -08:00
Wei Yongjun d9190913b7 mn10300: fix typo && -> || in arch/mn10300/unit-asb2305/pci.c
Fix the typo && -> ||.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-20 17:57:48 -08:00
David Howells 58bafe72ad mn10300: fix oprofile
oprofile for MN10300 seems to have been broken by the advent of the new
tracing framework.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-20 17:57:48 -08:00
Luca Bigliardi 41a9e64ca4 uml: fix vde network backend in user mode linux
* Replace kmalloc() with uml_kmalloc() (fix build failure)

* Remove unnecessary UM_KERN_INFO in printk() (don't display '<6>' while
  printing info)

Signed-off-by: Luca Bigliardi <shammash@artha.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: WANG Cong <wangcong@zeuux.org>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-20 17:57:48 -08:00
Ingo Molnar f8eeb2e6be x86, mm: fault.c, update copyrights
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 23:13:36 +01:00
Ingo Molnar cd1b68f08f x86, mm: fault.c, give another attempt at prefetch handing before SIGBUS
Impact: extend prefetch handling on 64-bit

Currently there's an extra is_prefetch() check done in do_sigbus(),
which we only do on 32 bits.

This is a last-ditch check before we terminate a task, so it's worth
giving prefetch instructions another chance - should none of our
existing quirks have caught a prefetch instruction related spurious
fault.

The only risk is if a prefetch causes a real sigbus, in that case
we'll not OOM but try another fault. But this code has been on
32-bit for a long time, so it should be fine in practice.

So do this on 64-bit too - and thus remove one more #ifdef.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:46 +01:00
Ingo Molnar 7c178a26d3 x86, mm: fault.c, remove #ifdef from fault_in_kernel_space()
Impact: cleanup

Removal of an #ifdef in fault_in_kernel_space(), by making
use of the new TASK_SIZE_MAX symbol which is now available
on 32-bit too.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:45 +01:00
Ingo Molnar d951734654 x86, mm: rename TASK_SIZE64 => TASK_SIZE_MAX
Impact: cleanup

Rename TASK_SIZE64 to TASK_SIZE_MAX, and provide the
define on 32-bit too. (mapped to TASK_SIZE)

This allows 32-bit code to make use of the (former-) TASK_SIZE64
symbol as well, in a clean way.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:44 +01:00
Ingo Molnar c3731c6866 x86, mm: fault.c, remove #ifdef from do_page_fault()
Impact: cleanup

do_page_fault() has this ugly #ifdef in its prototype:

  #ifdef CONFIG_X86_64
  asmlinkage
  #endif
  void __kprobes do_page_fault(struct pt_regs *regs, unsigned long error_code)

Replace it with 'dotraplinkage' which maps to exactly the above
construct: nothing on 32-bit and asmlinkage on 64-bit.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:44 +01:00
Ingo Molnar 1cc99544dd x86, mm: fault.c, unify oops handling
Impact: add oops-recursion check to 32-bit

Unify the oops state-machine, to the 64-bit version. It is
slightly more careful in that it does a recursion check
in oops_begin(), and is thus more likely to show the relevant
oops.

It also means that 32-bit will print one more line at the
end of pagefault triggered oopses:

 	printk(KERN_EMERG "CR2: %016lx\n", address);

Which is generally good information to be seen in partial-dump
digital-camera jpegs ;-)

The downside is the somewhat more complex critical path. Both
variants have been tested well meanwhile by kernel developers
crashing their boxes so i dont think this is a practical worry.

This removes 3 ugly #ifdefs from no_context() and makes the
function a lot nicer read.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:44 +01:00
Ingo Molnar 8f7661496c x86, mm: fault.c, unify oops printing
Impact: refine/extend page fault related oops printing on 64-bit

 - honor the pause_on_oops logic on 64-bit too
 - print out NX fault warnings on 64-bit as well
 - factor out the NX fault message to make it git-greppable and readable

Note that this means that we do the PF_INSTR check on 32-bit non-PAE
as well where it should not occur ... normally. Cannot hurt.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:43 +01:00
Ingo Molnar f2f13a8535 x86, mm: fault.c, reorder functions
Impact: cleanup

Avoid a couple more #ifdefs by moving fundamentally non-unifiable
functions into a single #ifdef 32-bit / #else / #endif block in
fault.c: vmalloc*(), dump_pagetable(), check_vm8086_mode().

No code changed:

   text	   data	    bss	    dec	    hex	filename
   4618	     32	     24	   4674	   1242	fault.o.before
   4618	     32	     24	   4674	   1242	fault.o.after

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:43 +01:00
Ingo Molnar b18018126f x86, mm, kprobes: fault.c, simplify notify_page_fault()
Impact: cleanup

Remove an #ifdef from notify_page_fault(). The function still
compiles to nothing in the !CONFIG_KPROBES case.

Introduce kprobes_built_in() and kprobe_fault_handler() helpers
to allow this - they returns 0 if !CONFIG_KPROBES.

No code changed:

   text	   data	    bss	    dec	    hex	filename
   4618	     32	     24	   4674	   1242	fault.o.before
   4618	     32	     24	   4674	   1242	fault.o.after

Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:42 +01:00
Ingo Molnar b814d41f09 x86, mm: fault.c, simplify kmmio_fault()
Impact: cleanup

Remove an #ifdef from kmmio_fault() - we can do this by
providing default implementations for is_kmmio_active()
and kmmio_handler(). The compiler optimizes it all away
in the !CONFIG_MMIOTRACE case.

Also, while at it, clean up mmiotrace.h a bit:

 - standard header guards
 - standard vertical spaces for structure definitions

No code changed (both with mmiotrace on and off in the config):

   text	   data	    bss	    dec	    hex	filename
   2947	     12	     12	   2971	    b9b	fault.o.before
   2947	     12	     12	   2971	    b9b	fault.o.after

Cc: Pekka Paalanen <pq@iki.fi>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:42 +01:00
Ingo Molnar 121d5d0a7e x86, mm: fault.c, enable PF_RSVD checks on 32-bit too
Impact: improve page fault handling robustness

The 'PF_RSVD' flag (bit 3) of the page-fault error_code is a
relatively recent addition to x86 CPUs, so the 32-bit do_fault()
implementation never had it. This flag gets set when the CPU
detects nonzero values in any reserved bits of the page directory
entries.

Extend the existing 64-bit check for PF_RSVD in do_page_fault()
to 32-bit too. If we detect such a fault then we print a more
informative oops and the pagetables.

This unifies the code some more, removes an ugly #ifdef and improves
the 32-bit page fault code robustness a bit. It slightly increases
the 32-bit kernel text size.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:41 +01:00
Ingo Molnar 8c938f9fae x86, mm: fault.c, factor out the vm86 fault check
Impact: cleanup

Instead of an ugly, open-coded, #ifdef-ed vm86 related legacy check
in do_page_fault(), put it into the check_v8086_mode() helper
function and merge it with an existing #ifdef.

Also, simplify the code flow a tiny bit in the helper.

No code changed:

arch/x86/mm/fault.o:

   text	   data	    bss	    dec	    hex	filename
   2711	     12	     12	   2735	    aaf	fault.o.before
   2711	     12	     12	   2735	    aaf	fault.o.after

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:41 +01:00
Ingo Molnar 107a03678c x86, mm: fault.c, refactor/simplify the is_prefetch() code
Impact: no functionality changed

Factor out the opcode checker into a helper inline.

The code got a tiny bit smaller:

   text	   data	    bss	    dec	    hex	filename
   4632	     32	     24	   4688	   1250	fault.o.before
   4618	     32	     24	   4674	   1242	fault.o.after

And it got cleaner / easier to review as well.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:40 +01:00
Ingo Molnar 2d4a71676f x86, mm: fault.c cleanup
Impact: cleanup, no code changed

Clean up various small details, which can be correctness checked
automatically:

 - tidy up the include file section
 - eliminate unnecessary includes
 - introduce show_signal_msg() to clean up code flow
 - standardize the code flow
 - standardize comments and other style details
 - more cleanups, pointed out by checkpatch

No code changed on either 32-bit nor 64-bit:

arch/x86/mm/fault.o:

   text	   data	    bss	    dec	    hex	filename
   4632	     32	     24	   4688	   1250	fault.o.before
   4632	     32	     24	   4688	   1250	fault.o.after

the md5 changed due to a change in a single instruction:

   2e8a8241e7f0d69706776a5a26c90bc0  fault.o.before.asm
   c5c3d36e725586eb74f0e10692f0193e  fault.o.after.asm

Because a __LINE__ reference in a WARN_ONCE() has changed.

On 32-bit a few stack offsets changed - no code size difference
nor any functionality difference.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-21 00:09:39 +01:00
Alok Kataria fdb17aeb28 x86, vmi: TSC going backwards check in vmi clocksource, cleanup
clean up vmi_read_cycles to use max()

Reported-b: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Zach Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 19:31:03 +01:00
Ingo Molnar c9e1585b1b Merge branch 'tip/x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into x86/mm 2009-02-20 18:51:43 +01:00
Ingo Molnar 7a5714e018 x86, pat: add large-PAT check to split_large_page()
Impact: future-proof the split_large_page() function

Linus noticed that split_large_page() is not safe wrt. the
PAT bit: it is bit 12 on the 1GB and 2MB page table level
(_PAGE_BIT_PAT_LARGE), and it is bit 7 on the 4K page
table level (_PAGE_BIT_PAT).

Currently it is not a problem because we never set
_PAGE_BIT_PAT_LARGE on any of the large-page mappings - but
should this happen in the future the split_large_page() would
silently lift bit 12 into the lowlevel 4K pte and would start
corrupting the physical page frame offset. Not fun.

So add a debug warning, to make sure if something ever sets
the PAT bit then this function gets updated too.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 17:48:49 +01:00
Steven Rostedt 3c3e5694ad x86: check PMD in spurious_fault handler
Impact: fix to prevent hard lockup on bad PMD permissions

If the PMD does not have the correct permissions for a page access,
but the PTE does, the spurious fault handler will mistake the fault
as a lazy TLB transaction. This will result in an infinite loop of:

 fault -> spurious_fault check (pass) -> return to code -> fault

This patch adds a check and a warn on if the PTE passes the permissions
but the PMD does not.

[ Updated: Ingo Molnar suggested using WARN_ONCE with some text ]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2009-02-20 11:44:47 -05:00
Ingo Molnar 609162850d Merge branches 'x86/asm', 'x86/cleanups' and 'x86/headers' into x86/core 2009-02-20 17:40:50 +01:00
Ingo Molnar 3b6f7b9beb Merge branch 'x86/urgent' into x86/core 2009-02-20 17:40:43 +01:00
Vegard Nossum ecab22aa6d x86: use symbolic constants for MSR_IA32_MISC_ENABLE bits
Impact: Cleanup. No functional changes.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 12:07:43 +01:00
Ingo Molnar 07a66d7c53 x86: use the right protections for split-up pagetables
Steven Rostedt found a bug in where in his modified kernel
ftrace was unable to modify the kernel text, due to the PMD
itself having been marked read-only as well in
split_large_page().

The fix, suggested by Linus, is to not try to 'clone' the
reference protection of a huge-page, but to use the standard
(and permissive) page protection bits of KERNPG_TABLE.

The 'cloning' makes sense for the ptes but it's a confused and
incorrect concept at the page table level - because the
pagetable entry is a set of all ptes and hence cannot
'clone' any single protection attribute - the ptes can be any
mixture of protections.

With the permissive KERNPG_TABLE, even if the pte protections
get changed after this point (due to ftrace doing code-patching
or other similar activities like kprobes), the resulting combined
protections will still be correct and the pte's restrictive
(or permissive) protections will control it.

Also update the comment.

This bug was there for a long time but has not caused visible
problems before as it needs a rather large read-only area to
trigger. Steve possibly hacked his kernel with some really
large arrays or so. Anyway, the bug is definitely worth fixing.

[ Huang Ying also experienced problems in this area when writing
  the EFI code, but the real bug in split_large_page() was not
  realized back then. ]

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Huang Ying <ying.huang@intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-20 08:35:03 +01:00
Tejun Heo 11124411aa x86: convert to the new dynamic percpu allocator
Impact: use new dynamic allocator, unified access to static/dynamic
        percpu memory

Convert to the new dynamic percpu allocator.

* implement populate_extra_pte() for both 32 and 64
* update setup_per_cpu_areas() to use pcpu_setup_static()
* define __addr_to_pcpu_ptr() and __pcpu_ptr_to_addr()
* define config HAVE_DYNAMIC_PER_CPU_AREA

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-20 16:29:09 +09:00
Tejun Heo f0aa661790 vmalloc: implement vm_area_register_early()
Impact: allow multiple early vm areas

There are places where kernel VM area needs to be allocated before
vmalloc is initialized.  This is done by allocating static vm_struct,
initializing several fields and linking it to vmlist and later vmalloc
initialization picking up these from vmlist.  This is currently done
manually and if there's more than one such areas, there's no defined
way to arbitrate who gets which address.

This patch implements vm_area_register_early(), which takes vm_area
struct with flags and size initialized, assigns address to it and puts
it on the vmlist.  This way, multiple early vm areas can determine
which addresses they should use.  The only current user - alpha mm
init - is converted to use it.

Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-20 16:29:08 +09:00
Rusty Russell b36128c830 alloc_percpu: change percpu_ptr to per_cpu_ptr
Impact: cleanup

There are two allocated per-cpu accessor macros with almost identical
spelling.  The original and far more popular is per_cpu_ptr (44
files), so change over the other 4 files.

tj: kill percpu_ptr() and update UP too

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: mingo@redhat.com
Cc: lenb@kernel.org
Cc: cpufreq@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-20 16:29:08 +09:00
Lai Jiangshan 42f8faecf7 x86: use percpu data for 4k hardirq and softirq stacks
Impact: economize memory for large NR_CPUS

percpu data is setup earlier than irq, we can use percpu data
to economize memory.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-02-20 16:26:10 +09:00
Alok N Kataria 48ffc70b67 x86, vmi: TSC going backwards check in vmi clocksource
Impact: fix time warps under vmware

Similar to the check for TSC going backwards in the TSC clocksource,
we also need this check for VMI clocksource.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
2009-02-20 07:53:08 +01:00
H. Peter Anvin f6d1826dfa x86, mce: use %ll instead of %L for 64-bit numbers
Impact: Cleanup

The standard spelling of a printf pattern for long long is "ll", not
"L", which is for long double.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-19 15:44:58 -08:00
Andi Kleen b79109c3bb x86, mce: separate correct machine check poller and fatal exception handler
Impact: cleanup, performance enhancement

The machine check poller is diverging more and more from the fatal
exception handler. Instead of adding more special cases separate the code
paths completely. The corrected poll path is actually quite simple,
and this doesn't result in much code duplication.

This makes both handlers much easier to read and results in
cleaner code flow.  The exception handler now only needs to care
about uncorrected errors, which also simplifies the handling of multiple
errors. The corrected poller also now always runs in standard interrupt
context and does not need to do anything special to handle NMI context.

Minor behaviour changes:
- MCG status is now not cleared on polling.
- Only the banks which had corrected errors get cleared on polling
- The exception handler only clears banks with errors now

v2: Forward port to new patch order. Add "uc" argument.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-19 14:52:20 -08:00
Andi Kleen b5f2fa4ea0 x86, mce: factor out duplicated struct mce setup into one function
Impact: cleanup

This merely factors out duplicated code to set up
the initial struct mce state into a single function.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-19 14:51:39 -08:00
Andi Kleen 0d7482e3d7 x86, mce: implement dynamic machine check banks support
Impact: cleanup; making code future proof; memory saving on small systems

This patch replaces the hardcoded max number of machine check banks with 
dynamic allocation depending on what the CPU reports. The sysfs
data structures and the banks array are dynamically allocated.

There is still a hard bank limit (128) because the mcelog protocol uses
banks >= 128 as pseudo banks to escape other events. But we expect
that 128 banks is beyond any reasonable CPU for now.

This supersedes an earlier patch by Venki, but it solves the problem
more completely by making the limit fully dynamic (up to the 128
boundary).

This saves some memory on machines with less than 6 banks because
they won't need sysdevs for unused ones and also allows to 
use sysfs to control these banks on possible future CPUs with
more than 6 banks.

This is an updated patch addressing Venki's comments.  I also added in
another patch from Thomas which fixed the error allocation path (that
patch was previously separated)

Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-19 14:50:58 -08:00
Andi Kleen e35849e910 x86, mce: enable machine checks in 64-bit defconfig
Impact: Low priority fix

The 32-bit defconfig already had it enabled. And it's a pretty
fundamental feature, so better enable it on 64 bits too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2009-02-19 14:48:55 -08:00
Linus Torvalds a5e7536388 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] xen_domu build fix
  [IA64] fixes configs and add default config for ia64 xen domU
  [IA64] Remove redundant cpu_clear() in __cpu_disable path
  [IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs"
  [IA64] bte_copy of BTE_MAX_XFER trips BUG_ON.
  [IA64] Build fix for __early_pfn_to_nid() undefined link error
2009-02-19 13:09:20 -08:00
Tony Luck ec8148de85 [IA64] xen_domu build fix
arch/ia64/xen/xen_pv_ops.c:156: error: xen_init_ops causes a section type conflict
arch/ia64/xen/xen_pv_ops.c:340: error: xen_iosapic_ops causes a section type conflict

Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-02-19 12:05:00 -08:00
Isaku Yamahata 1d5b20f490 [IA64] fixes configs and add default config for ia64 xen domU
This patch fixes xen related Kconfigs and add default config
file for ia64 xen domU.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
2009-02-19 11:39:06 -08:00
Alex Chiang c0acdea214 [IA64] Remove redundant cpu_clear() in __cpu_disable path
The second call to cpu_clear() is redundant, as we've already removed
the CPU from cpu_online_map before calling migrate_platform_irqs().

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
2009-02-19 11:32:50 -08:00
Alex Chiang 66db2e6331 [IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs"
This reverts commit e7b140365b.

Commit e7b14036 removes the targetted disabled CPU from the
cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.

Paul McKenney states that the reasoning behind the patch was to
prevent irq handlers from running on CPUs marked offline because:

	RCU happily ignores CPUs that don't have their bits set in
	cpu_online_map, so if there are RCU read-side critical sections
	in the irq handlers being run, RCU will ignore them.  If the
	other CPUs were running, they might sequence through the RCU
	state machine, which could result in data structures being
	yanked out from under those irq handlers, which in turn could
	result in oopses or worse.

Unfortunately, both ia64 functions above look at cpu_online_map to find
a new CPU to migrate interrupts onto. This means we can potentially
migrate an interrupt off ourself back to... ourself. Uh oh.

This causes an oops when we finally try to process pending interrupts on
the CPU we want to disable. The oops results from calling __do_IRQ with
a NULL pt_regs:

Unable to handle kernel NULL pointer dereference (address 0000000000000040)
Call Trace:
 [<a000000100016930>] show_stack+0x50/0xa0
                                sp=e0000009c922fa00 bsp=e0000009c92214d0
 [<a0000001000171a0>] show_regs+0x820/0x860
                                sp=e0000009c922fbd0 bsp=e0000009c9221478
 [<a00000010003c700>] die+0x1a0/0x2e0
                                sp=e0000009c922fbd0 bsp=e0000009c9221438
 [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
                                sp=e0000009c922fbd0 bsp=e0000009c92213d8
 [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000009c922fc60 bsp=e0000009c92213d8
 [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
                                sp=e0000009c922fe30 bsp=e0000009c9221398
 [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
                                sp=e0000009c922fe30 bsp=e0000009c9221330
 [<a00000010013a800>] handle_IRQ_event+0x80/0x120
                                sp=e0000009c922fe30 bsp=e0000009c92212f8
 [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
                                sp=e0000009c922fe30 bsp=e0000009c9221290
 [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
                                sp=e0000009c922fe30 bsp=e0000009c9221208
 [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
                                sp=e0000009c922fe30 bsp=e0000009c92211a8
 [<a00000010005bd80>] __cpu_disable+0x140/0x240
                                sp=e0000009c922fe30 bsp=e0000009c9221168
 [<a0000001006c5870>] take_cpu_down+0x50/0xa0
                                sp=e0000009c922fe30 bsp=e0000009c9221148
 [<a000000100122610>] stop_cpu+0xd0/0x200
                                sp=e0000009c922fe30 bsp=e0000009c92210f0
 [<a0000001000e0440>] kthread+0xc0/0x140
                                sp=e0000009c922fe30 bsp=e0000009c92210c8
 [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
                                sp=e0000009c922fe30 bsp=e0000009c92210a0
 [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                sp=e0000009c922fe30 bsp=e0000009c92210a0

I don't like this revert because it is fragile. ia64 is getting lucky
because we seem to only ever process timer interrupts in this path, but
if we ever race with an IPI here, we definitely use RCU and have the
potential of hitting an oops that Paul describes above.

Patching ia64's timer_interrupt() to check for NULL pt_regs is
insufficient though, as we still hit the above oops.

As a short term solution, I do think that this revert is the right
answer. The revert hold up under repeated testing (24+ hour test runs)
with this setup:

	- 8-way rx6600
	- randomly toggling CPU online/offline state every 2 seconds
	- running CPU exercisers, memory hog, disk exercisers, and
	  network stressors
	- average system load around ~160

In the long term, we really need to figure out why we set pt_regs = NULL
in ia64_process_pending_intr(). If it turns out that it is unnecessary
to do so, then we could safely re-introduce e7b14036 (along with some
other logic to be smarter about migrating interrupts).

One final note: x86 also removes the disabled CPU from cpu_online_map
and then re-enables interrupts for 1ms, presumably to handle any pending
interrupts:

arch/x86/kernel/irq_32.c (and irq_64.c):
cpu_disable_common:
	[remove cpu from cpu_online_map]

	fixup_irqs():
		for_each_irq:
			[break CPU affinities]

		local_irq_enable();
		mdelay(1);
		local_irq_disable();

So they are doing implicitly what ia64 is doing explicitly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
2009-02-19 11:32:26 -08:00
Robin Holt 39d481cba2 [IA64] bte_copy of BTE_MAX_XFER trips BUG_ON.
BTE_MAX_XFER is wrong.  It is one greater than the number of cache
lines the BTE is actually able to transfer.  If you request a transfer
of exactly BTE_MAX_XFER size, you trip a very cryptic BUG_ON() which
should certainly be made more clear.

This patch fixes that constant and also cleans up the BUG_ON()s in
arch/ia64/sn/kernel/bte.c to test one condition per line.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
2009-02-19 11:29:31 -08:00
Tony Luck 334f85b647 [IA64] Build fix for __early_pfn_to_nid() undefined link error
ia64 only defines __early_pfn_to_nid() for SPARSEMEM && NUMA configurations,
so the recent:

	commit: f2dbcfa738
	mm: clean up for early_pfn_to_nid()

ends up with some link problems for certain configuration files.

Fix arch/ia64/Kconfig to only define HAVE_ARCH_EARLY_PFN_TO_NID in the
cases where we do provide this function.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-02-19 11:22:36 -08:00
Linus Torvalds 402a917aca Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] 5405/1: ep93xx: remove unused gesbc9312.h header
  [ARM] 5404/1: Fix condition in arm_elf_read_implies_exec() to set READ_IMPLIES_EXEC
  [ARM] omap: fix clock reparenting in omap2_clk_set_parent()
  [ARM] 5403/1: pxa25x_ep_fifo_flush() *ep->reg_udccs always set to 0
  [ARM] 5402/1: fix a case of wrap-around in sanity_check_meminfo()
  [ARM] 5401/1: Orion: fix edge triggered GPIO interrupt support
  [ARM] 5400/1: Add support for inverted rdy_busy pin for Atmel nand device controller
  [ARM] 5391/1: AT91: Enable GPIO clocks earlier
  [ARM] 5390/1: AT91: Watchdog fixes
  [ARM] 5398/1: Add Wan ZongShun to MAINTAINERS for W90P910
  [ARM] omap: fix _omap2_clksel_get_src_field()
  [ARM] omap: fix omap2_divisor_to_clksel() error return value
2009-02-19 09:52:12 -08:00
Ingo Molnar e9ce0c37c2 Merge branch 'x86/untangle2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/headers 2009-02-19 18:15:01 +01:00
Linus Torvalds bcf8951fc2 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown
  x86, mce: use force_sig_info to kill process in machine check
  x86, mce: reinitialize per cpu features on resume
  x86, rcu: fix strange load average and ksoftirqd behavior
2009-02-19 09:14:35 -08:00
Hartley Sweeten 9dd446f657 [ARM] 5405/1: ep93xx: remove unused gesbc9312.h header
Remove the gesbc9312.h header since it is unused.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-19 16:13:02 +00:00
Cyrill Gorcunov cb425afd21 x86: compressed head_32 - use ENTRY,ENDPROC macros
Impact: clenaup

Linker script will put startup_32 at predefined
address so using startup_32 will not bloat the
code size.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:13:01 +01:00
Cyrill Gorcunov 2d4eeecb98 x86: compressed head_64 - use ENTRY,ENDPROC macros
Impact: clenaup

Linker script will put startup_32 at predefined
address so using ENTRY will not bloat the code
size.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:13:01 +01:00
Cyrill Gorcunov 324bda9e47 x86: pmjump - use GLOBAL,ENDPROC macros
Impact: cleanup

We are in setup stage so we use GLOBAL
instead of ENTRY and do not increase code
size.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:13:00 +01:00
Cyrill Gorcunov 2f79555097 x86: copy.S - use GLOBAL,ENDPROC macros
Impact: cleanup

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:13:00 +01:00
Cyrill Gorcunov 1b25f3b4e1 x86: linkage - get rid of _X86 macros
Impact: cleanup

There was an attempt to bring build-time checking for
missed ENTRY_X86/END_X86 and KPROBE... pairs. Using
them will add messy in code. Get just rid of them.
This commit could be easily restored if the need appear
in future.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:12:59 +01:00
Cyrill Gorcunov 95695547a7 x86: asm linkage - introduce GLOBAL macro
If the code is time critical and this entry is called
from other places we use ENTRY to have it globally defined
and especially aligned.

Contrary we have some snippets which are size
critical. So we use plane ".globl name; name:"
directive. Introduce GLOBAL macro for this.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:12:59 +01:00
Makito SHIOKAWA 9da616fb99 [ARM] 5404/1: Fix condition in arm_elf_read_implies_exec() to set READ_IMPLIES_EXEC
READ_IMPLIES_EXEC must be set when:
o binary _is_ an executable stack (i.e. not EXSTACK_DISABLE_X)
o processor architecture is _under_ ARMv6 (XN bit is supported from ARMv6)

Signed-off-by: Makito SHIOKAWA <lkhmkt@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-19 14:45:27 +00:00
Heiko Carstens 23d75d9cad [S390] fix "mem=" handling in case of standby memory
Standby memory detected with the sclp interface gets always registered
with add_memory calls without considering the limitationt that the
"mem=" kernel paramater implies.
So fix this and only register standby memory that is below the specified
limit.
This fixes zfcpdump since it uses "mem=32M". In case there is appr.
2GB standby memory present all of usable memory would be used for the
struct pages needed for standby memory.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-02-19 15:19:19 +01:00
Christian Borntraeger d5cd0343d2 [S390] Fix timeval regression on s390
commit aa5e97ce4b
[PATCH] improve precision of process accounting.

Introduced a timing regression:
-bash-3.2# time ls
real    0m0.006s
user    0m1.754s
sys     0m1.094s

The problem was introduced by an error in cputime_to_timeval.
Cputime is now 1/4096 microsecond, therefore, we have to divide
the remainder with 4096 to get the microseconds.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-02-19 15:19:19 +01:00
Russell King 41f3103fcf [ARM] omap: fix clock reparenting in omap2_clk_set_parent()
When changing the parent of a clock, it is necessary to keep the
clock use counts balanced otherwise things the parent state will
get corrupted.  Since we already disable and re-enable the clock,
we might as well use the recursive versions instead.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-19 13:25:16 +00:00
Hiroshi Shimamoto 71d8f9784a x86: syscalls.h: remove asmlinkage from declaration of sys_rt_sigreturn()
Impact: cleanup

asmlinkage for sys_rt_sigreturn() no longer exists in arch/x86/kernel/signal.c.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 12:18:54 +01:00
Nicolas Pitre 3fd9825c42 [ARM] 5402/1: fix a case of wrap-around in sanity_check_meminfo()
In the non highmem case, if two memory banks of 1GB each are provided,
the second bank would evade suppression since its virtual base would
be 0.  Fix this by disallowing any memory bank which virtual base
address is found to be lower than PAGE_OFFSET.

Reported-by: Lennert Buytenhek <buytenh@marvell.com>

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-19 09:49:45 +00:00
Jaswinder Singh Rajput de5483029b x86: include/asm/processor.h remove double declaration of print_cpu_info
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 10:12:18 +01:00
KAMEZAWA Hiroyuki cc2559bccc mm: fix memmap init for handling memory hole
Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole.
and memmap initialization was not done. This was a trouble for
sparc boot.

To fix this, the PFN should be initialized and marked as PG_reserved.
This patch changes early_pfn_in_nid() return true if PFN is a hole.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reported-by: David Miller <davem@davemlloft.net>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:55 -08:00
KAMEZAWA Hiroyuki f2dbcfa738 mm: clean up for early_pfn_to_nid()
What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:

	BUG_ON(page_zone(start_page) != page_zone(end_page));

Once I knew this is what was happening, I added some annotations:

	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
		printk(KERN_ERR "move_freepages: Bogus zones: "
		       "start_page[%p] end_page[%p] zone[%p]\n",
		       start_page, end_page, zone);
		printk(KERN_ERR "move_freepages: "
		       "start_zone[%p] end_zone[%p]\n",
		       page_zone(start_page), page_zone(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
		       page_to_pfn(start_page), page_to_pfn(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_nid[%d] end_nid[%d]\n",
		       page_to_nid(start_page), page_to_nid(end_page));
 ...

And here's what I got:

	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
	move_freepages: start_nid[1] end_nid[0]

My memory layout on this box is:

[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -> 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00020000
[    0.000000]     1: 0x00800000 -> 0x0081f7ff
[    0.000000]     1: 0x0081f800 -> 0x0081fe50
[    0.000000]     1: 0x0081fed1 -> 0x0081fed8
[    0.000000]     1: 0x0081feda -> 0x0081fedb
[    0.000000]     1: 0x0081fedd -> 0x0081fee5
[    0.000000]     1: 0x0081fee7 -> 0x0081ff51
[    0.000000]     1: 0x0081ff59 -> 0x0081ff5d

So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.

This patch:

Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
 I think it makes fix-for-memmap-init not easy.

This patch moves all declaration to include/linux/mm.h

After this,
  if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use static definition in include/linux/mm.h
  else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use generic definition in mm/page_alloc.c
  else
     -> per-arch back end function will be called.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Miller <davem@davemlloft.net>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:55 -08:00