Граф коммитов

1100 Коммитов

Автор SHA1 Сообщение Дата
Jack Steiner fa526d0d64 x86, pat: Fix cacheflush address in change_page_attr_set_clr()
Fix address passed to cpa_flush_range() when changing page
attributes from WB to UC. The address (*addr) is
modified by __change_page_attr_set_clr(). The result is that
the pages being flushed start at the _end_ of the changed range
instead of the beginning.

This should be considered for 2.6.30-stable and 2.6.31-stable.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Stable team <stable@kernel.org>
2009-09-09 14:05:24 -07:00
H. Peter Anvin b855192c08 Merge branch 'x86/urgent' into x86/pat
Reason: Change to is_new_memtype_allowed() in x86/urgent

Resolved semantic conflicts in:

	 arch/x86/mm/pat.c
	 arch/x86/mm/ioremap.c

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 17:24:28 -07:00
Venkatesh Pallipadi d886c73cd4 x86, pat: Sanity check remap_pfn_range for RAM region
Add sanity check for remap_pfn_range of RAM regions using
lookup_memtype(). Previously, we did not have anyway to get the type of
RAM memory regions as they were tracked using a single bit in
page_struct (WB, nonWB). Now we can get the actual type from page struct
(WB, WC, UC_MINUS) and make sure the requester gets that type.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:35 -07:00
Venkatesh Pallipadi 1087637616 x86, pat: Lookup the protection from memtype list on vm_insert_pfn()
Lookup the reserved memtype during vm_insert_pfn and use that memtype
for the new mapping. This takes care or handling of vm_insert_pfn()
interface in track_pfn_vma*/untrack_pfn_vma.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:32 -07:00
Venkatesh Pallipadi 637b86e75f x86, pat: Add lookup_memtype to get the current memtype of a paddr
Add a new routine lookup_memtype() to get the current memtype based on
the PAT reserves and frees.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:28 -07:00
Venkatesh Pallipadi f584174096 x86, pat: Use page flags to track memtypes of RAM pages
Change reserve_ram_pages_type and free_ram_pages_type to use 2 page
flags to track UC_MINUS, WC, WB and default types. Previous RAM tracking
just tracked WB or NonWB, which was not complete and did not allow
tracking of RAM fully and there was no way to get the actual type
reserved by looking at the page flags.

We use the memtype_lock spinlock for atomicity in dealing with
memtype tracking in struct page.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:24 -07:00
Venkatesh Pallipadi 335ef896d4 x86, pat: Add rbtree to do quick lookup in memtype tracking
PAT memtype tracking uses a linear link list to keep track of IO
(non-RAM) regions and their memtypes. The code used a last_accessed
pointer as a cache to speedup the lookup. As per discussions with
H. Peter Anvin a while back, having a rbtree here will avoid bad
performances in pathological cases where we may end up with huge
linked list. This may not add any noticable performance speedup
in normal case as the number of entires in PAT memtype list tend
to be ~20-30 range. The patch removes the "cached_entry" logic
as with rbtree we have more generic way of speeding up the lookup.

With this patch, we use rbtree to do the quick lookup. We still use
linked list as the memtype range tracked can be of different sizes
and can overlap in different ways. We also keep track of usage counts
with linked list.

Example:
Multiple ioremaps with different sizes
uncached-minus @ 0xfffff00000-0xfffff04000
uncached-minus @ 0xfffff02000-0xfffff03000

And one userlevel mmap and the thread forks a new process
uncached-minus @ 0xbf453000-0xbf454000
uncached-minus @ 0xbf453000-0xbf454000

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:19 -07:00
Venkatesh Pallipadi 9e36fda0b3 x86, pat: Add PAT reserve free to io_mapping* APIs
io_mapping_* interfaces were added, mainly for graphics drivers.
Make this interface go through the PAT reserve/free, instead of
hardcoding WC mapping. This makes sure that there are no
aliases due to unconditional WC setting.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:16 -07:00
Venkatesh Pallipadi 9fd126bc74 x86, pat: New i/f for driver to request memtype for IO regions
Add new routines to request memtype for IO regions. This will currently
be a backend for io_mapping_* routines. But, it can also be made available
to drivers directly in future, in case it is needed.

reserve interface reserves the memory, makes sure we have a compatible
memory type available and keeps the identity map in sync when needed.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:10 -07:00
Venkatesh Pallipadi 279e669b3f x86, pat: ioremap to follow same PAT restrictions as other PAT users
ioremap has this hard-coded check for new type and requested type. That
check differs from other PAT users like /dev/mem mmap, remap_pfn_range
in only one condition where requested type is UC_MINUS and new type
is WC. Under that condition, ioremap fails. But other PAT interfaces succeed
with a WC mapping.

Change to make ioremap be in sync with other PAT APIs and use the same
macro as others. Also changes the error print to KERN_ERR instead of
pr_debug.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:41:07 -07:00
Venkatesh Pallipadi 5fc517466d x86, pat: Keep identity maps consistent with mmaps even when pat_disabled
Make reserve_memtype internally take care of pat disabled case and fallback
to default return values.

Remove the specific pat_disabled checks in track_* routines.

Change kernel_map_sync_memtype to sync identity map even when
pat_disabled.

This change ensures that, even for pat_disabled case, we take care of
keeping identity map in sync. Before this patch, in pat disabled case,
ioremap() keeps the identity maps in sync and other APIs like pci and
/dev/mem mmap don't, which is not a very consistent behavior.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-26 15:40:58 -07:00
Amerigo Wang 3e0e1e9c5a x86: Fix an incorrect argument of reserve_bootmem()
This line looks suspicious, because if this is true, then the
'flags' parameter of function reserve_bootmem_generic() will be
unused when !CONFIG_NUMA. I don't think this is what we want.

Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: akpm@linux-foundation.org
LKML-Reference: <20090821083709.5098.52505.sendpatchset@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-21 16:40:31 +02:00
Suresh Siddha 1adcaafe74 x86, pat: Allow ISA memory range uncacheable mapping requests
Max Vozeler reported:
>  Bug 13877 -  bogl-term broken with CONFIG_X86_PAT=y, works with =n
>
>  strace of bogl-term:
>  814   mmap2(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0)
>				 = -1 EAGAIN (Resource temporarily unavailable)
>  814   write(2, "bogl: mmaping /dev/fb0: Resource temporarily unavailable\n",
>	       57) = 57

PAT code maps the ISA memory range as WB in the PAT attribute, so that
fixed range MTRR registers define the actual memory type (UC/WC/WT etc).

But the upper level is_new_memtype_allowed() API checks are failing,
as the request here is for UC and the return tracked type is WB (Tracked type is
WB as MTRR type for this legacy range potentially will be different for each
4k page).

Fix is_new_memtype_allowed() by always succeeding the ISA address range
checks, as the null PAT (WB) and def MTRR fixed range register settings
satisfy the memory type needs of the applications that map the ISA address
range.

Reported-and-Tested-by: Max Vozeler <xam@debian.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-08-17 14:12:44 -07:00
Linus Torvalds 067e18133f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Work around compilation warning in arch/x86/kernel/apm_32.c
  x86, UV: Complete IRQ interrupt migration in arch_enable_uv_irq()
  x86, 32-bit: Fix double accounting in reserve_top_address()
  x86: Don't use current_cpu_data in x2apic phys_pkg_id
  x86, UV: Fix UV apic mode
  x86, UV: Fix macros for accessing large node numbers
  x86, UV: Delete mapping of MMR rangs mapped by BIOS
  x86, UV: Handle missing blade-local memory correctly
  x86: fix assembly constraints in native_save_fl()
  x86, msr: execute on the correct CPU subset
  x86: Fix assert syntax in vmlinux.lds.S
  x86: Make 64-bit efi_ioremap use ioremap on MMIO regions
  x86: Add quirk to make Apple MacBook5,2 use reboot=pci
  x86: Fix CPA memtype reserving in the set_pages_array*() cases
  x86, pat: Fix set_memory_wc related corruption
  x86: fix section mismatch for i386 init code
2009-08-04 15:28:59 -07:00
Jan Beulich 6abf655109 x86, 32-bit: Fix double accounting in reserve_top_address()
With VMALLOC_END included in the calculation of MAXMEM (as of
2.6.28) it is no longer correct to also bump __VMALLOC_RESERVE
in reserve_top_address(). Doing so results in needlessly small
lowmem.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4A71DD2A020000780000D482@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-04 16:27:29 +02:00
Thomas Hellstrom 8523acfe40 x86: Fix CPA memtype reserving in the set_pages_array*() cases
The code was incorrectly reserving memtypes using the page
virtual address instead of the physical address. Furthermore,
the code was not ignoring highmem pages as it ought to.

( upstream does not pass in highmem pages yet - but upcoming
  graphics code will do it and there's no reason to not handle
  this properly in the CPA APIs.)

Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=13884

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: <stable@kernel.org>
Cc: dri-devel@lists.sourceforge.net
Cc: venkatesh.pallipadi@intel.com
LKML-Reference: <1249284345-7654-1-git-send-email-thellstrom@vmware.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-03 19:36:09 +02:00
Pallipadi, Venkatesh bdc6340f4e x86, pat: Fix set_memory_wc related corruption
Changeset 3869c4aa18
that went in after 2.6.30-rc1 was a seemingly small change to _set_memory_wc()
to make it complaint with SDM requirements. But, introduced a nasty bug, which
can result in crash and/or strange corruptions when set_memory_wc is used.
One such crash reported here
http://lkml.org/lkml/2009/7/30/94

Actually, that changeset introduced two bugs.
* change_page_attr_set() takes &addr as first argument and can the addr value
  might have changed on return, even for single page change_page_attr_set()
  call. That will make the second change_page_attr_set() in this routine
  operate on unrelated addr, that can eventually cause strange corruptions
  and bad page state crash.
* The second change_page_attr_set() call, before setting _PAGE_CACHE_WC, should
  clear the earlier _PAGE_CACHE_UC_MINUS, as otherwise cache attribute will not
  be WC (will be UC instead).

The patch below fixes both these problems. Sending a single patch to fix both
the problems, as the change is to the same line of code. The change to have a
addr_copy is not very clean. But, it is simpler than making more changes
through various routines in pageattr.c.

A huge thanks to Jerome for reporting this problem and providing a simple test
case that helped us root cause the problem.

Reported-by: Jerome Glisse <glisse@freedesktop.org>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20090730214319.GA1889@linux-os.sc.intel.com>
Acked-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-07-30 17:48:34 -07:00
Linus Torvalds 84210aeb4a Merge branch 'drm-radeon-kms' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
* 'drm-radeon-kms' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (35 commits)
  drm/radeon: set fb aperture sizes for framebuffer handoff.
  drm/ttm: fix highuser vs dma32 confusion.
  drm/radeon: Fix size used for benchmarking BO copies.
  drm/radeon: Add radeon.test parameter for running BO GPU copy tests.
  drm/radeon/kms: allow interruptible waits for objects.
  drm/ttm: powerpc: Fix Highmem cache flushing.
  x86: Export kmap_atomic_prot() needed for TTM.
  drm/ttm: Fix ttm in-kernel copying of pages with non-standard caching attributes.
  drm/ttm: Fix an oops and sync object leak.
  drm/radeon/kms: vram sizing on certain r100 chips needs workaround.
  drm/radeon: Pay more attention to object placement requested by userspace.
  drm/radeon: Fall back to evicting BOs with memcpy if necessary.
  drm/radeon: Don't unreserve twice on failure to validate.
  drm/radeon/kms: fix bandwidth computation on avivo hardware
  drm/radeon/kms: add initial colortiling support.
  drm/radeon/kms: fix hotspot handling on pre-avivo chips
  drm/radeon/kms: enable frac fb divs on rs600/rs690/rs740
  drm/radeon/kms: add PLL flag to prefer frequencies <= the target freq
  drm/radeon/kms: block RN50 from using 3D engine.
  drm/radeon/kms: fix VRAM sizing like DDX does it.
  ...
2009-07-29 12:31:59 -07:00
Thomas Hellstrom 73ba651fc2 x86: Export kmap_atomic_prot() needed for TTM.
This functionality is needed to kmap_atomic() highmem pages that may
potentially have or are about to set up other mappings with
non-standard caching attributes.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2009-07-29 15:56:22 +10:00
Linus Torvalds ca597a02cd Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: geode: Mark mfgpt irq IRQF_TIMER to prevent resume failure
  x86, amd: Don't probe for extended APIC ID if APICs are disabled
  x86, mce: Rename incorrect macro name "CONFIG_X86_THRESHOLD"
  x86-64: Fix bad_srat() to clear all state
  x86, mce: Fix set_trigger() accessor
  x86: Fix movq immediate operand constraints in uaccess.h
  x86: Fix movq immediate operand constraints in uaccess_64.h
  x86: Add reboot fixup for SBC-fitPC2
  x86: Include all of .data.* sections in _edata on 64-bit
  x86: Add quirk for Intel DG45ID board to avoid low memory corruption
2009-07-27 12:18:09 -07:00
Benjamin Herrenschmidt 9e1b32caa5 mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()
mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()

Upcoming paches to support the new 64-bit "BookE" powerpc architecture
will need to have the virtual address corresponding to PTE page when
freeing it, due to the way the HW table walker works.

Basically, the TLB can be loaded with "large" pages that cover the whole
virtual space (well, sort-of, half of it actually) represented by a PTE
page, and which contain an "indirect" bit indicating that this TLB entry
RPN points to an array of PTEs from which the TLB can then create direct
entries. Thus, in order to invalidate those when PTE pages are deleted,
we need the virtual address to pass to tlbilx or tlbivax instructions.

The old trick of sticking it somewhere in the PTE page struct page sucks
too much, the address is almost readily available in all call sites and
almost everybody implemets these as macros, so we may as well add the
argument everywhere. I added it to the pmd and pud variants for consistency.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV]
Acked-by: Nick Piggin <npiggin@suse.de>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-27 12:10:38 -07:00
Andi Kleen 429b2b319a x86-64: Fix bad_srat() to clear all state
Need to clear both nodes and nodes_add state for start/end.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <20090718065657.GA2898@basil.fritz.box>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org
2009-07-21 15:20:01 -07:00
Roland Dreier a1a08d1cb0 x86: Remove spurious printk level from segfault message
Since commit 5fd29d6c ("printk: clean up handling of log-levels
and newlines"), the kernel logs segfaults like:

    <6>gnome-power-man[24509]: segfault at 20 ip 00007f9d4950465a sp 00007fffbb50fc70 error 4 in libgobject-2.0.so.0.2103.0[7f9d494f7000+45000]

with the extra "<6>" being KERN_INFO.  This happens because the
printk in show_signal_msg() started with KERN_CONT and then
used "%s" to pass in the real level; and KERN_CONT is no longer
an empty string, and printk only pays attention to the level at
the very beginning of the format string.

Therefore, remove the KERN_CONT from this printk, since it is
now actively causing problems (and never really made any
sense).

Signed-off-by: Roland Dreier <roland@digitalvampire.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <874otjitkj.fsf@shaolin.home.digitalvampire.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-11 09:56:19 +02:00
Yinghai Lu 44b5728095 x86: don't clear nodes_states[N_NORMAL_MEMORY] when numa is not compiled in
Alex found that specjbb2005 still can not run with hugepages on an
x86-64 machine.  This only happens when numa is not compiled in.

The root cause: node_set_state will not set it back for us in that case,
so don't clear that when numa is not select in config

[ v2: use node_clear_state instead ]
Reported-and-Tested-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-08 10:32:50 -07:00
Joe Perches ad361c9884 Remove multiple KERN_ prefixes from printk formats
Commit 5fd29d6ccb ("printk: clean up
handling of log-levels and newlines") changed printk semantics.  printk
lines with multiple KERN_<level> prefixes are no longer emitted as
before the patch.

<level> is now included in the output on each additional use.

Remove all uses of multiple KERN_<level>s in formats.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-08 10:30:03 -07:00
Linus Torvalds faf80d62e4 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: fix usage of bios intcall()
  x86: Remove unused function lapic_watchdog_ok()
  x86: Remove unused variable disable_x2apic
  x86, kvm: Fix section mismatches in kvm.c
  x86: Add missing annotation to arch/x86/lib/copy_user_64.S::copy_to_user
  x86: Fix fixmap page order for FIX_TEXT_POKE0,1
  amd-iommu: set evt_buf_size correctly
  amd-iommu: handle alias entries correctly in init code
  x86: Fix printk call in print_local_apic()
  x86: Declare check_efer() before it gets used
  x86: Mark device_nb as static and fix NULL noise
  x86: Remove double declaration of MSR_P6_EVNTSEL0 and MSR_P6_EVNTSEL1
  xen: Use kcalloc() in xen_init_IRQ()
  x86: Fix fixmap ordering
  x86: Fix symbol annotation for arch/x86/lib/clear_page_64.S::clear_page_c
2009-07-06 17:45:44 -07:00
Jaswinder Singh Rajput 76c06927f2 x86: Declare check_efer() before it gets used
This sparse warning:

  arch/x86/mm/init.c:83:16: warning: symbol 'check_efer' was not declared. Should it be static?

triggers because check_efer() is not decalared before using it.
asm/proto.h includes the declaration of check_efer(), so
including asm/proto.h to fix that - this also addresses the
sparse warning.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1246458263.6940.22.camel@hpdv5.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-01 16:52:54 +02:00
Yinghai Lu 66918dcdf9 x86: only clear node_states for 64bit
Nathan reported that

| commit 73d60b7f74
| Author: Yinghai Lu <yinghai@kernel.org>
| Date:   Tue Jun 16 15:33:00 2009 -0700
|
|    page-allocator: clear N_HIGH_MEMORY map before we set it again
|
|    SRAT tables may contains nodes of very small size.  The arch code may
|    decide to not activate such a node.  However, currently the early boot
|    code sets N_HIGH_MEMORY for such nodes.  These nodes therefore seem to be
|    active although these nodes have no present pages.
|
|    For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too

unintentionally and incorrectly clears the cpuset.mems cgroup attribute on
an i386 kvm guest, meaning that cpuset.mems can not be used.

Fix this by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
and need to do save/restore for that in find_zone_movable_pfn

Reported-by: Nathan Lynch <ntl@pobox.com>
Tested-by: Nathan Lynch <ntl@pobox.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-30 18:56:01 -07:00
Pekka J Enberg 854c879f5a x86: Move init_gbpages() to setup_arch()
The init_gbpages() function is conditionally called from
init_memory_mapping() function. There are two call-sites where
this 'after_bootmem' condition can be true: setup_arch() and
mem_init() via pci_iommu_alloc().

Therefore, it's safe to move the call to init_gbpages() to
setup_arch() as it's always called before mem_init().

This removes an after_bootmem use - paving the way to remove
all uses of that state variable.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <Pine.LNX.4.64.0906221731210.19474@melkki.cs.Helsinki.FI>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-23 10:33:32 +02:00
Tejun Heo e59a1bb2fd x86: fix pageattr handling for lpage percpu allocator and re-enable it
lpage allocator aliases a PMD page for each cpu and returns whatever
is unused to the page allocator.  When the pageattr of the recycled
pages are changed, this makes the two aliases point to the overlapping
regions with different attributes which isn't allowed and known to
cause subtle data corruption in certain cases.

This can be handled in simliar manner to the x86_64 highmap alias.
pageattr code should detect if the target pages have PMD alias and
split the PMD alias and synchronize the attributes.

pcpur allocator is updated to keep the allocated PMD pages map sorted
in ascending address order and provide pcpu_lpage_remapped() function
which binary searches the array to determine whether the given address
is aliased and if so to which address.  pageattr is updated to use
pcpu_lpage_remapped() to detect the PMD alias and split it up as
necessary from cpa_process_alias().

Jan Beulich spotted the original problem and incorrect usage of vaddr
instead of laddr for lookup.

With this, lpage percpu allocator should work correctly.  Re-enable
it.

[ Impact: fix subtle lpage pageattr bug and re-enable lpage ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22 11:56:24 +09:00
Tejun Heo 992f4c1c2c x86: reorganize cpa_process_alias()
Reorganize cpa_process_alias() so that new alias condition can be
added easily.

Jan Beulich spotted problem in the original cleanup thread which
incorrectly assumed the two existing conditions were mutially
exclusive.

[ Impact: code reorganization ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
2009-06-22 11:56:24 +09:00
Linus Torvalds d06063cc22 Move FAULT_FLAG_xyz into handle_mm_fault() callers
This allows the callers to now pass down the full set of FAULT_FLAG_xyz
flags to handle_mm_fault().  All callers have been (mechanically)
converted to the new calling convention, there's almost certainly room
for architectures to clean up their code and then add FAULT_FLAG_RETRY
when that support is added.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-21 13:08:22 -07:00
Linus Torvalds 12e24f34cb Merge branch 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perfcounters-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (49 commits)
  perfcounter: Handle some IO return values
  perf_counter: Push perf_sample_data through the swcounter code
  perf_counter tools: Define and use our own u64, s64 etc. definitions
  perf_counter: Close race in perf_lock_task_context()
  perf_counter, x86: Improve interactions with fast-gup
  perf_counter: Simplify and fix task migration counting
  perf_counter tools: Add a data file header
  perf_counter: Update userspace callchain sampling uses
  perf_counter: Make callchain samples extensible
  perf report: Filter to parent set by default
  perf_counter tools: Handle lost events
  perf_counter: Add event overlow handling
  fs: Provide empty .set_page_dirty() aop for anon inodes
  perf_counter: tools: Makefile tweaks for 64-bit powerpc
  perf_counter: powerpc: Add processor back-end for MPC7450 family
  perf_counter: powerpc: Make powerpc perf_counter code safe for 32-bit kernels
  perf_counter: powerpc: Change how processor-specific back-ends get selected
  perf_counter: powerpc: Use unsigned long for register and constraint values
  perf_counter: powerpc: Enable use of software counters on 32-bit powerpc
  perf_counter tools: Add and use isprint()
  ...
2009-06-20 11:29:32 -07:00
Linus Torvalds c4c5ab3089 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (45 commits)
  x86, mce: fix error path in mce_create_device()
  x86: use zalloc_cpumask_var for mce_dev_initialized
  x86: fix duplicated sysfs attribute
  x86: de-assembler-ize asm/desc.h
  i386: fix/simplify espfix stack switching, move it into assembly
  i386: fix return to 16-bit stack from NMI handler
  x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present
  x86: Remove duplicated #include's
  x86: msr.h linux/types.h is only required for __KERNEL__
  x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround
  x86, mce: mce_intel.c needs <asm/apic.h>
  x86: apic/io_apic.c: dmar_msi_type should be static
  x86, io_apic.c: Work around compiler warning
  x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present
  x86: mce: Handle banks == 0 case in K7 quirk
  x86, boot: use .code16gcc instead of .code16
  x86: correct the conversion of EFI memory types
  x86: cap iomem_resource to addressable physical memory
  x86, mce: rename _64.c files which are no longer 64-bit-specific
  x86, mce: mce.h cleanup
  ...

Manually fix up trivial conflict in arch/x86/mm/fault.c
2009-06-20 10:49:48 -07:00
Linus Torvalds 7f81890687 x86: don't use 'access_ok()' as a range check in get_user_pages_fast()
It's really not right to use 'access_ok()', since that is meant for the
normal "get_user()" and "copy_from/to_user()" accesses, which are done
through the TLB, rather than through the page tables.

Why? access_ok() does both too few, and too many checks.  Too many,
because it is meant for regular kernel accesses that will not honor the
'user' bit in the page tables, and because it honors the USER_DS vs
KERNEL_DS distinction that we shouldn't care about in GUP.  And too few,
because it doesn't do the 'canonical' check on the address on x86-64,
since the TLB will do that for us.

So instead of using a function that isn't meant for this, and does
something else and much more complicated, just do the real rules: we
don't want the range to overflow, and on x86-64, we want it to be a
canonical low address (on 32-bit, all addresses are canonical).

Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-20 09:52:27 -07:00
Ingo Molnar 0c87197142 perf_counter, x86: Improve interactions with fast-gup
Improve a few details in perfcounter call-chain recording that
makes use of fast-GUP:

- Use ACCESS_ONCE() to observe the pte value. ptes are fundamentally
  racy and can be changed on another CPU, so we have to be careful
  about how we access them. The PAE branch is already careful with
  read-barriers - but the non-PAE and 64-bit side needs an
  ACCESS_ONCE() to make sure the pte value is observed only once.

- make the checks a bit stricter so that we can feed it any kind of
  cra^H^H^H user-space input ;-)

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-19 16:55:16 +02:00
Ingo Molnar a3d06cc6aa Merge branch 'linus' into perfcounters/core
Conflicts:
	arch/x86/include/asm/kmap_types.h
	include/linux/mm.h

	include/asm-generic/kmap_types.h

Merge reason: We crossed changes with kmap_types.h cleanups in mainline.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 13:06:17 +02:00
Ingo Molnar cc4949e1fd Merge branch 'linus' into x86/urgent
Merge reason: pull in latest to fix a bug in it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-17 08:59:10 +02:00
Ingo Molnar 5dfaf90f80 x86: mm: Read cr2 before prefetching the mmap_lock
Prefetch instructions can generate spurious faults on certain
models of older CPUs. The faults themselves cannot be stopped
and they can occur pretty much anywhere - so the way we solve
them is that we detect certain patterns and ignore the fault.

There is one small path of code where we must not take faults
though: the #PF handler execution leading up to the reading
of the CR2 (the faulting address). If we take a fault there
then we destroy the CR2 value (with that of the prefetching
instruction's) and possibly mishandle user-space or
kernel-space pagefaults.

It turns out that in current upstream we do exactly that:

	prefetchw(&mm->mmap_sem);

	/* Get the faulting address: */
	address = read_cr2();

This is not good.

So turn around the order: first read the cr2 then prefetch
the lock address. Reading cr2 is plenty fast (2 cycles) so
delaying the prefetch by this amount shouldnt be a big issue
performance-wise.

[ And this might explain a mystery fault.c warning that sometimes
  occurs on one an old AMD/Semptron based test-system i have -
  which does have such prefetch problems. ]

Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegard.nossum@gmail.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
LKML-Reference: <20090616030522.GA22162@Krystal>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-16 10:23:32 +02:00
Peter Zijlstra 465a454f25 x86, mm: Add __get_user_pages_fast()
Introduce a gup_fast() variant which is usable from IRQ/NMI context.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Nick Piggin <npiggin@suse.de>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-15 15:57:51 +02:00
Vegard Nossum 722f2a6c87 Merge commit 'linus/master' into HEAD
Conflicts:
	MAINTAINERS

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 15:50:49 +02:00
Vegard Nossum ac61a75796 kmemcheck: add opcode self-testing at boot
We've had some troubles in the past with weird instructions. This
patch adds a self-test framework which can be used to verify that
a certain set of opcodes are decoded correctly. Of course, the
opcodes which are not tested can still give the wrong results.

In short, this is just a safeguard to catch unintentional changes
in the opcode decoder. It does not mean that errors can't still
occur!

[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 15:49:22 +02:00
Vegard Nossum b1eeab6768 kmemcheck: add hooks for the page allocator
This adds support for tracking the initializedness of memory that
was allocated with the page allocator. Highmem requests are not
tracked.

Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>

[build fix for !CONFIG_KMEMCHECK]
Signed-off-by: Ingo Molnar <mingo@elte.hu>

[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 15:48:33 +02:00
Vegard Nossum 9e730237c2 kmemcheck: don't track page tables
As these are allocated using the page allocator, we need to pass
__GFP_NOTRACK before we add page allocator support to kmemcheck.

Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 12:40:11 +02:00
Vegard Nossum f85612967c x86: add hooks for kmemcheck
The hooks that we modify are:
- Page fault handler (to handle kmemcheck faults)
- Debug exception handler (to hide pages after single-stepping
  the instruction that caused the page fault)

Also redefine memset() to use the optimized version if kmemcheck is
enabled.

(Thanks to Pekka Enberg for minimizing the impact on the page fault
handler.)

As kmemcheck doesn't handle MMX/SSE instructions (yet), we also disable
the optimized xor code, and rely instead on the generic C implementation
in order to avoid false-positive warnings.

Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>

[whitespace fixlet]
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

[rebased for mainline inclusion]
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
2009-06-15 12:40:02 +02:00
Pekka Enberg f8b4ece2a9 kmemcheck: use kmemcheck_pte_lookup() instead of open-coding it
Lets use kmemcheck_pte_lookup() in kmemcheck_fault() instead of
open-coding it there.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 12:40:00 +02:00
Pekka Enberg 6d9609c132 kmemcheck: move 64-bit ifdef out of kmemcheck_opcode_decode()
This patch moves the CONFIG_X86_64 ifdef out of kmemcheck_opcode_decode() by
introducing a version of the function that always returns false for
CONFIG_X86_32.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 12:39:59 +02:00
Pekka Enberg 0c33cacd86 kmemcheck: remove multiple ifdef'd definitions of the same global variable
Multiple ifdef'd definitions of the same global variable is ugly and
error-prone. Fix that up.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 12:39:57 +02:00
Pekka Enberg 5b53b76a61 kmemcheck: make initialization message less confusing
The "Bugs, beware!" printout during is cute but confuses users that something
bad happened so change the text to the more boring "Initialized" message.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 12:39:56 +02:00
Pekka Enberg 6a19638719 kmemcheck: remove forward declarations from error.c
This patch reorders code in error.c so that we can get rid of the forward
declarations.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com>
2009-06-15 12:39:47 +02:00