Blah. The patch [0] I recently sent fixing errors with
in_hugepage_area() and prepare_hugepage_range() for powerpc itself has
an off-by-one bug. Furthermore, the related functions
touches_hugepage_*_range() and within_hugepage_*_range() are also
buggy. Some of the bugs, like those addressed in [0] originated with
commit 7d24f0b8a5 where we tweaked the
semantics of where hugepages are allowed. Other bugs have been there
essentially forever, and are due to the undefined behaviour of '<<'
with shift counts greater than the type width (LOW_ESID_MASK could
return non-zero for high ranges with the right congruences).
The good news is that I now have a testsuite which should pick up
things like this if they creep in again.
[0] "powerpc-fix-for-hugepage-areas-straddling-4gb-boundary"
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Commit 7d24f0b8a5 fixed bugs in the ppc64 SLB
miss handler with respect to hugepage handling, and in the process tweaked
the semantics of the hugepage address masks in mm_context_t.
Unfortunately, it left out a couple of necessary changes to go with that
change. First, the in_hugepage_area() macro was not updated to match,
second prepare_hugepage_range() was not updated to correctly handle
hugepages regions which straddled the 4GB point.
The latter appears only to cause process-hangs when attempting to map such
a region, but the former can cause oopses if a get_user_pages() is
triggered at the wrong point. This patch addresses both bugs.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Restore an earlier mod which went missing in the powerpc reshuffle: the 4xx
mmu_mapin_ram does not need to take init_mm.page_table_lock.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Update comments (only) on page_table_lock and mmap_sem in arch/powerpc.
Removed the comment on page_table_lock from hash_huge_page: since it's no
longer taking page_table_lock itself, it's irrelevant whether others are; but
how it is safe (even against huge file truncation?) I can't say.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
asm-ppc64/imalloc.h is only included from files in arch/powerpc/mm.
We already have a header for mm local definitions,
arch/powerpc/mm/mmu_decl.h. Thus, this patch moves the contents of
imalloc.h into mmu_decl.h. The only exception are the definitions of
PHBS_IO_BASE, IMALLOC_BASE and IMALLOC_END. Those are moved into
pgtable.h, next to similar definitions of VMALLOC_START and
VMALLOC_SIZE.
Built for multiplatform 32bit and 64bit (ARCH=powerpc).
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Somewhere we lost the include of udbg.h in lmb.c. While we're there, add a DBG
macro like every other file has and use it in lmb_dump_all().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch should fix the crashes we have been seeing on 64-bit
powerpc systems with a memory hole when sparsemem is enabled.
I'd appreciate it if people who know more about NUMA and sparsemem
than me could look over it.
There were two bugs. The first was that if NUMA was enabled but there
was no NUMA information for the machine, the setup_nonnuma() function
was adding a single region, assuming memory was contiguous. The
second was that the loops in mem_init() and show_mem() assumed that
all pages within the span of a pgdat were valid (had a valid struct
page).
I also fixed the incorrect setting of num_physpages that Mike Kravetz
pointed out.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Changed jobs and the Freescale address is no longer valid.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch moves the vdso's to arch/powerpc, adds support for the 32
bits vdso to the 32 bits kernel, rename systemcfg (finally !), and adds
some new (still untested) routines to both vdso's: clock_gettime() with
support for CLOCK_REALTIME and CLOCK_MONOTONIC, clock_getres() (same
clocks) and get_tbfreq() for glibc to retreive the timebase frequency.
Tom,Steve: The implementation of get_tbfreq() I've done for 32 bits
returns a long long (r3, r4) not a long. This is such that if we ever
add support for >4Ghz timebases on ppc32, the userland interface won't
have to change.
I have tested gettimeofday() using some glibc patches in both ppc32 and
ppc64 kernels using 32 bits userland (I haven't had a chance to test a
64 bits userland yet, but the implementation didn't change and was
tested earlier). I haven't tested yet the new functions.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Convert to sparsemem and remove all the discontigmem code in the
process. This has a few advantages:
- The old numa_memory_lookup_table can go away
- All the arch specific discontigmem magic can go away
We also remove the triple pass of memory properties and instead create a
list of per node extents that we iterate through. A final cleanup would
be to change our lmb code to store extents per node, then we can reuse
that information in the numa code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Remove ppc64 specific version of nr_cpus_node and use the generic one
provided.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This also make klimit have the same type on 32-bit as on 64-bit,
namely unsigned long, and defines and initializes it in one place.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch makes the kernel use a different kmem cache for PMD pages
as they are smaller than PTE pages. Avoids waste of memory.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch merges platform codes. systemcfg->platform is no longer used,
systemcfg use in general is deprecated as much as possible (and renamed
_systemcfg before it gets completely moved elsewhere in a future patch),
_machine is now used on ppc64 along as ppc32. Platform codes aren't gone
yet but we are getting a step closer. A bunch of asm code in head[_64].S
is also turned into C code.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Early calls to __ioremap() will panic if the hash insertion fails. This
patch makes them return NULL instead. It happens with some pSeries users
who enabled CONFIG_BOOTX_TEXT. The later is getting an incorrect address
for the fame buffer and the hash insertion fails. With this patch, it
will display an error instead of crashing at boot.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
memmap_init_zone() sets page count to 1. Before 'freeing' the
page, we need to clear the count. This is the same that is done
on free_all_bootmem_core() for memory discovered at boot time.
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add the create_section_mapping() routine to create hptes for memory
sections dynamically added after system boot.
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
For some stupid reason I can't explain (brown paper bag is at hand), I
removed the check pfn_valid() in the code that does the icache/dcache
coherency on POWER4 and later. That causes us to eventually try to
access non existing struct page when hashing in IO pages.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
On ppc64 we end up with a negative value for the data size in the memory
boot message:
Memory: 2035560k/2097152k available (5792k kernel code, 89564k reserved,
18014398509481632k data, 870k bss, 352k init)
It turns out the section ordering of the linker script is different on
ppc32 and ppc64, so just count data as _edata - _sdata which should work
on both.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Oops, some last minute changes caused the 64K pages patch to break ppc32
build, this fixes it.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch, however, should be applied on top of the 64k-page-size patch to
fix some problems with hugepage (some pre-existing, another introduced by
this patch).
The patch fixes a bug in the SLB miss handler for hugepages on ppc64
introduced by the dynamic hugepage patch (commit id
c594adad56) due to a misunderstanding of the
srd instruction's behaviour (mea culpa). The problem arises when a 64-bit
process maps some hugepages in the low 4GB of the address space (unusual).
In this case, as well as the 256M segment in question being marked for
hugepages, other segments at 32G intervals will be incorrectly marked for
hugepages.
In the process, this patch tweaks the semantics of the hugepage bitmaps to
be more sensible. Previously, an address below 4G was marked for hugepages
if the appropriate segment bit in the "low areas" bitmask was set *or* if
the low bit in the "high areas" bitmap was set (which would mark all
addresses below 1TB for hugepage). With this patch, any given address is
governed by a single bitmap. Addresses below 4GB are marked for hugepage
if and only if their bit is set in the "low areas" bitmap (256M
granularity). Addresses between 4GB and 1TB are marked for hugepage iff
the low bit in the "high areas" bitmap is set. Higher addresses are marked
for hugepage iff their bit in the "high areas" bitmap is set (1TB
granularity).
To avoid conflicts, this patch must be applied on top of BenH's pending
patch for 64k base page size [0]. As such, this patch also addresses a
hugepage problem introduced by that patch. That patch allows hugepages of
1MB in size on hardware which supports it, however, that won't work when
using 4k pages (4 level pagetable), because in that case hugepage PTEs are
stored at the PMD level, and each PMD entry maps 2MB. This patch simply
disallows hugepages in that case (we can do something cleverer to re-enable
them some other day).
Built, booted, and a handful of hugepage related tests passed on POWER5
LPAR (both ARCH=powerpc and ARCH=ppc64).
[0] http://gate.crashing.org/~benh/ppc64-64k-pages.diff
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Mostly this involves adding #include <asm/smp.h>, since that defines
things like boot_cpuid[_phys] and [gs]et_hard_smp_processor_id, which
are SMP-related but still needed on UP. This incorporates fixes
posted by Olof Johansson and Heikki Lindholm.
Signed-off-by: Paul Mackerras <paulus@samba.org>
The ancient ppcdebug/PPCDBG mechanism is now only used in two places.
First, in the hash setup code, one of the bits allows the size of the
hash table to be reduced by a factor of 8 - which would be better
accomplished with a command line option for that purpose. The other
was a bunch of bus walking related messages in the iSeries code, which
would seem to be insufficient reason to keep the mechanism.
This patch removes the last traces of this mechanism.
Built and booted on iSeries and pSeries POWER5 LPAR (ARCH=powerpc).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Add nicer printing of faulting address on unresolvable kernel faults.
Makes life a little easier for those who don't know how to decode our
register contents at oops time.
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Adds a new CONFIG_PPC_64K_PAGES which, when enabled, changes the kernel
base page size to 64K. The resulting kernel still boots on any
hardware. On current machines with 4K pages support only, the kernel
will maintain 16 "subpages" for each 64K page transparently.
Note that while real 64K capable HW has been tested, the current patch
will not enable it yet as such hardware is not released yet, and I'm
still verifying with the firmware architects the proper to get the
information from the newer hypervisors.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
We had a static memory_limit in prom.c, and then another one defined
in setup_64.c and used in numa.c, which resulted in the kernel crashing
when mem=xxx was given on the command line. This puts the declaration
in system.h and the definition in mem.c. This also moves the
definition of tce_alloc_start/end out of setup_64.c.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Change the phys_mem_access_prot() function to take a pfn instead of an
address. This allows mmap64() to work on /dev/mem for addresses above 4G
on 32-bit architectures. We start with a pfn in mmap_mem(), so there's no
need to convert to an address; in fact, it's actively bad, since the
conversion can overflow when the address is above 4G.
Similarly fix the ppc32 page_is_ram() function to avoid a conversion to an
address by directly comparing to max_pfn. Working with max_pfn instead of
high_memory fixes page_is_ram() to give the right answer for highmem pages.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
do_dabr() is not relevant on 40x or Book-E processors so dont build it
Signed-off-by: Kumar K. Gala <kumar.gala@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
With ARCH=powerpc we assume the presence of a device tree, so we don't
require any support for the old bi_recs method of passing boot
parameters. Likewise, we've never needed it for ppc64, but we still
had an include/asm-ppc64/bootinfo.h from which nothing was used. This
patch removes that file, and all references to it in arch/ppc64 and
arch/powerpc. A related, unused variable 'boot_mem_size' is also
removed from setup_32.c. The bootinfo stuff remains in ARCH=ppc for
the time being.
Built and booted on Power5 (ARCH=ppc64 and ARCH=powerpc), built for
32-bit powermac (ARCH=powerpc and ARCH=ppc).
Signed-off-by: David Gibson <dwg@au1.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Some minor fixes that are needed if we are building for a book-e
processor.
Signed-off-by: Kumar K. Gala <kumar.gala@freescale.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
We were initializing the btext stuff from prom_init(), thus breaking
the rule that all communication between prom_init() and the rest of
the kernel has to be via the flattened device tree. This removes
the btext initialization calls from prom_init() and initializes it
instead after the device tree is unflattened. It would be nice to
do it earlier, but that needs some more infrastructure to find the
properties we need in the flattened device tree.
Signed-off-by: Paul Mackerras <paulus@samba.org>
We weren't computing the size of the hash table correctly on iSeries
because the relevant code in prom.c was #ifdef CONFIG_PPC_PSERIES.
This moves the code to hash_utils_64.c, makes it unconditional, and
cleans it up a bit.
Signed-off-by: Paul Mackerras <paulus@samba.org>
On ARCH=ppc64 we were getting htab_hash_mask recalculated
to the correct value for our particular machine by accident.
In the merge tree, that code was commented out, so htab_hash_mask
was being corrupted.
We now set ppc64_pft_size instead which gets htab_has_mask
calculated correctly for us later. We should put an
ibm,pft-size property in the device tree at some point.
Also set -mno-minimal-toc in some makefiles.
Allow iSeries to configure PROC_DEVICETREE.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Now that the register names and bit definitions are all in reg.h,
use that instead of processor.h in assembly code in a few places.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This moves the remaining files in arch/ppc64/mm to arch/powerpc/mm,
and arranges that we use them when compiling with ARCH=ppc64.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This doesn't change any code, just renames things so we consistently
have foo_32.c and foo_64.c where we have separate 32- and 64-bit
versions.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This also creates merged versions of do_init_bootmem, paging_init
and mem_init and moves them to arch/powerpc/mm/mem.c. It gets rid
of the mem_pieces stuff.
I made memory_limit a parameter to lmb_enforce_memory_limit rather
than a global referenced by that function. This will require some
small changes to ppc64 if we want to continue building ARCH=ppc64
using the merged lmb.c.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This merges ppc_ksyms.c, puts back the actual do_execve call in
sys_execve, makes init_MMU call find_end_of_memory rather than
ppc_md.find_end_of_memory (every platform has a device tree
with a /memory node now, right?) and fixes some problems with the
mpic initialization on newworld powermacs.
Signed-off-by: Paul Mackerras <paulus@samba.org>
This creates the directory structure under arch/powerpc and a bunch
of Kconfig files. It does a first-cut merge of arch/powerpc/mm,
arch/powerpc/lib and arch/powerpc/platforms/powermac. This is enough
to build a 32-bit powermac kernel with ARCH=powerpc.
For now we are getting some unmerged files from arch/ppc/kernel and
arch/ppc/syslib, or arch/ppc64/kernel. This makes some minor changes
to files in those directories and files outside arch/powerpc.
The boot directory is still not merged. That's going to be interesting.
Signed-off-by: Paul Mackerras <paulus@samba.org>