get_user() and put_user() are inline functions in the meantime
again. Both will generate the mvcos instruction if compiled
with -march=z10 (or greater).
The kernel parameter "uaccess_primary" can only change the behavior
of out-of-line uaccess functions like copy_from_user() to not use
the mvcos instruction, but not for the above named inlined functions.
Therefore it is quite useless and the parameter can be removed.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
__delay is exported by most architectures, and may be used in modules.
Since it is not exported for s390, s390:allmodconfig currently fails
to build with
ERROR: "__delay" [drivers/net/phy/mdio-octeon.ko] undefined!
Fixes: a6d6786452 ("net: mdio-octeon: Modify driver to work on both
ThunderX and Octeon")
Cc: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use the new static_branch_likely() primitive to make sure that the
most likely case is executed without taking an unconditional branch.
This wasn't possible with the old jump label primitives.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20150729064600.GB3953@osiris
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rename two more files which I forgot. Also remove the "asm" from the
swsusp_asm64.S file, since the ".S" suffix already makes it obvious
that this file contains assembler code.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the 31 bit support in order to reduce maintenance cost and
effectively remove dead code. Since a couple of years there is no
distribution left that comes with a 31 bit kernel.
The 31 bit kernel also has been broken since more than a year before
anybody noticed. In addition I added a removal warning to the kernel
shown at ipl for 5 minutes: a960062e58 ("s390: add 31 bit warning
message") which let everybody know about the plan to remove 31 bit
code. We didn't get any response.
Given that the last 31 bit only machine was introduced in 1999 let's
remove the code.
Anybody with 31 bit user space code can still use the compat mode.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add the compare-and-delay instruction to the spin-lock and rw-lock
retry loops. A CPU executing the compare-and-delay instruction stops
until the lock value has changed. This is done to make the locking
code for contended locks to behave better in regard to the multi-
hreading facility. A thread of a core executing a compare-and-delay
will allow the other threads of a core to get a larger share of the
core resources.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If kprobes is disabled uprobes will not compile.
Fix this by including the correct header files.
Signed-off-by: Jan Willeke <willeke@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move the C functions and definitions related to the idle state handling
to arch/s390/include/asm/idle.h and arch/s390/kernel/idle.c. The function
s390_get_idle_time is renamed to arch_cpu_idle_time and vtime_stop_cpu to
enabled_wait.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch moves common functions from kprobes.c to probes.c.
Thus its possible for uprobes to use them without enabling kprobes.
Signed-off-by: Jan Willeke <willeke@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make use of the load-and-add, load-and-or and load-and-and instructions
to atomically update the read-write lock without a compare-and-swap loop.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Set the write-lock bit in the out-of-line rwlock code to indicate that
a writer is waiting. Additional readers will no be able to get the lock
until at least one writer got the lock. Additional writers have to wait
for the first writer to release the lock again.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add an owner field to the arch_rwlock_t to be able to pass the timeslice
of a virtual CPU with diagnose 0x9c to the lock owner in case the rwlock
is write-locked. The undirected yield in case the rwlock is acquired
writable but the lock is read-locked is removed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The out of line _raw_read_lock_wait_flags/_raw_write_lock_wait_flags
functions for the arch_read_lock_flags/arch_write_lock_flags calls
fail to re-enable the interrupts after another unsuccessful try to
get the lock with compare-and-swap. The following wait would be
done with interrupts disabled which is suboptimal.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In case a lock is contended it is better to do a load-and-test first
before trying to get the lock with compare-and-swap. This helps to avoid
unnecessary cache invalidations of the cacheline for the lock if the
CPU has to wait for the lock. For an uncontended lock doing the
compare-and-swap directly is a bit better, if the CPU does not have the
cacheline in its cache yet the compare-and-swap will get it read-write
immediately while a load-and-test would get it read-only first.
Always to the load-and-test first to avoid the cacheline invalidations
for the contended case outweight the potential read-only to read-write
cacheline upgrade for the uncontended case.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On LPAR, when spin_retry is set to <= 0, arch_spin_lock_wait() and
arch_spin_lock_wait_flags() may end up in a while(1) loop w/o doing
any compare and swap operation. To fix this, use do/while instead of
for loop.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Always switch to the kernel ASCE in switch_mm. Load the secondary
space ASCE in finish_arch_post_lock_switch after checking that
any pending page table operations have completed. The primary
ASCE is loaded in entry[64].S. With this the update_primary_asce
call can be removed from the switch_to macro and from the start
of switch_mm function. Remove the load_primary argument from
update_user_asce/clear_user_asce, rename update_user_asce to
set_user_asce and rename update_primary_asce to load_kernel_asce.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use lowcore constant to improve the code generated for spinlocks.
[ Martin Schwidefsky: patch breakdown and code beautification ]
Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Improve the spinlock code in several aspects:
- Have _raw_compare_and_swap return true if the operation has been
successful instead of returning the old value.
- Remove the "volatile" from arch_spinlock_t and arch_rwlock_t
- Rename 'owner_cpu' to 'lock'
- Add helper functions arch_spin_trylock_once / arch_spin_tryrelease_once
[ Martin Schwidefsky: patch breakdown and code beautification ]
Signed-off-by: Philipp Hachtmann <phacht@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The whole point of the out-of-line strnlen_user_srst() function was to
avoid corruption of register 0 due to register asm assignment.
However 'somebody' :) forgot to remove the update_primary_asce() function
call, which may clobber register 0 contents.
So let's remove that call and also move the size check to the calling
function.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current uaccess code uses a page table walk in some circumstances,
e.g. in case of the in atomic futex operations or if running on old
hardware which doesn't support the mvcos instruction.
However it turned out that the page table walk code does not correctly
lock page tables when accessing page table entries.
In other words: a different cpu may invalidate a page table entry while
the current cpu inspects the pte. This may lead to random data corruption.
Adding correct locking however isn't trivial for all uaccess operations.
Especially copy_in_user() is problematic since that requires to hold at
least two locks, but must be protected against ABBA deadlock when a
different cpu also performs a copy_in_user() operation.
So the solution is a different approach where we change address spaces:
User space runs in primary address mode, or access register mode within
vdso code, like it currently already does.
The kernel usually also runs in home space mode, however when accessing
user space the kernel switches to primary or secondary address mode if
the mvcos instruction is not available or if a compare-and-swap (futex)
instruction on a user space address is performed.
KVM however is special, since that requires the kernel to run in home
address space while implicitly accessing user space with the sie
instruction.
So we end up with:
User space:
- runs in primary or access register mode
- cr1 contains the user asce
- cr7 contains the user asce
- cr13 contains the kernel asce
Kernel space:
- runs in home space mode
- cr1 contains the user or kernel asce
-> the kernel asce is loaded when a uaccess requires primary or
secondary address mode
- cr7 contains the user or kernel asce, (changed with set_fs())
- cr13 contains the kernel asce
In case of uaccess the kernel changes to:
- primary space mode in case of a uaccess (copy_to_user) and uses
e.g. the mvcp instruction to access user space. However the kernel
will stay in home space mode if the mvcos instruction is available
- secondary space mode in case of futex atomic operations, so that the
instructions come from primary address space and data from secondary
space
In case of kvm the kernel runs in home space mode, but cr1 gets switched
to contain the gmap asce before the sie instruction gets executed. When
the sie instruction is finished cr1 will be switched back to contain the
user asce.
A context switch between two processes will always load the kernel asce
for the next process in cr1. So the first exit to user space is a bit
more expensive (one extra load control register instruction) than before,
however keeps the code rather simple.
In sum this means there is no need to perform any error prone page table
walks anymore when accessing user space.
The patch seems to be rather large, however it mainly removes the
the page table walk code and restores the previously deleted "standard"
uaccess code, with a couple of changes.
The uaccess without mvcos mode can be enforced with the "uaccess_primary"
kernel parameter.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix some numbers in the comments describing the layout of the bit maps.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The uaccesspt kernel parameter allows to enforce using the uaccess page
table walk variant. This is mainly for debugging purposes, so this mode
can also be enabled on machines which support the mvcos instruction.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
MACHINE_HAS_MVCOS is used exactly once when the machine is brought up.
There is no need to cache the flag in the machine_flags.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The types 'size_t' and 'unsigned long' have been used randomly for the
uaccess functions. This looks rather confusing.
So let's change all functions to use unsigned long instead and get rid
of size_t in order to have a consistent interface.
The only exception is strncpy_from_user() which uses 'long' since it
may return a signed value (-EFAULT).
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There are only two uaccess variants on s390 left: the version that is used
if the mvcos instruction is available, and the page table walk variant.
So there is no need for expensive indirect function calls.
By default the mvcos variant will be called. If the mvcos instruction is not
available it will call the page table walk variant.
For minimal performance impact the "if (mvcos_is_available)" is implemented
with a jump label, which will be a six byte nop on machines with mvcos.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For some unknown reason the indirect uaccess functions on s390 implement a
different parameter order than what is usual.
e.g.:
unsigned long copy_to_user(void *to, const void *from, unsigned long n);
vs.
size_t (*copy_to_user)(size_t n, void __user * to, const void *from);
Let's get rid of this confusing parameter reordering.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove some dead uaccess extern declarations and also make some functions
static, since they are only used locally.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If get_fs() == USER_DS we better test if current->mm is not zero before
walking page tables.
The page table walk code would try to lock mm->page_table_lock, however
if mm is zero this might crash.
Now it is arguably incorrect trying to access userspace if current->mm
is zero, however we have seen that and s390 would be the only architecture
which would crash in such a case.
So we better make the page table walk code a bit more robust and report
always a fault instead.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When translating a user space address, the address must be checked against
the ASCE limit of the process. If the address is larger than the maximum
address that is reachable with the ASCE, an ASCE type exception must be
generated.
The current code simply ignored the higher order bits. This resulted in an
address wrap around in user space instead of an exception in user space.
Cc: stable@vger.kernel.org # v3.9+
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Simplify the uaccess code by removing the user_mode=home option.
The kernel will now always run in the home space mode.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
find_first_bit_left() and friends have nothing to do with the normal
LSB0 bit numbering for big endian machines used in Linux (least
significant bit has bit number 0).
Instead they use MSB0 bit numbering, where the most signficant bit has
bit number 0. So rename find_first_bit_left() and friends to
find_first_bit_inv(), to avoid any confusion.
Also provide inv versions of set_bit, clear_bit and test_bit.
This also removes the confusing use of e.g. set_bit() in airq.c which
uses a "be_to_le" bit number conversion, which could imply that instead
set_bit_le() could be used. But that is entirely wrong since the _le
bitops variant uses yet another bit numbering scheme.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Just like all other architectures we should use out-of-line find bit
operations, since the inline variant bloat the size of the kernel image.
And also like all other architecures we should only supply optimized
variants of the __ffs, ffs, etc. primitives.
Therefore this patch removes the inlined s390 find bit functions and uses
the generic out-of-line variants instead.
The optimization of the primitives follows with the next patch.
With this patch also the functions find_first_bit_left() and
find_next_bit_left() have been reimplemented, since logically, they are
nothing else but a find_first_bit()/find_next_bit() implementation that
use an inverted __fls() instead of __ffs().
Also the restriction that these functions only work on machines which
support the "flogr" instruction is gone now.
This reduces the size of the kernel image (defconfig, -march=z9-109)
by 144,482 bytes.
Alone the size of the function build_sched_domains() gets reduced from
7 KB to 3,5 KB.
We also git rid of unused functions like find_first_bit_le()...
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The result of the store-clock-fast (STCKF) instruction is a bit fuzzy.
It can happen that the value stored on one CPU is smaller than the value
stored on another CPU, although the order of the stores is the other
way around. This can cause deltas of get_tod_clock() values to become
negative when they should not be.
We need to be more careful with store-clock-fast, this patch partially
reverts git commit e4b7b4238e666682555461fa52eecd74652f36bb "time:
always use stckf instead of stck if available". The get_tod_clock()
function now uses the store-clock-extended (STCKE) instruction.
get_tod_clock_fast() can be used if the fuzziness of store-clock-fast
is acceptable e.g. for wait loops local to a CPU.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Modify the psw_idle waiting logic in entry[64].S to return with
interrupts disabled. This avoids potential issues with udelay
and interrupt loops as interrupts are not reenabled after
clock comparator interrupts.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Improve the encoding of the different pte types and the naming of the
page, segment table and region table bits. Due to the different pte
encoding the hugetlbfs primitives need to be adapted as well. To improve
compatability with common code make the huge ptes use the encoding of
normal ptes. The conversion between the pte and pmd encoding for a huge
pte is done with set_huge_pte_at and huge_ptep_get.
Overall the code is now easier to understand.
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add "fallthrough" comments so nobody wonders if a break statement is missing.
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The help text for this config is duplicated across the x86, parisc, and
s390 Kconfig.debug files. Arnd Bergman noted that the help text was
slightly misleading and should be fixed to state that enabling this
option isn't a problem when using pre 4.4 gcc.
To simplify the rewording, consolidate the text into lib/Kconfig.debug
and modify it there to be more explicit about when you should say N to
this config.
Also, make the text a bit more generic by stating that this option
enables compile time checks so we can cover architectures which emit
warnings vs. ones which emit errors. The details of how an
architecture decided to implement the checks isn't as important as the
concept of compile time checking of copy_from_user() calls.
While we're doing this, remove all the copy_from_user_overflow() code
that's duplicated many times and place it into lib/ so that any
architecture supporting this option can get the function for free.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Helge Deller <deller@gmx.de>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When translating user space addresses to kernel addresses the follow_table()
function had two bugs:
- PROT_NONE mappings could be read accessed via the kernel mapping. That is
e.g. putting a filename into a user page, then protecting the page with
PROT_NONE and afterwards issuing the "open" syscall with a pointer to
the filename would incorrectly succeed.
- when walking the page tables it used the pgd/pud/pmd/pte primitives which
with dynamic page tables give no indication which real level of page tables
is being walked (region2, region3, segment or page table). So in case of an
exception the translation exception code passed to __handle_fault() is not
necessarily correct.
This is not really an issue since __handle_fault() doesn't evaluate the code.
Only in case of e.g. a SIGBUS this code gets passed to user space. If user
space can do something sane with the value is a different question though.
To fix these issues don't use any Linux primitives. Only walk the page tables
like the hardware would do it, however we leave quite some checks away since
we know that we only have full size page tables and each index is within bounds.
In theory this should fix all issues...
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The page table walker variant of clear_user() is supposed to copy the
contents of the empty zero page to user space.
However since 238ec4ef "[S390] zero page cache synonyms" empty_zero_page
is not anymore the page itself but contains the pointer to the empty zero
pages. Therefore the page table walker variant of clear_user() copied
the address of the first empty zero page and afterwards more or less
random data to user space instead of clearing the given user space range.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When the kernel resides in home space and the mvcos instruction is not
available uaccesses for kernel ds happen via simple strnlen() or memcpy()
calls.
This however can break badly, since uaccesses in kernel space may fail as
well, especially if CONFIG_DEBUG_PAGEALLOC is turned on.
To fix this implement strnlen_kernel() and copy_in_kernel() functions
which can only be used by the page table uaccess functions. These two
functions detect invalid memory accesses and return the correct length
of processed data.. Both functions are more or less a copy of the std
variants without sacf calls.
Fixes ipl crashes on 31 bit machines as well on 64 bit machines without
mvcos. Caused by changing the default address space of the kernel being
home space.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The "standard" and page table walk variants of strncpy_from_user() first
check the length of the to be copied string in userspace.
The string is then copied to kernel space and the length returned to the
caller.
However userspace can modify the string at any time while the kernel
checks for the length of the string or copies the string. In result the
returned length of the string is not necessarily correct.
Fix this by copying in a loop which mimics the mvcos variant of
strncpy_from_user(), which handles this correctly.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If the maximum length specified for the to be accessed string for
strncpy_from_user() and strnlen_user() is zero the following incorrect
values would be returned or incorrect memory accesses would happen:
strnlen_user_std() and strnlen_user_pt() incorrectly return "1"
strncpy_from_user_pt() would incorrectly access "dst[maxlen - 1]"
strncpy_from_user_mvcos() would incorrectly return "-EFAULT"
Fix all these oddities by adding early checks.
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Always stay within page boundaries when copying from user within
strlen_user_mvcos()/strncpy_from_user_mvcos(). This allows to
shorten the code a bit and may prevent unnecessary faults, since
we copy quite large amounts of memory to kernel space.
Also directly call the mvcos variants of copy_from_user() to
avoid indirect branches.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The s390 architecture is unique in respect to dirty page detection,
it uses the change bit in the per-page storage key to track page
modifications. All other architectures track dirty bits by means
of page table entries. This property of s390 has caused numerous
problems in the past, e.g. see git commit ef5d437f71
"mm: fix XFS oops due to dirty pages without buffers on s390".
To avoid future issues in regard to per-page dirty bits convert
s390 to a fault based software dirty bit detection mechanism. All
user page table entries which are marked as clean will be hardware
read-only, even if the pte is supposed to be writable. A write by
the user process will trigger a protection fault which will cause
the user pte to be marked as dirty and the hardware read-only bit
is removed.
With this change the dirty bit in the storage key is irrelevant
for Linux as a host, but the storage key is still required for
KVM guests. The effect is that page_test_and_clear_dirty and the
related code can be removed. The referenced bit in the storage
key is still used by the page_test_and_clear_young primitive to
provide page age information.
For page cache pages of mappings with mapping_cap_account_dirty
there will not be any change in behavior as the dirty bit tracking
already uses read-only ptes to control the amount of dirty pages.
Only for swap cache pages and pages of mappings without
mapping_cap_account_dirty there can be additional protection faults.
To avoid an excessive number of additional faults the mk_pte
primitive checks for PageDirty if the pgprot value allows for writes
and pre-dirties the pte. That avoids all additional faults for
tmpfs and shmem pages until these pages are added to the swap cache.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Fix name clash with some common code device drivers and add "tod"
to all tod clock access function names.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Without CONFIG_HUGETLB_PAGE, pmd_huge() will always return 0. So
pmd_large() should be used instead in places where both transparent
huge pages and hugetlbfs pages can occur.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Our memcpy and memcmp variants were implemented by calling the corresponding
gcc builtin variants.
However gcc is free to replace a call to __builtin_memcmp with a call to memcmp
which, when called, will result in an endless recursion within memcmp.
So let's provide asm variants and also fix the variants that are used for
uncompressing the kernel image.
In addition remove all other occurences of builtin function calls.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The s390 page-table walk code, used for user copy and futex, currently
cannot handle huge pages. As far as user copy is concerned, that is
not really a problem because those functions will only be used on old
hardware that has no huge page support. But the futex code will also
use pagetable walk functions on current hardware when user space runs
in primary space mode. So, if a futex sits in a huge page, the futex
operation on it will result in a page fault loop or even data
corruption.
This patch adds the code for resolving huge page mappings in the user
access pagetable walk code on s390.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current virtual timer interface is inherently per-cpu and hard to
use. The sole user of the interface is appldata which uses it to execute
a function after a specific amount of cputime has been used over all cpus.
Rework the virtual timer interface to hook into the cputime accounting.
This makes the interface independent from the CPU timer interrupts, and
makes the virtual timers global as opposed to per-cpu.
Overall the code is greatly simplified. The downside is that the accuracy
is not as good as the original implementation, but it is still good enough
for appldata.
Reviewed-by: Jan Glauber <jang@linux.vnet.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove the file name from the comment at top of many files. In most
cases the file name was wrong anyway, so it's rather pointless.
Also unify the IBM copyright statement. We did have a lot of sightly
different statements and wanted to change them one after another
whenever a file gets touched. However that never happened. Instead
people start to take the old/"wrong" statements to use as a template
for new files.
So unify all of them in one go.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Replace __s390x__ with CONFIG_64BIT in all places that are not exported
to userspace or guarded with #ifdef __KERNEL__.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Whenever the cpu loads an enabled wait PSW it will appear as idle to the
underlying host system. The code in default_idle calls vtime_stop_cpu
which does the necessary voodoo to get the cpu time accounting right.
The udelay code just loads an enabled wait PSW. To correct this rework
the vtime_stop_cpu/vtime_start_cpu logic and move the difficult parts
to entry[64].S, vtime_stop_cpu can now be called from anywhere and
vtime_start_cpu is gone. The correction of the cpu time during wakeup
from an enabled wait PSW is done with a critical section in entry[64].S.
As vtime_start_cpu is gone, s390_idle_check can be removed as well.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Define struct pcpu and merge some of the NR_CPUS arrays into it, including
__cpu_logical_map, current_set and smp_cpu_state. Split smp related
functions to those operating on physical cpus and the functions operating
on a logical cpu number. Make the functions for physical cpus use a
pointer to a struct pcpu. This hides the knowledge about cpu addresses in
smp.c, entry[64].S and swsusp_asm64.S, thus remove the sigp.h header.
The PSW restart mechanism is used to start secondary cpus, calling a
function on an online cpu, calling a function on the ipl cpu, and for
the nmi signal. Replace the different assembler functions with a
single function restart_int_handler. The new entry point calls a function
whose pointer is stored in the lowcore of the target cpu and it can wait
for the source cpu to stop. This covers all existing use cases.
Overall the code is now simpler and there are ~380 lines less code.
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Split out addressing mode bits from PSW_BASE_BITS, rename PSW_BASE_BITS
to PSW_MASK_BASE, get rid of psw_user32_bits, remove unused function
enabled_wait(), introduce PSW_MASK_USER, and drop PSW_MASK_MERGE macros.
Change psw_kernel_bits / psw_user_bits to contain only the bits that
are always set in the respective mode.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The alignment is missing for various global symbols in s390 assembly code.
With a recent gcc and an instruction like stgrl this can lead to a
specification exception if the instruction uses such a mis-aligned address.
Specify the alignment explicitely and while add it define __ALIGN for s390
and use the ENTRY define to save some lines of code.
Signed-off-by: Jan Glauber <jang@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Implement ndelay() on s390 as well.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Change futex_atomic_op_inuser and futex_atomic_cmpxchg_inatomic
prototypes to use u32 types for the futex as this is the data type the
futex core code uses all over the place.
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110311025058.GD26122@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The cmpxchg_futex_value_locked API was funny in that it returned either
the original, user-exposed futex value OR an error code such as -EFAULT.
This was confusing at best, and could be a source of livelocks in places
that retry the cmpxchg_futex_value_locked after trying to fix the issue
by running fault_in_user_writeable().
This change makes the cmpxchg_futex_value_locked API more similar to the
get_futex_value_locked one, returning an error code and updating the
original value through a reference argument.
Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com> [tile]
Acked-by: Tony Luck <tony.luck@intel.com> [ia64]
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michal Simek <monstr@monstr.eu> [microblaze]
Acked-by: David Howells <dhowells@redhat.com> [frv]
Cc: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110311024851.GC26122@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The uaccess functions copy_in_user_std and clear_user_std fail to
switch back from secondary space mode to primary space mode with sacf
in case of an unresolvable page fault. We need to make sure that the
switch back to primary mode is done in all cases, otherwise the code
following the uaccess inline assembly will crash.
Reported-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Let local_tick_enable/disable() reprogram the clock comparator so the
function names make semantically more sense.
Also that way the functions are more symmetric since normally each
local_tick_enable() call usually would have a subsequent call to
set_clock_comparator() anyway.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On each machine check all registers are revalidated. The save area for
the clock comparator however only contains the upper most seven bytes
of the former contents, if valid.
Therefore the machine check handler uses a store clock instruction to
get the current time and writes that to the clock comparator register
which in turn will generate an immediate timer interrupt.
However within the lowcore the expected time of the next timer
interrupt is stored. If the interrupt happens before that time the
handler won't be called. In turn the clock comparator won't be
reprogrammed and therefore the interrupt condition stays pending which
causes an interrupt loop until the expected time is reached.
On NOHZ machines this can result in unresponsive machines since the
time of the next expected interrupted can be a couple of days in the
future.
To fix this just revalidate the clock comparator register with the
expected value.
In addition the special handling for udelay must be changed as well.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If there is no in kernel image caller modules will suffer:
ERROR: "copy_from_user_overflow" [net/core/pktgen.ko] undefined!
ERROR: "copy_from_user_overflow" [net/can/can-raw.ko] undefined!
ERROR: "copy_from_user_overflow" [fs/cifs/cifs.ko] undefined!
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch introduces a new function that checks the running status
of a cpu in a hypervisor. This status is not virtualized, so the check
is only correct if running in an LPAR. On acquiring a spinlock, if the
cpu holding the lock is scheduled by the hypervisor, we do a busy wait
on the lock. If it is not scheduled, we yield over to that cpu.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Same as on x86 and sparc, besides the fact that enabling the option
will just emit compile time warnings instead of errors.
Keeps allyesconfig kernels compiling.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Finally move it to the place where it belongs to and make get rid of
it for !CONFIG_SMP.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Name space cleanup for rwlock functions. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
Not strictly necessary for -rt as -rt does not have non sleeping
rwlocks, but it's odd to not have a consistent naming convention.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
Name space cleanup. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.
Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
The pagetable walk usercopy functions have used a modified copy of the
do_exception() function for fault handling. This lead to inconsistencies
with recent changes to do_exception(), e.g. performance counters. This
patch changes the pagetable walk usercopy code to call do_exception()
directly, eliminating the redundancy. A new parameter is added to
do_exception() to specify the fault address.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce user_mode to replace the two variables switch_amode and
s390_noexec. There are three valid combinations of the old values:
1) switch_amode == 0 && s390_noexec == 0
2) switch_amode == 1 && s390_noexec == 0
3) switch_amode == 1 && s390_noexec == 1
They get replaced by
1) user_mode == HOME_SPACE_MODE
2) user_mode == PRIMARY_SPACE_MODE
3) user_mode == SECONDARY_SPACE_MODE
The new kernel parameter user_mode=[primary,secondary,home] lets
you choose the address space mode the user space processes should
use. In addition the CONFIG_S390_SWITCH_AMODE config option
is removed.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch adds an EX_TABLE entry to mvc{p|s|os} usercopy functions that
may be called with KERNEL_DS. In combination with collaborative memory
management, kernel pages marked as unused may trigger an adressing exception
in the usercopy functions. This fixes an unhandled addressing exception bug
where strncpy_from_user() is used with len > strnlen and KERNEL_DS, crossing
a page boundary to an unused page.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Use an own implementation instead of the common code udelay loop.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When udelay() gets called with a delay that would expire before the
next clock event it reprograms the clock comparator.
When the interrupt happens the clock comparator won't be resetted
therefore the interrupt condition doesn't get cleared.
The result is an endless timer interrupt loop until the next clock
event would expire (stored in lowcore).
So udelay() usually would wait much longer for small delays than it
should.
Fix this by disabling the local tick which makes sure that the clock
comparator will be resetted when a timer interrupt happens.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Provide __ucmpdi2() helper function on 31 bit so we don't run
again and again in compile errors like this one:
kernel/built-in.o: In function `T.689':
perf_counter.c:(.text+0x56c86): undefined reference to `__ucmpdi2'
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This allows the callers to now pass down the full set of FAULT_FLAG_xyz
flags to handle_mm_fault(). All callers have been (mechanically)
converted to the new calling convention, there's almost certainly room
for architectures to clean up their code and then add FAULT_FLAG_RETRY
when that support is added.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use builtin variants if gcc 4 or newer is used to compile the kernel.
Generates better code than the asm variants.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move all EXPORT_SYMBOLs to their corresponding definitions.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
pfn_valid() actually checks for a valid struct page and not for a
valid pfn. Using xip mappings w/o struct pages, this will result in
-EFAULT returned by the (page table walk) user copy functions,
even though there is valid memory. Those user copy functions don't
need a struct page, so this patch just removes the pfn_valid() check.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The implementation of __div64_31 for G5 machines is broken. The comments
in __div64_31 are correct, only the code does not do what the comments
say. The part "If the remainder has overflown subtract base and increase
the quotient" is only partially realized, the base is subtracted correctly
but the quotient is only increased if the dividend had the last bit set.
Using the correct instruction fixes the problem.
Cc: stable@kernel.org
Reported-by: Frans Pop <elendil@planet.nl>
Tested-by: Frans Pop <elendil@planet.nl>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move cio's private simple udelay function to lib/delay.c and turn it
into something much more readable. So we have all implementations
at one place.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This fixes a regression that came with 934b2857cc
("[S390] nohz/sclp: disable timer on synchronous waits.").
If udelay() gets called from a disabled context it sets the clock comparator
to a value where it expects the next interrupt. When the interrupt happens
the clock comparator gets not reset and therefore the interrupt condition
doesn't get cleared. The result is an endless timer interrupt loop.
In addition this patch fixes also the following:
rcutorture reveals that our __udelay implementation is still buggy,
since it might schedule tasklets, but prevents their execution:
NOHZ: local_softirq_pending 42
NOHZ: local_softirq_pending 02
NOHZ: local_softirq_pending 142
NOHZ: local_softirq_pending 02
To fix this we make sure that only the clock comparator interrupt
is enabled when the enabled wait psw is loaded.
Also no code gets called anymore which might schedule tasklets.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
sclp_sync_wait wait synchronously for an sclp interrupt and disables
timer interrupts. However on the irq enter paths there is an extra
check if a timer interrupt would be due and calls the timer callback.
This would schedule softirqs in the wrong context.
So introduce local_tick_enable/disable which prevents this.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
arch/s390/lib/uaccess_mvcos.c:166:
warning: 'strnlen_user_mvcos' defined but not used
arch/s390/lib/uaccess_mvcos.c:186:
warning: 'strncpy_from_user_mvcos' defined but not used
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The current uaccess page table walk code assumes at a few places that
any access is a user space access. This is not correct if somebody
has issued a set_fs(KERNEL_DS) in advance.
Add code which checks which address space we are in and with this make
sure we access the correct address space. This way we get also rid of
the dirty
if (!currrent-mm)
return -EFAULT;
hack in futex_atomic_cmpxchg_pt.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
This way we get rid of s390's NO_IDLE_HZ and use the generic dynticks
variant instead. In addition we get high resolution timers for free.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
a0c1e9073e "futex: runtime enable pi and
robust functionality" introduces a test wether futex in atomic stuff
works or not.
It does that by writing to address 0 of the kernel address space. This
will crash on older machines where addressing mode switching is enabled
but where the mvcos instruction is not available. Page table walking is
done by hand and therefore the code tries to access current->mm which
is NULL.
Therefore add an extra check, so we survive the early test.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add missing exception table entry so that the kernel can handle
proctection exceptions as well on the cs instruction. Currently only
specification exceptions are handled correctly.
The missing entry allows user space to crash the kernel.
Cc: stable <stable@kernel.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In s390's spin_lock_irqsave, interrupts remain disabled while
spinning. In other architectures like x86 and powerpc, interrupts are
re-enabled while spinning if IRQ is not masked before spin_lock_irqsave
is called.
The following patch re-enables interrupts through local_irq_restore
while spinning for a lock acquisition.
This can improve system response.
[heiko.carstens@de.ibm.com: removed saving of pc]
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Used to contain the address of the holder of the lock. But since the
spinlock code is not inlined anymore all locks contain the same address
anyway. And since in addtition nobody complained about that for ages
its obviously unused. So remove it.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
is_init() is an ambiguous name for the pid==1 check. Split it into
is_global_init() and is_container_init().
A cgroup init has it's tsk->pid == 1.
A global init also has it's tsk->pid == 1 and it's active pid namespace
is the init_pid_ns. But rather than check the active pid namespace,
compare the task structure with 'init_pid_ns.child_reaper', which is
initialized during boot to the /sbin/init process and never changes.
Changelog:
2.6.22-rc4-mm2-pidns1:
- Use 'init_pid_ns.child_reaper' to determine if a given task is the
global init (/sbin/init) process. This would improve performance
and remove dependence on the task_pid().
2.6.21-mm2-pidns2:
- [Sukadev Bhattiprolu] Changed is_container_init() calls in {powerpc,
ppc,avr32}/traps.c for the _exception() call to is_global_init().
This way, we kill only the cgroup if the cgroup's init has a
bug rather than force a kernel panic.
[akpm@linux-foundation.org: fix comment]
[sukadev@us.ibm.com: Use is_global_init() in arch/m32r/mm/fault.c]
[bunk@stusta.de: kernel/pid.c: remove unused exports]
[sukadev@us.ibm.com: Fix capability.c to work with threaded init]
Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Acked-by: Pavel Emelianov <xemul@openvz.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Herbert Poetzel <herbert@13thfloor.at>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>