Граф коммитов

43 Коммитов

Автор SHA1 Сообщение Дата
Peter Xu 4064b98270 mm: allow VM_FAULT_RETRY for multiple times
The idea comes from a discussion between Linus and Andrea [1].

Before this patch we only allow a page fault to retry once.  We achieved
this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time.  This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle the
page fault on a single page.  However that should hardly happen, and after
all for each code path to return a VM_FAULT_RETRY we'll first wait for a
condition (during which time we should possibly yield the cpu) to happen
before VM_FAULT_RETRY is really returned.

This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
flag when we receive VM_FAULT_RETRY.  It means that the page fault handler
now can retry the page fault for multiple times if necessary without the
need to generate another page fault event.  Meanwhile we still keep the
FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
page fault is the first attempt or not.

Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):

  - ALLOW_RETRY and !TRIED:  this means the page fault allows to
                             retry, and this is the first try

  - ALLOW_RETRY and TRIED:   this means the page fault allows to
                             retry, and this is not the first try

  - !ALLOW_RETRY and !TRIED: this means the page fault does not allow
                             to retry at all

  - !ALLOW_RETRY and TRIED:  this is forbidden and should never be used

In existing code we have multiple places that has taken special care of
the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY).  This patch introduces a simple helper to detect
the first retry of a page fault by checking against both (fault_flags &
FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
even the 2nd try will have the ALLOW_RETRY set, then use that helper in
all existing special paths.  One example is in __lock_page_or_retry(), now
we'll drop the mmap_sem only in the first attempt of page fault and we'll
keep it in follow up retries, so old locking behavior will be retained.

This will be a nice enhancement for current code [2] at the same time a
supporting material for the future userfaultfd-writeprotect work, since in
that work there will always be an explicit userfault writeprotect retry
for protected pages, and if that cannot resolve the page fault (e.g., when
userfaultfd-writeprotect is used in conjunction with swapped pages) then
we'll possibly need a 3rd retry of the page fault.  It might also benefit
other potential users who will have similar requirement like userfault
write-protection.

GUP code is not touched yet and will be covered in follow up patch.

Please read the thread below for more information.

[1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
[2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:30 -07:00
Peter Xu dde1607248 mm: introduce FAULT_FLAG_DEFAULT
Although there're tons of arch-specific page fault handlers, most of them
are still sharing the same initial value of the page fault flags.  Say,
merely all of the page fault handlers would allow the fault to be retried,
and they also allow the fault to respond to SIGKILL.

Let's define a default value for the fault flags to replace those initial
page fault flags that were copied over.  With this, it'll be far easier to
introduce new fault flag that can be used by all the architectures instead
of touching all the archs.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:29 -07:00
Peter Xu 24a62cf41f arc/mm: use helper fault_signal_pending()
Let ARC to use the new helper fault_signal_pending() by moving the signal
check out of the retry logic as standalone.  This should also helps to
simplify the code a bit.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220155843.9172-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:29 -07:00
Vineet Gupta 6aae3425aa ARC: mm: remove __ARCH_USE_5LEVEL_HACK
Patch series "elide extraneous generated code for folded p4d/pud/pmd", v3.

This series came out of seemingly benign excursion into
understanding/removing __ARCH_USE_5LEVEL_HACK from ARC port showing some
extraneous code being generated despite folded p4d/pud/pmd

| bloat-o-meter2 vmlinux-[AB]*
| add/remove: 0/0 grow/shrink: 3/0 up/down: 130/0 (130)
| function                                     old     new   delta
| free_pgd_range                               548     660    +112
| p4d_clear_bad                                  2      20     +18

The patches here address that

| bloat-o-meter2 vmlinux-[BF]*
| add/remove: 0/2 grow/shrink: 0/1 up/down: 0/-386 (-386)
| function                                     old     new   delta
| pud_clear_bad                                 20       -     -20
| p4d_clear_bad                                 20       -     -20
| free_pgd_range                               660     314    -346

The code savings are not a whole lot, but still worthwhile IMHO.

This patch (of 5):

With paging code made 5-level compliant, this is no longer needed.  ARC
has software page walker with 2 lookup levels (pgd -> pte)

This was expected to be non functional change but ended with slight
code bloat due to needless inclusions of p*d_free_tlb() macros which
will be addressed in further patches.

| bloat-o-meter2 vmlinux-[AB]*
| add/remove: 0/0 grow/shrink: 2/0 up/down: 128/0 (128)
| function                                     old     new   delta
| free_pgd_range                               546     656    +110
| p4d_clear_bad                                  2      20     +18
| Total: Before=4137148, After=4137276, chg 0.000000%

Link: http://lkml.kernel.org/r/20191016162400.14796-2-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-12-01 06:29:19 -08:00
Linus Torvalds 3eb514866f ARC updates for 5.3-rc1
- long due rewrite of do_page_fault
 
  - refactoring of entry/exit code to utilize the double load/store instructions
 
  - hsdk platform updates
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJdLi8cAAoJEGnX8d3iisJeoZkQAJba/3Q9TZrqMjQ2itVRwHbf
 E/TxpHRy2VynQBHgz81LgvZYQ1wxTCu8FLj8Ie4B2sM0HJ9O1ZPCLzwdQXJmpB4D
 LsY7T6rIGl7R17n3oJ0ZQgYmYki4Fxje9a98W/ylwTDpWPilIvWUwTMpcQ86wQ7K
 9izHv+vZ9hVHKtu2svs3WdDy4rPKMNOZkyheUpzIhsQlmuRMQJEG4I1M432L+4q2
 5Q1nl0XHuVOShqtbmpz/fK9/+A6sArP/hCIbT7i0QsktVAsxwhIwTWUWhCl4GzRi
 Aqq9GWZIciHo2+NmAa+nvrzRmGb/DkAoC+iU2C8xfgqm9fbfKSLNGggGvH1S7+6a
 ZVrB9HIhHO53OGjC4ysnBfHQpi8oGvl7M/AVC3Ij3mdu56aIWRPcmSUiqMa4M+Bc
 preXq+3pto/lC5DpIRHjsAQjNGTsfJua7XKiXngCdmLG/B9hbPCux0B1DG8+9OdY
 IV8BDFtWVWHk7YQpuMXtrb+Zxk28SNdtiPgFgHgp7syKn9v1EIVHoC2Gx1v6C425
 HZtA850dI3JOl/cnCZ5U7KxaDCY8UEcNUkkUnjlo1f2VySkaxPdd8Dw+2IEnB1Hl
 7XefU6BnYsg6DooGtOq0YzxjIzBdS4w4LhsgHuqe4rbRAonLff/rtEnnQrnF2Kdq
 hviAkMxhkVn+CwijUtoO
 =hfiN
 -----END PGP SIGNATURE-----

Merge tag 'arc-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC updates from Vineet Gupta:

 - long due rewrite of do_page_fault

 - refactoring of entry/exit code to utilize the double load/store
   instructions

 - hsdk platform updates

* tag 'arc-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: [plat-hsdk]: Enable AXI DW DMAC in defconfig
  ARC: [plat-hsdk]: enable DW SPI controller
  ARC: hide unused function unw_hdr_alloc
  ARC: [haps] Add Virtio support
  ARCv2: entry: simplify return to Delay Slot via interrupt
  ARC: entry: EV_Trap expects r10 (vs. r9) to have exception cause
  ARCv2: entry: rewrite to enable use of double load/stores LDD/STD
  ARCv2: entry: avoid a branch
  ARCv2: entry: push out the Z flag unclobber from common EXCEPTION_PROLOGUE
  ARCv2: entry: comments about hardware auto-save on taken interrupts
  ARC: mm: do_page_fault refactor #8: release mmap_sem sooner
  ARC: mm: do_page_fault refactor #7: fold the various error handling
  ARC: mm: do_page_fault refactor #6: error handlers to use same pattern
  ARC: mm: do_page_fault refactor #5: scoot no_context to end
  ARC: mm: do_page_fault refactor #4: consolidate retry related logic
  ARC: mm: do_page_fault refactor #3: tidyup vma access permission code
  ARC: mm: do_page_fault refactor #2: remove short lived variable
  ARC: mm: do_page_fault refactor #1: remove label @good_area
2019-07-16 15:07:51 -07:00
Linus Torvalds 5ad18b2e60 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull force_sig() argument change from Eric Biederman:
 "A source of error over the years has been that force_sig has taken a
  task parameter when it is only safe to use force_sig with the current
  task.

  The force_sig function is built for delivering synchronous signals
  such as SIGSEGV where the userspace application caused a synchronous
  fault (such as a page fault) and the kernel responded with a signal.

  Because the name force_sig does not make this clear, and because the
  force_sig takes a task parameter the function force_sig has been
  abused for sending other kinds of signals over the years. Slowly those
  have been fixed when the oopses have been tracked down.

  This set of changes fixes the remaining abusers of force_sig and
  carefully rips out the task parameter from force_sig and friends
  making this kind of error almost impossible in the future"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
  signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
  signal: Remove the signal number and task parameters from force_sig_info
  signal: Factor force_sig_info_to_task out of force_sig_info
  signal: Generate the siginfo in force_sig
  signal: Move the computation of force into send_signal and correct it.
  signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
  signal: Remove the task parameter from force_sig_fault
  signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
  signal: Explicitly call force_sig_fault on current
  signal/unicore32: Remove tsk parameter from __do_user_fault
  signal/arm: Remove tsk parameter from __do_user_fault
  signal/arm: Remove tsk parameter from ptrace_break
  signal/nds32: Remove tsk parameter from send_sigtrap
  signal/riscv: Remove tsk parameter from do_trap
  signal/sh: Remove tsk parameter from force_sig_info_fault
  signal/um: Remove task parameter from send_sigtrap
  signal/x86: Remove task parameter from send_sigtrap
  signal: Remove task parameter from force_sig_mceerr
  signal: Remove task parameter from force_sig
  signal: Remove task parameter from force_sigsegv
  ...
2019-07-08 21:48:15 -07:00
Vineet Gupta 926150db85 ARC: mm: do_page_fault refactor #8: release mmap_sem sooner
In case of successful page fault handling, this patch releases mmap_sem
before updating the perf stat event for major/minor faults. So even
though the contention reduction is NOT super high, it is still an
improvement.

There's an additional code size improvement as we only have 2 up_read()
calls now.

Note to myself:
--------------

1. Given the way it is done, we are forced to move @bad_area label earlier
   causing the various "goto bad_area" cases to hit perf stat code.

 - PERF_COUNT_SW_PAGE_FAULTS is NOW updated for access errors which is what
   arm/arm64 seem to be doing as well (with slightly different code)
 - PERF_COUNT_SW_PAGE_FAULTS_{MAJ,MIN} must NOT be updated for the
   error case which is guarded by now setting @fault initial value
   to VM_FAULT_ERROR which serves both cases when handle_mm_fault()
   returns error or is not called at all.

2. arm/arm64 use two homebrew fault flags VM_FAULT_BAD{MAP,MAPACCESS}
   which I was inclined to add too but seems not needed for ARC

 - given that we have everything is 1 function we can still use goto
 - we setup si_code at the right place (arm* do that in the end)
 - we init fault already to error value which guards entry into perf
   stats event update

Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-07-01 11:02:22 -07:00
Vineet Gupta 5e91bf5ce9 ARC: mm: do_page_fault refactor #7: fold the various error handling
- single up_read() call vs. 4
 - so much easier on eyes

Technically it seems like @bad_area label moved up, but even in old
regime, it was a special case of delivering SIGSEGV unconditionally
which we now do as well, although with checks.

Also note that @fault needs to be initialized since we can land in
@bad_area (which reads it) without setting it up with return value of
handle_mm_fault() - failing to do so did bite us although as a side
effect of different patch: see [1]

[1]: http://lists.infradead.org/pipermail/linux-snps-arc/2019-May/005803.html

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-07-01 11:02:22 -07:00
Vineet Gupta 98cb57ad70 ARC: mm: do_page_fault refactor #6: error handlers to use same pattern
- up_read
 - if !user_mode
 - whatever error handling

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-07-01 11:02:22 -07:00
Vineet Gupta d0542c7eac ARC: mm: do_page_fault refactor #5: scoot no_context to end
This is different than the rest of signal handling stuff

No functional change

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-07-01 11:02:22 -07:00
Vineet Gupta 02c88d142e ARC: mm: do_page_fault refactor #4: consolidate retry related logic
stats update code can now elide "retry" check and additional level of
indentation since all retry handling is done ahead of it already

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-07-01 11:02:22 -07:00
Vineet Gupta 85c5e33763 ARC: mm: do_page_fault refactor #3: tidyup vma access permission code
The coding pattern to NOT intialize variables at declaration time but
rather near code which makes us eof them makes it much easier to grok
the overall logic, specially when the init is not simply 0 or 1

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-07-01 11:02:22 -07:00
Vineet Gupta 13e2cc1240 ARC: mm: do_page_fault refactor #2: remove short lived variable
Compiler will do this anyways, still..

No functional change.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-07-01 11:02:22 -07:00
Vineet Gupta 450e5b6f65 ARC: mm: do_page_fault refactor #1: remove label @good_area
Invert the condition for stack expansion.
No functional change

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-07-01 11:02:22 -07:00
Thomas Gleixner d2912cb15b treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500
Based on 2 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation #

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 4122 file(s).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-19 17:09:55 +02:00
Eric W. Biederman 2e1661d267 signal: Remove the task parameter from force_sig_fault
As synchronous exceptions really only make sense against the current
task (otherwise how are you synchronous) remove the task parameter
from from force_sig_fault to make it explicit that is what is going
on.

The two known exceptions that deliver a synchronous exception to a
stopped ptraced task have already been changed to
force_sig_fault_to_task.

The callers have been changed with the following emacs regular expression
(with obvious variations on the architectures that take more arguments)
to avoid typos:

force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)]
->
force_sig_fault(\1,\2,\3)

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-05-29 09:31:43 -05:00
Eric W. Biederman 351b6825b3 signal: Explicitly call force_sig_fault on current
Update the calls of force_sig_fault that pass in a variable that is
set to current earlier to explicitly use current.

This is to make the next change that removes the task parameter
from force_sig_fault easier to verify.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-05-29 09:31:43 -05:00
Eugeniy Paltsev a8c715b4dd ARC: mm: SIGSEGV userspace trying to access kernel virtual memory
As of today if userspace process tries to access a kernel virtual addres
(0x7000_0000 to 0x7ffff_ffff) such that a legit kernel mapping already
exists, that process hangs instead of being killed with SIGSEGV

Fix that by ensuring that do_page_fault() handles kenrel vaddr only if
in kernel mode.

And given this, we can also simplify the code a bit. Now a vmalloc fault
implies kernel mode so its failure (for some reason) can reuse the
@no_context label and we can remove @bad_area_nosemaphore.

Reproduce user test for original problem:

------------------------>8-----------------
 #include <stdlib.h>
 #include <stdint.h>

 int main(int argc, char *argv[])
 {
 	volatile uint32_t temp;

 	temp = *(uint32_t *)(0x70000000);
 }
------------------------>8-----------------

Cc: <stable@vger.kernel.org>
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-05-20 10:09:10 -07:00
Vineet Gupta 4d447455e7 ARC: mm: do_page_fault fixes #1: relinquish mmap_sem if signal arrives while handle_mm_fault
do_page_fault() forgot to relinquish mmap_sem if a signal came while
handling handle_mm_fault() - due to say a ctl+c or oom etc.
This would later cause a deadlock by acquiring it twice.

This came to light when running libc testsuite tst-tls3-malloc test but
is likely also the cause for prior seen LTP failures. Using lockdep
clearly showed what the issue was.

| # while true; do ./tst-tls3-malloc ; done
| Didn't expect signal from child: got `Segmentation fault'
| ^C
| ============================================
| WARNING: possible recursive locking detected
| 4.17.0+ #25 Not tainted
| --------------------------------------------
| tst-tls3-malloc/510 is trying to acquire lock:
| 606c7728 (&mm->mmap_sem){++++}, at: __might_fault+0x28/0x5c
|
|but task is already holding lock:
|606c7728 (&mm->mmap_sem){++++}, at: do_page_fault+0x9c/0x2a0
|
| other info that might help us debug this:
|  Possible unsafe locking scenario:
|
|       CPU0
|       ----
|  lock(&mm->mmap_sem);
|  lock(&mm->mmap_sem);
|
| *** DEADLOCK ***
|

------------------------------------------------------------
What the change does is not obvious (note to myself)

prior code was

| do_page_fault
|
|   down_read()		<-- lock taken
|   handle_mm_fault	<-- signal pending as this runs
|   if fatal_signal_pending
|       if VM_FAULT_ERROR
|           up_read
|       if user_mode
|          return	<-- lock still held, this was the BUG

New code

| do_page_fault
|
|   down_read()		<-- lock taken
|   handle_mm_fault	<-- signal pending as this runs
|   if fatal_signal_pending
|       if VM_FAULT_RETRY
|          return       <-- not same case as above, but still OK since
|                           core mm already relinq lock for FAULT_RETRY
|    ...
|
|   < Now falls through for bug case above >
|
|   up_read()		<-- lock relinquished

Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2019-01-17 16:24:39 -08:00
Davidlohr Bueso d8d7d842e8 arch/arc/mm/fault.c: remove caller signal_pending_branch predictions
This is already done for us internally by the signal machinery.

Link: http://lkml.kernel.org/r/20181116002713.8474-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-01-04 13:13:48 -08:00
Eugeniy Paltsev 121e38e5ac ARC: mm: fix uninitialised signal code in do_page_fault
Commit 15773ae938 ("signal/arc: Use force_sig_fault where
appropriate") introduced undefined behaviour by leaving si_code
unitiailized and leaking random kernel values to user space.

Fixes: 15773ae938 ("signal/arc: Use force_sig_fault where appropriate")
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2018-11-12 10:38:27 -08:00
Eric W. Biederman 15773ae938 signal/arc: Use force_sig_fault where appropriate
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-27 21:59:35 +02:00
Souptick Joarder 50a7ca3c6f mm: convert return type of handle_mm_fault() caller to vm_fault_t
Use new return type vm_fault_t for fault handler.  For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno.  Once all instances are converted, vm_fault_t will become a
distinct type.

Ref-> commit 1c8f422059 ("mm: change return type to vm_fault_t")

In this patch all the caller of handle_mm_fault() are changed to return
vm_fault_t type.

Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: David S. Miller <davem@davemloft.net>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:28 -07:00
Eric W. Biederman 3eb0f5193b signal: Ensure every siginfo we send has all bits initialized
Call clear_siginfo to ensure every stack allocated siginfo is properly
initialized before being passed to the signal sending functions.

Note: It is not safe to depend on C initializers to initialize struct
siginfo on the stack because C is allowed to skip holes when
initializing a structure.

The initialization of struct siginfo in tracehook_report_syscall_exit
was moved from the helper user_single_step_siginfo into
tracehook_report_syscall_exit itself, to make it clear that the local
variable siginfo gets fully initialized.

In a few cases the scope of struct siginfo has been reduced to make it
clear that siginfo siginfo is not used on other paths in the function
in which it is declared.

Instances of using memset to initialize siginfo have been replaced
with calls clear_siginfo for clarity.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-25 10:40:51 -05:00
Liav Rehana ddf720f86e ARC: typo fix in mm/fault.c
Signed-off-by: Liav Rehana <liavr@mellanox.com>
Signed-off-by: Noam Camus <noamca@mellanox.com>
Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-08-28 15:17:36 -07:00
Ingo Molnar 3f07c01441 sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h>
We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
will have to be picked up from other headers and a couple of .c files.

Create a trivial placeholder <linux/sched/signal.h> file that just
maps to <linux/sched.h> to make this patch obviously correct and
bisectable.

Include the new header in the files that are going to need it.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:29 +01:00
Kirill A. Shutemov dcddffd41d mm: do not pass mm_struct into handle_mm_fault
We always have vma->vm_mm around.

Link: http://lkml.kernel.org/r/1466021202-61880-8-git-send-email-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-07-26 16:19:19 -07:00
Vineet Gupta 336e2136e1 ARC: mm: preps ahead of HIGHMEM support
Before we plug in highmem support, some of code needs to be ready for it
 - copy_user_highpage() needs to be using the kmap_atomic API
 - mk_pte() can't assume page_address()
 - do_page_fault() can't assume VMALLOC_END is end of kernel vaddr space

Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-10-28 19:31:05 +05:30
David Hildenbrand 70ffdb9393 mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler
Introduce faulthandler_disabled() and use it to check for irq context and
disabled pagefaults (via pagefault_disable()) in the pagefault handlers.

Please note that we keep the in_atomic() checks in place - to detect
whether in irq context (in which case preemption is always properly
disabled).

In contrast, preempt_disable() should never be used to disable pagefaults.
With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
counter, and therefore the result of in_atomic() differs.
We validate that condition by using might_fault() checks when calling
might_sleep().

Therefore, add a comment to faulthandler_disabled(), describing why this
is needed.

faulthandler_disabled() and pagefault_disable() are defined in
linux/uaccess.h, so let's properly add that include to all relevant files.

This patch is based on a patch from Thomas Gleixner.

Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 08:39:15 +02:00
Vineet Gupta ceed97ab4f ARC: perf: Enable generic software events
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2015-02-27 10:15:01 +05:30
Guenter Roeck e262eb9381 arc: mm: Fix build failure
Fix misspelled define.

Fixes: 33692f2759 ("vm: add VM_FAULT_SIGSEGV handling support")
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-30 10:31:14 -08:00
Linus Torvalds 33692f2759 vm: add VM_FAULT_SIGSEGV handling support
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.

That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works.  However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.

In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV.  And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.

However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d45 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space.  And user space really
expected SIGSEGV, not SIGBUS.

To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it.  They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.

This is the mindless minimal patch to do this.  A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.

Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.

Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-29 10:51:32 -08:00
David Rientjes 2a5e95d418 mm, arc: remove obsolete pagefault oom killer comment
Commit 609838cfed ("mm: invoke oom-killer from remaining unconverted page
fault handlers") converted arc to call pagefault_out_of_memory(), so remove
the comment about future conversion.

Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2014-08-07 16:10:59 -07:00
Vineet Gupta 21a63b5604 ARC: Change calling convention of do_page_fault()
switch the args (address, pt_regs) to match with all the other "C"
exception handlers.

This removes the awkwardness in EV_ProtV for page fault vs. unaligned
access.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-11-06 10:41:39 +05:30
Vineet Gupta 9c41f4eeb9 ARC: Incorrect mm reference used in vmalloc fault handler
A vmalloc fault needs to sync up PGD/PTE entry from init_mm to current
task's "active_mm".  ARC vmalloc fault handler however was using mm.

A vmalloc fault for non user task context (actually pre-userland, from
init thread's open for /dev/console) caused the handler to deref NULL mm
(for mm->pgd)

The reasons it worked so far is amazing:

1. By default (!SMP), vmalloc fault handler uses a cached value of PGD.
   In SMP that MMU register is repurposed hence need for mm pointer deref.

2. In pre-3.12 SMP kernel, the problem triggering vmalloc didn't exist in
   pre-userland code path - it was introduced with commit 20bafb3d23
   "n_tty: Move buffers into n_tty_data"

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: stable@vger.kernel.org    #3.10 and 3.11
Cc: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-02 10:27:04 -07:00
Johannes Weiner 759496ba64 arch: mm: pass userspace fault flag to generic fault handler
Unlike global OOM handling, memory cgroup code will invoke the OOM killer
in any OOM situation because it has no way of telling faults occuring in
kernel context - which could be handled more gracefully - from
user-triggered faults.

Pass a flag that identifies faults originating in user space from the
architecture-specific fault handlers to generic code so that memcg OOM
handling can be improved.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:01 -07:00
Johannes Weiner 94bce453c7 arch: mm: remove obsolete init OOM protection
The memcg code can trap tasks in the context of the failing allocation
until an OOM situation is resolved.  They can hold all kinds of locks
(fs, mm) at this point, which makes it prone to deadlocking.

This series converts memcg OOM handling into a two step process that is
started in the charge context, but any waiting is done after the fault
stack is fully unwound.

Patches 1-4 prepare architecture handlers to support the new memcg
requirements, but in doing so they also remove old cruft and unify
out-of-memory behavior across architectures.

Patch 5 disables the memcg OOM handling for syscalls, readahead, kernel
faults, because they can gracefully unwind the stack with -ENOMEM.  OOM
handling is restricted to user triggered faults that have no other
option.

Patch 6 reworks memcg's hierarchical OOM locking to make it a little
more obvious wth is going on in there: reduce locked regions, rename
locking functions, reorder and document.

Patch 7 implements the two-part OOM handling such that tasks are never
trapped with the full charge stack in an OOM situation.

This patch:

Back before smart OOM killing, when faulting tasks were killed directly on
allocation failures, the arch-specific fault handlers needed special
protection for the init process.

Now that all fault handlers call into the generic OOM killer (see commit
609838cfed97: "mm: invoke oom-killer from remaining unconverted page
fault handlers"), which already provides init protection, the
arch-specific leftovers can be removed.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: azurIt <azurit@pobox.sk>
Acked-by: Vineet Gupta <vgupta@synopsys.com>	[arch/arc bits]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-12 15:38:01 -07:00
Johannes Weiner 609838cfed mm: invoke oom-killer from remaining unconverted page fault handlers
A few remaining architectures directly kill the page faulting task in an
out of memory situation.  This is usually not a good idea since that
task might not even use a significant amount of memory and so may not be
the optimal victim to resolve the situation.

Since 2.6.29's 1c0fe6e ("mm: invoke oom-killer from page fault") there
is a hook that architecture page fault handlers are supposed to call to
invoke the OOM killer and let it pick the right task to kill.  Convert
the remaining architectures over to this hook.

To have the previous behavior of simply taking out the faulting task the
vm.oom_kill_allocating_task sysctl can be set to 1.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>   [arch/arc bits]
Cc: James Hogan <james.hogan@imgtec.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:20 -07:00
Vineet Gupta 38a9ff6d24 ARC: Remove explicit passing around of ECR
With ECR now part of pt_regs

* No need to propagate from lowest asm handlers as arg
* No need to save it in tsk->thread.cause_code
* Avoid bit chopping to access the bit-fields

More code consolidation, cleanup

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-26 15:30:50 +05:30
Vineet Gupta 3e1ae44188 ARC: [mm] Remove @write argument to do_page_fault()
This can be ascertained within do_page_fault() since it gets the full
ECR (Exception Cause Register).

Further, for both the callers of do_page_fault(): Prot-V / D-TLB-Miss,
the cause sub-fields in ECR are same for same type of access, making the
code much more simpler.

D-TLB-Miss [LD] 0x00_21_01_00
Prot-V     [LD] 0x00_23_01_00
                        ^^
D-TLB-Miss [ST] 0x00_21_02_00
Prot-V     [ST] 0x00_23_02_00
                        ^^
D-TLB-Miss [EX] 0x00_21_03_00
Prot-V     [EX] 0x00_23_03_00
                        ^^

This helps code consolidation, which is even better when moving code from
assembler to "C".

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 19:23:20 +05:30
Vineet Gupta da1677b02d ARC: Disintegrate arcregs.h
* Move the various sub-system defines/types into relevant files/functions
  (reduces compilation time)

* move CPU specific stuff out of asm/tlb.h into asm/mmu.h

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-06-22 13:46:42 +05:30
Sachin Kamat 39d0c30d00 ARC: Remove unneeded version.h header include
version.h header file inclusion is not necessary as detected by
versioncheck script.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-04-09 12:21:14 +05:30
Vineet Gupta fbd7053a78 ARC: Page Fault handling
This includes recent changes to make handler "retry" and/or "killable"

The killable (early exit) logic is loosely based on how SH implements it
	return if SIGKILL + either of VM_FAULT_OOM or VM_FAULT_RETRY
which is different from Hexagon implementation which would NOT early
exit for
	SIGKILL + VM_FAULT_OOM + !VM_FAULT_RETRY

credits: Non executable stack support from Simon Spooner

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-02-15 23:15:53 +05:30