do_page_fault() forgot to relinquish mmap_sem if a signal came while
handling handle_mm_fault() - due to say a ctl+c or oom etc.
This would later cause a deadlock by acquiring it twice.
This came to light when running libc testsuite tst-tls3-malloc test but
is likely also the cause for prior seen LTP failures. Using lockdep
clearly showed what the issue was.
| # while true; do ./tst-tls3-malloc ; done
| Didn't expect signal from child: got `Segmentation fault'
| ^C
| ============================================
| WARNING: possible recursive locking detected
| 4.17.0+ #25 Not tainted
| --------------------------------------------
| tst-tls3-malloc/510 is trying to acquire lock:
| 606c7728 (&mm->mmap_sem){++++}, at: __might_fault+0x28/0x5c
|
|but task is already holding lock:
|606c7728 (&mm->mmap_sem){++++}, at: do_page_fault+0x9c/0x2a0
|
| other info that might help us debug this:
| Possible unsafe locking scenario:
|
| CPU0
| ----
| lock(&mm->mmap_sem);
| lock(&mm->mmap_sem);
|
| *** DEADLOCK ***
|
------------------------------------------------------------
What the change does is not obvious (note to myself)
prior code was
| do_page_fault
|
| down_read() <-- lock taken
| handle_mm_fault <-- signal pending as this runs
| if fatal_signal_pending
| if VM_FAULT_ERROR
| up_read
| if user_mode
| return <-- lock still held, this was the BUG
New code
| do_page_fault
|
| down_read() <-- lock taken
| handle_mm_fault <-- signal pending as this runs
| if fatal_signal_pending
| if VM_FAULT_RETRY
| return <-- not same case as above, but still OK since
| core mm already relinq lock for FAULT_RETRY
| ...
|
| < Now falls through for bug case above >
|
| up_read() <-- lock relinquished
Cc: stable@vger.kernel.org
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
In setup_arch_memory we reserve the memory area wherein the kernel
is located. Current implementation may reserve more memory than
it actually required in case of CONFIG_LINUX_LINK_BASE is not
equal to CONFIG_LINUX_RAM_BASE. This happens because we calculate
start of the reserved region relatively to the CONFIG_LINUX_RAM_BASE
and end of the region relatively to the CONFIG_LINUX_RAM_BASE.
For example in case of HSDK board we wasted 256MiB of physical memory:
------------------->8------------------------------
Memory: 770416K/1048576K available (5496K kernel code,
240K rwdata, 1064K rodata, 2200K init, 275K bss,
278160K reserved, 0K cma-reserved)
------------------->8------------------------------
Fix that.
Fixes: 9ed68785f7 ("ARC: mm: Decouple RAM base address from kernel link addr")
Cc: stable@vger.kernel.org #4.14+
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
This is already done for us internally by the signal machinery.
Link: http://lkml.kernel.org/r/20181116002713.8474-4-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull Devicetree updates from Rob Herring:
"The biggest highlight here is the start of using json-schema for DT
bindings. Being able to validate bindings has been discussed for years
with little progress.
- Initial support for DT bindings using json-schema language. This is
the start of converting DT bindings from free-form text to a
structured format.
- Reworking of initrd address initialization. This moves to using the
phys address instead of virt addr in the DT parsing code. This
rework was motivated by CONFIG_DEV_BLK_INITRD causing unnecessary
rebuilding of lots of files.
- Fix stale phandle entries in phandle cache
- DT overlay validation improvements. This exposed several memory
leak bugs which have been fixed.
- Use node name and device_type helper functions in DT code
- Last remaining conversions to using %pOFn printk specifier instead
of device_node.name directly
- Create new common RTC binding doc and move all trivial RTC devices
out of trivial-devices.txt.
- New bindings for Freescale MAG3110 magnetometer, Cadence Sierra
PHY, and Xen shared memory
- Update dtc to upstream version v1.4.7-57-gf267e674d145"
* tag 'devicetree-for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (68 commits)
of: __of_detach_node() - remove node from phandle cache
of: of_node_get()/of_node_put() nodes held in phandle cache
gpio-omap.txt: add reg and interrupts properties
dt-bindings: mrvl,intc: fix a trivial typo
dt-bindings: iio: magnetometer: add dt-bindings for freescale mag3110
dt-bindings: Convert trivial-devices.txt to json-schema
dt-bindings: arm: mrvl: amend Browstone compatible string
dt-bindings: arm: Convert Tegra board/soc bindings to json-schema
dt-bindings: arm: Convert ZTE board/soc bindings to json-schema
dt-bindings: arm: Add missing Xilinx boards
dt-bindings: arm: Convert Xilinx board/soc bindings to json-schema
dt-bindings: arm: Convert VIA board/soc bindings to json-schema
dt-bindings: arm: Convert ST STi board/soc bindings to json-schema
dt-bindings: arm: Convert SPEAr board/soc bindings to json-schema
dt-bindings: arm: Convert CSR SiRF board/soc bindings to json-schema
dt-bindings: arm: Convert QCom board/soc bindings to json-schema
dt-bindings: arm: Convert TI nspire board/soc bindings to json-schema
dt-bindings: arm: Convert TI davinci board/soc bindings to json-schema
dt-bindings: arm: Convert Calxeda board/soc bindings to json-schema
dt-bindings: arm: Convert Altera board/soc bindings to json-schema
...
A huge update this time, but a lot of that is just consolidating or
removing code:
- provide a common DMA_MAPPING_ERROR definition and avoid indirect
calls for dma_map_* error checking
- use direct calls for the DMA direct mapping case, avoiding huge
retpoline overhead for high performance workloads
- merge the swiotlb dma_map_ops into dma-direct
- provide a generic remapping DMA consistent allocator for architectures
that have devices that perform DMA that is not cache coherent. Based
on the existing arm64 implementation and also used for csky now.
- improve the dma-debug infrastructure, including dynamic allocation
of entries (Robin Murphy)
- default to providing chaining scatterlist everywhere, with opt-outs
for the few architectures (alpha, parisc, most arm32 variants) that
can't cope with it
- misc sparc32 dma-related cleanups
- remove the dma_mark_clean arch hook used by swiotlb on ia64 and
replace it with the generic noncoherent infrastructure
- fix the return type of dma_set_max_seg_size (Niklas Söderlund)
- move the dummy dma ops for not DMA capable devices from arm64 to
common code (Robin Murphy)
- ensure dma_alloc_coherent returns zeroed memory to avoid kernel data
leaks through userspace. We already did this for most common
architectures, but this ensures we do it everywhere.
dma_zalloc_coherent has been deprecated and can hopefully be
removed after -rc1 with a coccinelle script.
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlwctQgLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYMxgQ//dBpAfS4/J76CdAbYry2zqgcOUU9hIrD6NHiEMWov
ltJxyvEl3LsUmIdEj3aCrYL9jZN0qsnCzn5BVj2c3jDIVgD64fAr7HDf/PbEEfKb
j6/GgEnVLPZV+sQMvhNA5jOzHrkseaqPa4/pNLFZ/l8jnuZ2d+btusDWJpMoVDer
TXVwtIfgeIu0gTygYOShLYXd5qptWKWsZEpbTZOO2sE6+x+ZJX7yQYUxYDTlcOIj
JWVO2l5QNHPc5T9o2at+6L5aNUvnZOxT79sWgyZLn0Kc+FagKAVwfLqUEl0v7foG
8k/xca5/8p3afB1DfrIrtplJqis7cVgdyGxriwuuoO8X4F0nPyWwpGmxsBhrWwwl
xTqC4UorEJ7QwoP6Azopk/vYI2QXIUBLjuCJCuFXZj9+2BGf4IfvBY1S2cLM9qLs
HMcxQonuXJii044KEFS96ePEuiT+igVINweIFBKWcgNCEG0UQtyL6RQ1U5297ipF
JiWZAqD+p9X52UdKS+oKfAiZEekMXn6Xyo97+YCiNpfOo0GP5eEcwhL+JpY4AiRq
apPXtsRy2o1s8yfjdraUIM2Mc2n62vFKb35oUbGCd/QO9piPrFQHl6T0HHcHk4YR
XrUXcHieFZBCYqh7ZVa4RL8Msq1wvGuTL4Dxl43mXdsMoUFRR6eSNWLoAV4IpOLZ
WgA=
=in72
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping
Pull DMA mapping updates from Christoph Hellwig:
"A huge update this time, but a lot of that is just consolidating or
removing code:
- provide a common DMA_MAPPING_ERROR definition and avoid indirect
calls for dma_map_* error checking
- use direct calls for the DMA direct mapping case, avoiding huge
retpoline overhead for high performance workloads
- merge the swiotlb dma_map_ops into dma-direct
- provide a generic remapping DMA consistent allocator for
architectures that have devices that perform DMA that is not cache
coherent. Based on the existing arm64 implementation and also used
for csky now.
- improve the dma-debug infrastructure, including dynamic allocation
of entries (Robin Murphy)
- default to providing chaining scatterlist everywhere, with opt-outs
for the few architectures (alpha, parisc, most arm32 variants) that
can't cope with it
- misc sparc32 dma-related cleanups
- remove the dma_mark_clean arch hook used by swiotlb on ia64 and
replace it with the generic noncoherent infrastructure
- fix the return type of dma_set_max_seg_size (Niklas Söderlund)
- move the dummy dma ops for not DMA capable devices from arm64 to
common code (Robin Murphy)
- ensure dma_alloc_coherent returns zeroed memory to avoid kernel
data leaks through userspace. We already did this for most common
architectures, but this ensures we do it everywhere.
dma_zalloc_coherent has been deprecated and can hopefully be
removed after -rc1 with a coccinelle script"
* tag 'dma-mapping-4.21' of git://git.infradead.org/users/hch/dma-mapping: (73 commits)
dma-mapping: fix inverted logic in dma_supported
dma-mapping: deprecate dma_zalloc_coherent
dma-mapping: zero memory returned from dma_alloc_*
sparc/iommu: fix ->map_sg return value
sparc/io-unit: fix ->map_sg return value
arm64: default to the direct mapping in get_arch_dma_ops
PCI: Remove unused attr variable in pci_dma_configure
ia64: only select ARCH_HAS_DMA_COHERENT_TO_PFN if swiotlb is enabled
dma-mapping: bypass indirect calls for dma-direct
vmd: use the proper dma_* APIs instead of direct methods calls
dma-direct: merge swiotlb_dma_ops into the dma_direct code
dma-direct: use dma_direct_map_page to implement dma_direct_map_sg
dma-direct: improve addressability error reporting
swiotlb: remove dma_mark_clean
swiotlb: remove SWIOTLB_MAP_ERROR
ACPI / scan: Refactor _CCA enforcement
dma-mapping: factor out dummy DMA ops
dma-mapping: always build the direct mapping code
dma-mapping: move dma_cache_sync out of line
dma-mapping: move various slow path functions out of line
...
If we want to map memory from the DMA allocator to userspace it must be
zeroed at allocation time to prevent stale data leaks. We already do
this on most common architectures, but some architectures don't do this
yet, fix them up, either by passing GFP_ZERO when we use the normal page
allocator or doing a manual memset otherwise.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Sam Ravnborg <sam@ravnborg.org> [sparc]
Avoid expensive indirect calls in the fast path DMA mapping
operations by directly calling the dma_direct_* ops if we are using
the directly mapped DMA operations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Tony Luck <tony.luck@intel.com>
ARC, ARM, ARM64 and Unicore32 are all capable of parsing the "initrd="
command line parameter to allow specifying the physical address and size
of an initrd. Move that parsing into init/do_mounts_initrd.c such that
we no longer duplicate that logic.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Commit 15773ae938 ("signal/arc: Use force_sig_fault where
appropriate") introduced undefined behaviour by leaving si_code
unitiailized and leaking random kernel values to user space.
Fixes: 15773ae938 ("signal/arc: Use force_sig_fault where appropriate")
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
If IOC was already enabled (due to bootloader) it technically needs to
be reconfigured with aperture base,size corresponding to Linux memory map
which will certainly be different than uboot's. But disabling and
reenabling IOC when DMA might be potentially active is tricky business.
To avoid random memory issues later, just panic here and ask user to
upgrade bootloader to one which doesn't enable IOC
This was actually seen as issue on some of the HSDK board with a version
of uboot which enabled IOC. There were random issues later with starting
of X or peripherals etc.
Also while I'm at it, replace hardcoded bits in ARC_REG_IO_COH_PARTIAL
and ARC_REG_IO_COH_ENABLE registers by definitions.
Inspired by: https://lkml.org/lkml/2018/1/19/557
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.
The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>
@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>
[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The alloc_bootmem_low_pages() function allocates PAGE_SIZE aligned regions
from low memory. memblock_alloc_low() with alignment set to PAGE_SIZE does
exactly the same thing.
The conversion is done using the following semantic patch:
@@
expression e;
@@
- alloc_bootmem_low_pages(e)
+ memblock_alloc_low(e, PAGE_SIZE)
Link: http://lkml.kernel.org/r/1536927045-23536-19-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull siginfo updates from Eric Biederman:
"I have been slowly sorting out siginfo and this is the culmination of
that work.
The primary result is in several ways the signal infrastructure has
been made less error prone. The code has been updated so that manually
specifying SEND_SIG_FORCED is never necessary. The conversion to the
new siginfo sending functions is now complete, which makes it
difficult to send a signal without filling in the proper siginfo
fields.
At the tail end of the patchset comes the optimization of decreasing
the size of struct siginfo in the kernel from 128 bytes to about 48
bytes on 64bit. The fundamental observation that enables this is by
definition none of the known ways to use struct siginfo uses the extra
bytes.
This comes at the cost of a small user space observable difference.
For the rare case of siginfo being injected into the kernel only what
can be copied into kernel_siginfo is delivered to the destination, the
rest of the bytes are set to 0. For cases where the signal and the
si_code are known this is safe, because we know those bytes are not
used. For cases where the signal and si_code combination is unknown
the bits that won't fit into struct kernel_siginfo are tested to
verify they are zero, and the send fails if they are not.
I made an extensive search through userspace code and I could not find
anything that would break because of the above change. If it turns out
I did break something it will take just the revert of a single change
to restore kernel_siginfo to the same size as userspace siginfo.
Testing did reveal dependencies on preferring the signo passed to
sigqueueinfo over si->signo, so bit the bullet and added the
complexity necessary to handle that case.
Testing also revealed bad things can happen if a negative signal
number is passed into the system calls. Something no sane application
will do but something a malicious program or a fuzzer might do. So I
have fixed the code that performs the bounds checks to ensure negative
signal numbers are handled"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits)
signal: Guard against negative signal numbers in copy_siginfo_from_user32
signal: Guard against negative signal numbers in copy_siginfo_from_user
signal: In sigqueueinfo prefer sig not si_signo
signal: Use a smaller struct siginfo in the kernel
signal: Distinguish between kernel_siginfo and siginfo
signal: Introduce copy_siginfo_from_user and use it's return value
signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE
signal: Fail sigqueueinfo if si_signo != sig
signal/sparc: Move EMT_TAGOVF into the generic siginfo.h
signal/unicore32: Use force_sig_fault where appropriate
signal/unicore32: Generate siginfo in ucs32_notify_die
signal/unicore32: Use send_sig_fault where appropriate
signal/arc: Use force_sig_fault where appropriate
signal/arc: Push siginfo generation into unhandled_exception
signal/ia64: Use force_sig_fault where appropriate
signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn
signal/ia64: Use the generic force_sigsegv in setup_frame
signal/arm/kvm: Use send_sig_mceerr
signal/arm: Use send_sig_fault where appropriate
signal/arm: Use force_sig_fault where appropriate
...
The only functional differences (modulo a few missing fixes in the arch
code) is that architectures without coherent caches need a hook to
convert a virtual or dma address into a pfn, given that we don't have
the kernel linear mapping available for the otherwise easy virt_to_page
call. As a side effect we can support mmap of the per-device coherent
area even on architectures not providing the callback, and we make
previous dangerous default methods dma_common_mmap actually save for
non-coherent architectures by rejecting it without the right helper.
In addition to that we need a hook so that some architectures can
override the protection bits when mmaping a dma coherent allocations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts
All the cache maintainance is already stubbed out when not enabled,
but merging the two allows us to nicely handle the case where
cache maintainance is required for some devices, but not others.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Paul Burton <paul.burton@mips.com> # MIPS parts
__GFP_HIGHMEM flag is cleared by upper layer functions
(in include/linux/dma-mapping.h) so we'll never get a
__GFP_HIGHMEM flag in arch_dma_alloc gfp argument.
That's why alloc_pages will never return highmem page
here.
Get rid of highmem pages handling and cleanup arch_dma_alloc
and arch_dma_free functions.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
So far the IOC treatment was global on ARC, being turned on (or off)
for all devices in the system. With this patch, this can now be done
per device using the "dma-coherent" DT property; IOW with this patch
we can use both HW-coherent and regular DMA peripherals simultaneously.
The changes involved are too many so enlisting the summary below:
1. common code calls ARC arch_setup_dma_ops() per device.
2. For coherent dma (IOC) it plugs in generic @dma_direct_ops which
doesn't need any arch specific backend: No need for any explicit
cache flushes or MMU mappings to provide for uncached access
- dma_(map|sync)_single* return early as corresponding dma ops callbacks
are NULL in generic code.
So arch_sync_dma_*() -> dma_cache_*() need not handle the coherent
dma case, hence drop ARC __dma_cache_*_ioc() which were no-op anyways
3. For noncoherent dma (non IOC) generic @dma_noncoherent_ops is used
which in turns calls ARC specific routines
- arch_dma_alloc() no longer checks for @ioc_enable since this is
called only for !IOC case.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: rewrote changelog]
Use new return type vm_fault_t for fault handler. For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno. Once all instances are converted, vm_fault_t will become a
distinct type.
Ref-> commit 1c8f422059 ("mm: change return type to vm_fault_t")
In this patch all the caller of handle_mm_fault() are changed to return
vm_fault_t type.
Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: David S. Miller <davem@davemloft.net>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix type warnings in arch/arc/mm/cache.c.
../arch/arc/mm/cache.c: In function 'flush_anon_page':
../arch/arc/mm/cache.c:1062:55: warning: passing argument 2 of '__flush_dcache_page' makes integer from pointer without a cast [-Wint-conversion]
__flush_dcache_page((phys_addr_t)page_address(page), page_address(page));
^~~~~~~~~~~~~~~~~~
../arch/arc/mm/cache.c:1013:59: note: expected 'long unsigned int' but argument is of type 'void *'
void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr)
~~~~~~~~~~~~~~^~~~~
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Elad Kanfi <eladkan@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Ofer Levi <oferle@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Check that SMP_CACHE_BYTES (and hence ARCH_DMA_MINALIGN) is larger
or equal to any cache line length by comparing it with values
previously read from ARC cache BCR registers.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ARC backend for dma_sync_single_for_(device|cpu) was broken as it was
not honoring the @dir argument and simply forcing it based on the call:
- arc_dma_sync_single_for_device(dir) assumed DMA_TO_DEVICE (cache wback)
- arc_dma_sync_single_for_cpu(dir) assumed DMA_FROM_DEVICE (cache inv)
This is not true given the DMA API programming model and has been
discussed here [1] in some detail.
Interestingly while the deficiency has been there forever, it only started
showing up after 4.17 dma common ops rework, commit a8eb92d02d
("arc: fix arc_dma_{map,unmap}_page") which wired up these calls under the
more commonly used dma_map_page API triggering the issue.
[1]: https://lkml.org/lkml/2018/5/18/979
Fixes: commit a8eb92d02d ("arc: fix arc_dma_{map,unmap}_page")
Cc: stable@kernel.org # v4.17+
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: reworked changelog]
Pull siginfo updates from Eric Biederman:
"This set of changes close the known issues with setting si_code to an
invalid value, and with not fully initializing struct siginfo. There
remains work to do on nds32, arc, unicore32, powerpc, arm, arm64, ia64
and x86 to get the code that generates siginfo into a simpler and more
maintainable state. Most of that work involves refactoring the signal
handling code and thus careful code review.
Also not included is the work to shrink the in kernel version of
struct siginfo. That depends on getting the number of places that
directly manipulate struct siginfo under control, as it requires the
introduction of struct kernel_siginfo for the in kernel things.
Overall this set of changes looks like it is making good progress, and
with a little luck I will be wrapping up the siginfo work next
development cycle"
* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
signal/sh: Stop gcc warning about an impossible case in do_divide_error
signal/mips: Report FPE_FLTUNK for undiagnosed floating point exceptions
signal/um: More carefully relay signals in relay_signal.
signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR}
signal: Remove unncessary #ifdef SEGV_PKUERR in 32bit compat code
signal/signalfd: Add support for SIGSYS
signal/signalfd: Remove __put_user from signalfd_copyinfo
signal/xtensa: Use force_sig_fault where appropriate
signal/xtensa: Consistenly use SIGBUS in do_unaligned_user
signal/um: Use force_sig_fault where appropriate
signal/sparc: Use force_sig_fault where appropriate
signal/sparc: Use send_sig_fault where appropriate
signal/sh: Use force_sig_fault where appropriate
signal/s390: Use force_sig_fault where appropriate
signal/riscv: Replace do_trap_siginfo with force_sig_fault
signal/riscv: Use force_sig_fault where appropriate
signal/parisc: Use force_sig_fault where appropriate
signal/parisc: Use force_sig_mceerr where appropriate
signal/openrisc: Use force_sig_fault where appropriate
signal/nios2: Use force_sig_fault where appropriate
...
Switch to the generic noncoherent direct mapping implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Alexey Brodkin <abrodkin@synopsys.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
These functions should perform the same cache synchronoization as calling
arc_dma_sync_single_for_{cpu,device} in addition to doing any required
address translation or mapping [1]. Ensure they actually do that by calling
arc_dma_sync_single_for_{cpu,device} instead of passing the dir argument
along to _dma_cache_sync.
The now unused _dma_cache_sync function is removed as well.
[1] in fact various drivers rely on that by passing DMA_ATTR_SKIP_CPU_SYNC
to the map/unmap routines and doing the cache synchronization manually.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Alexey Brodkin <abrodkin@synopsys.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
These functions should perform the same functionality as calling
arc_dma_sync_single_for_{cpu,device} on each S/G list element. Ensure
they actually do that by calling arc_dma_sync_single_for_{cpu,device}.
Otherwise we could be passing a different dir argument.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Alexey Brodkin <abrodkin@synopsys.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Remove the indirection through _dma_cache_sync. Also move the functions
up a bit in the source file as we'll need them in more places soon.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Alexey Brodkin <abrodkin@synopsys.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Call clear_siginfo to ensure every stack allocated siginfo is properly
initialized before being passed to the signal sending functions.
Note: It is not safe to depend on C initializers to initialize struct
siginfo on the stack because C is allowed to skip holes when
initializing a structure.
The initialization of struct siginfo in tracehook_report_syscall_exit
was moved from the helper user_single_step_siginfo into
tracehook_report_syscall_exit itself, to make it clear that the local
variable siginfo gets fully initialized.
In a few cases the scope of struct siginfo has been reduced to make it
clear that siginfo siginfo is not used on other paths in the function
in which it is declared.
Instances of using memset to initialize siginfo have been replaced
with calls clear_siginfo for clarity.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Thanks to commit 4b3ef9daa4 ("mm/swap: split swap cache into 64MB
trunks"), after swapoff the address_space associated with the swap
device will be freed. So page_mapping() users which may touch the
address_space need some kind of mechanism to prevent the address_space
from being freed during accessing.
The dcache flushing functions (flush_dcache_page(), etc) in architecture
specific code may access the address_space of swap device for anonymous
pages in swap cache via page_mapping() function. But in some cases
there are no mechanisms to prevent the swap device from being swapoff,
for example,
CPU1 CPU2
__get_user_pages() swapoff()
flush_dcache_page()
mapping = page_mapping()
... exit_swap_address_space()
... kvfree(spaces)
mapping_mapped(mapping)
The address space may be accessed after being freed.
But from cachetlb.txt and Russell King, flush_dcache_page() only care
about file cache pages, for anonymous pages, flush_anon_page() should be
used. The implementation of flush_dcache_page() in all architectures
follows this too. They will check whether page_mapping() is NULL and
whether mapping_mapped() is true to determine whether to flush the
dcache immediately. And they will use interval tree (mapping->i_mmap)
to find all user space mappings. While mapping_mapped() and
mapping->i_mmap isn't used by anonymous pages in swap cache at all.
So, to fix the race between swapoff and flush dcache, __page_mapping()
is add to return the address_space for file cache pages and NULL
otherwise. All page_mapping() invoking in flush dcache functions are
replaced with page_mapping_file().
[akpm@linux-foundation.org: simplify page_mapping_file(), per Mike]
Link: http://lkml.kernel.org/r/20180305083634.15174-1-ying.huang@intel.com
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- MCIP aka ARconnect fixes for SMP builds [Euginey]
- Preventive fix for SLC (L2 cache) flushing [Euginey]
- Kconfig default fix [Ulf Magnusson]
- trailing semicolon fixes [Luis de Bethencourt]
- other assorted minor fixes
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJamHh7AAoJEGnX8d3iisJeM+MQAL+cIkdRd9NJoPPL66IOgwmi
68jUkVQ8PSgR2P/sWvwIRTionOG9siu58Q1ygXpPR9SNnSJlyIYIW4Onuoajs+Ii
TmTwWm8jnKtuPtKMBep8XpYQQ+FRL6sDNn0QLjCnnvqAeE7rTFODSlpBR+jL6ur4
t2h1wHQem6oas1YRwnKrxxMnfV3KP3TkCS2a/lvmeAAt8xi+ll+9OnzVcSCj7pCJ
ZToenfH3UAbhAd5T7gWkJv2v00zbHNzKtpdluSW6WBybP1Ib2IxOxEiUlAvxQgpl
NctvS8Q1uveOPQZHO+fNXsJvf+imP2RdWh5RaOcmm8a4tp8jsR51BScX55usjr/z
ybg4bzHBV3x9YbZkkW9DKyF9eeZ7hWST8nNoebcHNVjboUeD+wgtV8e3Fvc6bIoo
/Xkw28y/mZwCs7mBfu1fNOIIiiZDSgz7YeeqzOPBzEYVcHE4VGINwX+9cLXPOsgz
rmI/Md3buSjrfXPnXTuk1R4fl0WKM5FT/2BJ+IbNU2w6VO1/iKqiBQREH5lBfYAI
BUikxKNwrJ9+zG8EXFUAEi6gn4dxosxMKJ04CTviwGFc1y6yNsmMgizevFpPUyx7
9o3z19hHUZLx94QsGHLI7QJe3Kawamu0tmdJvQBgB/eZ9lj6OdkzY48I2lgGezSw
00e+JPSy6EDi8YrlMki5
=+Bmc
-----END PGP SIGNATURE-----
Merge tag 'arc-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- MCIP aka ARconnect fixes for SMP builds [Euginey]
- preventive fix for SLC (L2 cache) flushing [Euginey]
- Kconfig default fix [Ulf Magnusson]
- trailing semicolon fixes [Luis de Bethencourt]
- other assorted minor fixes
* tag 'arc-4.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: setup cpu possible mask according to possible-cpus dts property
ARC: mcip: update MCIP debug mask when the new cpu came online
ARC: mcip: halt GFRC counter when ARC cores halt
ARCv2: boot log: fix HS48 release number
arc: dts: use 'atmel' as manufacturer for at24 in axs10x_mb
ARC: Fix malformed ARC_EMUL_UNALIGNED default
ARC: boot log: Fix trailing semicolon
ARC: dw2 unwind: Fix trailing semicolon
ARC: Enable fatal signals on boot for dev platforms
ARCv2: Don't pretend we may set L-bit in STATUS32 with kflag instruction
ARCv2: cache: fix slc_entire_op: flush only instead of flush-n-inv
slc_entire_op with OP_FLUSH command also invalidates it.
This is a preventive fix as the current use of slc_entire_op is only
with OP_FLUSH_N_INV where the invalidate is required.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
[vgupta: fixed changelog]
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
We always use the stub definitions, so remove the unused other code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
HS48 cpus will have a new MMUv5, although Linux is currently not
explicitly supporting the newer features (so remains at V4).
The existing software/hardware version check is very tight and causes
boot abort. Given that the MMUv5 hardware is backwards compatible,
relax the boot check to allow current kernel support level to work
with new hardware.
Also while at it, move the ancient MMU related code to under ARCompact
builds as baseline MMU for HS cpus is v4.
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
I recently came upon a scenario where I would get a double fault
machine check exception tiriggered by a kernel module.
However the ensuing crash stacktrace (ksym lookup) was not working
correctly.
Turns out that machine check auto-disables MMU while modules are allocated
in kernel vaddr spapce.
This patch re-enables the MMU before start printing the stacktrace
making stacktracing of modules work upon a fatal exception.
Cc: stable@kernel.org
Signed-off-by: Jose Abreu <joabreu@synopsys.com>
Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: moved code into low level handler to avoid in 2 places]
[Needed for HSDK]
Currently the first page of system (hence RAM base) is assumed to be
@ CONFIG_LINUX_LINK_BASE, where kernel itself is linked.
However is case of HSDK platform, for reasons explained in that patch,
this is not true. kernel needs to be linked @ 0x9000_0000 while DDR
is still wired at 0x8000_0000. To properly account for this 256M of RAM,
we need to introduce a new option and base page frame accountiing off of
it.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: renamed CONFIG_KERNEL_RAM_BASE_ADDRESS => CONFIG_LINUX_RAM_BASE
: simplified changelog]
[Needed for HSDK]
- Currently IOC base is hardcoded to 0x8000_0000 which is default value
of LINUX_LINK_BASE, but may not always be the case
- IOC programming model imposes the constraint that IOC aperture size
needs to be aligned to IOC base address, which we were not checking
so far.
Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: reworked the changelog]
Some of the boot printing code had printk() w/o explicit log level.
This patch introduces consistency allowing platforms to switch to less
verbose console logging using cmdline.
NPS400 with 4K CPUs needs to avoid the cpu info printing for faster
bootup.
Signed-off-by: Noam Camus <noamca@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Due to a HW bug in NPS400 we get from time to time false TLB miss.
Workaround this by validating each miss.
Signed-off-by: Noam Camus <noamca@mellanox.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
- PAE40 related updates
- SLC errata for region ops
- intc line masking by default
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZmw0DAAoJEGnX8d3iisJeoogP/R78jWBmVIRr7YBvOMEbNZAz
r5dJHnd2jCUtqQ/rmCndwzOqLv2j77RRdH3nlwfM7YaS8LZmK0Dkz9zUGReCalQU
z1sZfBSZuOfodWbDzfXdJVNEGGu8eSG7/s87E6K9UxOuaZJIPNMyU9qGxjDb3BTo
dzgNmT0xgiqZYtv9Y3uciPgkddJLOxE+eMuEpxzbzejLerUUc/jRV8m5qAL8ja9w
NanzLjo7Ec0FiczyYf1DtiONXBVl556IPQoFJtXIbsfZww8kJxFSZ+qemvmFXJOF
cxSfZeRBOCWV9mRW36kGAeKVE9EWqFHFn/UiCfvhTCpoFXPX63Hz+nVRLEE1lhrQ
ZaiQSuu0QgUkWP39ZpAPQjmdBlVxzv9Jsz5Dh72l3C00Hf9yw3jVCsKX/nZLpLM2
pS8pFVnJqttOLX/6wU1JjIQDhPvzqn0V21SwCXBt2DwyXd1zuce82ioPY7K2Uefc
4Unrso+YpIJNh8NlIe7Pvn8kEitNF7MViybofjhKPXlFXT4FqSJIV0Q/iq/L6lh8
RAfJO3GCQQymkB03aVmfRWq+xgCS0v3K2vP50T3+XEyix98ZwH+D4ViaXs9egmLk
EO323ebCKp8AJxICTme5qtmXs+k/CH+KeCzwSc90Mtf1SD3ohvyUeJvA5BGCCyP5
NP5sxiH9cNKwrMIBdvuE
=mOLm
-----END PGP SIGNATURE-----
Merge tag 'arc-4.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- PAE40 related updates
- SLC errata for region ops
- intc line masking by default
* tag 'arc-4.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
arc: Mask individual IRQ lines during core INTC init
ARCv2: PAE40: set MSB even if !CONFIG_ARC_HAS_PAE40 but PAE exists in SoC
ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses
ARC: dma: implement dma_unmap_page and sg variant
ARCv2: SLC: Make sure busy bit is set properly for region ops
ARC: [plat-sim] Include this platform unconditionally
ARC: [plat-axs10x]: prepare dts files for enabling PAE40 on axs103
ARC: defconfig: Cleanup from old Kconfig options
PAE40 confiuration in hardware extends some of the address registers
for TLB/cache ops to 2 words.
So far kernel was NOT setting the higher word if feature was not enabled
in software which is wrong. Those need to be set to 0 in such case.
Normally this would be done in the cache flush / tlb ops, however since
these registers only exist conditionally, this would have to be
conditional to a flag being set on boot which is expensive/ugly -
specially for the more common case of PAE exists but not in use.
Optimize that by zero'ing them once at boot - nobody will write to
them afterwards
Cc: stable@vger.kernel.org #4.4+
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
It is necessary to explicitly set both SLC_AUX_RGN_START1 and SLC_AUX_RGN_END1
which hold MSB bits of the physical address correspondingly of region start
and end otherwise SLC region operation is executed in unpredictable manner
Without this patch, SLC flushes on HSDK (IOC disabled) were taking
seconds.
Cc: stable@vger.kernel.org #4.4+
Reported-by: Vladimir Kondratiev <vladimir.kondratiev@intel.com>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
[vgupta: PAR40 regs only written if PAE40 exist]
c70c473396 "ARCv2: SLC: Make sure busy bit is set properly on SLC flushing"
fixes problem for entire SLC operation where the problem was initially
caught. But given a nature of the issue it is perfectly possible for
busy bit to be read incorrectly even when region operation was started.
So extending initial fix for regional operation as well.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: stable@vger.kernel.org #4.10
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Christoph noticed [1] that default DMA pool in current form overload
the DMA coherent infrastructure. In reply, Robin suggested [2] to
split the per-device vs. global pool interfaces, so allocation/release
from default DMA pool is driven by dma ops implementation.
This patch implements Robin's idea and provide interface to
allocate/release/mmap the default (aka global) DMA pool.
To make it clear that existing *_from_coherent routines work on
per-device pool rename them to *_from_dev_coherent.
[1] https://lkml.org/lkml/2017/7/7/370
[2] https://lkml.org/lkml/2017/7/7/431
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Suggested-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Andras Szemzo <sza@esh.hu>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.
This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.
Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.
One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).
Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.
Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- AXS10x platform clk updates for I2S, PGU
- Adding region based cache flush operation for ARCv2 cores
- Enforcing PAE40 dependency on HIGHMEM
- ptrace support for additional regs in ARCv2 cores
- Fix build failure in linux-next dut to a header include ordering change
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZEeokAAoJEGnX8d3iisJe/zgP/jpqauxaulNK4BvjJ3pI0+Fn
Pgz5Mgd/S7MzmiDPsb3H0YSWQMlxcSop+EPk8V4jXcXyTuDDaC7NzWLuKau2rjiB
jdbTycsJ4xSS5A8NrnG6AeE8SYAubJmtEwD8JASx+trtLGGHCKdTsRIOQpm5faUn
W2IFaXs9Jof5ZYL5btOzmIOOTxdR6f3Aegq/Zqb6rxqD8ZmtVapjPdIlqQEB/xw4
JSbTH9+HxI1+vpzCuOb1Ba4sglfFG9kH0xxWvMN5HuoYsdXG5KfjoXVsxUIEl/RC
m+bNdDOg5nlCS0U5PngnJnm5beBsAS8BApTTTM3ALNWGrsn+a6/1PraLsIgfa/yI
XgcXp80/rGiFqAA3+fUw+ly1oJb06V9p4SRCOaOTErqcxofS4upbJ7oTu5z0HwMV
HM2aw8vxZ2j4Wqrl5U3fGE9KvifUMJKuhcumpzfcmDsTtklCis4z13nbF0texNF2
tj0yqxYlOSdx4PoVzQG9Audf0sDeKUJ6Of+aGAXW55Zwq+QQJoCCzMSvTEV/xH4P
RUIFkkhCQzshVmUDgHnWAjccrxz/LUrO1zLpwmOD0QWjfJHhyhegrlXGuKxNiRH5
rijMkOIe+Z8yG/UKqW0nvtB9svS0y9vqh3kEG1DJIRnNg7U2Fmk7rB404fGuvh4B
T8uxDhsWP60hMZX1rw6h
=zJ9B
-----END PGP SIGNATURE-----
Merge tag 'arc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC updates from Vineet Gupta:
- AXS10x platform clk updates for I2S, PGU
- add region based cache flush operation for ARCv2 cores
- enforce PAE40 dependency on HIGHMEM
- ptrace support for additional regs in ARCv2 cores
- fix build failure in linux-next dut to a header include ordering
change
* tag 'arc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
Revert "ARCv2: Allow enabling PAE40 w/o HIGHMEM"
ARC: mm: fix build failure in linux-next for UP builds
ARCv2: ptrace: provide regset for accumulator/r30 regs
elf: Add ARCv2 specific core note section
ARCv2: mm: micro-optimize region flush generated code
ARCv2: mm: Merge 2 updates to DC_CTRL for region flush
ARCv2: mm: Implement cache region flush operations
ARC: mm: Move full_page computation into cache version agnostic wrapper
arc: axs10x: Fix ARC PGU default clock frequency
arc: axs10x: Add DT bindings for I2S audio playback