Currently a call to dma_pool_alloc() with a ___GFP_ZERO flag returns a
non-zeroed memory region.
This patchset adds support for the __GFP_ZERO flag to dma_pool_alloc(),
adds 2 wrapper functions for allocing zeroed memory from a pool, and
provides a coccinelle script for finding & replacing instances of
dma_pool_alloc() followed by memset(0) with a single dma_pool_zalloc()
call.
There was some concern that this always calls memset() to zero, instead
of passing __GFP_ZERO into the page allocator.
[https://lkml.org/lkml/2015/7/15/881]
I ran a test on my system to get an idea of how often dma_pool_alloc()
calls into pool_alloc_page().
After Boot: [ 30.119863] alloc_calls:541, page_allocs:7
After an hour: [ 3600.951031] alloc_calls:9566, page_allocs:12
After copying 1GB file onto a USB drive:
[ 4260.657148] alloc_calls:17225, page_allocs:12
It doesn't look like dma_pool_alloc() calls down to the page allocator
very often (at least on my system).
This patch (of 4):
Currently the __GFP_ZERO flag is ignored by dma_pool_alloc().
Make dma_pool_alloc() zero the memory if this flag is set.
Signed-off-by: Sean O. Stalley <sean.stalley@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Gilles Muller <Gilles.Muller@lip6.fr>
Cc: Nicolas Palix <nicolas.palix@imag.fr>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dma_pool_destroy() does not tolerate a NULL dma_pool pointer argument and
performs a NULL-pointer dereference. This requires additional attention
and effort from developers/reviewers and forces all dma_pool_destroy()
callers to do a NULL check
if (pool)
dma_pool_destroy(pool);
Or, otherwise, be invalid dma_pool_destroy() users.
Tweak dma_pool_destroy() and NULL-check the pointer there.
Proposed by Andrew Morton.
Link: https://lkml.org/lkml/2015/6/8/583
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes the function is_page_busy() return bool rather then an int now
due to this particular function's single return statement only ever
evaulating to either one or zero.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove 3 brace coding style for any arm of this statement
Signed-off-by: Paul McQuade <paulmcquad@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
cat /sys/.../pools followed by removal the device leads to:
|======================================================
|[ INFO: possible circular locking dependency detected ]
|3.17.0-rc4+ #1498 Not tainted
|-------------------------------------------------------
|rmmod/2505 is trying to acquire lock:
| (s_active#28){++++.+}, at: [<c017f754>] kernfs_remove_by_name_ns+0x3c/0x88
|
|but task is already holding lock:
| (pools_lock){+.+.+.}, at: [<c011494c>] dma_pool_destroy+0x18/0x17c
|
|which lock already depends on the new lock.
|the existing dependency chain (in reverse order) is:
|
|-> #1 (pools_lock){+.+.+.}:
| [<c0114ae8>] show_pools+0x30/0xf8
| [<c0313210>] dev_attr_show+0x1c/0x48
| [<c0180e84>] sysfs_kf_seq_show+0x88/0x10c
| [<c017f960>] kernfs_seq_show+0x24/0x28
| [<c013efc4>] seq_read+0x1b8/0x480
| [<c011e820>] vfs_read+0x8c/0x148
| [<c011ea10>] SyS_read+0x40/0x8c
| [<c000e960>] ret_fast_syscall+0x0/0x48
|
|-> #0 (s_active#28){++++.+}:
| [<c017e9ac>] __kernfs_remove+0x258/0x2ec
| [<c017f754>] kernfs_remove_by_name_ns+0x3c/0x88
| [<c0114a7c>] dma_pool_destroy+0x148/0x17c
| [<c03ad288>] hcd_buffer_destroy+0x20/0x34
| [<c03a4780>] usb_remove_hcd+0x110/0x1a4
The problem is the lock order of pools_lock and kernfs_mutex in
dma_pool_destroy() vs show_pools() call path.
This patch breaks out the creation of the sysfs file outside of the
pools_lock mutex. The newly added pools_reg_lock ensures that there is no
race of create vs destroy code path in terms whether or not the sysfs file
has to be deleted (and was it deleted before we try to create a new one)
and what to do if device_create_file() failed.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dma_pool_create() needs to unlock the mutex in error case. The bug was
introduced in the 3.16 by commit cc6b664aa2 ("mm/dmapool.c: remove
redundant NULL check for dev in dma_pool_create()")/
Signed-off-by: Krzysztof Hałasa <khc@piap.pl>
Cc: stable@vger.kernel.org # v3.16
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of calling an additional routine in dmam_pool_destroy() rely on
what dmam_pool_release() is doing.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
"dev" cannot be NULL because it is already checked before calling
dma_pool_create().
If dev ever was NULL, the code would oops in dev_to_node() after enabling
CONFIG_NUMA.
It is possible that some driver is using dev==NULL and has never been run
on a NUMA machine. Such a driver is probably outdated, possibly buggy and
will need some attention if it starts triggering NULL derefs.
Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This can help to catch the case where hardware is writing after dma free.
[akpm@linux-foundation.org: tidy code, fix comment, use sizeof(page->offset), use pr_err()]
Signed-off-by: Matthieu Castet <matthieu.castet@parrot.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dmapool always calls dma_alloc_coherent() with GFP_ATOMIC flag,
regardless the flags provided by the caller. This causes excessive
pruning of emergency memory pools without any good reason. Additionaly,
on ARM architecture any driver which is using dmapools will sooner or
later trigger the following error:
"ERROR: 256 KiB atomic DMA coherent pool is too small!
Please increase it with coherent_pool= kernel parameter!".
Increasing the coherent pool size usually doesn't help much and only
delays such error, because all GFP_ATOMIC DMA allocations are always
served from the special, very limited memory pool.
This patch changes the dmapool code to correctly use gfp flags provided
by the dmapool caller.
Reported-by: Soeren Moch <smoch@web.de>
Reported-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Soeren Moch <smoch@web.de>
Cc: stable@vger.kernel.org
The removal of the implicitly everywhere module.h and its child includes
will reveal this implicit stat.h usage:
mm/dmapool.c:108: error: ‘S_IRUGO’ undeclared here (not in a function)
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
The files changed within are only using the EXPORT_SYMBOL
macro variants. They are not using core modular infrastructure
and hence don't need module.h but only the export.h header.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
devres uses the pointer value as key after it's freed, which is safe but
triggers spurious use-after-free warnings on some static analysis tools.
Rearrange code to avoid such warnings.
Signed-off-by: Maxin B. John <maxin.john@gmail.com>
Reviewed-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As it stands this code will degenerate into a busy-wait if the calling task
has signal_pending().
Cc: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dma_pool_free() scans for the page to free in the pool list holding the
pool lock. Then it releases the lock basically to acquire it immediately
again. Modify the code to only take the lock once.
This will do some additional loops and computations with the lock held in
if memory debugging is activated. If it is not activated the only new
operations with this lock is one if and one substraction.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Buggy drivers (e.g. fsl_udc) could call dma_pool_alloc from atomic
context with GFP_KERNEL. In most instances, the first pool_alloc_page
call would succeed and the sleeping functions would never be called. This
allowed the buggy drivers to slip through the cracks.
Add a might_sleep_if() checking for __GFP_WAIT in flags.
Signed-off-by: Dima Zavin <dima@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show_pools() walks the page_list of a pool w/o protection against the list
modifications in alloc/free. Take pool->lock to avoid stomping into
nirvana.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Previously it was only enabled for CONFIG_DEBUG_SLAB.
Not hooked into the slub runtime debug configuration, so you currently only
get it with CONFIG_SLUB_DEBUG_ON, not plain CONFIG_SLUB_DEBUG
Acked-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The previous implementation simply refused to allocate more than a
boundary's worth of data from an entire page. Some users didn't know
this, so specified things like SMP_CACHE_BYTES, not realising the
horrible waste of memory that this was. It's fairly easy to correct
this problem, just by ensuring we don't cross a boundary within a page.
This even helps drivers like EHCI (which can't cross a 4k boundary)
on machines with larger page sizes.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
Use a list of free blocks within a page instead of using a bitmap.
Update documentation to reflect this. As well as being a slight
reduction in memory allocation, locked ops and lines of code, it speeds
up a transaction processing benchmark by 0.4%.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
We were missing a copyright statement and license, so add GPLv2, David
Brownell's copyright and my copyright.
The asm/io.h include was superfluous, but we were missing a few other
necessary includes.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Check that 'align' is a power of two, like the API specifies.
Align 'size' to 'align' correctly -- the current code has an off-by-one.
The ALIGN macro in kernel.h doesn't.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>
With one trivial change (taking the lock slightly earlier on wakeup
from schedule), all uses of the waitq are under the pool lock, so we
can use the locked (or __) versions of the wait queue functions, and
avoid the extra spinlock.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Acked-by: David S. Miller <davem@davemloft.net>