This reverts commits a50915394f and
d7c3b937bd.
This is a revert of a revert of a revert. In addition, it reverts the
even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the
original commits in linux-next.
It turns out that the original patch really was bogus, and that the
original revert was the correct thing to do after all. We thought we
had fixed the problem, and then reverted the revert, but the problem
really is fundamental: waking up kswapd simply isn't the right thing to
do, and direct reclaim sometimes simply _is_ the right thing to do.
When certain allocations fail, we simply should try some direct reclaim,
and if that fails, fail the allocation. That's the right thing to do
for THP allocations, which can easily fail, and the GPU allocations want
to do that too.
So starting kswapd is sometimes simply wrong, and removing the flag that
said "don't start kswapd" was a mistake. Let's hope we never revisit
this mistake again - and certainly not this many times ;)
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It apepars that this patch was innocent, and we hope that "mm: avoid
waking kswapd for THP allocations when compaction is deferred or
contended" will fix the final kswapd-spinning cause.
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With "mm: vmscan: scale number of pages reclaimed by reclaim/compaction
based on failures" reverted, Zdenek Kabelac reported the following
Hmm, so it's just took longer to hit the problem and observe
kswapd0 spinning on my CPU again - it's not as endless like before -
but still it easily eats minutes - it helps to turn off Firefox
or TB (memory hungry apps) so kswapd0 stops soon - and restart
those apps again. (And I still have like >1GB of cached memory)
kswapd0 R running task 0 30 2 0x00000000
Call Trace:
preempt_schedule+0x42/0x60
_raw_spin_unlock+0x55/0x60
put_super+0x31/0x40
drop_super+0x22/0x30
prune_super+0x149/0x1b0
shrink_slab+0xba/0x510
The sysrq+m indicates the system has no swap so it'll never reclaim
anonymous pages as part of reclaim/compaction. That is one part of the
problem but not the root cause as file-backed pages could also be
reclaimed.
The likely underlying problem is that kswapd is woken up or kept awake
for each THP allocation request in the page allocator slow path.
If compaction fails for the requesting process then compaction will be
deferred for a time and direct reclaim is avoided. However, if there
are a storm of THP requests that are simply rejected, it will still be
the the case that kswapd is awake for a prolonged period of time as
pgdat->kswapd_max_order is updated each time. This is noticed by the
main kswapd() loop and it will not call kswapd_try_to_sleep(). Instead
it will loopp, shrinking a small number of pages and calling
shrink_slab() on each iteration.
The temptation is to supply a patch that checks if kswapd was woken for
THP and if so ignore pgdat->kswapd_max_order but it'll be a hack and not
backed up by proper testing. As 3.7 is very close to release and this
is not a bug we should release with, a safer path is to revert "mm:
remove __GFP_NO_KSWAPD" for now and revisit it with the view to ironing
out the balance_pgdat() logic in general.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When transparent huge pages were introduced, memory compaction and swap
storms were an issue, and the kernel had to be careful to not make THP
allocations cause pageout or compaction.
Now that we have working compaction deferral, kswapd is smart enough to
invoke compaction and the quadratic behaviour around isolate_free_pages
has been fixed, it should be safe to remove __GFP_NO_KSWAPD.
[minchan@kernel.org: Comment fix]
[mgorman@suse.de: Avoid direct reclaim for deferred compaction]
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mtd_read_oob() has some unexpected similarities to mtd_read(). For
instance, when ops->datbuf != NULL, nand_base.c might return max_bitflips;
however, when ops->datbuf == NULL, nand_base's code potentially could
return -EUCLEAN (no in-tree drivers do this yet). In any case where the
driver might return max_bitflips, we should translate this into an
appropriate return code using the bitflip_threshold.
Essentially, mtd_read_oob() duplicates the logic from mtd_read().
This prevents users of mtd_read_oob() from receiving a positive return
value (i.e., from max_bitflips) and interpreting it as an unknown error.
Artem: amend comments.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Reviewed-by: Mike Dunn <mikedunn@newsguy.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
mtd_read_oob() will be expanded a little, so don't leave it in the header
as a static inline function.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The drivers' _read() method, absent an error, returns a non-negative integer
indicating the maximum number of bit errors that were corrected in any one
region comprising an ecc step. MTD returns -EUCLEAN if this is >=
bitflip_threshold, 0 otherwise. If bitflip_threshold is zero, the comparison is
not made since these devices lack ECC and always return zero in the non-error
case (thanks Brian)¹. Note that this is a subtle change to the driver
interface.
This and the preceding patches in this set were tested with ubi on top of the
nandsim and docg4 devices, running the ubi test io_basic from mtd-utils.
¹ http://lists.infradead.org/pipermail/linux-mtd/2012-March/040468.html
Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Brian Norris <computersforpeace@gmail.com>
Ivan Djelic <ivan.djelic@parrot.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
An element 'bitflip_threshold' is added to struct mtd_info, and also exposed as
a read/write variable in sysfs. This will be used to determine whether or not
mtd_read() returns -EUCLEAN or 0 (absent a hard error). If the driver leaves it
as zero, mtd will set it to a default value of ecc_strength.
This v2 adds the line that propagates bitflip_threshold from the master to the
partitions - thanks Ivan¹.
¹ http://lists.infradead.org/pipermail/linux-mtd/2012-April/040900.html
Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
ecc_strength element of struct mtd_info is exposed as a read-only variable in
sysfs.
Signed-off-by: Mike Dunn <mikedunn@newsguy.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Initialization of 'erase_info->fail_addr' to MTD_FAIL_ADDR_UNKNOWN prior
erase operation is duplicated accross several MTD drivers, and also taken
care of by some MTD users as well.
Harmonize it: initialize 'fail_addr' within 'mtd_erase()' interface.
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch changes all the OTP functions like 'mtd_get_fact_prot_info()' and
makes them return zero immediately if the input 'len' parameter is 0. This is
not really needed currently, but most of the other functions do this, and it is
just consistent to do the same in the OTP functions.
This patch also moves the OTP functions from the header file to mtdcore.c
because they become a bit too big for being inlined.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
In many places in drivers we verify for the zero length, but this is very
inconsistent across drivers. This is obviously the right thing to do, though.
This patch moves the check to the MTD API functions instead and removes a lot
of duplication.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Reviewed-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Some MTD drivers return -EINVAL if the 'phys' parameter is not NULL, trying to
convey that they cannot return the physical address. However, this is not very
logical because they still can return the virtual address ('virt'). But some
drivers (lpddr) just ignore the 'phys' parameter instead, which is a more
logical thing to do.
Let's harmonize this and:
1. Always initialize 'virt' and 'phys' to 'NULL' in 'mtd_point()'.
2. Do not return an error if the physical address cannot be found.
So as a result, all drivers will set 'phys' to 'NULL' if it is not supported.
None of the 'mtd_point()' users use 'phys' anyway, so this should not break
anything. I guess we could also just delete this parameter later.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The MTD API function now zero the 'retlen' parameter before calling
the driver's method — do not do this again in drivers. This removes
duplicated '*retlen = 0' assignent from the following methods:
'mtd_point()'
'mtd_read()'
'mtd_write()'
'mtd_writev()'
'mtd_panic_write()'
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Many drivers check whether the partition is R/O and return -EROFS if yes.
Let's stop having duplicated checks and move them to the API functions
instead.
And again a bit of noise - deleted few too sparse newlines, sorry.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Add verification of the offset and length to MTD API functions and verify that
MTD device offset and length are within MTD device size.
The modified API functions are:
'mtd_erase()'
'mtd_point()'
'mtd_unpoint()'
'mtd_get_unmapped_area()'
'mtd_read()'
'mtd_write()'
'mtd_panic_write()'
'mtd_lock()'
'mtd_unlock()'
'mtd_is_locked()'
'mtd_block_isbad()'
'mtd_block_markbad()'
This patch also uninlines these functions and exports in mtdcore.c because they
are not performance-critical and do not have to be inlined.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
We don't need to to check for mtd->resume before calling mtd_resume().
mtd_resume() should take care of that.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch renames all MTD functions by adding a "_" prefix:
mtd->erase -> mtd->_erase
mtd->read_oob -> mtd->_read_oob
...
The reason is that we are re-working the MTD API and from now on it is
an error to use MTD function pointers directly - we have a corresponding
API call for every pointer. By adding a leading "_" we achieve the following:
1. Make sure we convert every direct pointer users
2. A leading "_" suggests that this interface is internal and it becomes
less likely that people will use them directly
3. Make sure all the out-of-tree modules stop compiling and the owners
spot the big API change and amend them.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Fix the following build warning:
drivers/mtd/mtdcore.c: In function ‘mtd_release’:
drivers/mtd/mtdcore.c:110: warning: unused variable ‘mtd’
This happens when neither CONFIG_MTD_CHAR nor CONFIG_MTD_CHAR_MODULE are defined.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Commits 3fe4bae884 and
079c985e7a broke MTD suspend in 2 ways:
1. When the '->suspend' method is not present, we return -EOPNOTSUPP, but
the callers of 'mtd_suspend()' expects 0 instead.
2. Checking of the 'mtd' parameter against NULL has been incorrectly removed
in 'mtd_cls_suspend()'.
This patch fixes the breakages. This has been found, analyzed, reported
and tested by Rafael J. Wysocki <rjw@sisk.pl>.
Note, this patch is not needed in the stable tree because it causes a
regression introduced during the v3.3 merge window.
Reported-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Rafael J. Wysocki <rjw@sisk.pl>
Tested-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Just call the 'mtd_suspend()' and 'mtd_resume()' - they will do nothing
if the operation is not defined.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Instead, call the corresponding MTD API function which will return
'-EOPNOTSUPP' if the operation is not supported.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
This patch makes the 'mtd_writev()' function more usable and logical. We first
teach it to fall-back to the 'default_mtd_writev()' function if the MTD driver
does not define its own '->writev()' method. Then we make block2mtd and JFFS2
just 'mtd_writev()' instead of 'default_mtd_writev()' function. This means we
can now stop exporting 'default_mtd_writev()' and instead, export
'mtd_writev()'. This is much cleaner and more logical, as well as allows us to
get read of another direct 'mtd->writev' access in JFFS2.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The mtdcore.c file is a bit inconsistent - some EXPORT_SYMBOL_GPL declarations
follow the corresponding functions, some are placed at the end. This patch
harmonizes the file so that EXPORT_SYMBOL_GPL declarations follow the
corresponding function.
It also removes few extra newlines and trailing white-spaces.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
1. Teach 'mtd_write()' function to return '-EROFS' if the write method
is undefined, and remove the corresponding check from
'default_mtd_writev()'.
2. Do not test 'retlen' for NULL - it cannot be NULL.
3. Few minor coding stile clean-ups.
4. Add a kerneldoc comment
Additionally, minor fixes to the kerneldoc comments of the neighbor function.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
... since it is not needed because the generic 'dev_get_drvdata()' can be
used instead.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The code has the check for parts but it called after kmemdup,
kmemdup(parts, sizeof(*parts) * nr_parts,...)
if (!parts)
return -ENOMEM
In fact, we need check parts before safely using it.
and we also need check the real_parts to make sure kmemdup
allocation sucessfully.
Signed-off-by: Jason Liu <jason.hui@linaro.org>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
Start moving away from the MTD_DEBUG_LEVEL messages. The dynamic
debugging feature is a generic kernel feature that provides more
flexibility.
(See Documentation/dynamic-debug-howto.txt)
Also fix some punctuation, indentation, and capitalization that went
along with the affected lines.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
mtd_device_register() is a limited version of mtd_device_parse_register.
Replace it with macro calling mtd_device_parse_register().
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Encapsulate last MTD partition parser argument into a separate
structure. Currently it holds only 'origin' field for RedBoot parser,
but will be extended in future to contain at least device_node for OF
devices.
Amended commentary to make kerneldoc happy
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Artem Bityutskiy <dedekind1@gmail.com>
Lots (nearly all) mtd drivers contain nearly the similar code that
calls parse_mtd_partitions, provides some platform-default values, if
parsing fails, and registers mtd device.
This is an aim to provide single implementation of this scenario:
mtd_device_parse_register() which will handle all this parsing and
defaults.
Artem: amended comments
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
These symbols are replaced with mtd_device_register() (and removal with
mtd_device_unregister()) for public registration.
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
To prepare for the removal of add_mtd_device and add_mtd_partitions(),
introduce mtd_device_register(). This will create partitions if they
are supplied or register the whole device if there are no partitions.
Once all drivers are converted to use mtd_device_register(),
add_mtd_device() and add_mtd_partitions() will be made internal only.
v2: move kerneldoc to implementation file and fixup some kerneldoc
warnings.
Artem: tweak comments: remove junk tabs, use dots consistently.
Signed-off-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Remove the spare spaces in the head of the lines.
Signed-off-by: Wanlong Gao <wanlong.gao@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
'get_mtd_device_nm()' has a piece of code which equivalent to what
'__get_mtd_device()' does - remove this duplicated code and use
''__get_mtd_device()' instead.
Artem: changed commit message.
Artem: while on it, remove an unnecessary extra empty line
Signed-off-by: Wanlong Gao <wanlong.gao@gmail.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
->read_proc interface is going away, switch to seq_file.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Introduce a common function to handle large, contiguous kmalloc buffer
allocations by exponentially backing off on the size of the requested
kernel transfer buffer until it succeeds or until the requested
transfer buffer size falls below the page size.
This helps ensure the operation can succeed under low-memory, highly-
fragmented situations albeit somewhat more slowly.
Artem: so this patch solves the problem that the kernel tries to kmalloc too
large buffers, which (a) may fail and does fail - people complain about this,
and (b) slows down the system in case of high memory fragmentation, because
the kernel starts dropping caches, writing back, swapping, etc. But we do not
really have to allocate a lot of memory to do the I/O, we may do this even with
as little as one min. I/O unit (NAND page) of RAM. So the idea of this patch is
that if the user asks to read or write a lot, we try to kmalloc a lot, with GFP
flags which make the kernel _not_ drop caches, etc. If we can allocate it - good,
if not - we try to allocate twice as less, and so on, until we reach the min.
I/O unit size, which is our last resort allocation and use the normal
GFP_KERNEL flag.
Artem: re-write the allocation function so that it makes sure the allocated
buffer is aligned to the min. I/O size of the flash.
Signed-off-by: Grant Erickson <marathon96@gmail.com>
Tested-by: Ben Gardiner <bengardiner@nanometrics.ca>
Tested-by: Stefano Babic <sbabic@denx.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
The three backing_dev_info symbols are only used in this file and
should be static.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Implicit slab.h inclusion via percpu.h is about to go away. Make sure
gfp.h or slab.h is included as necessary.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
They will be holding dirty inodes and be responsible for flushing
them out, so they need to be setup properly.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Removes one .h and one .c file that are never used outside of
mtdcore.c.
Signed-off-by: Joern Engel <joern@logfs.org>
Edited to remove on leftover debug define.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>