The __SWAB_64_THRU_32__ case of a 64-bit byte swap was depending on the
no-longer-existant ___swab32() method (three underscores). We got rid
of some of the worst indirection and complexity, and now it should just
use the 32-bit swab function that was defined right above it.
Reported-and-tested-by: Nicolas Pitre <nico@cam.org>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This implementation caused problems in userspace which can, and does
define _both_ __LITTLE_ENDIAN and __BIG_ENDIAN.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The first step to make swab.h a regular header that will
include an asm/swab.h with arch overrides.
Avoid the gratuitous differences introduced in the new
linux/swab.h by naming the ___constant_swabXX bits and
__fswabXX bits exactly as found in the old implementation
in byteorder/swab[b].h
Use this new swab.h in byteorder/[big|little]_endian.h and
remove the two old swab headers.
Although the inclusion of asm/byteorder.h looks strange in
linux/swab.h, this will allow each arch to move the actual
arch overrides for the swab bits in an asm file and then
the includes can be cleaned up without requiring a flag day
for all arches at once.
Keep providing __fswabXX in case some userspace was using them
directly, but the revised __swabXX should be used instead in
any new code and will always do constant folding not dependent
on the optimization level, which means the __constant versions
can be phased out in-kernel.
Arches that use the old-style arch macros will lose their
optimized versions until they move to the new style, but at
least they will still compile. Many arches have already moved
and the patches to move the remaining arches are trivial.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (29 commits)
Input: i8042 - add Dell Vostro 1510 to nomux list
Input: gtco - use USB endpoint API
Input: add support for Maple controller as a joystick
Input: atkbd - broaden the Dell DMI signatures
Input: HIL drivers - add MODULE_ALIAS()
Input: map_to_7segment.h - convert to __inline__ for userspace
Input: add support for enhanced rotary controller on pxa930 and pxa935
Input: add support for trackball on pxa930 and pxa935
Input: add da9034 touchscreen support
Input: ads7846 - strict_strtoul takes unsigned long
Input: make some variables and functions static
Input: add tsc2007 based touchscreen driver
Input: psmouse - add module parameters to control OLPC touchpad delays
Input: i8042 - add Gigabyte M912 netbook to noloop exception table
Input: atkbd - Samsung NC10 key repeat fix
Input: atkbd - add keyboard quirk for HP Pavilion ZV6100 laptop
Input: libps2 - handle 0xfc responses from devices
Input: add support for Wacom W8001 penabled serial touchscreen
Input: synaptics - report multi-taps only if supported by the device
Input: add joystick driver for Walkera WK-0701 RC transmitter
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #3]
Revert "CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #2]"
SELinux: shrink sizeof av_inhert selinux_class_perm and context
CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #2]
keys: fix sparse warning by adding __user annotation to cast
smack: Add support for unlabeled network hosts and networks
selinux: Deprecate and schedule the removal of the the compat_net functionality
netlabel: Update kernel configuration API
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: fix section mismatch
sched: fix double kfree in failure path
sched: clean up arch_reinit_sched_domains()
sched: mark sched_create_sysfs_power_savings_entries() as __init
getrusage: RUSAGE_THREAD should return ru_utime and ru_stime
sched: fix sched_slice()
sched_clock: prevent scd->clock from moving backwards, take #2
sched: sched.c declare variables before they get used
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: provide irq_to_desc() to non-genirq architectures too
* 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: fix rcutorture bug
rcu: eliminate synchronize_rcu_xxx macro
rcu: make treercu safe for suspend and resume
rcu: fix rcutree grace-period-latency bug on small systems
futex: catch certain assymetric (get|put)_futex_key calls
futex: make futex_(get|put)_key() calls symmetric
locking, percpu counters: introduce separate lock classes
swiotlb: clean up EXPORT_SYMBOL usage
swiotlb: remove unnecessary declaration
swiotlb: replace architecture-specific swiotlb.h with linux/swiotlb.h
swiotlb: add support for systems with highmem
swiotlb: store phys address in io_tlb_orig_addr array
swiotlb: add hwdev to swiotlb_phys_to_bus() / swiotlb_sg_to_bus()
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ASoC: Fix SND_SOC_ALL_CODECS handling of dual SPI and I2C control buses
ASoC: Use snd_soc_dapm_nc_pin() in at91sam9g20ek
ASoC: TWL4030: Convert the bitfield enums to VALUE_ENUM type
ASoC: New enum type: value_enum
pxa2xx-ac97: switch AC unit to correct state before probing
ASoC: Clocking fixes for davinci-evm.c
ASoC: Mark non-connected TWL4030 pins for pandora
ASoC: OMAP: Select OMAP pin multiplexing when using Nokia N810 ASoC drivers
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: clean up annotations of fc->lock
fuse: fix sparse warning in ioctl
fuse: update interface version
fuse: add fuse_conn->release()
fuse: separate out fuse_conn_init() from new_conn()
fuse: add fuse_ prefix to several functions
fuse: implement poll support
fuse: implement unsolicited notification
fuse: add file kernel handle
fuse: implement ioctl support
fuse: don't let fuse_req->end() put the base reference
fuse: move FUSE_MINOR to miscdevice.h
fuse: style fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (41 commits)
scc_pata: make use of scc_dma_sff_read_status()
ide-dma-sff: factor out ide_dma_sff_write_status()
ide: move read_sff_dma_status() method to 'struct ide_dma_ops'
ide: don't set hwif->dma_ops in init_dma() method
Resurrect IT8172 IDE controller driver
piix: sync ich_laptop[] with ata_piix.c
ide: update warm-plug HOWTO
ide: fix ide_port_scan() to do ACPI setup after initializing request queues
ide: remove now redundant ->cur_dev checks
ide: remove unused ide_hwif_t.sg_mapped field
ide: struct ide_atapi_pc - remove unused fields and update documentation
ide: remove superfluous hwif variable assignment from ide_timer_expiry()
ide: use ide_pci_is_in_compatibility_mode() helper in setup-pci.c
ide: make "paranoia" ->handler check in ide_intr() more strict
ide-cd: convert to ide-atapi facilities
ide-cd: start DMA before sending the actual packet command
ide-cd: wait for DRQ to get set per default
ide: Fix drive's DWORD-IO handling
ide: add port and host iterators
ide: dynamic allocation of device structures
...
No one cares do_coredump()'s return value, and also it seems that it
is also not necessary. So make it void.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: WANG Cong <wangcong@zeuux.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove excess kernel-doc notation from rio header and driver:
Warning(include/linux/rio_drv.h:399): Excess function parameter or struct member 'buffer' description in 'rio_get_inb_message'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide a static debounce configuration mechanism for twl4030 GPIOs,
replacing the previous dynamic one. The single user of that mechanism was
for MMC card detect debouncing.
Boards can provide a bitmask saying which GPIOs to debounce (30 msec).
It's always enabled for pins with the MMC card-detect/VMMCx link active,
so most boards won't need to set the debounce mask.
This is a net code shrink, including runtime footprint.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- the type assigned at mount when no type is given is changed
from 0 to AUTOFS_TYPE_INDIRECT. This was done because 0 and
AUTOFS_TYPE_INDIRECT were being treated implicitly as the same
type.
- previously, an offset mount had it's type set to
AUTOFS_TYPE_DIRECT|AUTOFS_TYPE_OFFSET but the mount control
re-implementation needs to be able distinguish all three types.
So this was changed to make the type setting explicit.
- a type AUTOFS_TYPE_ANY was added for use by the re-implementation
when checking if a given path is a mountpoint. It's not really a
type as we use this to ask if a given path is a mountpoint in the
autofs_dev_ioctl_ismountpoint() function.
- functions to set and test the autofs mount types have been added to
improve readability and make the type usage explicit.
- the mount type is used from user space for the mount control
re-implementtion so, for consistency, all the definitions have
been moved to the user space include file include/linux/auto_fs4.h.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The parameter usage in the device node ioctl code uses arg1 and arg2 as
parameter names. This patch redefines the parameter names to reflect what
they actually are in an effort to make the code more readable.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allows kprobes to probe __exit routine. This adds flags member to struct
kprobe. When module is freed(kprobes hooks module_notifier to get this
event), kprobes which probe the functions in that module are set to "Gone"
flag to the flags member. These "Gone" probes are never be enabled.
Users can check the GONE flag through debugfs.
This also removes mod_refcounted, because we couldn't free a module if
kprobe incremented the refcount of that module.
[akpm@linux-foundation.org: document some locking]
[mhiramat@redhat.com: bugfix: pass aggr_kprobe to arch_remove_kprobe]
[mhiramat@redhat.com: bugfix: release old_p's insn_slot before error return]
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add kprobe_insn_mutex for protecting kprobe_insn_pages hlist, and remove
kprobe_mutex from architecture dependent code.
This allows us to call arch_remove_kprobe() (and free_insn_slot) while
holding kprobe_mutex.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This series of patches allows kprobes to probe module's __init and __exit
functions. This means, you can probe driver initialization and
terminating.
Currently, kprobes can't probe __init function because these functions are
freed after module initialization. And it also can't probe module __exit
functions because kprobe increments reference count of target module and
user can't unload it. this means __exit functions never be called unless
removing probes from the module.
To solve both cases, this series of patches introduces GONE flag and sets
it when the target code is freed(for this purpose, kprobes hooks
MODULE_STATE_* events). This also removes refcount incrementing for
allowing user to unload target module. Users can check which probes are
GONE by debugfs interface. For taking timing of freeing module's .init
text, these also include a patch which adds module's notifier of
MODULE_STATE_LIVE event.
This patch:
Add within_module_core() and within_module_init() for checking whether an
address is in the module .init.text section or .text section, and replace
within() local inline functions in kernel/module.c with them.
kprobes uses these functions to check where the kprobe is inserted.
Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Generalize the old at91rm9200 "bootstrap" bitbanging SPI master driver as
"spi_gpio", so it works with arbitrary GPIOs and can be configured through
platform_data. Such SPI masters support:
- any number of bus instances (bus_num is the platform_device.id)
- any number of chipselects (one GPIO per spi_device)
- all four SPI_MODE values, and SPI_CS_HIGH
- i/o word sizes from 1 to 32 bits;
- devices configured as with any other spi_master controller
When configured using platform_data, this provides relatively low clock
rates. On platforms that support inlined GPIO calls, significantly
improved transfer speeds are also possible with a semi-custom driver.
(It's still painful when accessing flash memory, but less so.)
Sanity checked by using this version to replace both native controllers on
a board with six different SPI slaves, relying on three different
SPI_MODE_* values and both SPI_CS_HIGH settings for correct operation.
[akpm@linux-foundation.org: cleanups]
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Acked-by: Magnus Damm <damm@igel.co.jp>
Tested-by: Magnus Damm <damm@igel.co.jp>
Cc: Torgil Svensson <torgil.svensson@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While looking at reducing the amount of architecture namespace pollution
in the generic kernel, I found that asm/irq.h is included in the vast
majority of compilations on ARM (around 650 files.)
Since asm/irq.h includes a sub-architecture include file on ARM, this
causes a negative impact on the ccache's ability to re-use the build
results from other sub-architectures, so we have a desire to reduce the
dependencies on asm/irq.h.
It turns out that a major cause of this is the needless include of
linux/hardirq.h into asm-generic/local.h. The patch below removes this
include, resulting in some 250 to 300 files (around half) of the kernel
then omitting asm/irq.h.
My test builds still succeed, provided two ARM files are fixed
(arch/arm/kernel/traps.c and arch/arm/mm/fault.c) - so there may be
negative impacts for this on other architectures.
Note that x86 does not include asm/irq.h nor linux/hardirq.h in its
asm/local.h, so this patch can be viewed as bringing the generic version
into line with the x86 version.
[kosaki.motohiro@jp.fujitsu.com: add #include <linux/irqflags.h> to acpi/processor_idle.c]
[adobriyan@gmail.com: fix sparc64]
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For NR_CPUS >= 16 values, FBC_BATCH is 2*NR_CPUS
Considering more and more distros are using high NR_CPUS values, it makes
sense to use a more sensible value for FBC_BATCH, and get rid of NR_CPUS.
A sensible value is 2*num_online_cpus(), with a minimum value of 32 (This
minimum value helps branch prediction in __percpu_counter_add())
We already have a hotcpu notifier, so we can adjust FBC_BATCH dynamically.
We rename FBC_BATCH to percpu_counter_batch since its not a constant
anymore.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide some basic advice about when to use BUG()/BUG_ON(): never, unless
there's really no better option.
This matches my understanding of the standard policy ... which seems not
to be written down so far, outside of LKML messages that I haven't
bookmarked.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
f_op->poll is the only vfs operation which is not allowed to sleep. It's
because poll and select implementation used task state to synchronize
against wake ups, which doesn't have to be the case anymore as wait/wake
interface can now use custom wake up functions. The non-sleep restriction
can be a bit tricky because ->poll is not called from an atomic context
and the result of accidentally sleeping in ->poll only shows up as
temporary busy looping when the timing is right or rather wrong.
This patch converts poll/select to use custom wake up function and use
separate triggered variable to synchronize against wake up events. The
only added overhead is an extra function call during wake up and
negligible.
This patch removes the one non-sleep exception from vfs locking rules and
is beneficial to userland filesystem implementations like FUSE, 9p or
peculiar fs like spufs as it's very difficult for those to implement
non-sleeping poll method.
While at it, make the following cosmetic changes to make poll.h and
select.c checkpatch friendly.
* s/type * symbol/type *symbol/ : three places in poll.h
* remove blank line before EXPORT_SYMBOL() : two places in select.c
Oleg: spotted missing barrier in poll_schedule_timeout()
Davide: spotted missing write barrier in pollwake()
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Ron Minnich <rminnich@sandia.gov>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Brad Boyer <flar@allandria.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Roland McGrath <roland@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Create a helper macro to divide two numbers and round the result to the
nearest whole number. This is a helper macro for hwmon drivers that want
to convert incoming sysfs values per standard hwmon practice, though the
macro itself can be used by anyone.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The atomic_t type cannot currently be used in some header files because it
would create an include loop with asm/atomic.h. Move the type definition
to linux/types.h to break the loop.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No architectures use CONFIG_OUT_OF_LINE_PFN_TO_PAGE - it can be removed.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
xacct_add_tsk() relies on do_exit()->update_hiwater_xxx() and uses
mm->hiwater_xxx directly, this leads to 2 problems:
- taskstats_user_cmd() can call fill_pid()->xacct_add_tsk() at any
moment before the task exits, so we should check the current values of
rss/vm anyway.
- do_exit()->update_hiwater_xxx() calls are racy. An exiting thread can
be preempted right before mm->hiwater_xxx = new_val, and another thread
can use A_LOT of memory and exit in between. When the first thread
resumes it can be the last thread in the thread group, in that case we
report the wrong hiwater_xxx values which do not take A_LOT into
account.
Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() helpers and change
xacct_add_tsk() to use them. The first helper will also be used by
rusage->ru_maxrss accounting.
Kill do_exit()->update_hiwater_xxx() calls. Unless we are going to
decrease rss/vm there is no point to update mm->hiwater_xxx, and nobody
can look at this mm_struct when exit_mmap() actually unmaps the memory.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Hugh Dickins <hugh@veritas.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
s_syncing livelock avoidance was breaking data integrity guarantee of
sys_sync, by allowing sys_sync to skip writing or waiting for superblocks
if there is a concurrent sys_sync happening.
This livelock avoidance is much less important now that we don't have the
get_super_to_sync() call after every sb that we sync. This was replaced
by __put_super_and_need_restart.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove WB_SYNC_HOLD. The primary motiviation is the design of my
anti-starvation code for fsync. It requires taking an inode lock over the
sync operation, so we could run into lock ordering problems with multiple
inodes. It is possible to take a single global lock to solve the ordering
problem, but then that would prevent a future nice implementation of "sync
multiple inodes" based on lock order via inode address.
Seems like a backward step to remove this, but actually it is busted
anyway: we can't use the inode lists for data integrity wait: an inode can
be taken off the dirty lists but still be under writeback. In order to
satisfy data integrity semantics, we should wait for it to finish
writeback, but if we only search the dirty lists, we'll miss it.
It would be possible to have a "writeback" list, for sys_sync, I suppose.
But why complicate things by prematurely optimise? For unmounting, we
could avoid the "livelock avoidance" code, which would be easier, but
again premature IMO.
Fixing the existing data integrity problem will come next.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove page_remove_rmap()'s vma arg, which was only for the Eeek message.
And remove the BUG_ON(page_mapcount(page) == 0) from CONFIG_DEBUG_VM's
page_dup_rmap(): we're trying to be more resilient about that than BUGs.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Complete zap_pte_range()'s coverage of bad pagetable entries by calling
print_bad_pte() on a pte_file in a linear vma and on a bad swap entry.
That needs free_swap_and_cache() to tell it, which will also have shown
one of those "swap_free" errors (but with much less information).
Similar checks in fork's copy_one_pte()? No, that would be more noisy
than helpful: we'll see them when parent and child exec or exit.
Where do_nonlinear_fault() calls print_bad_pte(): omit !VM_CAN_NONLINEAR
case, that could only be a bug in sys_remap_file_pages(), not a bad pte.
VM_FAULT_OOM rather than VM_FAULT_SIGBUS? Well, okay, that is consistent
with what happens if do_swap_page() operates a bad swap entry; but don't
we have patches to be more careful about killing when VM_FAULT_OOM?
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simplify the PAGE_FLAGS checking and clearing when freeing and allocating
a page: check the same flags as before when freeing, clear ALL the flags
(unless PageReserved) when freeing, check ALL flags off when allocating.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Swap allocation has always started from the beginning of the swap area;
but if we're dealing with a solidstate swap device which can only remap
blocks within limited zones, that would sooner wear out the first zone.
Therefore sys_swapon() test whether blk_queue is non-rotational, and if so
randomize the cluster_next starting position for allocation.
If blk_queue is nonrot, note SWP_SOLIDSTATE for later use, and report it
with an "SS" at the right end of the kernel's "Adding ... swap" message
(so that if it's both nonrot and discardable, "SSD" will be shown there).
Perhaps something should be shown in /proc/swaps (swapon -s), but we have
to be more cautious before making any addition to that format.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Joern Engel <joern@logfs.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Donjun Shin <djshin90@gmail.com>
Cc: Tejun Heo <teheo@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When scan_swap_map() finds a free cluster of swap pages to allocate,
discard the old contents of the cluster if the device supports discard.
But don't bother when swap is so fragmented that we allocate single pages.
Be careful about racing allocations made while we're scanning for a
cluster; and hold up allocations made while we're discarding.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Joern Engel <joern@logfs.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Donjun Shin <djshin90@gmail.com>
Cc: Tejun Heo <teheo@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When adding swap, all the old data on swap can be forgotten: sys_swapon()
discard all but the header page of the swap partition (or every extent but
the header of the swap file), to give a solidstate swap device the
opportunity to optimize its wear-levelling.
If that succeeds, note SWP_DISCARDABLE for later use, and report it with a
"D" at the right end of the kernel's "Adding ... swap" message. Perhaps
something should be shown in /proc/swaps (swapon -s), but we have to be
more cautious before making any addition to that format.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Joern Engel <joern@logfs.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Donjun Shin <djshin90@gmail.com>
Cc: Tejun Heo <teheo@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Before making functional changes, rearrange scan_swap_map() to simplify
subsequent diffs. Actually, there is one functional change in there:
leave cluster_nr negative while scanning for a new cluster - resetting it
early increased the likelihood that when we have difficulty finding a free
cluster, another task may come in and try doing exactly the same - just a
waste of cpu.
Before making functional changes, rearrange struct swap_info_struct
slightly: flags will be needed as an unsigned long (for wait_on_bit), next
is a good int to pair with prio, old_block_size is uninteresting so shift
it to the end.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove the SWP_ACTIVE mask: it just obscures the SWP_WRITEOK flag.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sparse output following warnings.
mm/vmalloc.c:1436:6: warning: symbol 'vread' was not declared. Should it be static?
mm/vmalloc.c:1474:6: warning: symbol 'vwrite' was not declared. Should it be static?
However, it is used by /dev/kmem. fixed here.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rik suggests a simplified get_scan_ratio() for !CONFIG_SWAP. Yes, the gcc
optimizer gives us that, when nr_swap_pages is #defined as 0L. Move usual
declaration to swapfile.c: it never belonged in page_alloc.c.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If we add a failing stub for add_to_swap(), then we can remove the #ifdef
CONFIG_SWAP from mm/vmscan.c.
This was intended as a source cleanup, but looking more closely, it turns
out that the !CONFIG_SWAP case was going to keep_locked for an anonymous
page, whereas now it goes to the more suitable activate_locked, like the
CONFIG_SWAP nr_swap_pages 0 case.
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>