This reverts commit b43d17f319.
Dave Jones reports that it causes lockups on his laptop, and his debug
output showed a lot of processes hung waiting for page_writeback (or
more commonly - processes hung waiting for a lock that was held during
that writeback wait).
The page_writeback hint made Ted suggest that Dave look at this commit,
and Dave verified that reverting it makes his problems go away.
Ted says:
"That commit fixes a race which is seen when you write into fallocated
(and hence uninitialized) disk blocks under *very* heavy memory
pressure. Furthermore, although theoretically it could trigger under
normal direct I/O writes, it only seems to trigger if you are issuing
a huge number of AIO writes, such that a just-written page can get
evicted from memory, and then read back into memory, before the
workqueue has a chance to update the extent tree.
This race has been around for a little over a year, and no one noticed
until two months ago; it only happens under fairly exotic conditions,
and in fact even after trying very hard to create a simple repro under
lab conditions, we could only reproduce the problem and confirm the
fix on production servers running MySQL on very fast PCIe-attached
flash devices.
Given that Dave was able to hit this problem pretty quickly, if we
confirm that this commit is at fault, the only reasonable thing to do
is to revert it IMO."
Reported-and-tested-by: Dave Jones <davej@redhat.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull more ARM updates from Russell King.
This got a fair number of conflicts with the <asm/system.h> split, but
also with some other sparse-irq and header file include cleanups. They
all looked pretty trivial, though.
* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (59 commits)
ARM: fix Kconfig warning for HAVE_BPF_JIT
ARM: 7361/1: provide XIP_VIRT_ADDR for no-MMU builds
ARM: 7349/1: integrator: convert to sparse irqs
ARM: 7259/3: net: JIT compiler for packet filters
ARM: 7334/1: add jump label support
ARM: 7333/2: jump label: detect %c support for ARM
ARM: 7338/1: add support for early console output via semihosting
ARM: use set_current_blocked() and block_sigmask()
ARM: exec: remove redundant set_fs(USER_DS)
ARM: 7332/1: extract out code patch function from kprobes
ARM: 7331/1: extract out insn generation code from ftrace
ARM: 7330/1: ftrace: use canonical Thumb-2 wide instruction format
ARM: 7351/1: ftrace: remove useless memory checks
ARM: 7316/1: kexec: EOI active and mask all interrupts in kexec crash path
ARM: Versatile Express: add NO_IOPORT
ARM: get rid of asm/irq.h in asm/prom.h
ARM: 7319/1: Print debug info for SIGBUS in user faults
ARM: 7318/1: gic: refactor irq_start assignment
ARM: 7317/1: irq: avoid NULL check in for_each_irq_desc loop
ARM: 7315/1: perf: add support for the Cortex-A7 PMU
...
Pull cpupower updates from Dominik Brodowski.
* git://git.kernel.org/pub/scm/linux/kernel/git/brodo/cpupowerutils:
cpupower tools: add install target to the debug tools' makefiles
cpupower tools: allow to build debug tools in a separate directory too
cpupower: Fix broken mask values
cpupower tool: allow to build in a separate directory
cpupower tool: makefile: simplify the recipe used to generate cpupower.pot target
cpupower tool: remove use of undefined variables from the clean target of the top makefile
cpupower: Fix linking with --as-needed
cpupower: Remove unneeded code and by that fix a memleak
cpupower: Fix number of idle states
cpupower: Unify cpupower-frequency-* manpages
cpupower: Add cpupower-idle-info manpage
cpupower: AMD fam14h/Ontario monitor can also be used by fam12h cpus
cpupower: Better interface for accessing AMD pci registers
Pull a few PCMCIA updates from Dominik Brodowski.
Fix up trivial conflict (modified code in question had been removed) in
drivers/pcmcia/soc_common.c.
* git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia:
pcmcia at91_cf: fix raw gpio number usage
ARM: pxa: fix error handling in pxa2xx_drv_pcmcia_probe
pcmcia: Convert to DEFINE_PCI_DEVICE_TABLE
pcmcia: convert drivers/pcmcia/* to use module_platform_driver()
pcmcia: irq: Remove IRQF_DISABLED
Pull slave-dmaengine update from Vinod Koul:
"This includes the cookie cleanup by Russell, the addition of context
parameter for dmaengine APIs, more arm dmaengine driver cleanup by
moving code to dmaengine, this time for imx by Javier and pl330 by
Boojin along with the usual driver fixes."
Fix up some fairly trivial conflicts with various other cleanups.
* 'next' of git://git.infradead.org/users/vkoul/slave-dma: (67 commits)
dmaengine: imx: fix the build failure on x86_64
dmaengine: i.MX: Fix merge of cookie branch.
dmaengine: i.MX: Add support for interleaved transfers.
dmaengine: imx-dma: use 'dev_dbg' and 'dev_warn' for messages.
dmaengine: imx-dma: remove 'imx_dmav1_baseaddr' and 'dma_clk'.
dmaengine: imx-dma: remove unused arg of imxdma_sg_next.
dmaengine: imx-dma: remove internal structure.
dmaengine: imx-dma: remove 'resbytes' field of 'internal' structure.
dmaengine: imx-dma: remove 'in_use' field of 'internal' structure.
dmaengine: imx-dma: remove sg member from internal structure.
dmaengine: imx-dma: remove 'imxdma_setup_sg_hw' function.
dmaengine: imx-dma: remove 'imxdma_config_channel_hw' function.
dmaengine: imx-dma: remove 'imxdma_setup_mem2mem_hw' function.
dmaengine: imx-dma: remove dma_mode member of internal structure.
dmaengine: imx-dma: remove data member from internal structure.
dmaengine: imx-dma: merge old dma-v1.c with imx-dma.c
dmaengine: at_hdmac: add slave config operation
dmaengine: add context parameter to prep_slave_sg and prep_dma_cyclic
dmaengine/dma_slave: introduce inline wrappers
dma: imx-sdma: Treat firmware messages as warnings instead of erros
...
Pull nfsd changes from Bruce Fields:
Highlights:
- Benny Halevy and Tigran Mkrtchyan implemented some more 4.1 features,
moving us closer to a complete 4.1 implementation.
- Bernd Schubert fixed a long-standing problem with readdir cookies on
ext2/3/4.
- Jeff Layton performed a long-overdue overhaul of the server reboot
recovery code which will allow us to deprecate the current code (a
rather unusual user of the vfs), and give us some needed flexibility
for further improvements.
- Like the client, we now support numeric uid's and gid's in the
auth_sys case, allowing easier upgrades from NFSv2/v3 to v4.x.
Plus miscellaneous bugfixes and cleanup.
Thanks to everyone!
There are also some delegation fixes waiting on vfs review that I
suppose will have to wait for 3.5. With that done I think we'll finally
turn off the "EXPERIMENTAL" dependency for v4 (though that's mostly
symbolic as it's been on by default in distro's for a while).
And the list of 4.1 todo's should be achievable for 3.5 as well:
http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues
though we may still want a bit more experience with it before turning it
on by default.
* 'for-3.4' of git://linux-nfs.org/~bfields/linux: (55 commits)
nfsd: only register cld pipe notifier when CONFIG_NFSD_V4 is enabled
nfsd4: use auth_unix unconditionally on backchannel
nfsd: fix NULL pointer dereference in cld_pipe_downcall
nfsd4: memory corruption in numeric_name_to_id()
sunrpc: skip portmap calls on sessions backchannel
nfsd4: allow numeric idmapping
nfsd: don't allow legacy client tracker init for anything but init_net
nfsd: add notifier to handle mount/unmount of rpc_pipefs sb
nfsd: add the infrastructure to handle the cld upcall
nfsd: add a header describing upcall to nfsdcld
nfsd: add a per-net-namespace struct for nfsd
sunrpc: create nfsd dir in rpc_pipefs
nfsd: add nfsd4_client_tracking_ops struct and a way to set it
nfsd: convert nfs4_client->cl_cb_flags to a generic flags field
NFSD: Fix nfs4_verifier memory alignment
NFSD: Fix warnings when NFSD_DEBUG is not defined
nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
nfsd: rename 'int access' to 'int may_flags' in nfsd_open()
ext4: return 32/64-bit dir name hash according to usage type
fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash
...
Pull arch/tile (really asm-generic) update from Chris Metcalf:
"These are a couple of asm-generic changes that apply to tile."
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
compat: use sys_sendfile64() implementation for sendfile syscall
[PATCH v3] ipc: provide generic compat versions of IPC syscalls
Pull scheduler fixes from Ingo Molnar.
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpusets: Remove an unused variable
sched/rt: Improve pick_next_highest_task_rt()
sched: Fix select_fallback_rq() vs cpu_active/cpu_online
sched/x86/smp: Do not enable IRQs over calibrate_delay()
sched: Fix compiler warning about declared inline after use
MAINTAINERS: Update email address for SCHEDULER and PERF EVENTS
Pull x86 updates from Ingo Molnar.
This touches some non-x86 files due to the sanitized INLINE_SPIN_UNLOCK
config usage.
Fixed up trivial conflicts due to just header include changes (removing
headers due to cpu_idle() merge clashing with the <asm/system.h> split).
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/apic/amd: Be more verbose about LVT offset assignments
x86, tls: Off by one limit check
x86/ioapic: Add io_apic_ops driver layer to allow interception
x86/olpc: Add debugfs interface for EC commands
x86: Merge the x86_32 and x86_64 cpu_idle() functions
x86/kconfig: Remove CONFIG_TR=y from the defconfigs
x86: Stop recursive fault in print_context_stack after stack overflow
x86/io_apic: Move and reenable irq only when CONFIG_GENERIC_PENDING_IRQ=y
x86/apic: Add separate apic_id_valid() functions for selected apic drivers
locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage
x86/kconfig: Update defconfigs
x86: Fix excessive MSR print out when show_msr is not specified
Pull timer core updates from Thomas Gleixner.
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ia64: vsyscall: Add missing paranthesis
alarmtimer: Don't call rtc_timer_init() when CONFIG_RTC_CLASS=n
x86: vdso: Put declaration before code
x86-64: Inline vdso clock_gettime helpers
x86-64: Simplify and optimize vdso clock_gettime monotonic variants
kernel-time: fix s/then/than/ spelling errors
time: remove no_sync_cmos_clock
time: Avoid scary backtraces when warning of > 11% adj
alarmtimer: Make sure we initialize the rtctimer
ntp: Fix leap-second hrtimer livelock
x86, tsc: Skip refined tsc calibration on systems with reliable TSC
rtc: Provide flag for rtc devices that don't support UIE
ia64: vsyscall: Use seqcount instead of seqlock
x86: vdso: Use seqcount instead of seqlock
x86: vdso: Remove bogus locking in update_vsyscall_tz()
time: Remove bogus comments
time: Fix change_clocksource locking
time: x86: Fix race switching from vsyscall to non-vsyscall clock
Fix this build error on ia64:
In file included from include/linux/sched.h:92,
from arch/ia64/kernel/asm-offsets.c:9:
include/linux/llist.h:59:25: error: asm/cmpxchg.h: No such file or directory
make[1]: *** [arch/ia64/kernel/asm-offsets.s] Error 1
Right now we don't seem to need any actual contents for the
asm/cmpxchg.h to make the build work ... so leave the migration of
xchg() and cmpxchg() to this new header file for a future patch.
Also process.c needs <asm/switch_to.h> (for definition of pfm_syst_info).
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull the code to generalize the powerpc VIRQ_DEBUG code from Grant Likely.
That code had been moved into generic irqdomain code, but still had
powerpc-specific code and could only be enabled on powerpc.
* 'irqdomain/merge' of git://git.secretlab.ca/git/linux-2.6:
irqdomain/powerpc: updated defconfigs for VIRQ_DEBUG rename
irqdomain: Remove powerpc dependency from debugfs file
Single fix for a commit from the first batch of patches through Andrew.
* emailed from Andrew Morton <akpm@linux-foundation.org>:
pagemap: remove remaining unneeded spin_lock()
Commit 025c5b2451 ("thp: optimize away unnecessary page table
locking") moves spin_lock() into pmd_trans_huge_lock() in order to avoid
locking unless pmd is for thp. So this spin_lock() is a bug.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Sterba had put in patches to look for mixed data/metadata groups
with metadata bigger than 4KB. But these ended up in the wrong place
and it wasn't testing the feature flag correctly.
This updates the tests to make sure our sizes are matching
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The debugfs code is really generic for all platforms. This patch removes the
powerpc-specific directory reference and makes it available to all
architectures.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
We were not noticing it because symbol__inc_addr_samples was erroneously
dropping samples that hit the last byte in a function.
Working on a fix for a problem reported by David Miller, Stephane
Eranian and Sorin Dumitru, where addresses < sym->start were causing
problems, I noticed this other problem.
Cc: David Ahern <dsahern@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sorin Dumitru <dumitru.sorin87@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-pqjaq4cr1xs2xen73pjhbav4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
From Tony Lindgren:
"This contains the updated gpio_to_irq patches from Tarun, and a trivial
build fix from Govindraj to #include <asm/system_misc.h> in pm.c.
The DSI mux patch is the same."
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP: pm: fix compilation break
ARM: OMAP: Remove OMAP_GPIO_IRQ macro definition
drivers: input: Fix OMAP_GPIO_IRQ with gpio_to_irq() in ams_delta_serio_exit()
ARM: OMAP: boards: Fix OMAP_GPIO_IRQ usage with gpio_to_irq()
ARM: OMAP2+: Remove __init from DSI mux functions
Pull the intel i915 hibernation memory corruption fix from Dave Airlie:
"I tracked down the misc memory corruption after i915 hibernate to the
blinking fbcon cursor, and realised the i915 driver wasn't doing the
fbdev suspend/resume calls at all. nouveau and radeon have done these
calls for a long time.
This has been fairly well tested and is definitely the main culprit in
hibernate not working."
Yay.
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/i915: suspend fbdev device around suspend/hibernate
* 'for-3.4/fixes-for-rc1-and-v3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/swarren/linux-tegra:
ARM: tegra: Fix device tree AUXDATA for USB/EHCI
During a complex merge for v3.4, one line of the commit
c20b909be9 ("ARM: LPC32xx: Ethernet support") was
reverted wrongly ("lpc-eth.0" -> "lpc-net.0") while the other conflicts were
merged correctly. This patch re-applies the clock name "lpc-eth.0".
Signed-off-by: Roland Stigge <stigge@antcom.de>
Signed-off-by: Olof Johansson <olof@lixom.net>
turbostat -s
cuts down on the amount of output, per user request.
also treak some output whitespace and the man page.
Signed-off-by: Len Brown <len.brown@intel.com>
Fix the compilation break observed on latest mainline caused
by 9f97da78 (Disintegrate asm/system.h for ARM):
arch/arm/mach-omap1/pm.c: In function 'omap_pm_prepare':
arch/arm/mach-omap1/pm.c:587: error: implicit declaration of function 'disable_hlt'
arch/arm/mach-omap1/pm.c: In function 'omap_pm_finish':
arch/arm/mach-omap1/pm.c:624: error: implicit declaration of function 'enable_hlt'
arch/arm/mach-omap1/pm.c: In function 'omap_pm_init':
arch/arm/mach-omap1/pm.c:681: error: 'arm_pm_idle' undeclared (first use in this function)
...
arch/arm/mach-omap2/pm.c: In function 'omap_pm_begin':
arch/arm/mach-omap2/pm.c:239: error: implicit declaration of function 'disable_hlt'
arch/arm/mach-omap2/pm.c: In function 'omap_pm_end':
arch/arm/mach-omap2/pm.c:247: error: implicit declaration of function 'enable_hlt'
Signed-off-by: Govindraj.R <govindraj.raja@ti.com>
Acked-by: Kevin Hilman <khilman@ti.com>
[tony@atomide.com: updated to fix omap1 too]
Signed-off-by: Tony Lindgren <tony@atomide.com>
Fix module parameter data type to eliminate build warnings.
sound/isa/opti9xx/opti92x-ad1848.c:87:1: warning: return from incompatible pointer type
sound/isa/opti9xx/opti92x-ad1848.c:87:1: warning: return from incompatible pointer type
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Fix module parameter data type to eliminate build warning.
sound/oss/msnd_pinnacle.c:1727:1: warning: return from incompatible pointer type
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The cleanup of the dmaengine parameter messup and a tweak to some
callibration values for WM1811.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJPdETyAAoJEBus8iNuMP3d7moQAJdyO48uOt717D+BLP77ri+u
U09DP4TSprqWIS1sSgxJcP3fnQs5nFyqyDyjXkV6P0KjHtK2VE0xAS5uaO0sddS7
6VBqFiyhfl9XbdDieu4qY9qdaeJL3idvo3Gqi+bjlY2Q3gnZeSx+nefW7vG6hyK8
SNXM9Cya66/eNfIt3DklQ34926lnHPN/Oz9WPqNs3c9ROJeoQzOsVSe8PLSP2ZIM
HitNbqZjxZXVC8Jytd56qwLmlIvikrX0UUmROkhuEwYMfa2lnWEGP72OtIT7udKE
G9NB/eaFK2ddB2pVv+lnkx6qm9ZYrGoiiyqfU9QbVplP8VE5sRJmdzaVbMmgDPpm
pYoSgVGs/ajiiC0SFQdAdSdwsngMSpbBopv9ri/HPRkVNM3AznpBg58AbESR5SM9
cc/iLRjTMNCc1Vv5xLdRO2BAYGtFQSZcknwDjZe50eDYv0cEjDG5R3+rvU8Wic15
T0umDIn8J0jv8e1yxN6qDGRicxM5H4uo2ZOfMgKwYRgGm5DgN8mnopOBd97/RXSS
cV8JpYvys6zPlxIsDTQ6fqQW0ywHq5IZD1F/u2o3HKRVi9nd8hM5d9urtOIr/nlZ
M4yLPHCsp+no37pexiHlIbW/X8WdeEXZL7NVRYDmyX6JFsctqTWtso/SnwGOn2J8
lXqw7Msl1fstcHkjb/5G
=XZ+C
-----END PGP SIGNATURE-----
Merge tag 'asoc-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: A few more updates for 3.4
The cleanup of the dmaengine parameter messup and a tweak to some
callibration values for WM1811.
Since all references to OMAP_GPIO_IRQ macro are replaced now
with gpio_to_irq(), this can be removed altogether.
Signed-off-by: Tarun Kanti DebBarma <tarun.kanti@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Even though ams-delta-serio input driver uses gpio_to_irq() in all
relevent places to get irq number, the ams_delta_serio_exit() still
uses OMAP_GPIO_IRQ macro. Fix this.
Signed-off-by: Tarun Kanti DebBarma <tarun.kanti@ti.com>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Tony Lindgren <tony@atomide.com>
The following commits change gpio-omap to use dynamic
IRQ allocation:
25db711 gpio/omap: Fix IRQ handling for SPARSE_IRQ
384ebe1 gpio/omap: Add DT support to GPIO driver
With dynamic allocation of IRQ the usage of OMAP_GPIO_IRQ
is no longer valid. We must be using gpio_to_irq() instead.
Signed-off-by: Tarun Kanti DebBarma <tarun.kanti@ti.com>
[tony@atomide.com: updated comments]
Signed-off-by: Tony Lindgren <tony@atomide.com>
The commit 89812fc81f ("perf tools: Add parser generator for events
parsing") changed event parsing engine but missed the ref-cycles event.
Add it.
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1333016517-10591-1-git-send-email-namhyung.kim@lge.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When we use autodefrag, we forget to update the index which indicates
the last page we've dirty. And we'll set dirty flags on a same set of
pages again and again.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
$ mkfs.btrfs /dev/sdb7
$ mount /dev/sdb7 /mnt/btrfs/ -oautodefrag
$ dd if=/dev/zero of=/mnt/btrfs/foobar bs=4k count=10 oflag=direct 2>/dev/null
$ filefrag -v /mnt/btrfs/foobar
Filesystem type is: 9123683e
File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
ext logical physical expected length flags
0 0 3072 10 eof
/mnt/btrfs/foobar: 1 extent found
Now we have a big real extent [0, 40960), but autodefrag will still defrag it.
$ sync
$ filefrag -v /mnt/btrfs/foobar
Filesystem type is: 9123683e
File size of /mnt/btrfs/foobar is 40960 (10 blocks, blocksize 4096)
ext logical physical expected length flags
0 0 3082 10 eof
/mnt/btrfs/foobar: 1 extent found
So if we already find a big real extent, we're ok about that, just skip it.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
If our file's layout is as follows:
| hole | data1 | hole | data2 |
we do not need to defrag this file, because this file has holes and
cannot be merged into one extent.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
$ mkfs.btrfs disk
$ mount disk /mnt -o autodefrag
$ dd if=/dev/zero of=/mnt/foobar bs=4k count=10 2>/dev/null && sync
$ for i in `seq 9 -2 0`; do dd if=/dev/zero of=/mnt/foobar bs=4k count=1 \
seek=$i conv=notrunc 2> /dev/null; done && sync
then we'll get to defrag "foobar" again and again.
So does option "-o autodefrag,compress".
Reasons:
When the cleaner kthread gets to fetch inodes from the defrag tree and defrag
them, it will dirty pages and submit them, this will comes to another DATA COW
where the processing inode will be inserted to the defrag tree again.
This patch sets a rule for COW code, i.e. insert an inode when we're really
going to make some defragments.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
commit 600a45e1d5
(Btrfs: fix deadlock on page lock when doing auto-defragment)
fixes the deadlock on page, but it also introduces another bug.
A page may have been truncated after unlock & lock.
So we need to find it again to get the right one.
And since we've held i_mutex lock, inode size remains unchanged and
we can drop isize overflow checks.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
The bug is from running xfstests 209 with autodefrag.
The race is as follows:
t1 t2(autodefrag)
direct IO
invalidate pagecache
dio(old data) add_inode_defrag
invalidate pagecache
endio
direct IO
invalidate pagecache
run_defrag
readpage(old data)
set page dirty (old data)
dio(new data, rewrite)
invalidate pagecache (*)
endio
t2(autodefrag) will get old data into pagecache via readpage and set
pagecache dirty. Meanwhile, invalidate pagecache(*) will fail due to
dirty flags in pages. So the old data may be flushed into disk by
flush thread, which will lead to data loss.
And so does the case of user defragment progs.
The patch fixes this race by holding i_mutex when we readpage and set page dirty.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
This deadlock comes from xfstests 251.
We'll hold the chunk_mutex throughout the whole of a chunk allocation.
But if we find that we've used up system chunk space, we need to allocate a
new system chunk, but this will lead to a recursion of chunk allocation and end
up with a deadlock on chunk_mutex.
So instead we need to allocate the system chunk first if we find we're in ENOSPC.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
o For space info, the type of space info is useful for debug.
o For transaction handle, its transid is useful.
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Otherwise, we get a warning or error similar to this when building with
CONFIG_NFSD_V4 disabled:
ERROR: "nfsd4_cld_block" [fs/nfsd/nfsd.ko] undefined!
Fix this by wrapping the calls to rpc_pipefs_notifier_register and
..._unregister in another function and providing no-op replacements
when CONFIG_NFSD_V4 is disabled.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Notify get_robust_list users that the syscall is going away.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@xenotime.net>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: kernel-hardening@lists.openwall.com
Cc: spender@grsecurity.net
Link: http://lkml.kernel.org/r/20120323190855.GA27213@www.outflux.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
It was possible to extract the robust list head address from a setuid
process if it had used set_robust_list(), allowing an ASLR info leak. This
changes the permission checks to be the same as those used for similar
info that comes out of /proc.
Running a setuid program that uses robust futexes would have had:
cred->euid != pcred->euid
cred->euid == pcred->uid
so the old permissions check would allow it. I'm not aware of any setuid
programs that use robust futexes, so this is just a preventative measure.
(This patch is based on changes from grsecurity.)
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: kernel-hardening@lists.openwall.com
Cc: spender@grsecurity.net
Link: http://lkml.kernel.org/r/20120319231253.GA20893@www.outflux.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We respect node affinity of devices already in the irq descriptor
allocation, but we ignore it for the initial interrupt affinity
setup, so the interrupt might be routed to a different node.
Restrict the default affinity mask to the node on which the irq
descriptor is allocated.
[ tglx: Massaged changelog ]
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/1332788538-17425-1-git-send-email-prarit@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The only place irq_finalize_oneshot() is called with force parameter set
is the threaded handler error exit path. But IRQTF_RUNTHREAD is dropped
at this point and irq_wake_thread() is not going to set it again,
since PF_EXITING is set for this thread already. So irq_finalize_oneshot()
will drop the threads bit in threads_oneshot anyway and hence the force
parameter is superfluous.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120321162234.GP24806@dhcp-26-207.brq.redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
exit_irq_thread() clears IRQTF_RUNTHREAD flag and drops the thread's bit in
desc->threads_oneshot then. The bit must not be set again in between and it
does not, since irq_wake_thread() sees PF_EXITING flag first and returns.
Due to above the order or checking PF_EXITING and IRQTF_RUNTHREAD flags in
irq_wake_thread() is important. This change just makes it more visible in the
source code.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/20120321162212.GO24806@dhcp-26-207.brq.redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
This adopts a trimmed down version of the MIPS port mangling interface
limited to the I/O swabbing for platforms that can't use little endian
accessors. For platforms with mixed I/O spaces involving PCI it will
still be necessary to enable byte swapping at the host controller level.
Attention needs to be paid to all of host controller endianness, CPU
endianness, and whether I/O accesses are explicitly swapped or not via
SWAP_IO_SPACE. Fortunately the platforms that need this are in the
minority.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>