This patch fixes the following deadlock bug during the recovery.
INFO: task mount:1322 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount D ffffffff81125870 0 1322 1266 0x00000000
ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046
ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8
ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520
Call Trace:
[<ffffffff81125870>] ? __lock_page+0x70/0x70
[<ffffffff816a92d9>] schedule+0x29/0x70
[<ffffffff816a93af>] io_schedule+0x8f/0xd0
[<ffffffff8112587e>] sleep_on_page+0xe/0x20
[<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff81125867>] __lock_page+0x67/0x70
[<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40
[<ffffffff81126857>] find_lock_page+0x67/0x80
[<ffffffff8112698f>] find_or_create_page+0x3f/0xb0
[<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs]
[<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs]
[<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs]
[<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40
[<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs]
[<ffffffff81184cf9>] mount_bdev+0x1c9/0x210
[<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs]
[<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffff81185a13>] mount_fs+0x43/0x1b0
[<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20
[<ffffffff811a0796>] vfs_kern_mount+0x76/0x120
[<ffffffff811a2cb7>] do_mount+0x237/0xa10
[<ffffffff81140b9b>] ? strndup_user+0x5b/0x80
[<ffffffff811a3520>] SyS_mount+0x90/0xe0
[<ffffffff816b3502>] system_call_fastpath+0x16/0x1b
The bug is triggered when check_index_in_prev_nodes tries to get the direct
node page by calling get_node_page.
At this point, if the direct node page is already locked by get_dnode_of_data,
its caller, we got a deadlock condition.
This patch adds additional condition check for the reuse of locked direct node
pages prior to the get_node_page call.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
While an orphan inode has zero link_count, f2fs_gc is able to select the inode
for foreground gc.
- f2fs_gc
- do_garbage_collect
- gc_data_segment
: f2fs_iget is failed
: get_valid_blocks() != 0, so that retry
--> here we got the infinite loop.
This patch resolved this issue.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
If we met an error during the dentry recovery, we should not conduct checkpoint.
Otherwise, some errorneous dentry blocks overwrites the existing blocks that
contain the remaining recovery information.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
The allocated page used by the recovery is not on HIGHMEM, so that we don't
need to use kmap/kunmap.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Few things can be changed in the default mkwrite function
1) Make file_update_time at the start before acquiring any lock
2) the condition page_offset(page) >= i_size_read(inode) should be
changed to page_offset(page) > i_size_read
3) Move wait_on_page_writeback.
Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
We can do this, since now we use a global mutex, f2fs_stat_mutex to protect its
list operations.
Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
[Jaegeuk Kim: add description]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Majianpeng reported a lockdep splat for f2fs. It turns out mutex_lock_all()
acquires an array of locks (in global/local lock style).
Any such operation is always serialized using cp_mutex, therefore there is no
fs_lock[] lock-order issue; tell lockdep about this using the
mutex_lock_nest_lock() primitive.
Reported-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
I found a bug when testing power-off-recovery as follows.
[Bug Scenario]
1. create a file
2. fsync the file
3. reboot w/o any sync
4. try to recover the file
- found its fsync mark
- found its dentry mark
: try to recover its dentry
- get its file name
- get its parent inode number
: here we got zero value
The reason why we get the wrong parent inode number is that we didn't
synchronize the inode page with its newly created inode information perfectly.
Especially, previous f2fs stores fi->i_pino and writes it to the cached
node page in a wrong order, which incurs the zero-valued i_pino during the
recovery.
So, this patch modifies the creation flow to fix the synchronization order of
inode page with its inode.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
If get_dnode_of_data gets a locked node page, let's skip redundant
get_node_page calls.
This is for the futher enhancement.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
During the dentry recovery routine, recover_inode() triggers __f2fs_add_link
with its directory inode.
In the following scenario, a bug is captured.
1. dir = f2fs_iget(pino)
2. __f2fs_add_link(dir, name)
3. iput(dir)
-> f2fs_evict_inode() faces with BUG_ON(atomic_read(fi->dirty_dents))
Kernel BUG at ffffffffa01c0676 [verbose debug info unavailable]
[<ffffffffa01c0676>] f2fs_evict_inode+0x276/0x300 [f2fs]
Call Trace:
[<ffffffff8118ea00>] evict+0xb0/0x1b0
[<ffffffff8118f1c5>] iput+0x105/0x190
[<ffffffffa01d2dac>] recover_fsync_data+0x3bc/0x1070 [f2fs]
[<ffffffff81692e8a>] ? io_schedule+0xaa/0xd0
[<ffffffff81690acb>] ? __wait_on_bit_lock+0x7b/0xc0
[<ffffffff8111a0e7>] ? __lock_page+0x67/0x70
[<ffffffff81165e21>] ? kmem_cache_alloc+0x31/0x140
[<ffffffff8118a502>] ? __d_instantiate+0x92/0xf0
[<ffffffff812a949b>] ? security_d_instantiate+0x1b/0x30
[<ffffffff8118a5b4>] ? d_instantiate+0x54/0x70
This means that we should flush all the dentry pages between iget and iput().
But, during the recovery routine, it is unallowed due to consistency, so we
have to wait the whole recovery process.
And then, write_checkpoint flushes all the dirty dentry blocks, and nicely we
can put the stale dir inodes from the dirty_dir_inode_list.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
The reason of using sbi->por_doing is to alleviate data writes during the
recovery.
The find_fsync_dnodes() produces some dirty dentry pages, so we should
cover it too with sbi->por_doing.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
In get_lock_data_page, if there is a data race between get_dnode_of_data for
node and grab_cache_page for data, f2fs is able to face with the following
BUG_ON(dn.data_blkaddr == NEW_ADDR).
kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/data.c:251!
[<ffffffffa044966c>] get_lock_data_page+0x1ec/0x210 [f2fs]
Call Trace:
[<ffffffffa043b089>] f2fs_readdir+0x89/0x210 [f2fs]
[<ffffffff811a0920>] ? fillonedir+0x100/0x100
[<ffffffff811a0920>] ? fillonedir+0x100/0x100
[<ffffffff811a07f8>] vfs_readdir+0xb8/0xe0
[<ffffffff811a0b4f>] sys_getdents+0x8f/0x110
[<ffffffff816d7999>] system_call_fastpath+0x16/0x1b
This bug is able to be occurred when the block address of the data block is
changed after f2fs_put_dnode().
In order to avoid that, this patch fixes the lock order of node and data
blocks in which the node block lock is covered by the data block lock.
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Currently f2fs recovers the dentry of fsynced files.
When power-off-recovery is conducted, this newly recovered inode should increase
node block count as well as inode block count.
This patch resolves this inconsistency that results in:
1. create a file
2. write data
3. fsync
4. reboot without sync
5. mount and recover the file
6. node block count is 1 and inode block count is 2
: fall into the inconsistent state
7. unlink the file
: trigger the following BUG_ON
------------[ cut here ]------------
kernel BUG at /home/zeus/f2fs_test/src/fs/f2fs/f2fs.h:716!
Call Trace:
[<ffffffffa0344100>] ? get_node_page+0x50/0x1a0 [f2fs]
[<ffffffffa0344bfc>] remove_inode_page+0x8c/0x100 [f2fs]
[<ffffffffa03380f0>] ? f2fs_evict_inode+0x180/0x2d0 [f2fs]
[<ffffffffa033812e>] f2fs_evict_inode+0x1be/0x2d0 [f2fs]
[<ffffffff811c7a67>] evict+0xa7/0x1a0
[<ffffffff811c82b5>] iput+0x105/0x190
[<ffffffff811c2b30>] d_kill+0xe0/0x120
[<ffffffff811c2c57>] dput+0xe7/0x1e0
[<ffffffff811acc3d>] __fput+0x19d/0x2d0
[<ffffffff811acd7e>] ____fput+0xe/0x10
[<ffffffff81070645>] task_work_run+0xb5/0xe0
[<ffffffff81002941>] do_notify_resume+0x71/0xb0
[<ffffffff8175f14a>] int_signal+0x12/0x17
Reported-and-Tested-by: Chris Fries <C.Fries@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
do_smart_update_queue() is called when an operation (semop,
semctl(SETVAL), semctl(SETALL), ...) modified the array. It must check
which of the sleeping tasks can proceed.
do_smart_update_queue() missed a few wakeups:
- if a sleeping complex op was completed, then all per-semaphore queues
must be scanned - not only those that were modified by *sops
- if a sleeping simple op proceeded, then the global queue must be
scanned again
And:
- the test for "|sops == NULL) before scanning the global queue is not
required: If the global queue is empty, then it doesn't need to be
scanned - regardless of the reason for calling do_smart_update_queue()
The patch is not optimized, i.e. even completing a wait-for-zero
operation causes a rescan. This is done to keep the patch as simple as
possible.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Stable fix to prevent an rpc_task wakeup race
- Fix a NFSv4.1 session drain deadlock
- Fix a NFSv4/v4.1 mount regression when not running rpc.gssd
- Ensure auth_gss pipe detection works in namespaces
- Fix SETCLIENTID fallback if rpcsec_gss is not available
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJRomGNAAoJEGcL54qWCgDyzUIQALj4hsOtdcmCCAWu0m+Pr+Gz
/6rpO2xJ5cLWyTYS2J9wmPkU+mb/g/OB/OhANyQQsDbfoW3e27TGaXRB8hUs29vk
LHTkFcrBTi4ohlw2Gyb6LWDF6cyHkC7cpH7tfIjLDff77V/qK1uqo41MtYpqUvy3
41tMoFOhvpwsy7HuKqVyYtNlwxnXrdJ5XvK0ycaQYQ9t8om9Hxu/scCgLfl/qwRr
eVhIesC2/Y1eC46UK7/TWCC7aTLkvi85UQY+fywMkajt/gPzjjh6nuhc45aOT9d7
pc0sXgIs8fLEjC+CRa0ojrziJZCHZY93U1kzA+CotRqPq76nPeFeG/1VhViKKb4C
1AU9IA4ntViUqNDQiGrCxE6ZeMewTOKksrCErwSS16Hv9Gqh4SHq84R/bF6MFgBc
dqQsexUbf/vaWHmuosARgbyc/QcBXDwWwbkq0hXSWbHA8vpc8/Lw1wkp49eyKz0V
0zwJ9xInakXr5/DIC72IKEF0Dg26L+GTXWLmDZVcdpVBp5A403trxUKckcsNa/Xf
FXaGMhyv2ZfV4AohW1Z8klkLiMHt4dr+YB7SQAgR/gb81p3odgKgMz51PfmHxFqs
4nxwRQqWplBTGiLrOFSgaRC50jLqEloUweWC2qf56KYEb9N6M6XDIXhllLIMyWLD
94cvihpKj+/ecKH5kQWF
=B8go
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Stable fix to prevent an rpc_task wakeup race
- Fix a NFSv4.1 session drain deadlock
- Fix a NFSv4/v4.1 mount regression when not running rpc.gssd
- Ensure auth_gss pipe detection works in namespaces
- Fix SETCLIENTID fallback if rpcsec_gss is not available
* tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix SETCLIENTID fallback if GSS is not available
SUNRPC: Prevent an rpc_task wakeup race
NFSv4.1 Fix a pNFS session draining deadlock
SUNRPC: Convert auth_gss pipe detection to work in namespaces
SUNRPC: Faster detection if gssd is actually running
SUNRPC: Fix a bug in gss_create_upcall
Pull parisc fixes from Helge Deller:
"This time we made the kernel- and interruption stack allocation
reentrant which fixed some strange kernel crashes (specifically
protection ID traps).
Furthemore this patchset fixes the interrupt stack in UP and SMP
configurations by using native locking instructions. And finally
usage of floating point calculations on parisc were disabled in the
MPILIB."
* 'parisc-for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: fix irq stack on UP and SMP
parisc/superio: Use module_pci_driver to register driver
parisc: make interrupt and interruption stack allocation reentrant
parisc: show number of FPE and unaligned access handler calls in /proc/interrupts
parisc: add additional parisc git tree to MAINTAINERS file
parisc: use PAGE_SHIFT instead of hardcoded value 12 in pacache.S
parisc: add rp5470 entry to machine database
MPILIB: disable usage of floating point registers on parisc
- Fix for corruption with FSX on 512 byte blocksize filesystems
- Fix rounding error in xfs_free_file_space
- Fix use-after-free with extent free intents
- Add several missing KM_NOFS flags to fix lockdep reports
- Several fixes for CRC related code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQIcBAABAgAGBQJRn93mAAoJENaLyazVq6ZOPw8P/1ab7Gr4K7ccpuRvC+JiBPx9
/Eb1l2A68r8rWJq3bEvLM/XgN7F0IESastS+iA8lrNuLWJAHCb+Y37mdbyWymhWO
J6LFg/Mic+n5uGmg6nXrqLJ8epTd3L+V+cCEHLpa/20zEwa5yd8sSTV0P1pfcMRi
S36Uof7boe+s4H/s2z4YX8Y+dkaVn1tN7j4PDs5Zd/wiUq0EKA5le4AkbqIJlCDn
E9mJL1S2t3QwfgQWL+pe8f5jzReGTAxJUQWX2ErajX4vYI7PnR1WTYf7YJB291f0
XPfkjqvaK0WCdipSkgb58t/Rj2IxoHItdryvBHa/rnL4e95aBQMpNyY8btNX28ou
NgcIQnqQB+xlJeHpSyb1xNusamGzWw0CEYsXPcDQbXj1uJjd7/GfqTwtVbB4zOlW
Hua7/6N22vaY1dfQcQFwgi1XcKF9jSVKBVOvy7yHEmdNuT7mFejaG1gVO7NpTIgd
s7R91pQVlYpLTmZHK6ZqZap5OC6kS/YJr9HxVxS3FQhaJmFwGqvf6UUjSWOK1MnJ
obTbbygo2Orvh10lgzDNsd+xRJZ9aFn+UfywSeQGTfm91HvTnEQje3xqctv3zmBy
adrWSXXg/eN07nciNk0zTgHnQZo1v/ZeAFlz0AcSslMMQ7Pq4+Pv4LPtdxMIdrxU
1IZ32GunGTUXqm6GRW6v
=hQCM
-----END PGP SIGNATURE-----
Merge tag 'for-linus-v3.10-rc3' of git://oss.sgi.com/xfs/xfs
Pull xfs fixes from Ben Myers:
"Here are fixes for corruption on 512 byte filesystems, a rounding
error, a use-after-free, some flags to fix lockdep reports, and
several fixes related to CRCs. We have a somewhat larger post -rc1
queue than usual due to fixes related to the CRC feature we merged for
3.10:
- Fix for corruption with FSX on 512 byte blocksize filesystems
- Fix rounding error in xfs_free_file_space
- Fix use-after-free with extent free intents
- Add several missing KM_NOFS flags to fix lockdep reports
- Several fixes for CRC related code"
* tag 'for-linus-v3.10-rc3' of git://oss.sgi.com/xfs/xfs:
xfs: remote attribute lookups require the value length
xfs: xfs_attr_shortform_allfit() does not handle attr3 format.
xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC
xfs: fix missing KM_NOFS tags to keep lockdep happy
xfs: Don't reference the EFI after it is freed
xfs: fix rounding in xfs_free_file_space
xfs: fix sub-page blocksize data integrity writes
- Additional CPU ID for the intel_pstate driver from Dirk Brandewie.
- More cpufreq fixes related to ARM big.LITTLE support and locking from
Viresh Kumar.
- VIA C7 cpufreq build fix from Rafał Bilski.
- ACPI power management fix making it possible to use device power
states regardless of the CONFIG_PM setting from Rafael J. Wysocki.
- New ACPI video blacklist item from Bastian Triller.
/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQIcBAABAgAGBQJRoRZjAAoJEKhOf7ml8uNsv9wQAKAMs9J8k6XqgNPisFKetw+K
hzCOsKFOpI0BQKFikgtWjhGre1SyNIRUvLXO7BHFHXYQW6cLvn1jAyJhvl+i4nvT
eOa+vdGd6grWncbhIxeidoyk9hTZ6bdMWlTBvKUz5KpHzvp4YGC2jlvwFwqsJkpg
nQ8Hcbrbhm4vz5h7EmrlYcELBNTi5LQtmnqlxtbn02GX75BFTpkCm5aLZWZNEUrE
Hix8BhN41+hSy+K34ztHFlP5g/s/lIa9dOX1tewqSigkDB/qYYIt2lpdD2icOIOW
qHAtvpZq8/fZOcoZ9KdFqKUjjbuKVavldb+YzGeTLQufOAwb4hgMRvAccdNFMHIW
9tVkp2TcK6K7pAYlXtgEf25ka7ulLWDBd4C662gZfpi+oPKx2BI/6m7J4VoTULeb
30hDMyZXrXWWvStwO05Pyno3W5lG+cn9jytc3hKkaFerb53NHcZHfb0Rih5NhDZD
Ep09IuPE8fOT9KndY2kw/WwoZyJurYCbrgE+G1QyA+hsNPkNhPlGTxdL8vCqxM4K
ZOaQQejpd1bXBSk8Koz8LRyQ38KJByvM64B0EDSP6BQUT+rlbkcvog1bJV+UdpbJ
4TlhrAFlobhRFQBqlIbRqMXFPH31YSm7wVK1eK/gEqNZI935Kd17YSFf8yyi2yli
vBlmPkiPEIJHysps+tvd
=Srt8
-----END PGP SIGNATURE-----
Merge tag 'pm+acpi-3.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
- Additional CPU ID for the intel_pstate driver from Dirk Brandewie.
- More cpufreq fixes related to ARM big.LITTLE support and locking from
Viresh Kumar.
- VIA C7 cpufreq build fix from Rafał Bilski.
- ACPI power management fix making it possible to use device power
states regardless of the CONFIG_PM setting from Rafael J Wysocki.
- New ACPI video blacklist item from Bastian Triller.
* tag 'pm+acpi-3.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / video: Add "Asus UL30A" to ACPI video detect blacklist
cpufreq: arm_big_little_dt: Instantiate as platform_driver
cpufreq: arm_big_little_dt: Register driver only if DT has valid data
cpufreq / e_powersaver: Fix linker error when ACPI processor is a module
cpufreq / intel_pstate: Add additional supported CPU ID
cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT
ACPI / PM: Allow device power states to be used for CONFIG_PM unset
Pull slave-dma fixes from Vinod Koul:
"We have two patches from Andy & Rafael fixing the Lynxpoint dma"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
ACPI / LPSS: register clock device for Lynxpoint DMA properly
dma: acpi-dma: parse CSRT to extract additional resources
kcore_vmalloc is in fs/proc/kcore.c and kcore_mem is unused across
the tree. Noticed while grepping the tree for some other kcore stuff.
(score looks pretty unmaintained to me.)
Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fallouts/wreckage of Cache Flush optimizations / aliasing dcache support
* Fix for an interesting bug where piped input to grep was getting
mysteriously clobbered
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRoICxAAoJEGnX8d3iisJeEU0P/33PB+g7wPgFFYYiNc3lm+uz
KUuVmZd/8mvIpJNwW4zEKObtMFecXShBCL67Qe6CJ/rGOj7xdPyRB5xpZqXOzVzW
4QF98G4u3gz7R+ELhneXAgJ2DRcGHaPvkQf0dW6a1BYQ81Wlz/cXJcNp+4dkSkRS
JIgFQsk8HAY8VLC/8CV+61ajrFkH/eRHaU2qjk+0QPUdsqI1W3N1ZNT0ZpaY4Hhf
S8H/zwN/Ymanu2+DV9zI8R+NzrYgCDVwyOmpakQQFC99+kdyI4o3FL19B9VHvyAs
hXqjbwHQSwjPajrlQyOpPedDLB3qK2xDzPvL940Aa2HW+EoAwOy8Lver6gq2laCc
Q5rn894XEd0HCW/QzJvK/0OeXn5MRerK3HNGWwGT3dqpj4okE70vMh0zs6kGmwX0
XEn3PsifkhZ0+ts0aiQxC5WSp8StrU8wT1iHSk/VTt6qbkq5cXDlJyTdPQ/b09+e
yJgv2Z4nPybP4jc6g46vaEtrz2bm3pHTg7opGzLQOCfYTMQ8vI7QXjTvqMHv4lOt
jDd2xVy8w826LYGeiqWdDMBNs+ff7Nyt/mICos2YSqhzgz6FnC9lLC5VSrl7sAKz
VwdaZeozOGH7USftUqPgOal2djkxDKQsS2pAS2Y85V2d5z/iVsiGshUCMMm1epVJ
u2T13gR5Rx8k3YhNYXgu
=j1po
-----END PGP SIGNATURE-----
Merge tag 'arc-v3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Pull ARC fixes from Vineet Gupta:
- Fallouts/wreckage of Cache Flush optimizations / aliasing dcache
support
- Fix for an interesting bug where piped input to grep was getting
mysteriously clobbered
* tag 'arc-v3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
ARC: lazy dcache flush broke gdb in non-aliasing configs
ARC: Use enough bits for determining page's cache color
ARC: Brown paper bag bug in macro for checking cache color
ARC: copy_(to|from)_user() to honor usermode-access permissions
ARC: [mm] Prevent stray dcache lines after__sync_icache_dcach()
ARC: [TB10x] Remove redundant abilis,simple-pinctrl mechanism
Pull ARM fixes from Russell King:
"Just three this time, all really quite small"
* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
ARM: 7729/1: vfp: ensure VFP_arch is non-zero when VFP is not supported
ARM: 7727/1: remove the .vm_mm value from gate_vma
ARM: 7723/1: crypto: sha1-armv4-large.S: fix SP handling
gdbserver inserting a breakpoint ends up calling copy_user_page() for a
code page. The generic version of which (non-aliasing config) didn't set
the PG_arch_1 bit hence update_mmu_cache() didn't sync dcache/icache for
corresponding dynamic loader code page - causing garbade to be executed.
So now aliasing versions of copy_user_highpage()/clear_page() are made
default. There is no significant overhead since all of special alias
handling code is compiled out for non-aliasing build
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Merge fixes from Andrew Morton:
"A bunch of fixes and one simple fbdev driver which missed the merge
window because people will still talking about it (to no great
effect)."
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (30 commits)
aio: fix kioctx not being freed after cancellation at exit time
mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas
drivers/rtc/rtc-max8998.c: check for pdata presence before dereferencing
ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap()
random: fix accounting race condition with lockless irq entropy_count update
drivers/char/random.c: fix priming of last_data
mm/memory_hotplug.c: fix printk format warnings
nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary
drivers/block/brd.c: fix brd_lookup_page() race
fbdev: FB_GOLDFISH should depend on HAS_DMA
drivers/rtc/rtc-pl031.c: pass correct pointer to free_irq()
auditfilter.c: fix kernel-doc warnings
aio: fix io_getevents documentation
revert "selftest: add simple test for soft-dirty bit"
drivers/leds/leds-ot200.c: fix error caused by shifted mask
mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer
linux/kernel.h: fix kernel-doc warning
mm compaction: fix of improper cache flush in migration code
rapidio/tsi721: fix bug in MSI interrupt handling
hfs: avoid crash in hfs_bnode_create
...
We didn't have any fixes sent up for -rc2, so this is a slightly larger
batch. A bit all over the place platform-wise; OMAP, at91, marvell,
renesas, sunxi, ux500, etc.
I tried to summarize highlights but there isn't a whole lot to point
out. Lots of little things fixed all over. A couple of defconfig updates
due to new/changing options.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRn+66AAoJEIwa5zzehBx3f/4P/3sqK2z7u5SSa+tpkKYkxezO
MykUOpUc4tuwrKiuUEeXPh89pjIrclQVzKYDqdaXIcezKB7IXFfQSyLNxDzGM7Y3
NrqrURvNpDmUi6F/xP89gXWkbvg2zr563mxSMpkF4G8HTYxvCv7sY0W/PDzb48Qg
q3Efc/AfQsOGM0Zl8WXX9jZBdCNTquZYWd7YEFxCe9oVRGfmNlIYKirPOLnk9MAZ
iIrbfFQG+cOos0NKrjM+tbtQNnPUBQeZdy3MvR7DtXCpKNdxs5EJL9EyHHqGGzPk
Jd1CG3hNrPiuKVhmVSDkELNDYNT7J9+/rya1Mc33pwMrReAIKWUU/sitvsOkKypi
PnTvlJkUv3UM2fOtoF8mOhQLTGdKSWfaF0BfHNmnCqJFDPs8vuQjx7O1WGKKUdC4
SbYZetUnL3AoMrEbza5EoMyqQwpTXiALWp4/6MunLhNyXo0OQvsqexvT6QIrhKyQ
0iExuSSbXDZAw2GXsqzOlCJecxHYG4qkGbf1DmeTW31GRQQ3nhpiu0GVAvHIEAhT
SJGD57P0gbEXMpnSIKLNKIAYlLdFkGx6AhX1/Xb+AL7Jsod+UEwgz2ya4GMI4JI/
0lUe8fPglU3ws8Up1y5FcIq4gjXTsvsEy317zeW/oq2vac0ACKBika2EVukdMD/5
SYr+m40EzQtVdTKuaZ+e
=8tMi
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"We didn't have any fixes sent up for -rc2, so this is a slightly
larger batch. A bit all over the place platform-wise; OMAP, at91,
marvell, renesas, sunxi, ux500, etc.
I tried to summarize highlights but there isn't a whole lot to point
out. Lots of little things fixed all over. A couple of defconfig
updates due to new/changing options."
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (44 commits)
ARM: at91/sama5: fix incorrect PMC pcr div definition
ARM: at91/dt: fix macb pinctrl_macb_rmii_mii_alt definition
ARM: at91: at91sam9n12: move external irq declatation to DT
ARM: shmobile: marzen: Use error values in usb_power_*
ARM: tegra: defconfig fixes
ARM: nomadik: fix IRQ assignment for SMC ethernet
ARM: vt8500: Add missing NULL terminator in dt_compat
clk: tegra: add ac97 controller clock
clk: tegra: remove USB from clk init table
ARM: dts: mvebu: Fix wrong the address reg value for the L2-cache node
ARM: plat-orion: Fix num_resources and id for ge10 and ge11
ARM: OMAP2+: hwmod: Remove sysc slave idle and auto idle apis
SERIAL: OMAP: Remove the slave idle handling from the driver
ARM: OMAP2+: serial: Remove the un-used slave idle hooks
ARM: OMAP2+: hwmod-data: UART IP needs software control to manage sidle modes
ARM: OMAP2+: hwmod: Add a new flag to handle SIDLE in SWSUP only in active
ARM: OMAP2+: hwmod: Fix sidle programming in _enable_sysc()/_idle_sysc()
arm: mvebu: fix the 'ranges' property to handle PCIe
ARM: mvebu: select ARCH_REQUIRE_GPIOLIB for mvebu platform
ARM: AM33XX: Add missing .clkdm_name to clkdiv32k_ick clock
...
The recent changes overhauling fs/aio.c introduced a bug that results in
the kioctx not being freed when outstanding kiocbs are cancelled at
exit_aio() time. Specifically, a kiocb that is cancelled has its
completion events discarded by batch_complete_aio(), which then fails to
wake up the process stuck in free_ioctx(). Fix this by modifying the
wait_event() condition in free_ioctx() appropriately.
This patch was tested with the cancel operation in the thread based code
posted yesterday.
[akpm@linux-foundation.org: fix build]
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Zach Brown <zab@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A panic can be caused by simply cat'ing /proc/<pid>/smaps while an
application has a VM_PFNMAP range. It happened in-house when a
benchmarker was trying to decipher the memory layout of his program.
/proc/<pid>/smaps and similar walks through a user page table should not
be looking at VM_PFNMAP areas.
Certain tests in walk_page_range() (specifically split_huge_page_pmd())
assume that all the mapped PFN's are backed with page structures. And
this is not usually true for VM_PFNMAP areas. This can result in panics
on kernel page faults when attempting to address those page structures.
There are a half dozen callers of walk_page_range() that walk through a
task's entire page table (as N. Horiguchi pointed out). So rather than
change all of them, this patch changes just walk_page_range() to ignore
VM_PFNMAP areas.
The logic of hugetlb_vma() is moved back into walk_page_range(), as we
want to test any vma in the range.
VM_PFNMAP areas are used by:
- graphics memory manager gpu/drm/drm_gem.c
- global reference unit sgi-gru/grufile.c
- sgi special memory char/mspec.c
- and probably several out-of-tree modules
[akpm@linux-foundation.org: remove now-unused hugetlb_vma() stub]
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the driver can crash with a NULL pointer dereference if no
pdata is provided, despite of successful registration of the MFD part.
This patch fixes the problem by adding a NULL check before dereferencing
the pdata pointer.
Signed-off-by: Tomasz Figa <t.figa@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Reviewed-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Last time we found there is lock/unlock bug in ocfs2_file_aio_write, and
then we did a thorough search for all lock resources in
ocfs2_inode_info, including rw, inode and open lockres and found this
bug. My kernel version is 3.0.13, and it is also in the lastest version
3.9. In ocfs2_fiemap, once ocfs2_get_clusters_nocache failed, it should
goto out_unlock instead of out, because we need release buffer head, up
read alloc sem and unlock inode.
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 902c098a36 ("random: use lockless techniques in the interrupt
path") turned IRQ path from being spinlock protected into lockless
cmpxchg-retry update.
That commit removed r->lock serialization between crediting entropy bits
from IRQ context and accounting when extracting entropy on userspace
read path, but didn't turn the r->entropy_count reads/updates in
account() to use cmpxchg as well.
It has been observed, that under certain circumstances this leads to
read() on /dev/urandom to return 0 (EOF), as r->entropy_count gets
corrupted and becomes negative, which in turn results in propagating 0
all the way from account() to the actual read() call.
Convert the accounting code to be the proper lockless counterpart of
what has been partially done by 902c098a36.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Greg KH <greg@kroah.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit ec8f02da9e ("random: prime last_data value per fips
requirements") added priming of last_data per fips requirements.
Unfortuantely, it did so in a way that can lead to multiple threads all
incrementing nbytes, but only one actually doing anything with the extra
data, which leads to some fun random corruption and panics.
The fix is to simply do everything needed to prime last_data in a single
shot, so there's no window for multiple cpus to increment nbytes -- in
fact, we won't even increment or decrement nbytes anymore, we'll just
extract the needed EXTRACT_SIZE one time per pool and then carry on with
the normal routine.
All these changes have been tested across multiple hosts and
architectures where panics were previously encoutered. The code changes
are are strictly limited to areas only touched when when booted in fips
mode.
This change should also go into 3.8-stable, to make the myriads of fips
users on 3.8.x happy.
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stodola <jstodola@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Matt Mackall <mpm@selenic.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix printk format warnings in mm/memory_hotplug.c by using "%pa":
mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'resource_size_t' [-Wformat]
mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'resource_size_t' [-Wformat]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary
DESCRIPTION:
There are use-cases when NILFS2 file system (formatted with block size
lesser than 4 KB) can be remounted in RO mode because of encountering of
"broken bmap" issue.
The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
"The machine I've been trialling nilfs on is running Debian Testing,
Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
also reproduced it (identically) with Debian Unstable amd64 and Debian
Experimental (using the 3.8-trunk kernel). The problematic partitions
were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."
SYMPTOMS:
(1) System log contains error messages likewise:
[63102.496756] nilfs_direct_assign: invalid pointer: 0
[63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
[63102.496798]
[63102.524403] Remounting filesystem read-only
(2) The NILFS2 file system is remounted in RO mode.
REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):
----------------[BEGIN SCRIPT]--------------------
VG=unencrypted
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------
REPRODUCIBILITY: 100%
INVESTIGATION:
As it was discovered, the issue takes place during segment
construction after executing such sequence of user-space operations:
open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
ftruncate(7, 60)
The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
bmap (inode number=28)" takes place because of trying to get block
number for third block of the file with logical offset #3072 bytes. As
it is possible to see from above output, the file has 60 bytes of the
whole size. So, it is enough one block (1 KB in size) allocation for
the whole file. Trying to operate with several blocks instead of one
takes place because of discovering several dirty buffers for this file
in nilfs_segctor_scan_file() method.
The root cause of this issue is in nilfs_set_page_dirty function which
is called just before writing to an mmapped page.
When nilfs_page_mkwrite function handles a page at EOF boundary, it
fills hole blocks only inside EOF through __block_page_mkwrite().
The __block_page_mkwrite() function calls set_page_dirty() after filling
hole blocks, thus nilfs_set_page_dirty function (=
a_ops->set_page_dirty) is called. However, the current implementation
of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
at EOF boundary.
As a result, buffers outside EOF are inconsistently marked dirty and
queued for write even though they are not mapped with nilfs_get_block
function.
FIX:
This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.
Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
for this issue.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Anthony Doggett <Anthony2486@interfaces.org.uk>
Reported-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The index on the page must be set before it is inserted in the radix
tree. Otherwise there is a small race which can occur during lookup
where the page can be found with the incorrect index. This will trigger
the BUG_ON() in brd_lookup_page().
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Chris Wedgwood <cw@f00f.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If NO_DMA=y:
drivers/built-in.o: In function `goldfish_fb_remove':
drivers/video/goldfishfb.c:301: undefined reference to `dma_free_coherent'
drivers/built-in.o: In function `goldfish_fb_probe':
drivers/video/goldfishfb.c:247: undefined reference to `dma_alloc_coherent'
drivers/video/goldfishfb.c:280: undefined reference to `dma_free_coherent'
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
free_irq() expects the same pointer that was passed to request_irq(),
otherwise the IRQ is not freed.
The issue was found using the following coccinelle script:
<smpl>
@r1@
type T;
T devid;
@@
request_irq(..., devid)
@r2@
type r1.T;
T devid;
position p;
@@
free_irq@p(..., devid)
@@
position p != r2.p;
@@
*free_irq@p(...)
</smpl>
Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kernel-doc warnings in kernel/auditfilter.c:
Warning(kernel/auditfilter.c:1029): Excess function parameter 'loginuid' description in 'audit_receive_filter'
Warning(kernel/auditfilter.c:1029): Excess function parameter 'sessionid' description in 'audit_receive_filter'
Warning(kernel/auditfilter.c:1029): Excess function parameter 'sid' description in 'audit_receive_filter'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In reviewing man pages, I noticed that io_getevents is documented to
update the timeout that gets passed into the library call. This doesn't
happen in kernel space or in the library (even though it's documented to
do so in both places). Unless there is objection, I'd like to fix the
comments/docs to match the code (I will also update the man page upon
consensus).
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Cyril Hrubis <chrubis@suse.cz>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert commit 58c7be84fe ("selftest: add simple test for soft-dirty
bit"). This is the self test for Pavel's pagemap2 patches which didn't
actually get merged.
Reported-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>