Граф коммитов

309 Коммитов

Автор SHA1 Сообщение Дата
Nikolay Borisov 44d70e194f btrfs: Make copy_items take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:55 +01:00
Nikolay Borisov 4791c8f19c btrfs: Make btrfs_check_ref_name_override take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:55 +01:00
Nikolay Borisov 481b01c0d3 btrfs: Make logged_inode_size take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:54 +01:00
Nikolay Borisov a491abb2e7 btrfs: Make btrfs_del_inode_ref take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:54 +01:00
Nikolay Borisov 49f34d1f96 btrfs: Make btrfs_del_dir_entries_in_log take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:54 +01:00
Nikolay Borisov 9ca5fbfbb9 btrfs: Make btrfs_log_new_name take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:54 +01:00
Nikolay Borisov 0f8939b8ac btrfs: Make btrfs_inode_in_log take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:54 +01:00
Nikolay Borisov 436635571b btrfs: Make btrfs_record_snapshot_destroy take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:54 +01:00
Nikolay Borisov 4176bdbf2d btrfs: Make btrfs_record_unlink_dir take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:53 +01:00
Nikolay Borisov ab1717b2ab btrfs: Make btrfs_must_commit_transaction take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:53 +01:00
Nikolay Borisov 5f4b32e94a btrfs: Make btrfs_commit_inode_delayed_items take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:53 +01:00
Nikolay Borisov aa79021fde btrfs: Make btrfs_commit_inode_delayed_inode take btrfs_inode
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:53 +01:00
Nikolay Borisov 4a0cc7ca6c btrfs: Make btrfs_ino take a struct btrfs_inode
Currently btrfs_ino takes a struct inode and this causes a lot of
internal btrfs functions which consume this ino to take a VFS inode,
rather than btrfs' own struct btrfs_inode. In order to fix this "leak"
of VFS structs into the internals of btrfs first it's necessary to
eliminate all uses of struct inode for the purpose of inode. This patch
does that by using BTRFS_I to convert an inode to btrfs_inode. With
this problem eliminated subsequent patches will start eliminating the
passing of struct inode altogether, eventually resulting in a lot cleaner
code.

Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
[ fix btrfs_get_extent tracepoint prototype ]
Signed-off-by: David Sterba <dsterba@suse.com>
2017-02-14 15:50:51 +01:00
Liu Bo 781feef7e6 Btrfs: fix lockdep warning about log_mutex
While checking INODE_REF/INODE_EXTREF for a corner case, we may acquire a
different inode's log_mutex with holding the current inode's log_mutex, and
lockdep has complained this with a possilble deadlock warning.

Fix this by using mutex_lock_nested() when processing the other inode's
log_mutex.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2017-01-03 15:19:28 +01:00
Chris Mason 5f52a2c512 Merge branch 'for-chris-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.10
Patches queued up by Filipe:

The most important change is still the fix for the extent tree
corruption that happens due to balance when qgroups are enabled (a
regression introduced in 4.7 by a fix for a regression from the last
qgroups rework). This has been hitting SLE and openSUSE users and QA
very badly, where transactions keep getting aborted when running
delayed references leaving the root filesystem in RO mode and nearly
unusable.  There are fixes here that allow us to run xfstests again
with the integrity checker enabled, which has been impossible since 4.8
(apparently I'm the only one running xfstests with the integrity
checker enabled, which is useful to validate dirtied leafs, like
checking if there are keys out of order, etc).  The rest are just some
trivial fixes, most of them tagged for stable, and two cleanups.

Signed-off-by: Chris Mason <clm@fb.com>
2016-12-13 09:14:42 -08:00
Jeff Mahoney 3a45bb207e btrfs: remove root parameter from transaction commit/end routines
Now we only use the root parameter to print the root objectid in
a tracepoint.  We can use the root parameter from the transaction
handle for that.  It's also used to join the transaction with
async commits, so we remove the comment that it's just for checking.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:07:00 +01:00
Jeff Mahoney bf89d38feb btrfs: split btrfs_wait_marked_extents into normal and tree log functions
btrfs_write_and_wait_marked_extents and btrfs_sync_log both call
btrfs_wait_marked_extents, which provides a core loop and then handles
errors differently based on whether it's it's a log root or not.

This means that btrfs_write_and_wait_marked_extents needs to take a root
because btrfs_wait_marked_extents requires one, even though it's only
used to determine whether the root is a log root.  The log root code
won't ever call into the transaction commit code using a log root, so we
can factor out the core loop and provide the error handling appropriate
to each waiter in new routines.  This allows us to eventually remove
the root argument from btrfs_commit_transaction, and as a result,
btrfs_end_transaction.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:07:00 +01:00
Jeff Mahoney 2ff7e61e0d btrfs: take an fs_info directly when the root is not used otherwise
There are loads of functions in btrfs that accept a root parameter
but only use it to obtain an fs_info pointer.  Let's convert those to
just accept an fs_info pointer directly.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney 0b246afa62 btrfs: root->fs_info cleanup, add fs_info convenience variables
In routines where someptr->fs_info is referenced multiple times, we
introduce a convenience variable.  This makes the code considerably
more readable.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:59 +01:00
Jeff Mahoney da17066c40 btrfs: pull node/sector/stripe sizes out of root and into fs_info
We track the node sizes per-root, but they never vary from the values
in the superblock.  This patch messes with the 80-column style a bit,
but subsequent patches to factor out root->fs_info into a convenience
variable fix it up again.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:58 +01:00
Jeff Mahoney 5b4aacefb8 btrfs: call functions that overwrite their root parameter with fs_info
There are 11 functions that accept a root parameter and immediately
overwrite it.  We can pass those an fs_info pointer instead.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-12-06 16:06:57 +01:00
Robbie Ko 2a7bf53f57 Btrfs: fix tree search logic when replaying directory entry deletes
If a log tree has a layout like the following:

leaf N:
        ...
        item 240 key (282 DIR_LOG_ITEM 0) itemoff 8189 itemsize 8
                dir log end 1275809046
leaf N + 1:
        item 0 key (282 DIR_LOG_ITEM 3936149215) itemoff 16275 itemsize 8
                dir log end 18446744073709551615
        ...

When we pass the value 1275809046 + 1 as the parameter start_ret to the
function tree-log.c:find_dir_range() (done by replay_dir_deletes()), we
end up with path->slots[0] having the value 239 (points to the last item
of leaf N, item 240). Because the dir log item in that position has an
offset value smaller than *start_ret (1275809046 + 1) we need to move on
to the next leaf, however the logic for that is wrong since it compares
the current slot to the number of items in the leaf, which is smaller
and therefore we don't lookup for the next leaf but instead we set the
slot to point to an item that does not exist, at slot 240, and we later
operate on that slot which has unexpected content or in the worst case
can result in an invalid memory access (accessing beyond the last page
of leaf N's extent buffer).

So fix the logic that checks when we need to lookup at the next leaf
by first incrementing the slot and only after to check if that slot
is beyond the last item of the current leaf.

Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Fixes: e02119d5a7 (Btrfs: Add a write ahead tree log to optimize synchronous operations)
Cc: stable@vger.kernel.org  # 2.6.29+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
[Modified changelog for clarity and correctness]
2016-11-30 16:56:12 +00:00
Robbie Ko ec125cfb7a Btrfs: fix deadlock caused by fsync when logging directory entries
While logging new directory entries, at tree-log.c:log_new_dir_dentries(),
after we call btrfs_search_forward() we get a leaf with a read lock on it,
and without unlocking that leaf we can end up calling btrfs_iget() to get
an inode pointer. The later (btrfs_iget()) can end up doing a read-only
search on the same tree again, if the inode is not in memory already, which
ends up causing a deadlock if some other task in the meanwhile started a
write search on the tree and is attempting to write lock the same leaf
that btrfs_search_forward() locked while holding write locks on upper
levels of the tree blocking the read search from btrfs_iget(). In this
scenario we get a deadlock.

So fix this by releasing the search path before calling btrfs_iget() at
tree-log.c:log_new_dir_dentries().

Example trace of such deadlock:

[ 4077.478852] kworker/u24:10  D ffff88107fc90640     0 14431      2 0x00000000
[ 4077.486752] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 4077.494346]  ffff880ffa56bad0 0000000000000046 0000000000009000 ffff880ffa56bfd8
[ 4077.502629]  ffff880ffa56bfd8 ffff881016ce21c0 ffffffffa06ecb26 ffff88101a5d6138
[ 4077.510915]  ffff880ebb5173b0 ffff880ffa56baf8 ffff880ebb517410 ffff881016ce21c0
[ 4077.519202] Call Trace:
[ 4077.528752]  [<ffffffffa06ed5ed>] ? btrfs_tree_lock+0xdd/0x2f0 [btrfs]
[ 4077.536049]  [<ffffffff81053680>] ? wake_up_atomic_t+0x30/0x30
[ 4077.542574]  [<ffffffffa068cc1f>] ? btrfs_search_slot+0x79f/0xb10 [btrfs]
[ 4077.550171]  [<ffffffffa06a5073>] ? btrfs_lookup_file_extent+0x33/0x40 [btrfs]
[ 4077.558252]  [<ffffffffa06c600b>] ? __btrfs_drop_extents+0x13b/0xdf0 [btrfs]
[ 4077.566140]  [<ffffffffa06fc9e2>] ? add_delayed_data_ref+0xe2/0x150 [btrfs]
[ 4077.573928]  [<ffffffffa06fd629>] ? btrfs_add_delayed_data_ref+0x149/0x1d0 [btrfs]
[ 4077.582399]  [<ffffffffa06cf3c0>] ? __set_extent_bit+0x4c0/0x5c0 [btrfs]
[ 4077.589896]  [<ffffffffa06b4a64>] ? insert_reserved_file_extent.constprop.75+0xa4/0x320 [btrfs]
[ 4077.599632]  [<ffffffffa06b206d>] ? start_transaction+0x8d/0x470 [btrfs]
[ 4077.607134]  [<ffffffffa06bab57>] ? btrfs_finish_ordered_io+0x2e7/0x600 [btrfs]
[ 4077.615329]  [<ffffffff8104cbc2>] ? process_one_work+0x142/0x3d0
[ 4077.622043]  [<ffffffff8104d729>] ? worker_thread+0x109/0x3b0
[ 4077.628459]  [<ffffffff8104d620>] ? manage_workers.isra.26+0x270/0x270
[ 4077.635759]  [<ffffffff81052b0f>] ? kthread+0xaf/0xc0
[ 4077.641404]  [<ffffffff81052a60>] ? kthread_create_on_node+0x110/0x110
[ 4077.648696]  [<ffffffff814a9ac8>] ? ret_from_fork+0x58/0x90
[ 4077.654926]  [<ffffffff81052a60>] ? kthread_create_on_node+0x110/0x110

[ 4078.358087] kworker/u24:15  D ffff88107fcd0640     0 14436      2 0x00000000
[ 4078.365981] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[ 4078.373574]  ffff880ffa57fad0 0000000000000046 0000000000009000 ffff880ffa57ffd8
[ 4078.381864]  ffff880ffa57ffd8 ffff88103004d0a0 ffffffffa06ecb26 ffff88101a5d6138
[ 4078.390163]  ffff880fbeffc298 ffff880ffa57faf8 ffff880fbeffc2f8 ffff88103004d0a0
[ 4078.398466] Call Trace:
[ 4078.408019]  [<ffffffffa06ed5ed>] ? btrfs_tree_lock+0xdd/0x2f0 [btrfs]
[ 4078.415322]  [<ffffffff81053680>] ? wake_up_atomic_t+0x30/0x30
[ 4078.421844]  [<ffffffffa068cc1f>] ? btrfs_search_slot+0x79f/0xb10 [btrfs]
[ 4078.429438]  [<ffffffffa06a5073>] ? btrfs_lookup_file_extent+0x33/0x40 [btrfs]
[ 4078.437518]  [<ffffffffa06c600b>] ? __btrfs_drop_extents+0x13b/0xdf0 [btrfs]
[ 4078.445404]  [<ffffffffa06fc9e2>] ? add_delayed_data_ref+0xe2/0x150 [btrfs]
[ 4078.453194]  [<ffffffffa06fd629>] ? btrfs_add_delayed_data_ref+0x149/0x1d0 [btrfs]
[ 4078.461663]  [<ffffffffa06cf3c0>] ? __set_extent_bit+0x4c0/0x5c0 [btrfs]
[ 4078.469161]  [<ffffffffa06b4a64>] ? insert_reserved_file_extent.constprop.75+0xa4/0x320 [btrfs]
[ 4078.478893]  [<ffffffffa06b206d>] ? start_transaction+0x8d/0x470 [btrfs]
[ 4078.486388]  [<ffffffffa06bab57>] ? btrfs_finish_ordered_io+0x2e7/0x600 [btrfs]
[ 4078.494561]  [<ffffffff8104cbc2>] ? process_one_work+0x142/0x3d0
[ 4078.501278]  [<ffffffff8104a507>] ? pwq_activate_delayed_work+0x27/0x40
[ 4078.508673]  [<ffffffff8104d729>] ? worker_thread+0x109/0x3b0
[ 4078.515098]  [<ffffffff8104d620>] ? manage_workers.isra.26+0x270/0x270
[ 4078.522396]  [<ffffffff81052b0f>] ? kthread+0xaf/0xc0
[ 4078.528032]  [<ffffffff81052a60>] ? kthread_create_on_node+0x110/0x110
[ 4078.535325]  [<ffffffff814a9ac8>] ? ret_from_fork+0x58/0x90
[ 4078.541552]  [<ffffffff81052a60>] ? kthread_create_on_node+0x110/0x110

[ 4079.355824] user-space-program D ffff88107fd30640     0 32020      1 0x00000000
[ 4079.363716]  ffff880eae8eba10 0000000000000086 0000000000009000 ffff880eae8ebfd8
[ 4079.372003]  ffff880eae8ebfd8 ffff881016c162c0 ffffffffa06ecb26 ffff88101a5d6138
[ 4079.380294]  ffff880fbed4b4c8 ffff880eae8eba38 ffff880fbed4b528 ffff881016c162c0
[ 4079.388586] Call Trace:
[ 4079.398134]  [<ffffffffa06ed595>] ? btrfs_tree_lock+0x85/0x2f0 [btrfs]
[ 4079.405431]  [<ffffffff81053680>] ? wake_up_atomic_t+0x30/0x30
[ 4079.411955]  [<ffffffffa06876fb>] ? btrfs_lock_root_node+0x2b/0x40 [btrfs]
[ 4079.419644]  [<ffffffffa068ce83>] ? btrfs_search_slot+0xa03/0xb10 [btrfs]
[ 4079.427237]  [<ffffffffa06aba52>] ? btrfs_buffer_uptodate+0x52/0x70 [btrfs]
[ 4079.435041]  [<ffffffffa0689b60>] ? generic_bin_search.constprop.38+0x80/0x190 [btrfs]
[ 4079.443897]  [<ffffffffa068ea44>] ? btrfs_insert_empty_items+0x74/0xd0 [btrfs]
[ 4079.451975]  [<ffffffffa072c443>] ? copy_items+0x128/0x850 [btrfs]
[ 4079.458890]  [<ffffffffa072da10>] ? btrfs_log_inode+0x629/0xbf3 [btrfs]
[ 4079.466292]  [<ffffffffa06f34a1>] ? btrfs_log_inode_parent+0xc61/0xf30 [btrfs]
[ 4079.474373]  [<ffffffffa06f45a9>] ? btrfs_log_dentry_safe+0x59/0x80 [btrfs]
[ 4079.482161]  [<ffffffffa06c298d>] ? btrfs_sync_file+0x20d/0x330 [btrfs]
[ 4079.489558]  [<ffffffff8112777c>] ? do_fsync+0x4c/0x80
[ 4079.495300]  [<ffffffff81127a0a>] ? SyS_fdatasync+0xa/0x10
[ 4079.501422]  [<ffffffff814a9b72>] ? system_call_fastpath+0x16/0x1b

[ 4079.508334] user-space-program D ffff88107fc30640     0 32021      1 0x00000004
[ 4079.516226]  ffff880eae8efbf8 0000000000000086 0000000000009000 ffff880eae8effd8
[ 4079.524513]  ffff880eae8effd8 ffff881030279610 ffffffffa06ecb26 ffff88101a5d6138
[ 4079.532802]  ffff880ebb671d88 ffff880eae8efc20 ffff880ebb671de8 ffff881030279610
[ 4079.541092] Call Trace:
[ 4079.550642]  [<ffffffffa06ed595>] ? btrfs_tree_lock+0x85/0x2f0 [btrfs]
[ 4079.557941]  [<ffffffff81053680>] ? wake_up_atomic_t+0x30/0x30
[ 4079.564463]  [<ffffffffa068cc1f>] ? btrfs_search_slot+0x79f/0xb10 [btrfs]
[ 4079.572058]  [<ffffffffa06bb7d8>] ? btrfs_truncate_inode_items+0x168/0xb90 [btrfs]
[ 4079.580526]  [<ffffffffa06b04be>] ? join_transaction.isra.15+0x1e/0x3a0 [btrfs]
[ 4079.588701]  [<ffffffffa06b206d>] ? start_transaction+0x8d/0x470 [btrfs]
[ 4079.596196]  [<ffffffffa0690ac6>] ? block_rsv_add_bytes+0x16/0x50 [btrfs]
[ 4079.603789]  [<ffffffffa06bc2e9>] ? btrfs_truncate+0xe9/0x2e0 [btrfs]
[ 4079.610994]  [<ffffffffa06bd00b>] ? btrfs_setattr+0x30b/0x410 [btrfs]
[ 4079.618197]  [<ffffffff81117c1c>] ? notify_change+0x1dc/0x680
[ 4079.624625]  [<ffffffff8123c8a4>] ? aa_path_perm+0xd4/0x160
[ 4079.630854]  [<ffffffff810f4fcb>] ? do_truncate+0x5b/0x90
[ 4079.636889]  [<ffffffff810f59fa>] ? do_sys_ftruncate.constprop.15+0x10a/0x160
[ 4079.644869]  [<ffffffff8110d87b>] ? SyS_fcntl+0x5b/0x570
[ 4079.650805]  [<ffffffff814a9b72>] ? system_call_fastpath+0x16/0x1b

[ 4080.410607] user-space-program D ffff88107fc70640     0 32028  12639 0x00000004
[ 4080.418489]  ffff880eaeccbbe0 0000000000000086 0000000000009000 ffff880eaeccbfd8
[ 4080.426778]  ffff880eaeccbfd8 ffff880f317ef1e0 ffffffffa06ecb26 ffff88101a5d6138
[ 4080.435067]  ffff880ef7e93928 ffff880f317ef1e0 ffff880eaeccbc08 ffff880f317ef1e0
[ 4080.443353] Call Trace:
[ 4080.452920]  [<ffffffffa06ed15d>] ? btrfs_tree_read_lock+0xdd/0x190 [btrfs]
[ 4080.460703]  [<ffffffff81053680>] ? wake_up_atomic_t+0x30/0x30
[ 4080.467225]  [<ffffffffa06876bb>] ? btrfs_read_lock_root_node+0x2b/0x40 [btrfs]
[ 4080.475400]  [<ffffffffa068cc81>] ? btrfs_search_slot+0x801/0xb10 [btrfs]
[ 4080.482994]  [<ffffffffa06b2df0>] ? btrfs_clean_one_deleted_snapshot+0xe0/0xe0 [btrfs]
[ 4080.491857]  [<ffffffffa06a70a6>] ? btrfs_lookup_inode+0x26/0x90 [btrfs]
[ 4080.499353]  [<ffffffff810ec42f>] ? kmem_cache_alloc+0xaf/0xc0
[ 4080.505879]  [<ffffffffa06bd905>] ? btrfs_iget+0xd5/0x5d0 [btrfs]
[ 4080.512696]  [<ffffffffa06caf04>] ? btrfs_get_token_64+0x104/0x120 [btrfs]
[ 4080.520387]  [<ffffffffa06f341f>] ? btrfs_log_inode_parent+0xbdf/0xf30 [btrfs]
[ 4080.528469]  [<ffffffffa06f45a9>] ? btrfs_log_dentry_safe+0x59/0x80 [btrfs]
[ 4080.536258]  [<ffffffffa06c298d>] ? btrfs_sync_file+0x20d/0x330 [btrfs]
[ 4080.543657]  [<ffffffff8112777c>] ? do_fsync+0x4c/0x80
[ 4080.549399]  [<ffffffff81127a0a>] ? SyS_fdatasync+0xa/0x10
[ 4080.555534]  [<ffffffff814a9b72>] ? system_call_fastpath+0x16/0x1b

Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Fixes: 2f2ff0ee5e (Btrfs: fix metadata inconsistencies after directory fsync)
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
[Modified changelog for clarity and correctness]
2016-11-30 13:49:16 +00:00
Qu Wenruo 50b3e040b7 btrfs: qgroup: Rename functions to make it follow reserve,trace,account steps
Rename btrfs_qgroup_insert_dirty_extent(_nolock) to
btrfs_qgroup_trace_extent(_nolock), according to the new
reserve/trace/account naming schema.

Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-11-30 13:45:21 +01:00
Linus Torvalds f6167514c8 Merge branch 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "My patch fixes the btrfs list_head abuse that we tracked down during
  Dave Jones' memory corruption investigation. With both Jens and my
  patches in place, I'm no longer able to trigger problems.

  Filipe is fixing a difficult old bug between snapshots, balance and
  send. Dave is cooking a few more for the next rc, but these are tested
  and ready"

* 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix races on root_log_ctx lists
  btrfs: fix incremental send failure caused by balance
2016-10-28 10:07:35 -07:00
Chris Mason 570dd45042 btrfs: fix races on root_log_ctx lists
btrfs_remove_all_log_ctxs takes a shortcut where it avoids walking the
list because it knows all of the waiters are patiently waiting for the
commit to finish.

But, there's a small race where btrfs_sync_log can remove itself from
the list if it finds a log commit is already done.  Also, it uses
list_del_init() to remove itself from the list, but there's no way to
know if btrfs_remove_all_log_ctxs has already run, so we don't know for
sure if it is safe to call list_del_init().

This gets rid of all the shortcuts for btrfs_remove_all_log_ctxs(), and
just calls it with the proper locking.

This is part two of the corruption fixed by cbd60aa7cd.  I should have
done this in the first place, but convinced myself the optimizations were
safe.  A 12 hour run of dbench 2048 will eventually trigger a list debug
WARN_ON for the list_del_init() in btrfs_sync_log().

Fixes: d1433debe7
Reported-by: Dave Jones <davej@codemonkey.org.uk>
cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Chris Mason <clm@fb.com>
2016-10-27 10:42:20 -07:00
Linus Torvalds f29135b54b Merge branch 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This is a big variety of fixes and cleanups.

  Liu Bo continues to fixup fuzzer related problems, and some of Josef's
  cleanups are prep for his bigger extent buffer changes (slated for
  v4.10)"

* 'for-linus-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (39 commits)
  Revert "btrfs: let btrfs_delete_unused_bgs() to clean relocated bgs"
  Btrfs: remove unnecessary btrfs_mark_buffer_dirty in split_leaf
  Btrfs: don't BUG() during drop snapshot
  btrfs: fix btrfs_no_printk stub helper
  Btrfs: memset to avoid stale content in btree leaf
  btrfs: parent_start initialization cleanup
  btrfs: Remove already completed TODO comment
  btrfs: Do not reassign count in btrfs_run_delayed_refs
  btrfs: fix a possible umount deadlock
  Btrfs: fix memory leak in do_walk_down
  btrfs: btrfs_debug should consume fs_info when DEBUG is not defined
  btrfs: convert send's verbose_printk to btrfs_debug
  btrfs: convert pr_* to btrfs_* where possible
  btrfs: convert printk(KERN_* to use pr_* calls
  btrfs: unsplit printed strings
  btrfs: clean the old superblocks before freeing the device
  Btrfs: kill BUG_ON in run_delayed_tree_ref
  Btrfs: don't leak reloc root nodes on error
  btrfs: squash lines for simple wrapper functions
  Btrfs: improve check_node to avoid reading corrupted nodes
  ...
2016-10-11 11:23:06 -07:00
Jeff Mahoney 5d163e0e68 btrfs: unsplit printed strings
CodingStyle chapter 2:
"[...] never break user-visible strings such as printk messages,
because that breaks the ability to grep for them."

This patch unsplits user-visible strings.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 18:08:44 +02:00
Josef Bacik afcdd129e0 Btrfs: add a flags field to btrfs_fs_info
We have a lot of random ints in btrfs_fs_info that can be put into flags.  This
is mostly equivalent with the exception of how we deal with quota going on or
off, now instead we set a flag when we are turning it on or off and deal with
that appropriately, rather than just having a pending state that the current
quota_enabled gets set to.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-09-26 17:59:49 +02:00
Miklos Szeredi f031221001 btrfs: use filemap_check_errors()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Cc: Chris Mason <clm@fb.com>
2016-09-16 12:44:21 +02:00
Chris Mason cbd60aa7cd Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns
We use a btrfs_log_ctx structure to pass information into the
tree log commit, and get error values out.  It gets added to a per
log-transaction list which we walk when things go bad.

Commit d1433debe added an optimization to skip waiting for the log
commit, but didn't take root_log_ctx out of the list.  This
patch makes sure we remove things before exiting.

Signed-off-by: Chris Mason <clm@fb.com>
Fixes: d1433debe7
cc: stable@vger.kernel.org # 3.15+
2016-09-06 05:57:25 -07:00
Filipe Manana 28a235931b Btrfs: fix lockdep warning on deadlock against an inode's log mutex
Commit 44f714dae5 ("Btrfs: improve performance on fsync against new
inode after rename/unlink"), which landed in 4.8-rc2, introduced a
possibility for a deadlock due to double locking of an inode's log mutex
by the same task, which lockdep reports with:

[23045.433975] =============================================
[23045.434748] [ INFO: possible recursive locking detected ]
[23045.435426] 4.7.0-rc6-btrfs-next-34+ #1 Not tainted
[23045.436044] ---------------------------------------------
[23045.436044] xfs_io/3688 is trying to acquire lock:
[23045.436044]  (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]
               but task is already holding lock:
[23045.436044]  (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]
               other info that might help us debug this:
[23045.436044]  Possible unsafe locking scenario:

[23045.436044]        CPU0
[23045.436044]        ----
[23045.436044]   lock(&ei->log_mutex);
[23045.436044]   lock(&ei->log_mutex);
[23045.436044]
                *** DEADLOCK ***

[23045.436044]  May be due to missing lock nesting notation

[23045.436044] 3 locks held by xfs_io/3688:
[23045.436044]  #0:  (&sb->s_type->i_mutex_key#15){+.+...}, at: [<ffffffffa035f2ae>] btrfs_sync_file+0x14e/0x425 [btrfs]
[23045.436044]  #1:  (sb_internal#2){.+.+.+}, at: [<ffffffff8118446b>] __sb_start_write+0x5f/0xb0
[23045.436044]  #2:  (&ei->log_mutex){+.+...}, at: [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]
               stack backtrace:
[23045.436044] CPU: 4 PID: 3688 Comm: xfs_io Not tainted 4.7.0-rc6-btrfs-next-34+ #1
[23045.436044] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
[23045.436044]  0000000000000000 ffff88022f5f7860 ffffffff8127074d ffffffff82a54b70
[23045.436044]  ffffffff82a54b70 ffff88022f5f7920 ffffffff81092897 ffff880228015d68
[23045.436044]  0000000000000000 ffffffff82a54b70 ffffffff829c3f00 ffff880228015d68
[23045.436044] Call Trace:
[23045.436044]  [<ffffffff8127074d>] dump_stack+0x67/0x90
[23045.436044]  [<ffffffff81092897>] __lock_acquire+0xcbb/0xe4e
[23045.436044]  [<ffffffff8109155f>] ? mark_lock+0x24/0x201
[23045.436044]  [<ffffffff8109179a>] ? mark_held_locks+0x5e/0x74
[23045.436044]  [<ffffffff81092de0>] lock_acquire+0x12f/0x1c3
[23045.436044]  [<ffffffff81092de0>] ? lock_acquire+0x12f/0x1c3
[23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffff814a51a4>] mutex_lock_nested+0x77/0x3a7
[23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffffa039705e>] ? btrfs_release_delayed_node+0xb/0xd [btrfs]
[23045.436044]  [<ffffffffa038552d>] btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffffa038552d>] ? btrfs_log_inode+0x13a/0xc95 [btrfs]
[23045.436044]  [<ffffffff810a0ed1>] ? vprintk_emit+0x453/0x465
[23045.436044]  [<ffffffffa0385a61>] btrfs_log_inode+0x66e/0xc95 [btrfs]
[23045.436044]  [<ffffffffa03c084d>] log_new_dir_dentries+0x26c/0x359 [btrfs]
[23045.436044]  [<ffffffffa03865aa>] btrfs_log_inode_parent+0x4a6/0x628 [btrfs]
[23045.436044]  [<ffffffffa0387552>] btrfs_log_dentry_safe+0x5a/0x75 [btrfs]
[23045.436044]  [<ffffffffa035f464>] btrfs_sync_file+0x304/0x425 [btrfs]
[23045.436044]  [<ffffffff811acaf4>] vfs_fsync_range+0x8c/0x9e
[23045.436044]  [<ffffffff811acb22>] vfs_fsync+0x1c/0x1e
[23045.436044]  [<ffffffff811acc79>] do_fsync+0x31/0x4a
[23045.436044]  [<ffffffff811ace99>] SyS_fsync+0x10/0x14
[23045.436044]  [<ffffffff814a88e5>] entry_SYSCALL_64_fastpath+0x18/0xa8
[23045.436044]  [<ffffffff8108f039>] ? trace_hardirqs_off_caller+0x3f/0xaa

An example reproducer for this is:

   $ mkfs.btrfs -f /dev/sdb
   $ mount /dev/sdb /mnt
   $ mkdir /mnt/dir
   $ touch /mnt/dir/foo
   $ sync
   $ mv /mnt/dir/foo /mnt/dir/bar
   $ touch /mnt/dir/foo
   $ xfs_io -c "fsync" /mnt/dir/bar

This is because while logging the inode of file bar we end up logging its
parent directory (since its inode has an unlink_trans field matching the
current transaction id due to the rename operation), which in turn logs
the inodes for all its new dentries, so that the new inode for the new
file named foo gets logged which in turn triggered another logging attempt
for the inode we are fsync'ing, since that inode had an old name that
corresponds to the name of the new inode.

So fix this by ensuring that when logging the inode for a new dentry that
has a name matching an old name of some other inode, we don't log again
the original inode that we are fsync'ing.

Fixes: 44f714dae5 ("Btrfs: improve performance on fsync against new inode after rename/unlink")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:32 -07:00
Qu Wenruo df2c95f33e btrfs: qgroup: Fix qgroup incorrectness caused by log replay
When doing log replay at mount time(after power loss), qgroup will leak
numbers of replayed data extents.

The cause is almost the same of balance.
So fix it by manually informing qgroup for owner changed extents.

The bug can be detected by btrfs/119 test case.

Cc: Mark Fasheh <mfasheh@suse.de>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Reviewed-and-Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2016-08-25 03:58:23 -07:00
Chris Mason 1083881654 Merge branch 'integration-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.8 2016-08-05 12:25:05 -07:00
Filipe Manana 44f714dae5 Btrfs: improve performance on fsync against new inode after rename/unlink
With commit 56f23fdbb6 ("Btrfs: fix file/data loss caused by fsync after
rename and new inode") we got simple fix for a functional issue when the
following sequence of actions is done:

  at transaction N
  create file A at directory D
  at transaction N + M (where M >= 1)
  move/rename existing file A from directory D to directory E
  create a new file named A at directory D
  fsync the new file
  power fail

The solution was to simply detect such scenario and fallback to a full
transaction commit when we detect it. However this turned out to had a
significant impact on throughput (and a bit on latency too) for benchmarks
using the dbench tool, which simulates real workloads from smbd (Samba)
servers. For example on a test vm (with a debug kernel):

Unpatched:
Throughput 19.1572 MB/sec  32 clients  32 procs  max_latency=1005.229 ms

Patched:
Throughput 23.7015 MB/sec  32 clients  32 procs  max_latency=809.206 ms

The patched results (this patch is applied) are similar to the results of
a kernel with the commit 56f23fdbb6 ("Btrfs: fix file/data loss caused
by fsync after rename and new inode") reverted.

This change avoids the fallback to a transaction commit and instead makes
sure all the names of the conflicting inode (the one that had a name in a
past transaction that matches the name of the new file in the same parent
directory) are logged so that at log replay time we don't lose neither the
new file nor the old file, and the old file gets the name it was renamed
to.

This also ends up avoiding a full transaction commit for a similar case
that involves an unlink instead of a rename of the old file:

  at transaction N
  create file A at directory D
  at transaction N + M (where M >= 1)
  remove file A
  create a new file named A at directory D
  fsync the new file
  power fail

Signed-off-by: Filipe Manana <fdmanana@suse.com>
2016-08-01 07:32:14 +01:00
Jeff Mahoney 66642832f0 btrfs: btrfs_abort_transaction, drop root parameter
__btrfs_abort_transaction doesn't use its root parameter except to
obtain an fs_info pointer.  We can obtain that from trans->root->fs_info
for now and from trans->fs_info in a later patch.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:54:26 +02:00
Jeff Mahoney 3cdde2240d btrfs: btrfs_test_opt and friends should take a btrfs_fs_info
btrfs_test_opt and friends only use the root pointer to access
the fs_info.  Let's pass the fs_info directly in preparation to
eliminate similar patterns all over btrfs.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:53:16 +02:00
Liu Bo fb770ae414 Btrfs: fix read_node_slot to return errors
We use read_node_slot() to read btree node and it has two cases,
a) slot is out of range, which means 'no such entry'
b) we fail to read the block, due to checksum fails or corrupted
   content or not with uptodate flag.
But we're returning NULL in both cases, this makes it return -ENOENT
in case a) and return -EIO in case b), and this fixes its callers
as well as btrfs_search_forward() 's caller to catch the new errors.

The problem is reported by Peter Becker, and I can manage to
hit the same BUG_ON by mounting my fuzz image.

Reported-by: Peter Becker <floyd.net@gmail.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-07-26 13:52:25 +02:00
Linus Torvalds 4c6459f945 Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "The most user visible change here is a fix for our recent superblock
  validation checks that were causing problems on non-4k pagesized
  systems"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: btrfs_check_super_valid: Allow 4096 as stripesize
  btrfs: remove build fixup for qgroup_account_snapshot
  btrfs: use new error message helper in qgroup_account_snapshot
  btrfs: avoid blocking open_ctree from cleaner_kthread
  Btrfs: don't BUG_ON() in btrfs_orphan_add
  btrfs: account for non-CoW'd blocks in btrfs_abort_transaction
  Btrfs: check if extent buffer is aligned to sectorsize
  btrfs: Use correct format specifier
2016-06-18 05:57:59 -10:00
Liu Bo c871b0f2fd Btrfs: check if extent buffer is aligned to sectorsize
Thanks to fuzz testing, we can pass an invalid bytenr to extent buffer
via alloc_extent_buffer().  An unaligned eb can have more pages than it
should have, which ends up extent buffer's leak or some corrupted content
in extent buffer.

This adds a warning to let us quickly know what was happening.

Now that alloc_extent_buffer() no more returns NULL, this changes its
caller and callers of its caller to match with the new error
handling.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-06-17 18:32:40 +02:00
Linus Torvalds 559b6d90a0 Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs cleanups and fixes from Chris Mason:
 "We have another round of fixes and a few cleanups.

  I have a fix for short returns from btrfs_copy_from_user, which
  finally nails down a very hard to find regression we added in v4.6.

  Dave is pushing around gfp parameters, mostly to cleanup internal apis
  and make it a little more consistent.

  The rest are smaller fixes, and one speelling fixup patch"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (22 commits)
  Btrfs: fix handling of faults from btrfs_copy_from_user
  btrfs: fix string and comment grammatical issues and typos
  btrfs: scrub: Set bbio to NULL before calling btrfs_map_block
  Btrfs: fix unexpected return value of fiemap
  Btrfs: free sys_array eb as soon as possible
  btrfs: sink gfp parameter to convert_extent_bit
  btrfs: make state preallocation more speculative in __set_extent_bit
  btrfs: untangle gotos a bit in convert_extent_bit
  btrfs: untangle gotos a bit in __clear_extent_bit
  btrfs: untangle gotos a bit in __set_extent_bit
  btrfs: sink gfp parameter to set_record_extent_bits
  btrfs: sink gfp parameter to set_extent_new
  btrfs: sink gfp parameter to set_extent_defrag
  btrfs: sink gfp parameter to set_extent_delalloc
  btrfs: sink gfp parameter to clear_extent_dirty
  btrfs: sink gfp parameter to clear_record_extent_bits
  btrfs: sink gfp parameter to clear_extent_bits
  btrfs: sink gfp parameter to set_extent_bits
  btrfs: make find_workspace warn if there are no workspaces
  btrfs: make find_workspace always succeed
  ...
2016-05-27 16:37:36 -07:00
David Sterba 42f31734eb Merge branch 'cleanups-4.7' into for-chris-4.7-20160525 2016-05-25 22:51:03 +02:00
Nicholas D Steeves 0132761017 btrfs: fix string and comment grammatical issues and typos
Signed-off-by: Nicholas D Steeves <nsteeves@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2016-05-25 22:35:14 +02:00
Linus Torvalds 07be1337b9 Merge branch 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "This has our merge window series of cleanups and fixes.  These target
  a wide range of issues, but do include some important fixes for
  qgroups, O_DIRECT, and fsync handling.  Jeff Mahoney moved around a
  few definitions to make them easier for userland to consume.

  Also whiteout support is included now that issues with overlayfs have
  been cleared up.

  I have one more fix pending for page faults during btrfs_copy_from_user,
  but I wanted to get this bulk out the door first"

* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (90 commits)
  btrfs: fix memory leak during RAID 5/6 device replacement
  Btrfs: add semaphore to synchronize direct IO writes with fsync
  Btrfs: fix race between block group relocation and nocow writes
  Btrfs: fix race between fsync and direct IO writes for prealloc extents
  Btrfs: fix number of transaction units for renames with whiteout
  Btrfs: pin logs earlier when doing a rename exchange operation
  Btrfs: unpin logs if rename exchange operation fails
  Btrfs: fix inode leak on failure to setup whiteout inode in rename
  btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT
  Btrfs: pin log earlier when renaming
  Btrfs: unpin log if rename operation fails
  Btrfs: don't do unnecessary delalloc flushes when relocating
  Btrfs: don't wait for unrelated IO to finish before relocation
  Btrfs: fix empty symlink after creating symlink and fsync parent dir
  Btrfs: fix for incorrect directory entries after fsync log replay
  btrfs: build fixup for qgroup_account_snapshot
  btrfs: qgroup: Fix qgroup accounting when creating snapshot
  Btrfs: fix fspath error deallocation
  btrfs: make find_workspace warn if there are no workspaces
  btrfs: make find_workspace always succeed
  ...
2016-05-21 10:49:22 -07:00
Chris Mason c315ef8d9d Merge branch 'for-chris-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.7
Signed-off-by: Chris Mason <clm@fb.com>
2016-05-17 14:43:19 -07:00
Filipe Manana 5f9a8a51d8 Btrfs: add semaphore to synchronize direct IO writes with fsync
Due to the optimization of lockless direct IO writes (the inode's i_mutex
is not held) introduced in commit 38851cc19a ("Btrfs: implement unlocked
dio write"), we started having races between such writes with concurrent
fsync operations that use the fast fsync path. These races were addressed
in the patches titled "Btrfs: fix race between fsync and lockless direct
IO writes" and "Btrfs: fix race between fsync and direct IO writes for
prealloc extents". The races happened because the direct IO path, like
every other write path, does create extent maps followed by the
corresponding ordered extents while the fast fsync path collected first
ordered extents and then it collected extent maps. This made it possible
to log file extent items (based on the collected extent maps) without
waiting for the corresponding ordered extents to complete (get their IO
done). The two fixes mentioned before added a solution that consists of
making the direct IO path create first the ordered extents and then the
extent maps, while the fsync path attempts to collect any new ordered
extents once it collects the extent maps. This was simple and did not
require adding any synchonization primitive to any data structure (struct
btrfs_inode for example) but it makes things more fragile for future
development endeavours and adds an exceptional approach compared to the
other write paths.

This change adds a read-write semaphore to the btrfs inode structure and
makes the direct IO path create the extent maps and the ordered extents
while holding read access on that semaphore, while the fast fsync path
collects extent maps and ordered extents while holding write access on
that semaphore. The logic for direct IO write path is encapsulated in a
new helper function that is used both for cow and nocow direct IO writes.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
2016-05-13 01:59:36 +01:00
Filipe Manana 3f9749f6e9 Btrfs: fix empty symlink after creating symlink and fsync parent dir
If we create a symlink, fsync its parent directory, crash/power fail and
mount the filesystem, we end up with an empty symlink, which not only is
useless it's also not allowed in linux (the man page symlink(2) is well
explicit about that).  So we just need to make sure to fully log an inode
if it's a symlink, to ensure its inline extent gets logged, ensuring the
same behaviour as ext3, ext4, xfs, reiserfs, f2fs, nilfs2, etc.

Example reproducer:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt
  $ mkdir /mnt/testdir
  $ sync
  $ ln -s /mnt/foo /mnt/testdir/bar
  $ xfs_io -c fsync /mnt/testdir
  <power fail>
  $ mount /dev/sdb /mnt
  $ readlink /mnt/testdir/bar
  <empty string>

A test case for fstests follows soon.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
2016-05-13 01:59:12 +01:00
Filipe Manana 657ed1aa48 Btrfs: fix for incorrect directory entries after fsync log replay
If we move a directory to a new parent and later log that parent and don't
explicitly log the old parent, when we replay the log we can end up with
entries for the moved directory in both the old and new parent directories.
Besides being ilegal to have directories with multiple hard links in linux,
it also resulted in the leaving the inode item with a link count of 1.
A similar issue also happens if we move a regular file - after the log tree
is replayed the file has a link in both the old and new parent directories,
when it should be only at the new directory.

Sample reproducer:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt
  $ mkdir /mnt/x
  $ mkdir /mnt/y
  $ touch /mnt/x/foo
  $ mkdir /mnt/y/z
  $ sync
  $ ln /mnt/x/foo /mnt/x/bar
  $ mv /mnt/y/z /mnt/x/z
  < power fail >
  $ mount /dev/sdc /mnt
  $ ls -1Ri /mnt
  /mnt:
  257 x
  258 y

  /mnt/x:
  259 bar
  259 foo
  260 z

  /mnt/x/z:

  /mnt/y:
  260 z

  /mnt/y/z:

  $ umount /dev/sdc
  $ btrfs check /dev/sdc
  Checking filesystem on /dev/sdc
  UUID: a67e2c4a-a4b4-4fdc-b015-9d9af1e344be
  checking extents
  checking free space cache
  checking fs roots
  root 5 inode 260 errors 2000, link count wrong
        unresolved ref dir 257 index 4 namelen 1 name z filetype 2 errors 0
        unresolved ref dir 258 index 2 namelen 1 name z filetype 2 errors 0
  (...)

Attempting to remove the directory becomes impossible:

  $ mount /dev/sdc /mnt
  $ rmdir /mnt/y/z
  $ ls -lh /mnt/y
  ls: cannot access /mnt/y/z: No such file or directory
  total 0
  d????????? ? ? ? ?            ? z
  $ rmdir /mnt/x/z
  rmdir: failed to remove ‘/mnt/x/z’: Stale file handle
  $ ls -lh /mnt/x
  ls: cannot access /mnt/x/z: Stale file handle
  total 0
  -rw-r--r-- 2 root root 0 Apr  6 18:06 bar
  -rw-r--r-- 2 root root 0 Apr  6 18:06 foo
  d????????? ? ?    ?    ?            ? z

So make sure that on rename we set the last_unlink_trans value for our
inode, even if it's a directory, to the value of the current transaction's
ID and that if the new parent directory is logged that we fallback to a
transaction commit.

A test case for fstests is being submitted as well.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
2016-05-13 01:59:11 +01:00
Al Viro 84695ffee7 Merge getxattr prototype change into work.lookups
The rest of work.xattr stuff isn't needed for this branch
2016-05-02 19:45:47 -04:00
David Sterba 91166212e0 btrfs: sink gfp parameter to clear_extent_bits
Callers pass GFP_NOFS and GFP_KERNEL. No need to pass the flags around.

Signed-off-by: David Sterba <dsterba@suse.com>
2016-04-29 11:01:47 +02:00