WSL2-Linux-Kernel/fs
Filipe Manana 259c4b96d7 btrfs: stop doing unnecessary log updates during a rename
During a rename, we call __btrfs_unlink_inode(), which will call
btrfs_del_inode_ref_in_log() and btrfs_del_dir_entries_in_log(), in order
to remove an inode reference and a directory entry from the log. These
are necessary when __btrfs_unlink_inode() is called from the unlink path,
but not necessary when it's called from a rename context, because:

1) For the btrfs_del_inode_ref_in_log() call, it's pointless to delete the
   inode reference related to the old name, because later in the rename
   path we call btrfs_log_new_name(), which will drop all inode references
   from the log and copy all inode references from the subvolume tree to
   the log tree. So we are doing one unnecessary btree operation which
   adds additional latency and lock contention in case there are other
   tasks accessing the log tree;

2) For the btrfs_del_dir_entries_in_log() call, we are now doing the
   equivalent at btrfs_log_new_name() since the previous patch in the
   series, that has the subject "btrfs: avoid logging all directory
   changes during renames". In fact, having __btrfs_unlink_inode() call
   this function not only adds additional latency and lock contention due
   to the extra btree operation, but also can make btrfs_log_new_name()
   unnecessarily log a range item to track the deletion of the old name,
   since it has no way to known that the directory entry related to the
   old name was previously logged and already deleted by
   __btrfs_unlink_inode() through its call to
   btrfs_del_dir_entries_in_log().

So skip those calls at __btrfs_unlink_inode() when we are doing a rename.
Skipping them also allows us now to reduce the duration of time we are
pinning a log transaction during renames, which is always beneficial as
it's not delaying so much other tasks trying to sync the log tree, in
particular we end up not holding the log transaction pinned while adding
the new name (adding inode ref, directory entry, etc).

This change is part of a patchset comprised of the following patches:

  1/5 btrfs: add helper to delete a dir entry from a log tree
  2/5 btrfs: pass the dentry to btrfs_log_new_name() instead of the inode
  3/5 btrfs: avoid logging all directory changes during renames
  4/5 btrfs: stop doing unnecessary log updates during a rename
  5/5 btrfs: avoid inode logging during rename and link when possible

Just like the previous patch in the series, "btrfs: avoid logging all
directory changes during renames", the following script mimics part of
what a package installation/upgrade with zypper does, which is basically
renaming a lot of files, in some directory under /usr, to a name with a
suffix of "-RPMDELETE":

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/nvme0n1
  MNT=/mnt/nvme0n1

  NUM_FILES=10000

  mkfs.btrfs -f $DEV
  mount $DEV $MNT

  mkdir $MNT/testdir

  for ((i = 1; i <= $NUM_FILES; i++)); do
      echo -n > $MNT/testdir/file_$i
  done

  sync

  # Do some change to testdir and fsync it.
  echo -n > $MNT/testdir/file_$((NUM_FILES + 1))
  xfs_io -c "fsync" $MNT/testdir

  echo "Renaming $NUM_FILES files..."
  start=$(date +%s%N)
  for ((i = 1; i <= $NUM_FILES; i++)); do
      mv $MNT/testdir/file_$i $MNT/testdir/file_$i-RPMDELETE
  done
  end=$(date +%s%N)

  dur=$(( (end - start) / 1000000 ))
  echo "Renames took $dur milliseconds"

  umount $MNT

Testing this change on box a using a non-debug kernel (Debian's default
kernel config) gave the following results:

NUM_FILES=10000, before patchset:                   27399 ms
NUM_FILES=10000, after patches 1/5 to 3/5 applied:   9093 ms (-66.8%)
NUM_FILES=10000, after patches 1/5 to 4/5 applied:   9016 ms (-67.1%)

NUM_FILES=5000, before patchset:                     9241 ms
NUM_FILES=5000, after patches 1/5 to 3/5 applied:    4642 ms (-49.8%)
NUM_FILES=5000, after patches 1/5 to 4/5 applied:    4553 ms (-50.7%)

NUM_FILES=2000, before patchset:                     2550 ms
NUM_FILES=2000, after patches 1/5 to 3/5 applied:    1788 ms (-29.9%)
NUM_FILES=2000, after patches 1/5 to 4/5 applied:    1767 ms (-30.7%)

NUM_FILES=1000, before patchset:                     1088 ms
NUM_FILES=1000, after patches 1/5 to 3/5 applied:     905 ms (-16.9%)
NUM_FILES=1000, after patches 1/5 to 4/5 applied:     883 ms (-18.8%)

The next patch in the series (5/5), also contains dbench results after
applying to whole patchset.

Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1193549
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-03-14 13:13:47 +01:00
..
9p Revert "fs/9p: search open fids first" 2022-01-30 22:13:37 +09:00
adfs
affs
afs afs: Fix potential thrashing in afs writeback 2022-03-11 10:24:37 -08:00
autofs
befs
bfs
btrfs btrfs: stop doing unnecessary log updates during a rename 2022-03-14 13:13:47 +01:00
cachefiles cachefiles: Fix volume coherency attribute 2022-03-11 10:24:37 -08:00
ceph
cifs cifs: fix confusing unneeded warning message on smb2.1 and earlier 2022-02-16 17:16:49 -06:00
coda
configfs configfs: fix a race in configfs_{,un}register_subsystem() 2022-02-22 18:30:28 +01:00
cramfs
crypto
debugfs
devpts
dlm
ecryptfs
efivarfs
efs
erofs erofs: fix ztailpacking on > 4GiB filesystems 2022-03-02 21:58:45 +08:00
exfat
exportfs
ext2
ext4 Various bug fixes for ext4 fast commit and inline data handling. Also 2022-02-06 10:34:45 -08:00
f2fs Fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the 2022-02-01 11:13:24 -08:00
fat
freevxfs
fscache
fuse fuse: fix pipe buffer lifetime for direct_io 2022-03-07 16:30:44 +01:00
gfs2 gfs2 fixes: 2022-02-11 11:36:32 -08:00
hfs
hfsplus
hostfs
hpfs
hugetlbfs
iomap
isofs
jbd2 Various bug fixes for ext4 fast commit and inline data handling. Also 2022-02-06 10:34:45 -08:00
jffs2
jfs
kernfs
ksmbd ksmbd: add support for key exchange 2022-02-04 00:12:22 -06:00
lockd Notable bug fixes: 2022-02-02 10:14:31 -08:00
minix
netfs
nfs NFS: Do not report writeback errors in nfs_getattr() 2022-02-16 15:15:22 -05:00
nfs_common
nfsd Notable bug fixes: 2022-02-09 09:56:57 -08:00
nilfs2
nls
notify fanotify: Fix stale file descriptor in copy_event_to_user() 2022-02-01 12:52:07 +01:00
ntfs
ntfs3
ocfs2 ocfs2: fix a deadlock when commit trans 2022-01-30 09:56:58 +02:00
omfs
openpromfs
orangefs
overlayfs overlayfs fixes for 5.17-rc3 2022-02-01 11:23:02 -08:00
proc proc: fix documentation and description of pagemap 2022-03-05 11:08:33 -08:00
pstore
qnx4
qnx6
quota quota: make dquot_quota_sync return errors from ->sync_fs 2022-01-30 08:59:47 -08:00
ramfs
reiserfs
romfs
smbfs_common
squashfs
sysfs
sysv
tracefs tracefs: Set the group ownership in apply_options() not parse_options() 2022-02-25 21:05:04 -05:00
ubifs
udf
ufs
unicode Fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the 2022-02-01 11:13:24 -08:00
vboxsf
verity
xfs Bug fixes for 5.17-rc4: 2022-02-26 09:53:19 -08:00
zonefs
Kconfig ksmbd: add support for key exchange 2022-02-04 00:12:22 -06:00
Kconfig.binfmt
Makefile Fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into the 2022-02-01 11:13:24 -08:00
aio.c
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf.c binfmt_elf fix for v5.17-rc7 2022-03-01 11:31:37 -08:00
binfmt_elf_fdpic.c
binfmt_flat.c
binfmt_misc.c Fix regression due to "fs: move binfmt_misc sysctl to its own file" 2022-02-09 09:50:02 -08:00
binfmt_script.c
buffer.c
char_dev.c
compat_binfmt_elf.c
coredump.c
d_path.c
dax.c
dcache.c
direct-io.c
drop_caches.c
eventfd.c
eventpoll.c
exec.c
fcntl.c
fhandle.c
file.c
file_table.c fs/file_table: fix adding missing kmemleak_not_leak() 2022-02-17 10:23:19 -08:00
filesystems.c
fs-writeback.c
fs_context.c
fs_parser.c
fs_pin.c
fs_struct.c
fs_types.c
fsopen.c
init.c
inode.c
internal.h
io-wq.c
io-wq.h
io_uring.c io_uring: disallow modification of rsrc_data during quiesce 2022-02-22 09:57:32 -07:00
ioctl.c
kernel_read_file.c
libfs.c
locks.c
mbcache.c
mount.h
mpage.c
namei.c
namespace.c fs: add kernel doc for mnt_{hold,unhold}_writers() 2022-02-14 08:35:32 +01:00
no-block.c
nsfs.c
open.c
pipe.c watch_queue: Fix lack of barrier/sync/lock between post and read 2022-03-11 10:17:13 -08:00
pnode.c
pnode.h
posix_acl.c
proc_namespace.c
read_write.c
readdir.c
remap_range.c
select.c
seq_file.c
signalfd.c
splice.c
stack.c
stat.c
statfs.c
super.c vfs: make freeze_super abort when sync_filesystem returns error 2022-01-30 08:59:47 -08:00
sync.c vfs: make sync_filesystem return errors from ->sync_fs 2022-01-30 08:59:47 -08:00
sysctls.c
timerfd.c
userfaultfd.c mm: refactor vm_area_struct::anon_vma_name usage code 2022-03-05 11:08:32 -08:00
utimes.c
xattr.c