WSL2-Linux-Kernel/Documentation
Paolo Bonzini 52ac8b358b KVM: Block memslot updates across range_start() and range_end()
We would like to avoid taking mmu_lock for .invalidate_range_{start,end}()
notifications that are unrelated to KVM.  Because mmu_notifier_count
must be modified while holding mmu_lock for write, and must always
be paired across start->end to stay balanced, lock elision must
happen in both or none.  Therefore, in preparation for this change,
this patch prevents memslot updates across range_start() and range_end().

Note, technically flag-only memslot updates could be allowed in parallel,
but stalling a memslot update for a relatively short amount of time is
not a scalability issue, and this is all more than complex enough.

A long note on the locking: a previous version of the patch used an rwsem
to block the memslot update while the MMU notifier run, but this resulted
in the following deadlock involving the pseudo-lock tagged as
"mmu_notifier_invalidate_range_start".

   ======================================================
   WARNING: possible circular locking dependency detected
   5.12.0-rc3+ #6 Tainted: G           OE
   ------------------------------------------------------
   qemu-system-x86/3069 is trying to acquire lock:
   ffffffff9c775ca0 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at: __mmu_notifier_invalidate_range_end+0x5/0x190

   but task is already holding lock:
   ffffaff7410a9160 (&kvm->mmu_notifier_slots_lock){.+.+}-{3:3}, at: kvm_mmu_notifier_invalidate_range_start+0x36d/0x4f0 [kvm]

   which lock already depends on the new lock.

This corresponds to the following MMU notifier logic:

    invalidate_range_start
      take pseudo lock
      down_read()           (*)
      release pseudo lock
    invalidate_range_end
      take pseudo lock      (**)
      up_read()
      release pseudo lock

At point (*) we take the mmu_notifiers_slots_lock inside the pseudo lock;
at point (**) we take the pseudo lock inside the mmu_notifiers_slots_lock.

This could cause a deadlock (ignoring for a second that the pseudo lock
is not a lock):

- invalidate_range_start waits on down_read(), because the rwsem is
held by install_new_memslots

- install_new_memslots waits on down_write(), because the rwsem is
held till (another) invalidate_range_end finishes

- invalidate_range_end sits waits on the pseudo lock, held by
invalidate_range_start.

Removing the fairness of the rwsem breaks the cycle (in lockdep terms,
it would change the *shared* rwsem readers into *shared recursive*
readers), so open-code the wait using a readers count and a
spinlock.  This also allows handling blockable and non-blockable
critical section in the same way.

Losing the rwsem fairness does theoretically allow MMU notifiers to
block install_new_memslots forever.  Note that mm/mmu_notifier.c's own
retry scheme in mmu_interval_read_begin also uses wait/wake_up
and is likewise not fair.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-08-03 03:44:03 -04:00
..
ABI Networking fixes for 5.14-rc2, including fixes from bpf and netfilter. 2021-07-14 09:24:32 -07:00
PCI pci-v5.14-changes 2021-07-08 12:06:20 -07:00
RCU Merge branch 'core-rcu-2021.07.04' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu 2021-07-04 12:58:33 -07:00
accounting delayacct: Add sysctl to enable at runtime 2021-05-12 11:43:25 +02:00
admin-guide USB / Thunderbolt patches for 5.14-rc1 2021-07-05 14:16:22 -07:00
arm docs: Fix typo in Documentation/arm/marvell.rst 2021-06-04 11:28:36 -06:00
arm64 arm64: Document requirement for access to FEAT_HCX 2021-05-25 19:05:28 +01:00
block for-5.14/block-2021-06-29 2021-06-30 12:12:56 -07:00
bpf Networking changes for 5.14. 2021-06-30 15:51:09 -07:00
cdrom docs: cdrom-standard.rst: get rid of uneeded UTF-8 chars 2021-05-11 11:00:17 -06:00
core-api module: add printk formats to add module build ID to stacktraces 2021-07-08 11:48:22 -07:00
cpu-freq cpufreq: Remove ->resolve_freq() 2021-06-30 19:45:42 +02:00
crypto
dev-tools Documentation: kunit: drop obsolete note about uml_abort for coverage 2021-07-12 13:54:12 -06:00
devicetree ARM: SoC fixes for v5.14 2021-07-17 15:58:24 -07:00
doc-guide docs: doc-guide: avoid using ReST :doc:`foo` markup 2021-06-17 13:24:37 -06:00
driver-api Documentation: Fix intiramfs script name 2021-07-18 23:48:14 +09:00
fault-injection docs: fault-injection: fix non-working usage of negative values 2021-06-14 15:58:22 -06:00
fb
features Documentation/features: Add THREAD_INFO_IN_TASK feature matrix 2021-07-15 06:33:44 -06:00
filesystems Documentation: Fix intiramfs script name 2021-07-18 23:48:14 +09:00
firmware-guide pwm: Changes for v5.14-rc1 2021-07-08 12:18:04 -07:00
firmware_class
fpga Documentation: fpga: dfl: change FPGA indirect article to an 2021-06-09 14:51:25 +02:00
gpu drm/amd/display: Add Freesync video documentation 2021-06-18 17:06:43 -04:00
hid
hwmon hwmon: (pmbus) Add driver for Delta DPS-920AB PSU 2021-06-17 04:21:46 -07:00
i2c Merge branch 'i2c/for-mergewindow' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux 2021-07-04 11:47:18 -07:00
ia64
ide
iio
infiniband
input docs: networking: Replace strncpy() with strscpy() 2021-06-04 11:21:43 -06:00
isdn
kbuild
kernel-hacking docs: kernel-hacking: hacking.rst: avoid using ReST :doc:`foo` markup 2021-06-17 13:24:38 -06:00
leds
litmus-tests
livepatch
locking locking/lockdep,doc: Improve readability of the block matrix 2021-05-31 10:14:54 +02:00
m68k
maintainer
mhi
mips
misc-devices
netlabel
networking Networking fixes for 5.14-rc2, including fixes from bpf and netfilter. 2021-07-14 09:24:32 -07:00
nios2
nvdimm
openrisc
parisc
pcmcia
power PM: runtime: Clarify documentation when callbacks are unassigned 2021-06-11 19:04:07 +02:00
powerpc powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls 2021-05-21 00:58:03 +10:00
process docs: process: submitting-patches.rst: avoid using ReST :doc:`foo` markup 2021-06-17 13:24:38 -06:00
riscv riscv: Ensure BPF_JIT_REGION_START aligned with PMD size 2021-06-18 21:10:05 -07:00
s390 vfio/mdev: Remove CONFIG_VFIO_MDEV_DEVICE 2021-06-21 15:29:25 -06:00
scheduler This was a reasonably active cycle for documentation; this pull includes: 2021-06-28 16:53:05 -07:00
scsi scsi: core: Kill message byte 2021-05-31 22:48:24 -04:00
security This was a reasonably active cycle for documentation; this pull includes: 2021-06-28 16:53:05 -07:00
sh
sound ASoC: Updates for v5.14 2021-07-01 08:36:12 +02:00
sparc
sphinx
sphinx-static
spi spi: pxa2xx: Update documentation to point out that it's outdated 2021-05-18 14:05:36 +01:00
staging
target
timers Documentation: drop optional BOMs 2021-05-10 15:17:34 -06:00
trace Tracing updates for 5.14: 2021-07-03 11:13:22 -07:00
translations docs/zh_CN: add a missing space character 2021-07-15 06:33:44 -06:00
usb USB / Thunderbolt patches for 5.14-rc1 2021-07-05 14:16:22 -07:00
userspace-api Char / Misc driver updates for 5.14-rc1 2021-07-05 13:42:16 -07:00
virt KVM: Block memslot updates across range_start() and range_end() 2021-08-03 03:44:03 -04:00
vm Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
w1 w1: fix build warning in w1_ds2438.rst 2021-05-26 09:11:24 +02:00
watchdog
x86 Fixes and improvements for FPU handling on x86: 2021-07-07 11:12:01 -07:00
xtensa
.gitignore
COPYING-logo
Changes
CodingStyle
Kconfig
Makefile docs: Makefile: Use CONFIG_SHELL not SHELL 2021-06-18 11:26:08 -06:00
SubmittingPatches
arch.rst
asm-annotations.rst
atomic_bitops.txt
atomic_t.txt
conf.py docs: Take a little noise out of the build process 2021-06-17 13:49:18 -06:00
docutils.conf
dontdiff
index.rst
logo.gif
memory-barriers.txt
watch_queue.rst