Lightly tested against qemu.
One thing *not* implemented here is separate mappings
for descriptor/avail/used rings. That's nice to have,
will be done later after we have core support.
This also exposes the PCI layout to userspace, and
adds macros for PCI layout offsets:
QEMU wants it, so why not? Trust, but verify.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Most of initialization is device-independent.
Let's move it to common.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Device VQs were getting freed twice: once in every device's removal
functions, and then again in virtio_pci_legacy_remove(). The ones in
devices are called first, so drop the useless second call.
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Some devices might not implement config space access
(e.g. remoteproc used not to - before 3.9).
virtio/balloon needs config space access so make it
fail gracefully if not there.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The reason we defer kfree until release function is because it's a
general rule for kobjects: kfree of the reference counter itself is only
legal in the release function.
Previous patch didn't make this clear, document this in code.
Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
A struct device which has just been unregistered can live on past the
point at which a driver decides to drop it's initial reference to the
kobject gained on allocation.
This implies that when releasing a virtio device, we can't free a struct
virtio_device until the underlying struct device has been released,
which might not happen immediately on device_unregister().
Unfortunately, this is exactly what virtio pci does:
it has an empty release callback, and frees memory immediately
after unregistering the device.
This causes an easy to reproduce crash if CONFIG_DEBUG_KOBJECT_RELEASE
it enabled.
To fix, free the memory only once we know the device is gone in the release
callback.
Cc: stable@vger.kernel.org
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
It turns out we need to add device-specific code
in release callback. Move it to virtio_pci_legacy.c.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Most importantly, this fixes using virtio_pci as a module.
Further, the big virtio 1.0 conversion missed a couple of places. This fixes
them up.
This isn't 100% sparse-clean yet because on many architectures get_user
triggers sparse warnings when used with __bitwise tag (when same tag is on both
pointer and value read).
I posted a patchset to fix it up by adding __force on all
arches that don't already have it (many do), when that's
merged these warnings will go away.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJUkSztAAoJECgfDbjSjVRp5DkH/ibw+0ZaEFP/SXWnw6WONpaG
pzMsrfMG/vxlOfutSUdDqG+oqqU2fSLvFq5qDK6Xk9/emRSwGduz29ZaxGh8J1MZ
/Ojqtu/HSLl+UASTs+fMw49itghoIjmAPBwwMkQjvanfqLclgdz9UxzoCOc4YkO0
PJQA/Vw6blVSP1m0p97PvzZkAiIetI2ixZn2vPJZc8vSkOHtygM9HdXKTv785HbG
ycRbR9B3OBMvq26FIuWeyuY93FnyX2Qtf2bzwSSRdzo7qlsNhVVG7sKyWOOR+1xG
TLmjhyTF57oA2GgZCVfgnFxsfiuIKMumfG0jbABTXmBGgA/ULef7HcF/lzLgdq8=
=32cU
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael S Tsirkin:
"virtio 1.0 related fixes
Most importantly, this fixes using virtio_pci as a module.
Further, the big virtio 1.0 conversion missed a couple of places.
This fixes them up.
This isn't 100% sparse-clean yet because on many architectures
get_user triggers sparse warnings when used with __bitwise tag (when
same tag is on both pointer and value read).
I posted a patchset to fix it up by adding __force on all arches that
don't already have it (many do), when that's merged these warnings
will go away"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_pci: restore module attributes
mic/host: fix up virtio 1.0 APIs
vringh: update for virtio 1.0 APIs
vringh: 64 bit features
tools/virtio: add virtio 1.0 in vringh_test
tools/virtio: add virtio 1.0 in virtio_test
tools/virtio: enable -Werror
tools/virtio: 64 bit features
tools/virtio: fix vringh test
tools/virtio: more stubs
virtio: core support for config generation
virtio_pci: add VIRTIO_PCI_NO_LEGACY
virtio_pci: move probe to common file
virtio_pci_common.h: drop VIRTIO_PCI_NO_LEGACY
virtio_config: fix virtio_cread_bytes
virtio: set VIRTIO_CONFIG_S_FEATURES_OK on restore
bug which doesn't merit cc: stable.
All the exciting stuff went via MST this cycle.
Thanks,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUkPhtAAoJENkgDmzRrbjxnEoP/iXZ2d0pwbSQSSw12wXec7BO
aT698KzTHg2naRqWEhRbE5ujGW9ZR1hbkH3r70R2BlM5wZTwCAEkJB4SRfIu6u0R
fv9GUgZTdSIlN2+yBW1eYHa2QlB8V/jx3P2Vyw3/ejGQ2vC6B7JbNsdCZJoyuH5A
jZojZUrltfEdPVO4JkZ9sSzo64fzyoHGj8YYilH8ygcyZOshAvKvP/gOsFyRzyaj
OPCU1CImBj5i1w2dFSErhKPJp/WQl7La1dKwTVE4lEGKadKDmqnStV0JKuvcRPv6
XB/9vwdRXtFIxRUKDH8Wj7DrkKRCsBh6DJUl+MC+c2TVLNu6v0D/WGOlmio2rvoZ
6CGwNBQg4Ex/J8CkcYsgISn/jzDS+0oaZK0P+IYNJtv5e7D6jPJcDoZ1yPXh1uPn
9ZYZlKa08o5/991f3HM2kJKWDP/OdzgHQH6w2exY01Zsz/6TATLvU7xoPQuHelWe
mRd1kbW4ONIUAzIGPhd5PvtnmJ2L82+tN3VmYu50XAl2oSF8fzvQ/VHl9hgyu/vI
WREA7/40e2DYkhRbu7focdEAFYo0HnuRq6Kwp5zYjbfgyBsiaY7DVvAUlnhwhMES
kSFKejGUP00gofrXdffTYNu5LdKMN8eaVT7pMyxNXlQHSzXe3Vc6oqnpXzP17ii7
ahqmeDZKNUF3GSOHc4IC
=COY0
-----END PGP SIGNATURE-----
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"A balloon enhancement, and a minor race-on-module-unload theoretical
bug which doesn't merit cc: stable.
All the exciting stuff went via MST this cycle"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
virtio_balloon: free some memory from balloon on OOM
virtio_balloon: return the amount of freed memory from leak_balloon()
virtio_blk: fix race at module removal
virtio: Fix comment typo 'CONFIG_S_FAILED'
When the virtio_pci driver was moved into virtio_pci_legacy.c the module
licence and other attributes went AWOL. This patch restores them.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes, just
removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There are
some ath9k patches coming in through this tree that have been acked by
the wireless maintainers as they relied on the debugfs changes.
Everything has been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlSOD20ACgkQMUfUDdst+ylLPACg2QrW1oHhdTMT9WI8jihlHVRM
53kAoLeteByQ3iVwWurwwseRPiWa8+MI
=OVRS
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core update from Greg KH:
"Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes,
just removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There
are some ath9k patches coming in through this tree that have been
acked by the wireless maintainers as they relied on the debugfs
changes.
Everything has been in linux-next for a while"
* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
fs: debugfs: add forward declaration for struct device type
firmware class: Deletion of an unnecessary check before the function call "vunmap"
firmware loader: fix hung task warning dump
devcoredump: provide a one-way disable function
device: Add dev_<level>_once variants
ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
ath: use seq_file api for ath9k debugfs files
debugfs: add helper function to create device related seq_file
drivers/base: cacheinfo: remove noisy error boot message
Revert "core: platform: add warning if driver has no owner"
drivers: base: support cpu cache information interface to userspace via sysfs
drivers: base: add cpu_device_create to support per-cpu devices
topology: replace custom attribute macros with standard DEVICE_ATTR*
cpumask: factor out show_cpumap into separate helper function
driver core: Fix unbalanced device reference in drivers_probe
driver core: fix race with userland in device_add()
sysfs/kernfs: make read requests on pre-alloc files use the buffer.
sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
fs: sysfs: return EGBIG on write if offset is larger than file size
...
virtio 1.0 devices require that drivers set VIRTIO_CONFIG_S_FEATURES_OK
after finalizing features.
virtio core missed doing this on restore, fix it up.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
kbuild does not seem to like it when we name source
files same as the module.
Let's rename virtio_pci -> virtio_pci_common,
and get rid of #include-ing c files.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Move everything dealing with legacy devices out to virtio_pci_legacy.c.
Expose common code APIs in virtio_pci.h
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
VQ setup is mostly version-specific, add another level of indirection to
split the version-independent code out.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
VQ deletion is mostly version-specific, add another level of indirection
to split the version-independent code out.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
slightly reduce the amount of pointer chasing this needs to do.
More importantly, this will easily generalize to virtio 1.0.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We don't need to go from vq to vq info on
data path, so using direct vq->priv pointer for that
seems like a waste.
Let's build an array of vq infos, then we can use vq->index
for that lookup.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
legacy_only flag is now unused, drop it from core.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
we have blacklisted balloon in core, no need
for a driver flag.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
transports need to be able to detect legacy-only
devices (ATM balloon only) to use legacy path
to drive them.
Add a core API to do just that.
The implementation just blacklists balloon:
not too pretty, but let's not over-engineer.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
We have no plans to support virtio 1.0 in balloon driver. Add an
explicit flag to mark it legacy only.
This will be used by follow-up patches.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtio-blk has some legacy feature bits that modern drivers
must not negotiate, but are needed for old legacy hosts
(that e.g. don't support virtio-scsi).
Allow a separate legacy feature table for such cases.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Now that we use u64 for bits, we can simply & them together.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
For virtio-1, we can theoretically have a more complex virtqueue
layout with avail and used buffers not on a contiguous memory area
with the descriptor table. For now, it's fine for a transport driver
to stay with the old layout: It needs, however, a way to access
the locations of the avail/used rings so it can register them with
the host.
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Use virtioXX_to_cpu and friends for access to
all multibyte structures in memory.
Note: this is intentionally mechanical.
A follow-up patch will split long lines etc.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
At this point, no transports set any of the high 32 feature bits.
Since transports generally can't (yet) cope with such bits, add BUG_ON
checks to make sure they are not set by mistake.
Based on rproc patch by Rusty.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Change u32 to u64, and use BIT_ULL and 1ULL everywhere.
Note: transports are unchanged, and only set low 32 bit.
This guarantees that no transport sets e.g. VERSION_1
by mistake without proper support.
Based on patch by Rusty.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
It seemed like a good idea to use bitmap for features
in struct virtio_device, but it's actually a pain,
and seems to become even more painful when we get more
than 32 feature bits. Just change it to a u32 for now.
Based on patch by Rusty.
Suggested-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless
it is often the case that these control tools does not have enough time
to react on fast changing memory load. As a result OS runs out of memory
and invokes OOM-killer. The balancing of memory by use of the virtio
balloon should not cause the termination of processes while there are
pages in the balloon. Now there is no way for virtio balloon driver to
free some memory at the last moment before some process will be get
killed by OOM-killer.
This does not provide a security breach as balloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.
To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make
the system to return and retry the allocation that forced the out of
memory killer to run.
Allocate virtio feature bit for this: it is not set by default,
the the guest will not deflate virtio balloon on OOM without explicit
permission from host.
Signed-off-by: Raushaniya Maksudova <rmaksudova@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This value would be useful in the next patch to provide the amount of
the freed memory for OOM killer.
Signed-off-by: Raushaniya Maksudova <rmaksudova@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
been sitting in MST's tree during my vacation. I changed a function name
and made one trivial change, then they spent two days in linux-next.
Thanks,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUQFBQAAoJENkgDmzRrbjxJRIP/1yCQRElQewxURSmJelyqCdU
0mHYB0R9Mf3tfre1xnofqs2lWeSMc/4ptKHsVR6pupoztSwnz7HsLHfEFvFJh4mj
KsaqYElxkNxTcfyHwLjyJS0/J6tG1tYypXGiimTBS0bvFHL3XZdimVgJ6WvX+gO7
YSaDEX8/EqCERafslS5+gKJlz3drDOnCZCe9y4BDSmsvl2k7bkpSxIn8vsR6jIC0
c5JpUy6QVF+3XA/J932M7yRs+xpqxNoUWiyY3ar9o3CtQAaQB0ZAetSxY6hTfvVc
GlNFzCifdsaQwsl2SVsE2h6tWaRhtMtcGWQuhHThIPyIf8XxhYyBRY2FLo70LMz1
eqtwy6F/Bg/nzUsdee4PZBMeoKHlAEL12RpsEKgfUoLzj16Aqa8ll+Agbglbkw8G
f3d2FwzKAlpY5NwHETC1wYy52PJ3efqksRWuhokmYpxNSbHJS/lsiJOE7272/4Qr
MtXuvRmo22tf34XFd5y7zqWjgZ58eeFOqQWi/K+6ZgpqVOvikjrXXKEuiVdjO0ZD
kTVR/sQKiR+79rzENk80XBhWaMveECNXF1TiZ/3MmURkmEOBRQMxRQ20BX3exvna
AJ/WVA5DcfXZc1yyqknE1NLGrvSBMJENH13x2QPwrqNWAryOOKuF1VKKIwWlDw5j
vtx5nXiJa8YYdxI2TJCN
=JK6x
-----END PGP SIGNATURE-----
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"One cc: stable commit, the rest are a series of minor cleanups which
have been sitting in MST's tree during my vacation. I changed a
function name and made one trivial change, then they spent two days in
linux-next"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (25 commits)
virtio-rng: refactor probe error handling
virtio_scsi: drop scan callback
virtio_balloon: enable VQs early on restore
virtio_scsi: fix race on device removal
virito_scsi: use freezable WQ for events
virtio_net: enable VQs early on restore
virtio_console: enable VQs early on restore
virtio_scsi: enable VQs early on restore
virtio_blk: enable VQs early on restore
virtio_scsi: move kick event out from virtscsi_init
virtio_net: fix use after free on allocation failure
9p/trans_virtio: enable VQs early
virtio_console: enable VQs early
virtio_blk: enable VQs early
virtio_net: enable VQs early
virtio: add API to enable VQs early
virtio_net: minor cleanup
virtio-net: drop config_mutex
virtio_net: drop config_enable
virtio-blk: drop config_mutex
...
virtio spec requires drivers to set DRIVER_OK before using VQs.
This is set automatically after resume returns, virtio balloon
violated this rule by adding bufs, which causes the VQ to be used
directly within restore.
To fix, call virtio_device_ready before using VQ.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Defer config changed notifications that arrive during
probe/scan/freeze/restore.
This will allow drivers to set DRIVER_OK earlier, without worrying about
racing with config change interrupts.
This change will also benefit old hypervisors (before 2009)
that send interrupts without checking DRIVER_OK: previously,
the callback could race with driver-specific initialization.
This will also help simplify drivers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cosmetic changes)
This is in preparation to extending config changed event handling
in core.
Wrapping these in an API also seems to make for a cleaner code.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Replace duplicated code in all transports with a single wrapper in
virtio.c.
The only functional change is in virtio_mmio.c: if a buggy device sends
us an interrupt before driver is set, we previously returned IRQ_NONE,
now we return IRQ_HANDLED.
As this must not happen in practice, this does not look like a big deal.
See also commit 3fff0179e3
virtio-pci: do not oops on config change if driver not loaded.
for the original motivation behind the driver check.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
On restore, virtio pci does the following:
+ set features
+ init vqs etc - device can be used at this point!
+ set ACKNOWLEDGE,DRIVER and DRIVER_OK status bits
This is in violation of the virtio spec, which
requires the following order:
- ACKNOWLEDGE
- DRIVER
- init vqs
- DRIVER_OK
This behaviour will break with hypervisors that assume spec compliant
behaviour. It seems like a good idea to have this patch applied to
stable branches to reduce the support butden for the hypervisors.
Cc: stable@vger.kernel.org
Cc: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Always mark pages with PageBalloon even if balloon compaction is disabled
and expose this mark in /proc/kpageflags as KPF_BALLOON.
Also this patch adds three counters into /proc/vmstat: "balloon_inflate",
"balloon_deflate" and "balloon_migrate". They accumulate balloon
activity. Current size of balloon is (balloon_inflate - balloon_deflate)
pages.
All generic balloon code now gathered under option CONFIG_MEMORY_BALLOON.
It should be selected by ballooning driver which wants use this feature.
Currently virtio-balloon is the only user.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now ballooned pages are detected using PageBalloon(). Fake mapping is no
longer required. This patch links ballooned pages to balloon device using
field page->private instead of page->mapping. Also this patch embeds
balloon_dev_info directly into struct virtio_balloon.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sasha Levin reported KASAN splash inside isolate_migratepages_range().
Problem is in the function __is_movable_balloon_page() which tests
AS_BALLOON_MAP in page->mapping->flags. This function has no protection
against anonymous pages. As result it tried to check address space flags
inside struct anon_vma.
Further investigation shows more problems in current implementation:
* Special branch in __unmap_and_move() never works:
balloon_page_movable() checks page flags and page_count. In
__unmap_and_move() page is locked, reference counter is elevated, thus
balloon_page_movable() always fails. As a result execution goes to the
normal migration path. virtballoon_migratepage() returns
MIGRATEPAGE_BALLOON_SUCCESS instead of MIGRATEPAGE_SUCCESS,
move_to_new_page() thinks this is an error code and assigns
newpage->mapping to NULL. Newly migrated page lose connectivity with
balloon an all ability for further migration.
* lru_lock erroneously required in isolate_migratepages_range() for
isolation ballooned page. This function releases lru_lock periodically,
this makes migration mostly impossible for some pages.
* balloon_page_dequeue have a tight race with balloon_page_isolate:
balloon_page_isolate could be executed in parallel with dequeue between
picking page from list and locking page_lock. Race is rare because they
use trylock_page() for locking.
This patch fixes all of them.
Instead of fake mapping with special flag this patch uses special state of
page->_mapcount: PAGE_BALLOON_MAPCOUNT_VALUE = -256. Buddy allocator uses
PAGE_BUDDY_MAPCOUNT_VALUE = -128 for similar purpose. Storing mark
directly in struct page makes everything safer and easier.
PagePrivate is used to mark pages present in page list (i.e. not
isolated, like PageLRU for normal pages). It replaces special rules for
reference counter and makes balloon migration similar to migration of
normal pages. This flag is protected by page_lock together with link to
the balloon device.
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/p/53E6CEAA.9020105@oracle.com
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: <stable@vger.kernel.org> [3.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
virtqueue_add() populates the virtqueue descriptor table from the sgs
given. If it uses an indirect descriptor table, then it puts a single
descriptor in the descriptor table pointing to the kmalloc'ed indirect
table where the sg is populated.
Previously vring_add_indirect() did the allocation and the simple
linear layout. We replace that with alloc_indirect() which allocates
the indirect table then chains it like the normal descriptor table so
we can reuse the core logic.
This slows down pktgen by less than 1/2 a percent (which uses direct
descriptors), as well as vring_bench, but it's far neater.
vring_bench before:
1061485790-1104800648(1.08254e+09+/-6.6e+06)ns
vring_bench after:
1125610268-1183528965(1.14172e+09+/-8e+06)ns
pktgen before:
787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0
pktgen after:
779988-790404(786391+/-2.5e+03)pps 361-366(364.35+/-1.3)Mb/sec (361914432-366747456(3.64885e+08+/-1.2e+06)bps) errors: 0
Now, if we make force indirect descriptors by turning off any_header_sg
in virtio_net.c:
pktgen before:
713773-721062(718374+/-2.1e+03)pps 331-334(332.95+/-0.92)Mb/sec (331190672-334572768(3.33325e+08+/-9.6e+05)bps) errors: 0
pktgen after:
710542-719195(714898+/-2.4e+03)pps 329-333(331.15+/-1.1)Mb/sec (329691488-333706480(3.31713e+08+/-1.1e+06)bps) errors: 0
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We used to have several callers which just used arrays. They're
gone, so we can use sg_next() everywhere, simplifying the code.
On my laptop, this slowed down vring_bench by 15%:
vring_bench before:
936153354-967745359(9.44739e+08+/-6.1e+06)ns
vring_bench after:
1061485790-1104800648(1.08254e+09+/-6.6e+06)ns
However, a more realistic test using pktgen on a AMD FX(tm)-8320 saw
a few percent improvement:
pktgen before:
767390-792966(785159+/-6.5e+03)pps 356-367(363.75+/-2.9)Mb/sec (356068960-367936224(3.64314e+08+/-3e+06)bps) errors: 0
pktgen after:
787781-796334(793165+/-2.4e+03)pps 365-369(367.5+/-1.2)Mb/sec (365530384-369498976(3.68028e+08+/-1.1e+06)bps) errors: 0
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to meet
kernel coding style guidelines. This issue was reported by checkpatch.
Signed-off-by: Benoit Taine <benoit.taine@lip6.fr>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Heinz points out that adding buffers to a broken virtqueue (which
should "never happen") still works. Failing allows drivers to detect
and complain about broken devices.
Now drivers are robust, we can add this extra check.
Reported-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
A bad implementation of virtio might cause us to mark the virtqueue
broken: we'll dev_err() in that case, and the device is useless, but
let's not BUG().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When adding or removing 100G from a balloon:
BUG: soft lockup - CPU#0 stuck for 22s! [vballoon:367]
We have a wait_event_interruptible(), but the condition is always true
(more ballooning to do) so we don't ever sleep. We also have a
wait_event() for the host to ack, but that is also always true as QEMU
is synchronous for balloon operations.
Reported-by: Gopesh Kumar Chaudhary <gopchaud@in.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
As result of deprecation of MSI-X/MSI enablement functions
pci_enable_msix() and pci_enable_msi_block() all drivers
using these two interfaces need to be updated to use the
new pci_enable_msi_range() or pci_enable_msi_exact()
and pci_enable_msix_range() or pci_enable_msix_exact()
interfaces.
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
In commit bb478d8b16 virtio_ring: plug kmemleak false positive,
kmemleak_ignore was introduced. This broke compilation of virtio_test:
cc -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign
-fno-strict-overflow -fno-strict-aliasing -fno-common -MMD
-U_FORTIFY_SOURCE -c -o virtio_ring.o ../../drivers/virtio/virtio_ring.c
../../drivers/virtio/virtio_ring.c: In function ‘vring_add_indirect’:
../../drivers/virtio/virtio_ring.c:177:2: warning: implicit declaration
of function ‘kmemleak_ignore’ [-Wimplicit-function-declaration]
kmemleak_ignore(desc);
^
cc virtio_test.o virtio_ring.o -o virtio_test
virtio_ring.o: In function `vring_add_indirect':
tools/virtio/../../drivers/virtio/virtio_ring.c:177:
undefined reference to `kmemleak_ignore'
Add a dummy header for tools/virtio, and add #incldue <linux/kmemleak.h>
to drivers/virtio/virtio_ring.c so it is picked up by the userspace
tools.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Mark the function virtballoon_migratepage() as static in
virtio_balloon.c because it is not used outside this file.
This eliminates the following warning in virtio_balloon.c:
drivers/virtio/virtio_balloon.c:372:5: warning: no previous prototype for ‘virtballoon_migratepage’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
According to the virtio spec, the device configuration field
that should be updated after an inflation or deflation
operation is the 'actual' field, not the 'num_pages' one.
Commit 855e0c5288 swapped them
in update_balloon_size(). Fix it.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fixes: 855e0c5288
The driver core clears the driver data to NULL after device_release
or on probe failure. Thus, it is not needed to manually clear the
device driver data to NULL.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
some robustness fixes for broken virtio devices, plus minor tweaks.
[vs last pull request: added the virtio-scsi broken vq escape patch, which
I somehow lost.]
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQIcBAABAgAGBQJSgDsJAAoJENkgDmzRrbjxEE4P/jXqZHS/HdlxW9k0BjKKlEIF
PdtCoP3UhWTdskXvy2pD8m6nYn214MEJYUIa4HFlIEZsdxhuexzQHY19Ynkjagyv
57sRsUUm5fYQLIL7IUh2DUD1VU38hUFinno/y333szzvCj9qITDA/QABsiWxK8NO
dq+Lmeixgrhc5yN9iryW+gZV+hekJIZ4LsU5ejSaJucKblzXUH8qIbmSthG7RTYJ
tr4J7xTTXbhxY4CoC5Dpx2hvsFkvzaAIvI4Nr1mDjfq5cR8BaYvnC89U1IbhdAey
p1AbZE58JLrY+Z8K8LBRGV2KjO8qSZ6R47hbZ9nAnodJYB7sZLyj6jUe1q+/htuC
Dh9Xm9O4eW2xNaFk20dYeIF4UU5/HzdsbvG/IlH8x4sm8/K706ocYyAOHlzYUg2T
k7gltrgDzDokMgb2R44gwnr4oaJ2q8Gne6JXswlPEv2eRs6vNnA5Xhc0rEHGkU6C
gYn1vNFN6yx0vf2syG/Ce5pZtMxGpefKQkHzzWdq8FKr1B9s54dDuf2hls7J8A9t
OQT1gE33yURSelf4Kh4k9zWXaWk/Ohv9l2R1cqpALnJ4/+q0fP5t7HdK500S7aax
DxLeFeqvsBw7nlWgsGxQmt+fjITQFHhcDiwst0ehnt6RbDEW7XPIguz0K/gyhxYG
+UNbl/5Gr64jnUX3YCzm
=vY2L
-----END PGP SIGNATURE-----
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"Nothing really exciting: some groundwork for changing virtio endian,
and some robustness fixes for broken virtio devices, plus minor
tweaks"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
virtio_scsi: verify if queue is broken after virtqueue_get_buf()
x86, asmlinkage, lguest: Pass in globals into assembler statement
virtio: mmio: fix signature checking for BE guests
virtio_ring: adapt to notify() returning bool
virtio_net: verify if queue is broken after virtqueue_get_buf()
virtio_console: verify if queue is broken after virtqueue_get_buf()
virtio_blk: verify if queue is broken after virtqueue_get_buf()
virtio_ring: add new function virtqueue_is_broken()
virtio_test: verify if virtqueue_kick() succeeded
virtio_net: verify if virtqueue_kick() succeeded
virtio_ring: let virtqueue_{kick()/notify()} return a bool
virtio_ring: change host notification API
virtio_config: remove virtio_config_val
virtio: use size-based config accessors.
virtio_config: introduce size-based accessors.
virtio_ring: plug kmemleak false positive.
virtio: pm: use CONFIG_PM_SLEEP instead of CONFIG_PM
As virtio-mmio config registers are specified to be little-endian,
using readl() to read the magic value and then memcmp() to check it
fails on BE (as readl() has an implicit swab).
Fix it by encoding the magic value as an integer instead of a string.
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Correct if statement to check for bool returned by notify()
(introduced in 5b1bf7cb67).
Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Add new function virtqueue_is_broken(). Callers of virtqueue_get_buf()
should check for a broken queue.
Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtqueue_{kick()/notify()} should exploit the new host notification API.
If the notify call returned with a negative value the host kick failed
(e.g. a kick triggered after a device was hot-unplugged). In this case
the virtqueue is set to 'broken' and false is returned, otherwise true.
Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Currently a host kick error is silently ignored and not reflected in
the virtqueue of a particular virtio device.
Changing the notify API for guest->host notification seems to be one
prerequisite in order to be able to handle such errors in the context
where the kick is triggered.
This patch changes the notify API. The notify function must return a
bool return value. It returns false if the host notification failed.
Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The dev_attrs field of struct bus_type is going away soon, dev_groups
should be used instead. This converts the virtio bus code to use the
correct field.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: <virtualization@lists.linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This lets the transport do endian conversion if necessary, and insulates
the drivers from the difference.
Most drivers can use the simple helpers virtio_cread() and virtio_cwrite().
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The freeze and restore functions defined in virtio drivers are used
for suspend and hibernate, so CONFIG_PM_SLEEP is more appropriate than
CONFIG_PM. This patch replace all CONFIG_PM with CONFIG_PM_SLEEP for
virtio drivers that implement freeze and restore callbacks.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The virtio_pci_freeze/restore are defined under CONFIG_PM but is used
by SET_SYSTEM_SLEEP_PM_OPS macro, which is defined under
CONFIG_PM_SLEEP. So if CONFIG_PM_SLEEP is not cofigured but
CONFIG_PM_RUNTIME is, the following warning message appeared:
drivers/virtio/virtio_pci.c:770:12: warning: ‘virtio_pci_freeze’ defined but not used [-Wunused-function]
static int virtio_pci_freeze(struct device *dev)
^
drivers/virtio/virtio_pci.c:790:12: warning: ‘virtio_pci_restore’ defined but not used [-Wunused-function]
static int virtio_pci_restore(struct device *dev)
^
Fix it by changing CONFIG_PM to CONFIG_PM_SLEEP.
Signed-off-by: Aaron Lu <aaron.lu@intel.com>
Reviewed-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pull networking updates from David Miller:
"This is a re-do of the net-next pull request for the current merge
window. The only difference from the one I made the other day is that
this has Eliezer's interface renames and the timeout handling changes
made based upon your feedback, as well as a few bug fixes that have
trickeled in.
Highlights:
1) Low latency device polling, eliminating the cost of interrupt
handling and context switches. Allows direct polling of a network
device from socket operations, such as recvmsg() and poll().
Currently ixgbe, mlx4, and bnx2x support this feature.
Full high level description, performance numbers, and design in
commit 0a4db187a9 ("Merge branch 'll_poll'")
From Eliezer Tamir.
2) With the routing cache removed, ip_check_mc_rcu() gets exercised
more than ever before in the case where we have lots of multicast
addresses. Use a hash table instead of a simple linked list, from
Eric Dumazet.
3) Add driver for Atheros CQA98xx 802.11ac wireless devices, from
Bartosz Markowski, Janusz Dziedzic, Kalle Valo, Marek Kwaczynski,
Marek Puzyniak, Michal Kazior, and Sujith Manoharan.
4) Support reporting the TUN device persist flag to userspace, from
Pavel Emelyanov.
5) Allow controlling network device VF link state using netlink, from
Rony Efraim.
6) Support GRE tunneling in openvswitch, from Pravin B Shelar.
7) Adjust SOCK_MIN_RCVBUF and SOCK_MIN_SNDBUF for modern times, from
Daniel Borkmann and Eric Dumazet.
8) Allow controlling of TCP quickack behavior on a per-route basis,
from Cong Wang.
9) Several bug fixes and improvements to vxlan from Stephen
Hemminger, Pravin B Shelar, and Mike Rapoport. In particular,
support receiving on multiple UDP ports.
10) Major cleanups, particular in the area of debugging and cookie
lifetime handline, to the SCTP protocol code. From Daniel
Borkmann.
11) Allow packets to cross network namespaces when traversing tunnel
devices. From Nicolas Dichtel.
12) Allow monitoring netlink traffic via AF_PACKET sockets, in a
manner akin to how we monitor real network traffic via ptype_all.
From Daniel Borkmann.
13) Several bug fixes and improvements for the new alx device driver,
from Johannes Berg.
14) Fix scalability issues in the netem packet scheduler's time queue,
by using an rbtree. From Eric Dumazet.
15) Several bug fixes in TCP loss recovery handling, from Yuchung
Cheng.
16) Add support for GSO segmentation of MPLS packets, from Simon
Horman.
17) Make network notifiers have a real data type for the opaque
pointer that's passed into them. Use this to properly handle
network device flag changes in arp_netdev_event(). From Jiri
Pirko and Timo Teräs.
18) Convert several drivers over to module_pci_driver(), from Peter
Huewe.
19) tcp_fixup_rcvbuf() can loop 500 times over loopback, just use a
O(1) calculation instead. From Eric Dumazet.
20) Support setting of explicit tunnel peer addresses in ipv6, just
like ipv4. From Nicolas Dichtel.
21) Protect x86 BPF JIT against spraying attacks, from Eric Dumazet.
22) Prevent a single high rate flow from overruning an individual cpu
during RX packet processing via selective flow shedding. From
Willem de Bruijn.
23) Don't use spinlocks in TCP md5 signing fast paths, from Eric
Dumazet.
24) Don't just drop GSO packets which are above the TBF scheduler's
burst limit, chop them up so they are in-bounds instead. Also
from Eric Dumazet.
25) VLAN offloads are missed when configured on top of a bridge, fix
from Vlad Yasevich.
26) Support IPV6 in ping sockets. From Lorenzo Colitti.
27) Receive flow steering targets should be updated at poll() time
too, from David Majnemer.
28) Fix several corner case regressions in PMTU/redirect handling due
to the routing cache removal, from Timo Teräs.
29) We have to be mindful of ipv4 mapped ipv6 sockets in
upd_v6_push_pending_frames(). From Hannes Frederic Sowa.
30) Fix L2TP sequence number handling bugs, from James Chapman."
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1214 commits)
drivers/net: caif: fix wrong rtnl_is_locked() usage
drivers/net: enic: release rtnl_lock on error-path
vhost-net: fix use-after-free in vhost_net_flush
net: mv643xx_eth: do not use port number as platform device id
net: sctp: confirm route during forward progress
virtio_net: fix race in RX VQ processing
virtio: support unlocked queue poll
net/cadence/macb: fix bug/typo in extracting gem_irq_read_clear bit
Documentation: Fix references to defunct linux-net@vger.kernel.org
net/fs: change busy poll time accounting
net: rename low latency sockets functions to busy poll
bridge: fix some kernel warning in multicast timer
sfc: Fix memory leak when discarding scattered packets
sit: fix tunnel update via netlink
dt:net:stmmac: Add dt specific phy reset callback support.
dt:net:stmmac: Add support to dwmac version 3.610 and 3.710
dt:net:stmmac: Allocate platform data only if its NULL.
net:stmmac: fix memleak in the open method
ipv6: rt6_check_neigh should successfully verify neigh if no NUD information are available
net: ipv6: fix wrong ping_v6_sendmsg return value
...
This adds a way to check ring empty state after enable_cb outside any
locks. Will be used by virtio_net.
Note: there's room for more optimization: caller is likely to have a
memory barrier already, which means we might be able to get rid of a
barrier here. Deferring this optimization until we do some
benchmarking.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge first patch-bomb from Andrew Morton:
- various misc bits
- I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
distracted. There has been quite a bit of activity.
- About half the MM queue
- Some backlight bits
- Various lib/ updates
- checkpatch updates
- zillions more little rtc patches
- ptrace
- signals
- exec
- procfs
- rapidio
- nbd
- aoe
- pps
- memstick
- tools/testing/selftests updates
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (445 commits)
tools/testing/selftests: don't assume the x bit is set on scripts
selftests: add .gitignore for kcmp
selftests: fix clean target in kcmp Makefile
selftests: add .gitignore for vm
selftests: add hugetlbfstest
self-test: fix make clean
selftests: exit 1 on failure
kernel/resource.c: remove the unneeded assignment in function __find_resource
aio: fix wrong comment in aio_complete()
drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
drivers/memstick/host/r592.c: convert to module_pci_driver
drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
pps-gpio: add device-tree binding and support
drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
drivers/parport/share.c: use kzalloc
Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
aoe: update internal version number to v83
aoe: update copyright date
aoe: perform I/O completions in parallel
...
Enhance adjust_managed_page_count() to adjust totalhigh_pages for
highmem pages. And change code which directly adjusts totalram_pages to
use adjust_managed_page_count() because it adjusts totalram_pages,
totalhigh_pages and zone->managed_pages altogether in a safe way.
Remove inc_totalhigh_pages() and dec_totalhigh_pages() from xen/balloon
driver bacause adjust_managed_page_count() has already adjusted
totalhigh_pages.
This patch also fixes two bugs:
1) enhances virtio_balloon driver to adjust totalhigh_pages when
reserve/unreserve pages.
2) enhance memory_hotplug.c to adjust totalhigh_pages when hot-removing
memory.
We still need to deal with modifications of totalram_pages in file
arch/powerpc/platforms/pseries/cmm.c, but need help from PPC experts.
[akpm@linux-foundation.org: remove ifdef, per Wanpeng Li, virtio_balloon.c cleanup, per Sergei]
[akpm@linux-foundation.org: export adjust_managed_page_count() to modules, for drivers/virtio/virtio_balloon.c]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <sworddragon2@aol.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
balloon_page_dequeue() can return NULL. If it does for the first page
being freed then leak_balloon() will create a scatter list with len=0.
Which in turn seems to generate an invalid virtio request.
I didn't get this in practice, I found it by code review. On the other
hand, such an invalid virtio request will cause errors in QEMU and
fill_balloon() also performs the same check implemented by this commit.
This bug was introduced in e2250429.
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org # 3.9
These are specialized versions of virtqueue_add_buf(), which cover
over 80% of cases and are far clearer.
In particular, the scatterlists passed to these functions don't have
to be clean (ie. we ignore end markers).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio_scsi can really use this, to avoid the current hack of copying
the whole sg array. Some other things get slightly neater, too.
This causes a slowdown in virtqueue_add_buf(), which is implemented as
a wrapper. This is addressed in the next patches.
for i in `seq 50`; do /usr/bin/time -f 'Wall time:%e' ./vringh_test --indirect --eventidx --parallel --fast-vringh; done 2>&1 | stats --trim-outliers:
Before:
Using CPUS 0 and 3
Guest: notified 0, pinged 39009-39063(39062)
Host: notified 39009-39063(39062), pinged 0
Wall time:1.700000-1.950000(1.723542)
After:
Using CPUS 0 and 3
Guest: notified 0, pinged 39062-39063(39063)
Host: notified 39062-39063(39063), pinged 0
Wall time:1.760000-2.220000(1.789167)
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Reviewed-by: Asias He <asias@redhat.com>
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJRLEkHAAoJENkgDmzRrbjx6K0P/3o9/iW5hkOPYpu+KV2nr0wG
6RG0uu0DCOb/tckigYwnn5PkS7UEcJu6kDypnEgXfhcNhYiBjoGIAUEQ2jBYyzQm
IYc4oZhxDdqigJL/FHi2zL50mkWacTdBK83udxim3eRkgW9ysBRoBFwqMruVyhZv
474KGS4PNT8pHDOCAlrRS1I9oW2pYTuUidN+SnfVJ8gFSkH0tuHEGoJeGrtDa010
XkiWqiTJq6pDHTM5f4Wwp/WNQ21UNDBlvRahg0nqTZXicQsvujV0Tdb+EnAfXwa+
bssqvO4X2WqlraVK1TJteufhcdhUgt/I1+45p9eLZvOXizk7EBKxFWynE7C878GV
dhpz8i4mc+u6aJlGoHzwvKah0zhwDtqWdbDS+LNQjRmjMK9v45ttoisU+iR1GjLt
JpDVg73K315aWJ9RLsYu7mvZ8JY+qRFkwXqX3lZd+GdohY0DSmGMxMqJG93sLBYi
42vyHMaBd1JY0rDVqpfzlmjnSxX+k05uB0GYB3fO0CXbPxmfXtRKz7/5wpSz0Up+
nTCs6Xa7t5jtG9qpC4cKxyEDEwB9M6SSxi+lhrIBkeDqlFwXAv1Fean4jqqhtFwr
HO6i1mYgy0xCVY7EmequVyWlH6/B5GVB2xTMQuAz3f8pKm+XUr6C/twROYkgoXbV
AOVyRWtnbISDVjtVFn9c
=KcDx
-----END PGP SIGNATURE-----
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"All trivial, thanks to the stuff which didn't quite make it time"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
virtio_console: Initialize guest_connected=true for rproc_serial
virtio: use module_virtio_driver.
virtio: Add module driver macro for virtio drivers.
virtio_console: Use virtio device index to generate port name
virtio: make pci_device_id const
virtio: make config_ops const
virtio-mmio: fix wrong comment about register offset
virtio_console: Let unconnected rproc device receive data.
Use DEVICE_TABLE macro which also makes table const.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It is just a table of function pointers, make it const for cleanliness and security
reasons.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a
while now and is almost always enabled by default. As agreed during the
Linux kernel summit, remove it from any "depends on" lines in Kconfigs.
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
CONFIG_HOTPLUG is going away as an option. As a result, the __dev*
markings need to be removed.
This change removes the use of __devinit, __devexit_p, and __devexit
from these drivers.
Based on patches originally written by Bill Pemberton, but redone by me
in order to handle some of the coding style issues better, by hand.
Cc: Bill Pemberton <wfp5p@virginia.edu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Latinoware 2012.
There's a slightly non-trivial merge in virtio-net, as we cleaned up the
virtio add_buf interface while DaveM accepted the mq virtio-net patches.
You can see my solution in my pending-rebases branch, if that helps, but I
know you love merging:
https://git.kernel.org/?p=linux/kernel/git/rusty/linux.git;a=commit;h=12e4e64fa66a4c812e4855de32abdb4d819526fe
Cheers,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQz/vKAAoJENkgDmzRrbjx+eYQAK/egj9T8Nnth6mkzdbCFSO7
Bciga2hDiudGCiGojTRGPRSc0VP9LgfvPbY2pxX+R9CfEqR+a8q/rRQhCS79ZwPB
/mJy3HNiCx418HZxgwNtk6vPe0PjJm6SsjbXeB9hB+PQLCbdwA0BjpG6xjF/jitP
noPqhhXreeQgYVxAKoFPvff/Byu2GlNnDdVMQxWRmo8hTKlTCzl0T/7BHRxthhJj
iOrXTFzrT/osPT0zyqlngT03T4wlBvL2Bfw8d/kuRPEZ71dpIctWeH2KzdwXVCrz
hFQGxAz4OWvW3xrNwj7c6O3SWj4VemUMjQqeA/PtRiOEI5gM0Y/Bit47dWL4wM/O
OWUKFHzq4DFs8MmwXBgDDXl5xOjOBH9Ik4FZayn3Y7COT/B8CjFdOC2MdDGmZ9yd
NInumg7FqP+u12g+9Vq8S/b0cfoQm4qFe8VHiPJu+jRmCZglyvLjk7oq/QwW8Gaq
Pkzit1Ey0DWo2KvZ4D/nuXJCuhmzN/AJ10M48lLYZhtOIVg9gsa0xjhfgq4FnvSK
xFCf3rcWnlGIXcOYh/hKU25WaCLzBuqMuSK35A72IujrQOL7OJTk4Oqote3Z3H9B
08XJmyW6SOZdfw17X4Im1jbyuLek///xQJ9Jw/tya7j9lBt8zjJ+FmLPs4mLGEOm
WJv9uZPs+QbIMNky2Lcb
=myDR
-----END PGP SIGNATURE-----
Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio update from Rusty Russell:
"Some nice cleanups, and even a patch my wife did as a "live" demo for
Latinoware 2012.
There's a slightly non-trivial merge in virtio-net, as we cleaned up
the virtio add_buf interface while DaveM accepted the mq virtio-net
patches."
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (27 commits)
virtio_console: Add support for remoteproc serial
virtio_console: Merge struct buffer_token into struct port_buffer
virtio: add drv_to_virtio to make code clearly
virtio: use dev_to_virtio wrapper in virtio
virtio-mmio: Fix irq parsing in command line parameter
virtio_console: Free buffers from out-queue upon close
virtio: Convert dev_printk(KERN_<LEVEL> to dev_<level>(
virtio_console: Use kmalloc instead of kzalloc
virtio_console: Free buffer if splice fails
virtio: tools: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: scsi: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: rpmsg: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: net: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: console: make it clear that virtqueue_add_buf() no longer returns > 0
virtio: make virtqueue_add_buf() returning 0 on success, not capacity.
virtio: console: don't rely on virtqueue_add_buf() returning capacity.
virtio_net: don't rely on virtqueue_add_buf() returning capacity.
virtio-net: remove unused skb_vnet_hdr->num_sg field
virtio-net: correct capacity math on ring full
virtio: move queue_index and num_free fields into core struct virtqueue.
...
Add drv_to_virtio wrapper to get virtio_driver from device_driver.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use dev_to_virtio wrapper in virtio to make code clearly.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When the resource_size_t is 64-bit long, the sscanf() on
the virtio device command line paramter string may return
wrong value because its format was defined as "%u". Fixed
by using an intermediate local value of a known length.
Also added cleaned up the resource creation and added extra
comments to make the parameters parsing easier to follow.
Reported-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
dev_<level> calls take less code than dev_printk(KERN_<LEVEL>
and reducing object size is good.
Convert if (printk_ratelimit()) dev_printk to dev_<level>_ratelimited.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now noone relies on this behavior, we simplify virtqueue_add_buf() so it
return 0 or -errno.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
They're generic concepts, so hoist them. This also avoids accessor
functions (though kept around for merge with DaveM's net tree).
This goes even further than Jason Wang's 17bb6d4088 patch
("virtio-ring: move queue_index to vring_virtqueue") which moved the
queue_index from the specific transport.
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Memory fragmentation introduced by ballooning might reduce significantly
the number of 2MB contiguous memory blocks that can be used within a guest,
thus imposing performance penalties associated with the reduced number of
transparent huge pages that could be used by the guest workload.
Besides making balloon pages movable at allocation time and introducing
the necessary primitives to perform balloon page migration/compaction,
this patch also introduces the following locking scheme, in order to
enhance the syncronization methods for accessing elements of struct
virtio_balloon, thus providing protection against concurrent access
introduced by parallel memory migration threads.
- balloon_lock (mutex) : synchronizes the access demand to elements of
struct virtio_balloon and its queue operations;
[yongjun_wei@trendmicro.com.cn: fix missing unlock on error in fill_balloon()]
[akpm@linux-foundation.org: avoid having multiple return points in fill_balloon()]
[akpm@linux-foundation.org: fix printk warning]Signed-off-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Virtio wants to release used indices after the corresponding
virtio device has been unregistered. However, virtio does not
hold an extra reference, giving up its last reference with
device_unregister(), making accessing dev->index afterwards
invalid.
I actually saw problems when testing my (not-yet-merged)
virtio-ccw code:
- device_add virtio-net,id=xxx
-> creates device virtio<n> with n>0
- device_del xxx
-> deletes virtio<n>, but calls ida_simple_remove with an
index of 0
- device_add virtio-net,id=xxx
-> tries to add virtio0, which is still in use...
So let's save the index we want to release before calling
device_unregister().
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Cc: stable@kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Virtio devices may attempt to add descriptors to a virtqueue from atomic
context using GFP_ATOMIC allocation. This is problematic because such
allocations can fall outside of the lowmem mapping, causing virt_to_phys
to report bogus physical addresses which are subsequently passed to
userspace via the buffers for the virtual device.
This patch masks out __GFP_HIGH and __GFP_HIGHMEM from the requested
flags when allocating descriptors for a virtqueue. If an atomic
allocation is requested and later fails, we will return -ENOSPC which
will be handled by the driver.
Cc: stable@kernel.org
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If a virtio device reports a QueueNumMax of 0, vring_new_virtqueue()
doesn't check this, and thanks to an unsigned (i < num - 1) loop
guard, scribbles over memory when initialising the free list.
Avoid by not trying to create zero-descriptor queues, as there's no
way to do any I/O with one.
Signed-off-by: Brian Foley <brian.foley@arm.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
vm_setup_vq fails to allow VirtQueues needing only 2 pages of
storage, as it should. Found with a kernel using 64kB pages, but
can be provoked if a virtio device reports QueueNumMax where the
descriptor table and available ring fit in one page, and the used
ring on the second (<= 227 descriptors with 4kB pages and <= 3640
with 64kB pages.)
Signed-off-by: Brian Foley <brian.foley@arm.com>
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Convert a nonnegative error return code to a negative one, as returned
elsewhere in the function.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
(
if@p1 (\(ret < 0\|ret != 0\))
{ ... return ret; }
|
ret@p1 = 0
)
... when != ret = e1
when != &ret
*if(...)
{
... when != ret = e2
when forall
return ret;
}
// </smpl>
Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Because of a sanity check in virtio_dev_remove, a buggy device can crash
kernel. And in case of rproc it's userspace so it's not a good idea.
We are unloading a driver so how bad can it be?
Be less aggressive in handling this error: if it's a driver bug,
warning once should be enough.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Everyone who selects VIRTIO is also made to select VIRTIO_RING; just make
them synonymous, since we removed the indirection layer some time ago.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Trying to enable a virtio driver (eg CONFIG_VIRTIO_BLK) is painful
because it depends on CONFIG_VIRTIO. CONFIG_VIRTIO doesn't tell you
how to turn it on (it's selected from anything which provides a virtio
bus).
This patch at least adds some documentation, visible in menuconfig, as
a hint.
Reported-by: Kent Overstreet <koverstreet@google.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio network device multiqueue support reserves
vq 3 for future use (useful both for future extensions and to make it
pretty - this way receive vqs have even and transmit - odd numbers).
Make it possible to skip initialization for
specific vq numbers by specifying NULL for name.
Document this usage as well as (existing) NULL callback.
Drivers using this not coded up yet, so I simply tested
with virtio-pci and verified that this patch does
not break existing drivers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Sometimes, virtio device need to configure irq affinity hint to maximize the
performance. Instead of just exposing the irq of a virtqueue, this patch
introduce an API to set the affinity for a virtqueue.
The api is best-effort, the affinity hint may not be set as expected due to
platform support, irq sharing or irq type. Currently, only pci method were
implemented and we set the affinity according to:
- if device uses INTX, we just ignore the request
- if device has per vq vector, we force the affinity hint
- if the virtqueues share MSI, make the affinity OR over all affinities
requested
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Instead of storing the queue index in transport-specific virtio structs,
this patch moves them to vring_virtqueue and introduces an helper to get
the value. This lets drivers simplify their management and tracing of
virtqueues.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It is not experimental in any vaguely-sane sense.
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Devices should depend on virtio, not select it. It's supposed to be
selected by the particular driver, e.g. VIRTIO_PCI.
Make balloon depend on VIRTIO and EXPERIMENTAL
(to match description).
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch changes virtio-scsi to use a new virtio_driver->scan() callback
so that scsi_scan_host() can be properly invoked once virtio_dev_probe() has
set add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK) to signal active virtio-ring
operation, instead of from within virtscsi_probe().
This fixes a bug where SCSI LUN scanning for both virtio-scsi-raw and
virtio-scsi/tcm_vhost setups was happening before VIRTIO_CONFIG_S_DRIVER_OK
had been set, causing VIRTIO_SCSI_S_BAD_TARGET to occur. This fixes a bug
with virtio-scsi/tcm_vhost where LUN scan was not detecting LUNs.
Tested with virtio-scsi-raw + virtio-scsi/tcm_vhost w/ IBLOCK on 3.5-rc2 code.
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Since ee7cd8981e 'virtio: expose added
descriptors immediately.', in virtio balloon virtqueue_get_buf might
now run concurrently with virtqueue_kick. I audited both and this
seems safe in practice but this is not guaranteed by the API.
Additionally, a spurious interrupt might in theory make
virtqueue_get_buf run in parallel with virtqueue_add_buf, which is
racy.
While we might try to protect against spurious callbacks it's
easier to fix the driver: balloon seems to be the only one
(mis)using the API like this, so let's just fix balloon.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed unused var)
This patch adds an option to instantiate guest virtio-mmio devices
basing on a kernel command line (or module) parameter, for example:
virtio_mmio.devices=0x100@0x100b0000:48
Signed-off-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Current index allocation in virtio is based on a monotonically
increasing variable "index". This means we'll run out of numbers
after a while. E.g. someone crazy doing this in host side.
while(1) {
hot-plug a virtio device
hot-unplug the virito devcie
}
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The remove and freeze functions have a lot of shared code; put it into a
common function that gets called by both.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
restore_common() was used when there were different thaw and freeze PM
callbacks implemented. We removed thaw in commit
f38f8387cb.
restore_common() can be removed and virtballoon_restore() can itself do
the restore ops.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
When the balloon module is removed, we deflate the balloon, reclaiming
all the pages that were given to the host. However, we don't update the
config values for the new balloon size, resulting in the host showing
outdated balloon values.
The size update is done after each leak and fill operation, only the
module removal case was left out.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
As reported by David Gibson, current code handles PAGE_SIZE != 4k
completely wrong which can lead to guest memory corruption errors:
- page_to_balloon_pfn is wrong: e.g. on system with 64K page size
it gives the same pfn value for 16 different pages.
- we also need to convert back to linux pfns when we free.
- for each linux page we need to tell host about multiple balloon
pages, but code only adds one pfn to the array.
This patch fixes all that, tested with a 64k ppc64 kernel.
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Tested-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Although virtio config space fields are usually in guest-native endian,
the spec for the virtio balloon device explicitly states that both fields
in its config space are little-endian.
However, the current virtio_balloon driver does not have a suitable endian
swap for the 'num_pages' field, although it does have one for the 'actual'
field. This patch corrects the bug, adding sparse annotation while we're
at it.
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
There's no difference in supporting S3 and S4 for virtio devices: the
vqs have to be re-created as the device has to be assumed to be reset at
restore-time. Since S4 already handles this situation, we can directly
use the same code and callbacks for S3 support.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
restore_common() was shared between restore and thaw callbacks. With
thaw gone, we don't need restore_common() anymore.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
The thaw operation was used by the balloon driver, but after the last
commit there's no reason to have separate thaw and restore callbacks.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
There's no reason stats update after restore can't work. If a host
requested for stats, and before servicing the request, the guest entered
S4, upon restore, the stats request can still be processed and sent off
to the host.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Remove all #inclusions of asm/system.h preparatory to splitting and killing
it. Performed with the following command:
perl -p -i -e 's!^#\s*include\s*<asm/system[.]h>.*\n!!' `grep -Irl '^#\s*include\s*<asm/system[.]h>' *`
Signed-off-by: David Howells <dhowells@redhat.com>
commit e562966dba added support for S4 to
the balloon driver. The freeze function did nothing to free the pages,
since reclaiming the pages from the host to immediately give them back
(if S4 was successful) seemed wasteful. Also, if S4 wasn't successful,
the guest would have to re-fill the balloon. On restore, the pages were
supposed to be marked freed and the free page counters were incremented
to reflect the balloon was totally deflated.
However, this wasn't done right. The pages that were earlier taken away
from the guest during a balloon inflation operation were just shown as
used pages after a successful restore from S4. Just a fancy way of
leaking lots of memory.
Instead of trying that, just leak the balloon on freeze and fill it on
restore/thaw paths. This works properly now. The optimisation to not
leak can be added later on after a bit of refactoring of the code.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Use virtio_mb() to make sure the available index to be exposed before
checking the the avail event. Otherwise we may get stale value of
avail event in guest and never kick the host after.
Note: this fixes a bug introduced by ee7cd8981e.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org
Note: this fixes a bug introduced recently in
7b21e34fd1.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Handling balloon hibernate / restore is tricky. If the balloon was
inflated before going into the hibernation state, upon resume, the host
will not have any memory of that. Any pages that were passed on to the
host earlier would most likely be invalid, and the host will have to
re-balloon to the previous value to get in the pre-hibernate state.
So the only sane thing for the guest to do here is to discard all the
pages that were put in the balloon. When to discard the pages is the
next question.
One solution is to deflate the balloon just before writing the image to
the disk (in the freeze() PM callback). However, asking for pages from
the host just to discard them immediately after seems wasteful of
resources. Hence, it makes sense to do this by just fudging our
counters soon after wakeup. This means we don't deflate the balloon
before sleep, and also don't put unnecessary pressure on the host.
This also helps in the thaw case: if the freeze fails for whatever
reason, the balloon should continue to remain in the inflated state.
This was tested by issuing 'swapoff -a' and trying to go into the S4
state. That fails, and the balloon stays inflated, as expected. Both
the host and the guest are happy.
Finally, in the restore() callback, we empty the list of pages that were
previously given off to the host, add the appropriate number of pages to
the totalram_pages counter, reset the num_pages counter to 0, and
all is fine.
As a last step, delete the vqs on the freeze callback to prepare for
hibernation, and re-create them in the restore and thaw callbacks to
resume normal operation.
The kthread doesn't race with any operations here, since it's frozen
before the freeze() call and is thawed after the thaw() and restore()
callbacks, so we're safe with that.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The probe and PM restore functions will share this code.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Handle thaw, restore and freeze notifications from the PM core. Expose
these to individual virtio drivers that can quiesce and resume vq
operations. For drivers not implementing the thaw() method, use the
restore method instead.
These functions also save device-specific data so that the device can be
put in pre-suspend state after resume, and disable and enable the PCI
device in the freeze and resume functions, respectively.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The older PM API doesn't have a way to get notifications on hibernate
events. Switch to the newer one that gives us those notifications.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Under the existing #ifdef DEBUG, check that they don't have more than
1/10 of a second between an add_buf() and a
virtqueue_notify()/virtqueue_kick_prepare() call.
We could get false positives on a really busy system, but good for
development.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
A virtio driver does virtqueue_add_buf() multiple times before finally
calling virtqueue_kick(); previously we only exposed the added buffers
in the virtqueue_kick() call. This means we don't need a memory
barrier in virtqueue_add_buf(), but it reduces concurrency as the
device (ie. host) can't see the buffers until the kick.
In the unusual (but now possible) case where a driver does add_buf()
and get_buf() without doing a kick, we do need to insert one before
our counter wraps. Otherwise we could wrap num_added, and later on
not realize that we have passed the marker where we should have
kicked.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since we know vq->vring.num is a power of 2, modulus is lazy (it's asserted
in vring_new_virtqueue()).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Based on patch by Christoph for virtio_blk speedup:
Split virtqueue_kick to be able to do the actual notification
outside the lock protecting the virtqueue. This patch was
originally done by Stefan Hajnoczi, but I can't find the
original one anymore and had to recreated it from memory.
Pointers to the original or corrections for the commit message
are welcome.
Stefan's patch was here:
a6d06644e3http://www.spinics.net/lists/linux-virtualization/msg14616.html
Third time's the charm!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Remove wrapper functions. This makes the allocation type explicit in
all callers; I used GPF_KERNEL where it seemed obvious, left it at
GFP_ATOMIC otherwise.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The old documentation is left over from when we used a structure with
strategy pointers.
And move the documentation to the C file as per kernel practice.
Though I disagree...
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Trivial changes to remove forgotten junk, format comments, and correct names.
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were cheating with our barriers; using the smp ones rather than the
real device ones. That was fine, until rpmsg came along, which is
used to talk to a real device (a non-SMP CPU).
Unfortunately, just putting back the real barriers (reverting
d57ed95d) causes a performance regression on virtio-pci. In
particular, Amos reports netbench's TCP_RR over virtio_net CPU
utilization increased up to 35% while throughput went down by up to
14%.
By comparison, this branch is in the noise.
Reference: https://lkml.org/lkml/2011/12/11/22
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio pci device reset actually just does an I/O
write, which in PCI is really posted, that is it
can complete on CPU before the device has received it.
Further, interrupts might have been pending on
another CPU, so device callback might get invoked after reset.
This conflicts with how drivers use reset, which is typically:
reset
unregister
a callback running after reset completed can race with
unregister, potentially leading to use after free bugs.
Fix by flushing out the write, and flushing pending interrupts.
This assumes that device is never reset from
its vq/config callbacks, or in parallel with being
added/removed, document this assumption.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fix this compile error on s390:
CC [M] drivers/virtio/virtio_mmio.o
drivers/virtio/virtio_mmio.c: In function 'vm_get_features':
drivers/virtio/virtio_mmio.c:107:2: error: implicit declaration of function 'writel'
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Pawel Moll <pawel.moll@arm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>