This fixes couple of races which lead to infinite wait of park completion
with the following backtraces:
[20801.303319] Call Trace:
[20801.303321] ? __schedule+0x284/0x650
[20801.303323] schedule+0x33/0xc0
[20801.303324] schedule_timeout+0x1bc/0x210
[20801.303326] ? schedule+0x3d/0xc0
[20801.303327] ? schedule_timeout+0x1bc/0x210
[20801.303329] ? preempt_count_add+0x79/0xb0
[20801.303330] wait_for_completion+0xa5/0x120
[20801.303331] ? wake_up_q+0x70/0x70
[20801.303333] kthread_park+0x48/0x80
[20801.303335] io_finish_async+0x2c/0x70
[20801.303336] io_ring_ctx_wait_and_kill+0x95/0x180
[20801.303338] io_uring_release+0x1c/0x20
[20801.303339] __fput+0xad/0x210
[20801.303341] task_work_run+0x8f/0xb0
[20801.303342] exit_to_usermode_loop+0xa0/0xb0
[20801.303343] do_syscall_64+0xe0/0x100
[20801.303349] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[20801.303380] Call Trace:
[20801.303383] ? __schedule+0x284/0x650
[20801.303384] schedule+0x33/0xc0
[20801.303386] io_sq_thread+0x38a/0x410
[20801.303388] ? __switch_to_asm+0x40/0x70
[20801.303390] ? wait_woken+0x80/0x80
[20801.303392] ? _raw_spin_lock_irqsave+0x17/0x40
[20801.303394] ? io_submit_sqes+0x120/0x120
[20801.303395] kthread+0x112/0x130
[20801.303396] ? kthread_create_on_node+0x60/0x60
[20801.303398] ret_from_fork+0x35/0x40
o kthread_park() waits for park completion, so io_sq_thread() loop
should check kthread_should_park() along with khread_should_stop(),
otherwise if kthread_park() is called before prepare_to_wait()
the following schedule() never returns:
CPU#0 CPU#1
io_sq_thread_stop(): io_sq_thread():
while(!kthread_should_stop() && !ctx->sqo_stop) {
ctx->sqo_stop = 1;
kthread_park()
prepare_to_wait();
if (kthread_should_stop() {
}
schedule(); <<< nobody checks park flag,
<<< so schedule and never return
o if the flag ctx->sqo_stop is observed by the io_sq_thread() loop
it is quite possible, that kthread_should_park() check and the
following kthread_parkme() is never called, because kthread_park()
has not been yet called, but few moments later is is called and
waits there for park completion, which never happens, because
kthread has already exited:
CPU#0 CPU#1
io_sq_thread_stop(): io_sq_thread():
ctx->sqo_stop = 1;
while(!kthread_should_stop() && !ctx->sqo_stop) {
<<< observe sqo_stop and exit the loop
}
if (kthread_should_park())
kthread_parkme(); <<< never called, since was
<<< never parked
kthread_park() <<< waits forever for park completion
In the current patch we quit the loop by only kthread_should_park()
check (kthread_park() is synchronous, so kthread_should_stop() is
never observed), and we abandon ->sqo_stop flag, since it is racy.
At the end of the io_sq_thread() we unconditionally call parmke(),
since we've exited the loop by the park flag.
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This message should better identify the DM device with the integrity
failure.
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The information about tag size should not be printed without debug info
set. Also print device major:minor in the error message to identify the
device instance.
Also use rate limiting and debug level for info about used crypto API
implementaton. This is important because during online reencryption
the existing message saturates syslog (because we are moving hotzone
across the whole device).
Cc: stable@vger.kernel.org
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Pull NVMe fixes from Christoph
* 'nvme-5.2' of git://git.infradead.org/nvme:
nvme: validate cntlid during controller initialisation
nvme: change locking for the per-subsystem controller list
nvme: trace all async notice events
nvme: fix typos in nvme status code values
nvme-fabrics: remove unused argument
nvme-multipath: avoid crash on invalid subsystem cntlid enumeration
nvme-fc: use separate work queue to avoid warning
nvme-rdma: remove redundant reference between ib_device and tagset
nvme-pci: mark expected switch fall-through
nvme-pci: add known admin effects to augument admin effects log page
nvme-pci: init shadow doorbell after each reset
The dm_early_create() function (which deals with "dm-mod.create=" kernel
command line option) calls dm_hash_insert() who gets an extra reference
to the md object.
In case of failure, this reference wasn't being released, causing
dm_destroy() to hang, thus hanging the whole boot process.
Fix this by calling __hash_remove() in the error path.
Fixes: 6bbc923dfc ("dm: add support to directly boot to a mapped device")
Cc: stable@vger.kernel.org
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Currently, once configured, AFS cells are looked up in the DNS at regular
intervals - which is a waste of resources if those cells aren't being
used. It also leads to a problem where cells preloaded, but not
configured, before the network is brought up end up effectively statically
configured with no VL servers and are unable to get any.
Fix this by not doing the DNS lookup until the first time a cell is
touched. It is waited for if we don't have any cached records yet,
otherwise the DNS lookup to maintain the record is done in the background.
This has the downside that the first time you touch a cell, you now have to
wait for the upcall to do the required DNS lookups rather than them already
being cached.
Further, the record is not replaced if the old record has at least one
server in it and the new record doesn't have any.
Fixes: 0a5143f2f8 ("afs: Implement VL server rotation")
Signed-off-by: David Howells <dhowells@redhat.com>
clock_getres in the vDSO library has to preserve the same behaviour
of posix_get_hrtimer_res().
In particular, posix_get_hrtimer_res() does:
sec = 0;
ns = hrtimer_resolution;
and hrtimer_resolution depends on the enablement of the high
resolution timers that can happen either at compile or at run time.
Fix the nds32 vdso implementation of clock_getres keeping a copy of
hrtimer_resolution in vdso data and using that directly.
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
On x86_64, all returns to usermode go through
prepare_exit_to_usermode(), with the sole exception of do_nmi().
This even includes machine checks -- this was added several years
ago to support MCE recovery. Update the documentation.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jon Masters <jcm@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 04dcbdb805 ("x86/speculation/mds: Clear CPU buffers on exit to user")
Link: http://lkml.kernel.org/r/999fa9e126ba6a48e9d214d2f18dbde5c62ac55c.1557865329.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The double fault ESPFIX path doesn't return to user mode at all --
it returns back to the kernel by simulating a #GP fault.
prepare_exit_to_usermode() will run on the way out of
general_protection before running user code.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jon Masters <jcm@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: 04dcbdb805 ("x86/speculation/mds: Clear CPU buffers on exit to user")
Link: http://lkml.kernel.org/r/ac97612445c0a44ee10374f6ea79c222fe22a5c4.1557865329.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
None of these is used by modules. Nor should they as we have better
highlevel primitives.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Greentime Hu <greentime@andestech.com>
Signed-off-by: Greentime Hu <greentime@andestech.com>
Merge in a few pending fixes from pre-5.1 that didn't get sent in:
MAINTAINERS: update arch/arm/mach-davinci
ARM: dts: ls1021: Fix SGMII PCS link remaining down after PHY disconnect
ARM: dts: imx6q-logicpd: Reduce inrush current on USBH1
ARM: dts: imx6q-logicpd: Reduce inrush current on start
ARM: dts: imx: Fix the AR803X phy-mode
ARM: dts: sun8i: a33: Reintroduce default pinctrl muxing
arm64: dts: allwinner: a64: Rename hpvcc-supply to cpvdd-supply
ARM: sunxi: fix a leaked reference by adding missing of_node_put
ARM: sunxi: fix a leaked reference by adding missing of_node_put
Signed-off-by: Olof Johansson <olof@lixom.net>
Add llseek op for SEEK_DATA and SEEK_HOLE.
Improves xfstests/285,286,436,445,448 and 490
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Copychunk allows source and target to be on the same file.
For details on restrictions see MS-SMB2 3.3.5.15.6
Signed-off-by: Kovtunenko Oleksandr <alexander198961@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
An IOCTL uses up to 2 iovs. The 1st iov is the command itself, the 2nd iov is
optional data for that command. The 1st iov is always allocated on the heap
but the 2nd iov may point to a variable on the stack. This will trigger an
error when passing the 2nd iov for RDMA I/O.
Fix this by allocating a buffer for the 2nd iov.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Ronnie sahlberg <lsahlber@redhat.com>
SMBDirect manages its own ports in the transport layer, there is no need to
check the port to find a connection.
Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie sahlberg <lsahlber@redhat.com>
* Fix a long standing namespace label corruption scenario when
re-provisioning capacity for a namespace.
* Restore the ability of the dax_pmem module to be built-in.
* Harden the build for the 'nfit_test' unit test modules so that the
userspace test harness can ensure all required test modules are
available.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJc3KYfAAoJEB7SkWpmfYgCGtUP/1eZkLk6X2xYYZw6mMKbaVUm
F4f7uOhpFFNonor0EhZVgTXqLEjFE9ux3+kZi0EZkvJhOSWAftICo1mzqLBnDxSW
QiNMOcFeM16Df3a6kwMbrcsjVRMyI63E4qH2puaPcW4sSRVhrMzKcklx+iWtubtk
q/bXx4B4n78E509FMagF3Irt6iWh235YqAXapbh4jSLGs8BJ0LRE8WVnMridCYcD
MV4QEYHWwHh0SlQ7HM/jdGtCwJyaFiHK+G6CDqUqTR7NKFpkpAwKDT2UULkdpzSo
1YUJQfUcdpwp7GUaTvGWL8BDyVklh+pLQtp+lepQfpPbSLPMC6qRC+hZXAuxlX7h
Dj94P9DYZWJk9b0Z4NaqJCjADi/iKIdCHa9dIOPb4XmbXgTLnS15HsG0asBeoTuF
UfDNdOLo8Tz+3dwNvmJ9Avb8ShqYLkwiOfkOeBDGu/+OWTiUrbghbAhhM4iOd8ey
cFYc5MWD0HA4F0f4jw0o0fKQ3qGfhqEabdN1Z54nZ53y+t9MFx2fTAXAq26f+oly
HM3ehus7EiNUS9gjMC55AAPTt5S/S4nu+YiMQUwlRfj2ErkYvrRsQAl7x4QjEWdu
RSfrGCMb37OaMnzFGw49GGJsZqPUeT1O2anxVDVTM+RvMCi6fJY75XRJVMs2A6CG
WNVEIGIQQbHwEyidOAPC
=QM33
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-fixes-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"Just a small collection of fixes this time around.
The new virtio-pmem driver is nearly ready, but some last minute
device-mapper acks and virtio questions made it prudent to await v5.3.
Other major topics that were brewing on the linux-nvdimm mailing list
like sub-section hotplug, and other devm_memremap_pages() reworks will
go upstream through Andrew's tree.
Summary:
- Fix a long standing namespace label corruption scenario when
re-provisioning capacity for a namespace.
- Restore the ability of the dax_pmem module to be built-in.
- Harden the build for the 'nfit_test' unit test modules so that the
userspace test harness can ensure all required test modules are
available"
* tag 'libnvdimm-fixes-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
drivers/dax: Allow to include DEV_DAX_PMEM as builtin
libnvdimm/namespace: Fix label tracking error
tools/testing/nvdimm: add watermarks for dax_pmem* modules
dax/pmem: Fix whitespace in dax_pmem
Core:
* Add over-current health state
* Add standard, adaptive and custom charge types
* Add new properties for start/end charge threshold
New Drivers / Hardware:
* UCS1002 Programmable USB Port Power Controller
* Ingenic JZ47xx Battery Fuel Gauge
* AXP20x USB Power: Add AXP813 support
* AT91 poweroff: Add SAM9X60 support
* OLPC battery: Add XO-1.5 and XO-1.75 support
Misc. Changes:
* syscon-reboot: support mask property
* AXP288 fuel gauge: Blacklist ACEPC T8/T11
- Looks like some vendor thought it's a good idea to
build a desktop system with a fuel gauge, that slowly
"discharges"...
* cpcap-battery: Fix calculation errors
* misc. fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE72YNB0Y/i3JqeVQT2O7X88g7+poFAlzbPpUACgkQ2O7X88g7
+ppU9w/9GDMAHh5LelpuKosuWfdoZMOiMqtyp+GH+Tg4t/cYksTpUFcupKE8sIEU
HG+YHNZdD56rHYz7fF6/SRAWfj1o77+Hr2s7XQlLayReFYuxltPIM+MX+xXpj4Qt
OJcSWnk9233UqfodPAyvC/Tj+I0SgElOUmkhhe5fqNtktQeJgvDO1Gs2oNBZOuMG
+ySTT+8Dba2YbXAHYXYdyzMG1YuDZLbkvSpkYzRBH4CyfDrcTH2zkkfQSu0pAYPk
VwdeWw05yKRNZtWhwS+eUefIXmdu8ZH2BNrYk5PobTeDhhMYx+QzoTuxyhIY+Mbq
I1tabHrIOMy1Xyw0QsbB2/ujrt5SzNv6SLxgKaPvgPSr1uPz3Ogl3+SRziNY3zvN
SmxSedAL5qx/TBTL+rKSKCO66aU8jAdGzvnRfwWcCoQhE+EZF5r0vSn5zIhR2Fxh
fKKph8ZZv7426jPBuXTOurQVRs8daa+DmwHauebq4MNnhftJM1PfTb8SFOwrDTMD
Es4M5BXgn/1RKfqjh0gKTYkbRBCtUhnHUAPmzAKFCbEENc0eC439P3wQ8lP0EzFT
QHpdpPxeMor24HjVldfi0K4hXqNPGEnTlZwq7Asu6NAp0HcgdqIGXiLqQP3/s5ds
gMUqOLNRAywupdpMT7db7JadnVmDRK1sHZnhk4wTAPt4Q6gqcE8=
=qicd
-----END PGP SIGNATURE-----
Merge tag 'for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply
Pull power supply and reset updates from Sebastian Reichel:
"Core:
- Add over-current health state
- Add standard, adaptive and custom charge types
- Add new properties for start/end charge threshold
New Drivers / Hardware:
- UCS1002 Programmable USB Port Power Controller
- Ingenic JZ47xx Battery Fuel Gauge
- AXP20x USB Power: Add AXP813 support
- AT91 poweroff: Add SAM9X60 support
- OLPC battery: Add XO-1.5 and XO-1.75 support
Misc Changes:
- syscon-reboot: support mask property
- AXP288 fuel gauge: Blacklist ACEPC T8/T11. Looks like some vendor
thought it's a good idea to build a desktop system with a fuel
gauge, that slowly "discharges"...
- cpcap-battery: Fix calculation errors
- misc fixes"
* tag 'for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply: (54 commits)
power: supply: olpc_battery: force the le/be casts
power: supply: ucs1002: Fix build error without CONFIG_REGULATOR
power: supply: ucs1002: Fix wrong return value checking
power: supply: Add driver for Microchip UCS1002
dt-bindings: power: supply: Add bindings for Microchip UCS1002
power: supply: core: Add POWER_SUPPLY_HEALTH_OVERCURRENT constant
power: supply: core: fix clang -Wunsequenced
power: supply: core: Add missing documentation for CHARGE_CONTROL_* properties
power: supply: core: Add CHARGE_CONTROL_{START_THRESHOLD,END_THRESHOLD} properties
power: supply: core: Add Standard, Adaptive, and Custom charge types
power: supply: axp288_fuel_gauge: Add ACEPC T8 and T11 mini PCs to the blacklist
power: supply: bq27xxx_battery: Notify also about status changes
power: supply: olpc_battery: Have the framework register sysfs files for us
power: supply: olpc_battery: Add OLPC XO 1.75 support
power: supply: olpc_battery: Avoid using platform_info
power: supply: olpc_battery: Use devm_power_supply_register()
power: supply: olpc_battery: Move priv data to a struct
power: supply: olpc_battery: Use DT to get battery version
x86/platform/olpc: Use a correct version when making up a battery node
x86/platform/olpc: Trivial code move in DT fixup
...
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCXNxbogAKCRCAXGG7T9hj
vpyFAQCUWBVb3vHQqqqsboKYA86cJg/t8fjdhw+vFieDcLs7ZwEA4nBDP9JfoHiV
HkDjhD3SEPS3kftsrR1PVGLrv/dIqgo=
=4YnV
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.2b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- some minor cleanups
- two small corrections for Xen on ARM
- two fixes for Xen PVH guest support
- a patch for a new command line option to tune virtual timer handling
* tag 'for-linus-5.2b-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/arm: Use p2m entry with lock protection
xen/arm: Free p2m entry if fail to add it to RB tree
xen/pvh: correctly setup the PV EFI interface for dom0
xen/pvh: set xen_domain_type to HVM in xen_pvh_init
xenbus: drop useless LIST_HEAD in xenbus_write_watch() and xenbus_file_write()
xen-netfront: mark expected switch fall-through
xen: xen-pciback: fix warning Using plain integer as NULL pointer
x86/xen: Add "xen_timer_slop" command line option
Generic kernels feed many operation through the "machvec" logic to get
the correct form of the operation for the current system. "mmiowb()" is
one of those operations.
Although machvec is initialized very early in boot, it isn't early
enough for a recent upstream kernel change that added mmiowb to the
spin_unlock() path.
Statically initialize the mmiowb field of machvec so that we won't die
with a call through a NULL pointer. This should be safe because we do
the real initialization of machvec before bringing up any addtional CPUs
or doing any I/O.
Fixes: 49ca6462fc ("ia64/mmiowb: Add unconditional mmiowb() to arch_spin_unlock()")
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Scott Mayhew revived an old api that communicates with a userspace
daemon to manage some on-disk state that's used to track clients across
server reboots. We've been using a usermode_helper upcall for that, but
it's tough to run those with the right namespaces, so a daemon is much
friendlier to container use cases.
Trond fixed nfsd's handling of user credentials in user namespaces. He
also contributed patches that allow containers to support different sets
of NFS protocol versions.
The only remaining container bug I'm aware of is that the NFS reply
cache is shared between all containers. If anyone's aware of other gaps
in our container support, let me know.
The rest of this is miscellaneous bugfixes.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEYtFWavXG9hZotryuJ5vNeUKO4b4FAlzcWNcVHGJmaWVsZHNA
ZmllbGRzZXMub3JnAAoJECebzXlCjuG+DUEP/0WD3jKNAHFV3M5YQPAI9fz/iCND
Db/A4oWP5qa6JmwmHe61il29QeGqkeFr/NPexgzM3Xw2E39d7RBXBeWyVDuqb0wr
6SCXjXibTsuAHg11nR8Xf0P5Vej3rfGbG6up5lLCIDTEZxVpWoaBJnM8+3bewuCj
XbeiDW54oiMbmDjon3MXqVAIF/z7LjorecJ+Yw5+0Jy7KZ6num9Kt8+fi7qkEfFd
i5Bp9KWgzlTbJUJV4EX3ZKN3zlGkfOvjoo2kP3PODPVMB34W8jSLKkRSA1tDWYZg
43WhBt5OODDlV6zpxSJXehYKIB4Ae469+RRaIL4F+ORRK+AzR0C/GTuOwJiG+P3J
n95DX5WzX74nPOGQJgAvq4JNpZci85jM3jEK1TR2M7KiBDG5Zg+FTsPYVxx5Sgah
Akl/pjLtHQPSdBbFGHn5TsXU+gqWNiKsKa9663tjxLb8ldmJun6JoQGkAEF9UJUn
dzv0UxyHeHAblhSynY+WsUR+Xep9JDo/p5LyFK4if9Sd62KeA1uF/MFhAqpKZF81
mrgRCqW4sD8aVTBNZI06pZzmcZx4TRr2o+Oj5KAXf6Yk6TJRSGfnQscoMMBsTLkw
VK1rBQ/71TpjLHGZZZEx1YJrkVZAMmw2ty4DtK2f9jeKO13bWmUpc6UATzVufHKA
C1rUZXJ5YioDbYDy
=TUdw
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"This consists mostly of nfsd container work:
Scott Mayhew revived an old api that communicates with a userspace
daemon to manage some on-disk state that's used to track clients
across server reboots. We've been using a usermode_helper upcall for
that, but it's tough to run those with the right namespaces, so a
daemon is much friendlier to container use cases.
Trond fixed nfsd's handling of user credentials in user namespaces. He
also contributed patches that allow containers to support different
sets of NFS protocol versions.
The only remaining container bug I'm aware of is that the NFS reply
cache is shared between all containers. If anyone's aware of other
gaps in our container support, let me know.
The rest of this is miscellaneous bugfixes"
* tag 'nfsd-5.2' of git://linux-nfs.org/~bfields/linux: (23 commits)
nfsd: update callback done processing
locks: move checks from locks_free_lock() to locks_release_private()
nfsd: fh_drop_write in nfsd_unlink
nfsd: allow fh_want_write to be called twice
nfsd: knfsd must use the container user namespace
SUNRPC: rsi_parse() should use the current user namespace
SUNRPC: Fix the server AUTH_UNIX userspace mappings
lockd: Pass the user cred from knfsd when starting the lockd server
SUNRPC: Temporary sockets should inherit the cred from their parent
SUNRPC: Cache the process user cred in the RPC server listener
nfsd: Allow containers to set supported nfs versions
nfsd: Add custom rpcbind callbacks for knfsd
SUNRPC: Allow further customisation of RPC program registration
SUNRPC: Clean up generic dispatcher code
SUNRPC: Add a callback to initialise server requests
SUNRPC/nfs: Fix return value for nfs4_callback_compound()
nfsd: handle legacy client tracking records sent by nfsdcld
nfsd: re-order client tracking method selection
nfsd: keep a tally of RECLAIM_COMPLETE operations when using nfsdcld
nfsd: un-deprecate nfsdcld
...
- Fix the low refresh rate register in adv7511
- A handful of msm fixes that fell out of 5.1 bringup on SDM845
- Fix spinlock initialization in pl111
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEfxcpfMSgdnQMs+QqlvcN/ahKBwoFAlzccqQACgkQlvcN/ahK
Bwojowf/fzcPku4y1L8IzPIoQ75iYr7VC4jhHiTN1DxHHVQUCCMpdPk9EPMJUPY3
OeMmOqirtOL4+QK+5z/3Zp8ERJ+NYz93Hgkc3GNlNGwt12odr1mavRWNmhPRNx0t
Ghj3nFDRI9MuK6rK3Hzc+Kap5vTFwD3bAEaCt3eRGlnf0yTzqwer969GsaKCeMeo
XqSI3ZtJq2T6bxpvwgKbRNHWxnEGtW/wI7newfz0I3BIVIbciTPz564kNmXiba8R
mC8CG2WB860hxT4AZkFoKTP8IalfskL9XPrLKOmpYocPxPDSwBz6lHQzOJlvk52d
OjQrcmD3URs5j3xSVmMvoPrT05fTYQ==
=dZYN
-----END PGP SIGNATURE-----
Merge tag 'drm-misc-next-fixes-2019-05-15' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
- A couple new panfrost fixes
- Fix the low refresh rate register in adv7511
- A handful of msm fixes that fell out of 5.1 bringup on SDM845
- Fix spinlock initialization in pl111
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515201729.GA89093@art_vandelay
- Handle meta data in GRUB_MENU
- Add variable to cusomize what return value the reboot code should return.
- Add support for grub2bls boot loader
- Show name and test iteration number in error message sent in mail
- Minor fixes and clean ups
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXNxRZxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qrVCAP4rgnMgMsPOt0cMtKty1Z3uA6njfrZc
UU1gNeHEvKr1MQEAhYy4N5FCigBygALEczmUIYwrzVq3luNPTwgUeUIH3AY=
=BTC7
-----END PGP SIGNATURE-----
Merge tag 'ktest-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull more ktest updates from Steven Rostedt:
- Add support for grub2bls boot loader
- Show name and test iteration number in error message sent in mail
- Minor fixes and clean ups
* tag 'ktest-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: update sample.conf for grub2bls
ktest: remove get_grub2_index
ktest: pass KERNEL_VERSION to POST_KTEST
ktest: introduce grub2bls REBOOT_TYPE option
ktest: cleanup get_grub_index
ktest: introduce _get_grub_index
- Removing of non-DYNAMIC_FTRACE from 32bit x86
- Removing of mcount support from x86
- Emulating a call from int3 on x86_64, fixes live kernel patching
- Consolidated Tracing Error logs file
Minor updates:
- Removal of klp_check_compiler_support()
- kdb ftrace dumping output changes
- Accessing and creating ftrace instances from inside the kernel
- Clean up of #define if macro
- Introduction of TRACE_EVENT_NOP() to disable trace events based on config
options
And other minor fixes and clean ups
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXNxMZxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qq4PAP44kP6VbwL8CHyI2A3xuJ6Hwxd+2Z2r
ip66RtzyJ+2iCgEA2QCuWUlEt2bLpF9a8IQ4N9tWenSeW2i7gunPb+tioQw=
=RVQo
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
"The major changes in this tracing update includes:
- Removal of non-DYNAMIC_FTRACE from 32bit x86
- Removal of mcount support from x86
- Emulating a call from int3 on x86_64, fixes live kernel patching
- Consolidated Tracing Error logs file
Minor updates:
- Removal of klp_check_compiler_support()
- kdb ftrace dumping output changes
- Accessing and creating ftrace instances from inside the kernel
- Clean up of #define if macro
- Introduction of TRACE_EVENT_NOP() to disable trace events based on
config options
And other minor fixes and clean ups"
* tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
x86: Hide the int3_emulate_call/jmp functions from UML
livepatch: Remove klp_check_compiler_support()
ftrace/x86: Remove mcount support
ftrace/x86_32: Remove support for non DYNAMIC_FTRACE
tracing: Simplify "if" macro code
tracing: Fix documentation about disabling options using trace_options
tracing: Replace kzalloc with kcalloc
tracing: Fix partial reading of trace event's id file
tracing: Allow RCU to run between postponed startup tests
tracing: Fix white space issues in parse_pred() function
tracing: Eliminate const char[] auto variables
ring-buffer: Fix mispelling of Calculate
tracing: probeevent: Fix to make the type of $comm string
tracing: probeevent: Do not accumulate on ret variable
tracing: uprobes: Re-enable $comm support for uprobe events
ftrace/x86_64: Emulate call function while updating in breakpoint handler
x86_64: Allow breakpoints to emulate call instructions
x86_64: Add gap to int3 to allow for call emulation
tracing: kdb: Allow ftdump to skip all but the last few entries
tracing: Add trace_total_entries() / trace_total_entries_cpu()
...
This reverts commit e8c24bbda7.
GCC 4.7, which is still permitted, emits code using the original
syntax. This means we end up with lots of assembler warnings when
building with a currently-supported version of gcc.
Revert the commit (with fixups to keep the follow-on -mauto-it
change) to avoid these warnings.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
- guest SVE support
- guest Pointer Authentication support
- Better discrimination of perf counters between host and guests
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlzMM9kVHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDEp8P/iqZvvZlLdlnWQwluWh237c28kAo
zELO0L7Wl+OJ66v2hzM+NPBi5kv/9pSv7AoKNLv3398YmKFt0n7yUB+MHi0BC9xi
ZEp4etCOiVcqcWWeDiAXLdR9OQlb7IDBDc56s4V9HQgK3sEb4u8aEJIy/nDBVniv
GVLMh1EOsrviIYso6UVxI1X7lPQevpCS0kv9/llhhzEj8QDxnQThjDuW3wrAyhQi
F9XNVjAMW8rft7vvok9cxT4v+TR1HgUajquoSrjXuonWHgKnC9tSH/dHILNK8Zij
5OApojGlZQrXIa5Sk3JOhGahVVY9Y+ewsw58J5bJxd0/xrKXnWk/Lann7NE+UcBf
RJMHfanIO/+JJRzHhagejK7pqnYXD1PWBwF8z3Hefs1IVw4eBvPBGuhIULJ6+eSP
+3JCwiOiwshG43gZlGmHcgvhPdeX4r/BlopWV9+0X/gAjcU1+3+ZG6J3jeAcC1Kx
i481dSzlZ7Ar7VWDCk7WgcmDvUwHXtxq0HbqzQjPBO04kkakjdPZZrZIX3+Qhlem
GpkPVb2z5h5KTk9Fx03ZXxPVdiOQh1UmNC8jlsYZPWcJVTLkySs7HWXZJe+WTs4Z
NLuen/eA4/NCon+UA6XdIG5Ddn/J39UuF1lCApHPHn576rwz+HmqpcN59XiU6y4h
XHIxzajFcXNpn802
=fjph
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-for-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm updates for 5.2
- guest SVE support
- guest Pointer Authentication support
- Better discrimination of perf counters between host and guests
Conflicts:
include/uapi/linux/kvm.h
- Fix a bug, fix a spelling mistake, remove some useless code.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJc2kTEAAoJEJ2a6ncsY3GfS88IAImcIlKXMvzSKtHFxGpRap17
9LTZs5MQAUZHVMFJXmrJLBgogtGxUw53aX53woeyerytZsoGU4+YzwgLhk4XBEzA
5Kt5ahlxu82sa2ThH1zyLlNWFXiTECgD5ErNTdavLbNlaKE8YG160+65/mSyixGz
vs5wLSYGv/37no1ay6PIZ3DtwqdrYq5nJbuG+ZsaamUHPJOGprqHqg0gaTJ877NZ
yQDUS7OVuEJ1pdUUK/elP+cnlqR9smaP5OUNsXYMHWJgPJMjc27/thBJy93iS1kk
/zKQ8AFmxqoaePnR7ymTbqurfFFHBiSavUmyWopSQppNHCf4DDE8XjLs9MXKez8=
=Lco4
-----END PGP SIGNATURE-----
Merge tag 'kvm-ppc-next-5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
PPC KVM update for 5.2
* Support for guests to access the new POWER9 XIVE interrupt controller
hardware directly, reducing interrupt latency and overhead for guests.
* In-kernel implementation of the H_PAGE_INIT hypercall.
* Reduce memory usage of sparsely-populated IOMMU tables.
* Several bug fixes.
Second PPC KVM update for 5.2
* Fix a bug, fix a spelling mistake, remove some useless code.
Makes au1000-eth work again, tested on DB1500.
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Linux-MIPS <linux-mips@linux-mips.org>
The RDPMC-exiting control is dependent on the existence of the RDPMC
instruction itself, i.e. is not tied to the "Architectural Performance
Monitoring" feature. For all intents and purposes, the control exists
on all CPUs with VMX support since RDPMC also exists on all VCPUs with
VMX supported. Per Intel's SDM:
The RDPMC instruction was introduced into the IA-32 Architecture in
the Pentium Pro processor and the Pentium processor with MMX technology.
The earlier Pentium processors have performance-monitoring counters, but
they must be read with the RDMSR instruction.
Because RDPMC-exiting always exists, KVM requires the control and refuses
to load if it's not available. As a result, hiding the PMU from a guest
breaks nested virtualization if the guest attemts to use KVM.
While it's not explicitly stated in the RDPMC pseudocode, the VM-Exit
check for RDPMC-exiting follows standard fault vs. VM-Exit prioritization
for privileged instructions, e.g. occurs after the CPL/CR0.PE/CR4.PCE
checks, but before the counter referenced in ECX is checked for validity.
In other words, the original KVM behavior of injecting a #GP was correct,
and the KVM unit test needs to be adjusted accordingly, e.g. eat the #GP
when the unit test guest (L3 in this case) executes RDPMC without
RDPMC-exiting set in the unit test host (L2).
This reverts commit e51bfdb687.
Fixes: e51bfdb687 ("KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU")
Reported-by: David Hill <hilld@binarystorm.net>
Cc: Saar Amar <saaramar@microsoft.com>
Cc: Mihai Carabas <mihai.carabas@oracle.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Currently KVM sets 5 most significant bits of physical address bits
reported by CPUID (boot_cpu_data.x86_phys_bits) for nonpresent or
reserved bits SPTE to mitigate L1TF attack from guest when using shadow
MMU. However for some particular Intel CPUs the physical address bits
of internal cache is greater than physical address bits reported by
CPUID.
Use the kernel's existing boot_cpu_data.x86_cache_bits to determine the
five most significant bits. Doing so improves KVM's L1TF mitigation in
the unlikely scenario that system RAM overlaps the high order bits of
the "real" physical address space as reported by CPUID. This aligns with
the kernel's warnings regarding L1TF mitigation, e.g. in the above
scenario the kernel won't warn the user about lack of L1TF mitigation
if x86_cache_bits is greater than x86_phys_bits.
Also initialize shadow_nonpresent_or_rsvd_mask explicitly to make it
consistent with other 'shadow_{xxx}_mask', and opportunistically add a
WARN once if KVM's L1TF mitigation cannot be applied on a system that
is marked as being susceptible to L1TF.
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
If L1 is using an MSR bitmap, unconditionally merge the MSR bitmaps from
L0 and L1 for MSR_{KERNEL,}_{FS,GS}_BASE. KVM unconditionally exposes
MSRs L1. If KVM is also running in L1 then it's highly likely L1 is
also exposing the MSRs to L2, i.e. KVM doesn't need to intercept L2
accesses.
Based on code from Jintack Lim.
Cc: Jintack Lim <jintack@xxxxxxxxxxxxxxx>
Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
dev_pm_domain_attach_by_name() can return NULL, so we should check for
that case when we're about to dereference gxpd.
Fixes: 9325d4266a ("drm/msm/gpu: Attach to the GPU GX power domain")
Cc: Jordan Crouse <jcrouse@codeaurora.org>
Cc: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jordan Crouse <jcrouse@codeauorora.org>
Signed-off-by: Sean Paul <seanpaul@chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20190515170104.155525-1-sean@poorly.run
UBIFS stores inode numbers as LE64 integers.
We have to convert them to host oder, otherwise
BE hosts won't be able to use the integer correctly.
Reported-by: kbuild test robot <lkp@intel.com>
Fixes: 9ca2d73264 ("ubifs: Limit number of xattrs per inode")
Signed-off-by: Richard Weinberger <richard@nod.at>
CONFIG_UBIFS_FS_ENCRYPTION is gone, fscrypt is now
controlled via CONFIG_FS_ENCRYPTION.
This problem slipped into the tree because of a mis-merge on
my side.
Reported-by: Eric Biggers <ebiggers@kernel.org>
Fixes: eea2c05d92 ("ubifs: Remove #ifdef around CONFIG_FS_ENCRYPTION")
Signed-off-by: Richard Weinberger <richard@nod.at>
Fix gcc build error while CONFIG_UBIFS_FS_XATTR
is not set
fs/ubifs/dir.o: In function `ubifs_unlink':
dir.c:(.text+0x260): undefined reference to `ubifs_purge_xattrs'
fs/ubifs/dir.o: In function `do_rename':
dir.c:(.text+0x1edc): undefined reference to `ubifs_purge_xattrs'
fs/ubifs/dir.o: In function `ubifs_rmdir':
dir.c:(.text+0x2638): undefined reference to `ubifs_purge_xattrs'
Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 9ca2d73264 ("ubifs: Limit number of xattrs per inode")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
We always pass in 0 for the cqe flags argument, since the support for
"this read hit page cache" hint was dropped.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The available registers for --int-regs and --user-regs may be different,
e.g. XMM registers.
Split parse_regs into two dedicated functions for --int-regs and
--user-regs respectively.
Modify the warning message. "--user-regs=?" should be applied to show
the available registers for --user-regs.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/1557865174-56264-1-git-send-email-kan.liang@linux.intel.com
[ Changed docs as suggested by Ravi and agreed by Kan ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The Cortex-A57 and Cortex-A72 both support all ARMv8 recommended events
up to the RC_ST_SPEC (0x91) event with the exception of:
- L1D_CACHE_REFILL_INNER (0x44)
- L1D_CACHE_REFILL_OUTER (0x45)
- L1D_TLB_RD (0x4E)
- L1D_TLB_WR (0x4F)
- L2D_TLB_REFILL_RD (0x5C)
- L2D_TLB_REFILL_WR (0x5D)
- L2D_TLB_RD (0x5E)
- L2D_TLB_WR (0x5F)
- STREX_SPEC (0x6F)
Create an appropriate JSON file for mapping those events and update the
mapfile.csv for matching the Cortex-A57 and Cortex-A72 MIDR to that
file.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean V Kelley <seanvk.dev@oregontracks.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org (moderated list:arm pmu profiling and debugging)
Link: http://lkml.kernel.org/r/20190513202522.9050-4-f.fainelli@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Broadcom's Brahma-B53 CPUs support the same type of events that the
Cortex-A53 supports, recognize its CPUID and map it to the cortex-a53
events.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean V Kelley <seanvk.dev@oregontracks.org>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org (moderated list
Link: http://lkml.kernel.org/r/20190513202522.9050-3-f.fainelli@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
ARM64's implementation of get_cpuidr_str() masks out the revision bits
[3:0] while reading the CPU identifier, there is no need for the
[[:xdigit:]] wildcard.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean V Kelley <seanvk.dev@oregontracks.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org (moderated list:arm pmu profiling and debugging)
Link: http://lkml.kernel.org/r/20190513202522.9050-2-f.fainelli@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The shell tests should not redirect useful output to /dev/null, as that
is done automatically by 'perf test' in non verbose mode, so remove that
from the zstd comp/decomp test, fixing up verbose mode.
Before:
$ perf test zstd
68: Zstd perf.data compression/decompression : Ok
$ perf test -v zstd
68: Zstd perf.data compression/decompression :
--- start ---
test child forked, pid 11956
-z, --compression-level[=<n>]
Collecting compressed record file:
Checking compressed events stats:
test child finished with 0
---- end ----
Zstd perf.data compression/decompression: Ok
$
Now:
$ perf test zstd
68: Zstd perf.data compression/decompression : Ok
$ perf test -v zstd
68: Zstd perf.data compression/decompression :
--- start ---
test child forked, pid 12695
Collecting compressed record file:
0+500 records in
72+1 records out
37361 bytes (37 kB, 36 KiB) copied, 9.83796 s, 3.8 kB/s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB /tmp/perf.data.rzq, compressed (original 0.004 MB, ratio is 3.679) ]
Checking compressed events stats:
# compressed : Zstd, level = 1, ratio = 4
COMPRESSED events: 3
test child finished with 0
---- end ----
Zstd perf.data compression/decompression: Ok
$
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/n/tip-tp96618ds42zic94nlh0msz3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce a basic integration test for Zstd based record
compression/decompression using 'perf record' and 'perf report'.
Committer notes:
Reduce a bit the freq (from 25 kHz to 5 kHz) and the number of /dev/null
records read (from 1000 to 500), reducing the time it takes to something
more in line with the time existing 'perf test' entries take to run.
With that in place:
$ time perf test zstd
68: Zstd perf.data compression/decompression : Ok
real 0m10.376s
user 0m0.105s
sys 0m0.440s
$ grep "model name" /proc/cpuinfo | head -1
model name : Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
$
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/dc007ae4-104a-2b7c-316e-275929025f0d@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Initialized decompression part of Zstd based API so COMPRESSED records
would be decompressed into the resulting output data file.
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/c27d7500-ecdd-3569-cab5-8f70bbed5ea4@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
zstd_init(, comp_level = 0) initializes decompression part of API only
hat now consists of zstd_decompress_stream() function.
The perf.data PERF_RECORD_COMPRESSED records are decompressed using
zstd_decompress_stream() function into a linked list of mmaped memory
regions of mmap_comp_len size (struct decomp).
After decompression of one COMPRESSED record its content is iterated and
fetched for usual processing. The mmaped memory regions with
decompressed events are kept in the linked list till the tool process
termination.
When dumping raw records (e.g., perf report -D --header) file offsets of
events from compressed records are printed as zero.
Committer notes:
Since now we have support for processing PERF_RECORD_COMPRESSED, we see
none, in raw form, like we saw in the previous patch commiter notes,
they were decompressed into the usual PERF_RECORD_{FORK,MMAP,COMM,etc}
records, we only see the stats for those PERF_RECORD_COMPRESSED events,
and since I used the file generated in the commiter notes for the
previous patch, there they are, 2 compressed records:
$ perf report --header-only | grep cmdline
# cmdline : /home/acme/bin/perf record -z2 sleep 1
$ perf report -D | grep COMPRESS
COMPRESSED events: 2
COMPRESSED events: 0
$ perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 15 of event 'cycles:u'
# Event count (approx.): 962227
#
# Overhead Command Shared Object Symbol
# ........ ....... ................ ...........................
#
46.99% sleep libc-2.28.so [.] _dl_addr
29.24% sleep [unknown] [k] 0xffffffffaea00a67
16.45% sleep libc-2.28.so [.] __GI__IO_un_link.part.1
5.92% sleep ld-2.28.so [.] _dl_setup_hash
1.40% sleep libc-2.28.so [.] __nanosleep
0.00% sleep [unknown] [k] 0xffffffffaea00163
#
# (Tip: To see callchains in a more compact form: perf report -g folded)
#
$
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/304b0a59-942c-3fe1-da02-aa749f87108b@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Implemented -z,--compression_level[=<n>] option that enables compression
of mmaped kernel data buffers content in runtime during perf record mode
collection. Default option value is 1 (fastest compression).
Compression overhead has been measured for serial and AIO streaming when
profiling matrix multiplication workload:
-------------------------------------------------------------
| SERIAL | AIO-1 |
----------------------------------------------------------------|
|-z | OVH(x) | ratio(x) size(MiB) | OVH(x) | ratio(x) size(MiB) |
|---------------------------------------------------------------|
| 0 | 1,00 | 1,000 179,424 | 1,00 | 1,000 187,527 |
| 1 | 1,04 | 8,427 181,148 | 1,01 | 8,474 188,562 |
| 2 | 1,07 | 8,055 186,953 | 1,03 | 7,912 191,773 |
| 3 | 1,04 | 8,283 181,908 | 1,03 | 8,220 191,078 |
| 5 | 1,09 | 8,101 187,705 | 1,05 | 7,780 190,065 |
| 8 | 1,05 | 9,217 179,191 | 1,12 | 6,111 193,024 |
-----------------------------------------------------------------
OVH = (Execution time with -z N) / (Execution time with -z 0)
ratio - compression ratio
size - number of bytes that was compressed
size ~= trace size x ratio
Committer notes:
Testing it I noticed that it failed to disable build id processing when
compression is enabled, and as we'd have to uncompress everything to
look for the PERF_RECORD_{MMAP,SAMPLE,etc} to figure out which build ids
to read from DSOs, we better disable build id processing when
compression is enabled, logging with pr_debug() when doing so:
Original patch:
# perf record -z2
^C[ perf record: Woken up 1 times to write data ]
0x1746e0 [0x76]: failed to process type: 81 [Invalid argument]
[ perf record: Captured and wrote 1.568 MB perf.data, compressed (original 0.452 MB, ratio is 3.995) ]
#
After auto-disabling build id processing when compression is enabled:
$ perf record -z2 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.292) ]
$ perf record -v -z2 sleep 1
Compression enabled, disabling build id collection at the end of the session.
<SNIP extra -v pr_debug() messages>
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.305) ]
$
Also, with parts of the patch originally after this one moved to just
before this one we get:
$ perf record -z2 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data, compressed (original 0.001 MB, ratio is 2.371) ]
$ perf report -D | grep COMPRESS
0 0x1b8 [0x155]: PERF_RECORD_COMPRESSED: unhandled!
0 0x30d [0x80]: PERF_RECORD_COMPRESSED: unhandled!
COMPRESSED events: 2
COMPRESSED events: 0
$
I.e. when faced with PERF_RECORD_COMPRESSED that we still have no code
to process, we just show it as not being handled, skip them and
continue, while before we had:
$ perf report -D | grep COMPRESS
0x1b8 [0x169]: failed to process type: 81 [Invalid argument]
Error:
failed to process sample
0 0x1b8 [0x169]: PERF_RECORD_COMPRESSED
$
Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/9ff06518-ae63-a908-e44d-5d9e56dd66d9@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>