The "used length" reported by calling vhost_add_used() must be the
number of bytes written by the device (using "in" buffers).
In vhost_vsock_handle_tx_kick() the device only reads the guest
buffers (they are all "out" buffers), without writing anything,
so we must pass 0 as "used length" to comply virtio spec.
Fixes: 433fc58e6b ("VSOCK: Introduce vhost_vsock.ko")
Cc: stable@vger.kernel.org
Reported-by: Halil Pasic <pasic@linux.ibm.com>
Suggested-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://lore.kernel.org/r/20211122163525.294024-2-sgarzare@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
This reverts commit 939779f515.
Attempts to validate length in the core did not work out: there turn out
to exist multiple broken devices, and in particular legacy devices are
known to be broken in this respect.
We have ideas for handling this better in the next version but for now
let's revert to a known good state to make sure drivers work for people.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit 816625c136.
Attempts to validate length in the core did not work out.
We'll drop them, so revert the dependent changes in drivers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit a40392edf1.
Attempts to validate length in the core did not work out.
We'll drop them, so revert the dependent changes in drivers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This reverts commit c57911ebfb.
Attempts to validate length in the core did not work out. We'll drop
them for now, so revert the dependent changes in drivers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Update the bios scratch register when updating the backlight
level. Some platforms apparently read this scratch register
and do additional operations in their hotkey handlers.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1518
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The overclocking interface currently appends data to a
string. Revert back to using sprintf().
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1774
Fixes: 6db0c87a0a ("amdgpu/pm: Replace hwmgr smu usage of sprintf with sysfs_emit")
Acked-by: Evan Quan <evan.quan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
There is no need for additional programming, keep the default settings.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
fix some byteorder issues about amdgpu discovery.
This will result in running errors on the big end system. (e.g:MIPS)
Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If xnack is on, VM retry fault interrupt send to IH ring1, and ring1
will be full quickly. IH cannot receive other interrupts, this causes
deadlock if migrating buffer using sdma and waiting for sdma done
while handling retry fault.
Remove VMC from IH storm client, enable ring1 write pointer
overflow, then IH will drop retry fault interrupts and be able to receive
other interrupts while driver is handling retry fault.
IH ring1 write pointer doesn't writeback to memory by IH, and ring1
write pointer recorded by self-irq is not updated, so always read
the latest ring1 write pointer from register.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
By default this timestamp is 32 bit counter. It gets overflowed in
around 10 minutes.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
unmap range always increase atomic svms->drain_pagefaults to simplify
both parent range and child range unmap, page fault handle ignores the
retry fault if svms->drain_pagefaults is set to speed up interrupt
handling. svm_range_drain_retry_fault restart draining if another
range unmap from cpu.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
VMA may be removed before unmap notifier callback, and deferred list
work remove range, return success for this special case as we are
handling stale retry fault.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
kfd_process_wq_release drain retry fault to ensure no retry fault comes
after removing kfd process from the hash table, otherwise svm page fault
handler will fail to recover the fault and dump GPU vm fault log.
Refactor deferred list work to get_task_mm and take mmap write lock
to handle all ranges, and avoid mm is gone while inserting mmu notifier.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Otherwise when IH process restart, count is zero, the loop will
not exit to wake_up_all after processing AMDGPU_IH_MAX_NUM_IVS
interrupts.
Cc: stable@vger.kernel.org
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Renoir and newer gfx9 APUs have new TSC register that is
not part of the gfxoff tile, so it can be read without
needing to disable gfx off.
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Apply the same check we do for dGPUs for APUs as well.
Acked-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes: 9f4f2c1a35 ("drm/amd/amdgpu: fix the kfd pre_reset sequence in sriov")
For sriov XGMI configuration, the host driver will handle the hive reset,
so in guest side, the reset_sriov only be called once on one device. This will
make kfd post_reset unblanced with kfd pre_reset since kfd pre_reset already
been moved out of reset_sriov function. Move kfd post_reset out of reset_sriov
function to make them balance .
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
Due to pass the wrong parameter down to the enable_stream_gating(),
it would cause the DSC of the removing stream would not be PG.
[HOW]
To pass the correct parameter down th the enable_stream_gating().
Reviewed-by: Anthony Koo <Anthony.Koo@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Yi-Ling Chen <Yi-Ling.Chen2@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
A warning appears in the log on GPU reset for
link_enc_cfg_link_encs_assign for the following condition:
ASSERT(state->res_ctx.link_enc_cfg_ctx.link_enc_assignments[i].valid == false);
This is not expected behavior and may result in link encoders being
incorrectly assigned.
[How]
The dc->current_state is backed up into dm->cached_dc_state before
we commit 0 streams.
DC will clear link encoder assignments on the real state but the
changes won't propagate over to the copy we made before the
0 streams commit.
DC expects that link encoder assignments are *not* valid
when committing a state, so as a workaround it needs to be cleared
before passing it back into DC.
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
We're only setting the flags on stream[0]'s planes so this logic fails
if we have more than one stream in the state.
This can cause a page flip timeout with multiple displays in the
configuration.
[How]
Index into the stream_status array using the stream index - it's a 1:1
mapping.
Fixes: cdaae8371a ("drm/amd/display: Handle GPU reset for DC block")
Reviewed-by: Harry Wentland <Harry.Wentland@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The HW interrupt gets disabled after GPU reset so we don't receive
notifications for HPD or AUX from DMUB - leading to timeout and
black screen with (or without) DPIA links connected.
[How]
Re-enable the interrupt after GPU reset like we do for the other
DC interrupts.
Fixes: 81927e2808 ("drm/amd/display: Support for DMUB AUX")
Reviewed-by: Jude Shih <Jude.Shih@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_amdkfd_gpuvm_free_memory_of_gpu drop dmabuf reference increased in
amdgpu_gem_prime_export.
amdgpu_bo_destroy drop dmabuf reference increased in
amdgpu_gem_prime_import.
So remove this extra dma_buf_put to avoid double free.
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
Tested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Disable HDP register remapping on SRIOV and set rmmio_remap.reg_offset
to the fixed address of the VF register for hdp_v*_flush_hdp.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Bokun Zhang <bokun.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Change iomap_read_inline_data to return 0 or an error code; this
simplifies the callers. Add a description.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[djwong: document the return value of iomap_read_inline_data explicitly]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
With the remove of xfs_dqrele_all_inodes, xfs_inew_wait and all the
infrastructure used to wake the XFS_INEW bit waitqueue is unused.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 777eb1fa85 ("xfs: remove xfs_dqrele_all_inodes")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
When testing xfstests xfs/126 on lastest upstream kernel, it will hang on some machine.
Adding a getxattr operation after xattr corrupted, I can reproduce it 100%.
The deadlock as below:
[983.923403] task:setfattr state:D stack: 0 pid:17639 ppid: 14687 flags:0x00000080
[ 983.923405] Call Trace:
[ 983.923410] __schedule+0x2c4/0x700
[ 983.923412] schedule+0x37/0xa0
[ 983.923414] schedule_timeout+0x274/0x300
[ 983.923416] __down+0x9b/0xf0
[ 983.923451] ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[ 983.923453] down+0x3b/0x50
[ 983.923471] xfs_buf_lock+0x33/0xf0 [xfs]
[ 983.923490] xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[ 983.923508] xfs_buf_get_map+0x4c/0x320 [xfs]
[ 983.923525] xfs_buf_read_map+0x53/0x310 [xfs]
[ 983.923541] ? xfs_da_read_buf+0xcf/0x120 [xfs]
[ 983.923560] xfs_trans_read_buf_map+0x1cf/0x360 [xfs]
[ 983.923575] ? xfs_da_read_buf+0xcf/0x120 [xfs]
[ 983.923590] xfs_da_read_buf+0xcf/0x120 [xfs]
[ 983.923606] xfs_da3_node_read+0x1f/0x40 [xfs]
[ 983.923621] xfs_da3_node_lookup_int+0x69/0x4a0 [xfs]
[ 983.923624] ? kmem_cache_alloc+0x12e/0x270
[ 983.923637] xfs_attr_node_hasname+0x6e/0xa0 [xfs]
[ 983.923651] xfs_has_attr+0x6e/0xd0 [xfs]
[ 983.923664] xfs_attr_set+0x273/0x320 [xfs]
[ 983.923683] xfs_xattr_set+0x87/0xd0 [xfs]
[ 983.923686] __vfs_removexattr+0x4d/0x60
[ 983.923688] __vfs_removexattr_locked+0xac/0x130
[ 983.923689] vfs_removexattr+0x4e/0xf0
[ 983.923690] removexattr+0x4d/0x80
[ 983.923693] ? __check_object_size+0xa8/0x16b
[ 983.923695] ? strncpy_from_user+0x47/0x1a0
[ 983.923696] ? getname_flags+0x6a/0x1e0
[ 983.923697] ? _cond_resched+0x15/0x30
[ 983.923699] ? __sb_start_write+0x1e/0x70
[ 983.923700] ? mnt_want_write+0x28/0x50
[ 983.923701] path_removexattr+0x9b/0xb0
[ 983.923702] __x64_sys_removexattr+0x17/0x20
[ 983.923704] do_syscall_64+0x5b/0x1a0
[ 983.923705] entry_SYSCALL_64_after_hwframe+0x65/0xca
[ 983.923707] RIP: 0033:0x7f080f10ee1b
When getxattr calls xfs_attr_node_get function, xfs_da3_node_lookup_int fails with EFSCORRUPTED in
xfs_attr_node_hasname because we have use blocktrash to random it in xfs/126. So it
free state in internal and xfs_attr_node_get doesn't do xfs_buf_trans release job.
Then subsequent removexattr will hang because of it.
This bug was introduced by kernel commit 07120f1abd ("xfs: Add xfs_has_attr and subroutines").
It adds xfs_attr_node_hasname helper and said caller will be responsible for freeing the state
in this case. But xfs_attr_node_hasname will free state itself instead of caller if
xfs_da3_node_lookup_int fails.
Fix this bug by moving the step of free state into caller.
Also, use "goto error/out" instead of returning error directly in xfs_attr_node_addname_find_attr and
xfs_attr_node_removename_setup function because we should free state ourselves.
Fixes: 07120f1abd ("xfs: Add xfs_has_attr and subroutines")
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Increase the FRAME_WARN value to avoid some new warnings which showed up
in the Linux kernel test project, revert a patch which moved the _stext
symbol and thus tiggered errors in the hardened usercopy checks, and
introduce an extru_safe() assembler macro to overcome possible unsafe
usage of the extru asm statement on 64-bit PA2.0 machines.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQS86RI+GtKfB8BJu973ErUQojoPXwUCYZ5GgwAKCRD3ErUQojoP
X5adAPwMFlxhTrkflWqSP35G/AjQNSzxplaj75fBWRGRJWTXKAD/cMpePFj//CBw
n5wOh6eDvBDQUElctHOCFMVZgHyrXA0=
=5LTo
-----END PGP SIGNATURE-----
Merge tag 'for-5.16/parisc-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"Increase the FRAME_WARN value to avoid some new warnings which showed
up in the Linux kernel test project, revert a patch which moved the
_stext symbol and thus tiggered errors in the hardened usercopy
checks, and introduce an extru_safe() assembler macro to overcome
possible unsafe usage of the extru asm statement on 64-bit PA2.0
machines"
* tag 'for-5.16/parisc-5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
Revert "parisc: Fix backtrace to always include init funtion names"
parisc: Convert PTE lookup to use extru_safe() macro
parisc: Fix extraction of hash lock bits in syscall.S
parisc: Provide an extru_safe() macro to extract unsigned bits
parisc: Increase FRAME_WARN to 2048 bytes on parisc
uprobe_perf_open() processes a list of probes, but due to a missing
setting of the uprobe to be processed, the loop processes the head probe
instead of the added probes.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCYZ5VIhQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qsI4AP4yOJ+r/2b7cTWOpnZcs5grnFfKD46t
LRXXOyWc9CNfNwEA/pXFW3IJ0hCKuMaHAJbPFuyIKufGBJxPEae0htc0SgI=
=HDiu
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fix from Steven Rostedt:
"Fix wrong uprobe variable in iterator
uprobe_perf_open() processes a list of probes, but due to a missing
setting of the uprobe to be processed, the loop processes the head
probe instead of the added probes"
* tag 'trace-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/uprobe: Fix uprobe_perf_open probes iteration
Pull HID fixes from Jiri Kosina:
- fix for Intel-ISH driver to make sure it gets aoutoloaded only on
matching devices and not universally (Thomas Weißschuh)
- fix for Wacom driver reporting invalid contact under certain
circumstances (Jason Gerecke)
- probing fix for ft260 dirver (Michael Zaidman)
- fix for generic keycode remapping (Thomas Weißschuh)
- fix for division by zero in hid-magicmouse (Claudia Pellegrino)
- other tiny assorted fixes and new device IDs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: multitouch: Fix Iiyama ProLite T1931SAW (0eef:0001 again!)
HID: nintendo: eliminate dead datastructures in !CONFIG_NINTENDO_FF case
HID: magicmouse: prevent division by 0 on scroll
HID: thrustmaster: fix sparse warnings
HID: Ignore battery for Elan touchscreen on HP Envy X360 15-eu0xxx
HID: input: set usage type to key on keycode remap
HID: input: Fix parsing of HID_CP_CONSUMER_CONTROL fields
HID: ft260: fix i2c probing for hwmon devices
Revert "HID: hid-asus.c: Maps key 0x35 (display off) to KEY_SCREENLOCK"
HID: intel-ish-hid: fix module device-id handling
mod_devicetable: fix kdocs for ishtp_device_id
HID: wacom: Use "Confidence" flag to prevent reporting invalid contacts
HID: nintendo: unlock on error in joycon_leds_create()
platform/x86: isthp_eclite: only load for matching devices
platform/chrome: chros_ec_ishtp: only load for matching devices
HID: intel-ish-hid: hid-client: only load for matching devices
HID: intel-ish-hid: fw-loader: only load for matching devices
HID: intel-ish-hid: use constants for modaliases
HID: intel-ish-hid: add support for MODULE_DEVICE_TABLE()
If the xenstore page hasn't been allocated properly, reading the value
of the related hvm_param (HVM_PARAM_STORE_PFN) won't actually return
error. Instead, it will succeed and return zero. Instead of attempting
to xen_remap a bad guest physical address, detect this condition and
return early.
Note that although a guest physical address of zero for
HVM_PARAM_STORE_PFN is theoretically possible, it is not a good choice
and zero has never been validly used in that capacity.
Also recognize all bits set as an invalid value.
For 32-bit Linux, any pfn above ULONG_MAX would get truncated. Pfns
above ULONG_MAX should never be passed by the Xen tools to HVM guests
anyway, so check for this condition and return early.
Cc: stable@vger.kernel.org
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20211123210748.1910236-1-sstabellini@kernel.org
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
snapshot_write() is inappropriately limiting the amount of data that can
be written in cases where a partial page has already been written. For
example, one would expect to be able to write 1 byte, then 4095 bytes to
the snapshot device, and have both of those complete fully (since now
we're aligned to a page again). But what ends up happening is we write 1
byte, then 4094/4095 bytes complete successfully.
The reason is that simple_write_to_buffer()'s second argument is the
total size of the buffer, not the size of the buffer minus the offset.
Since simple_write_to_buffer() accounts for the offset in its
implementation, snapshot_write() can just pass the full page size
directly down.
Signed-off-by: Evan Green <evgreen@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 39fbef4b0f ("PM: hibernate: Get block device exclusively in
swsusp_check()") changed the opening mode of the block device to
(FMODE_READ | FMODE_EXCL).
In the corresponding calls to swsusp_close(), the mode is still just
FMODE_READ which triggers the warning in blkdev_flush_mapping() on
resume from hibernate.
So, use the mode (FMODE_READ | FMODE_EXCL) also when closing the
device.
Fixes: 39fbef4b0f ("PM: hibernate: Get block device exclusively in swsusp_check()")
Signed-off-by: Thomas Zeitlhofer <thomas.zeitlhofer+lkml@ze-it.at>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Check cpc_desc against NULL in cppc_get_perf(), so it doesn't crash
down the road if cpc_desc is NULL.
Fixes: 0654cf05d1 ("ACPI: CPPC: Introduce cppc_get_nominal_perf()")
Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: 5.15+ <stable@vger.kernel.org> # 5.15+
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Update NC-SI command handler (both standard and OEM) to take into
account of payload paddings in allocating skb (in case of payload
size is not 32-bit aligned).
The checksum field follows payload field, without taking payload
padding into account can cause checksum being truncated, leading to
dropped packets.
Fixes: fb4ee67529 ("net/ncsi: Add NCSI OEM command support")
Signed-off-by: Kumar Thangavel <thangavel.k@hcl.com>
Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
To hot unplug a CPU, the idle task on that CPU calls a few layers of C
code before finally leaving the kernel. When KASAN is in use, poisoned
shadow is left around for each of the active stack frames, and when
shadow call stacks are in use. When shadow call stacks (SCS) are in use
the task's saved SCS SP is left pointing at an arbitrary point within
the task's shadow call stack.
When a CPU is offlined than onlined back into the kernel, this stale
state can adversely affect execution. Stale KASAN shadow can alias new
stackframes and result in bogus KASAN warnings. A stale SCS SP is
effectively a memory leak, and prevents a portion of the shadow call
stack being used. Across a number of hotplug cycles the idle task's
entire shadow call stack can become unusable.
We previously fixed the KASAN issue in commit:
e1b77c9298 ("sched/kasan: remove stale KASAN poison after hotplug")
... by removing any stale KASAN stack poison immediately prior to
onlining a CPU.
Subsequently in commit:
f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
... the refactoring left the KASAN and SCS cleanup in one-time idle
thread initialization code rather than something invoked prior to each
CPU being onlined, breaking both as above.
We fixed SCS (but not KASAN) in commit:
63acd42c0d ("sched/scs: Reset the shadow stack when idle_task_exit")
... but as this runs in the context of the idle task being offlined it's
potentially fragile.
To fix these consistently and more robustly, reset the SCS SP and KASAN
shadow of a CPU's idle task immediately before we online that CPU in
bringup_cpu(). This ensures the idle task always has a consistent state
when it is running, and removes the need to so so when exiting an idle
task.
Whenever any thread is created, dup_task_struct() will give the task a
stack which is free of KASAN shadow, and initialize the task's SCS SP,
so there's no need to specially initialize either for idle thread within
init_idle(), as this was only necessary to handle hotplug cycles.
I've tested this on arm64 with:
* gcc 11.1.0, defconfig +KASAN_INLINE, KASAN_STACK
* clang 12.0.0, defconfig +KASAN_INLINE, KASAN_STACK, SHADOW_CALL_STACK
... offlining and onlining CPUS with:
| while true; do
| for C in /sys/devices/system/cpu/cpu*/online; do
| echo 0 > $C;
| echo 1 > $C;
| done
| done
Fixes: f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
Fix:
WARNING: modpost: vmlinux.o(.text.unlikely+0x64d0): Section mismatch in reference \
from the function prepare_command_line() to the variable .init.data:command_line
The function prepare_command_line() references
the variable __initdata command_line.
This is often because prepare_command_line lacks a __initdata
annotation or the annotation of command_line is wrong.
Apparently some toolchains do different inlining decisions.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YZySgpmBcNNM2qca@zn.tnic
Since the commit c118c7303a ("powerpc/32: Fix vmap stack - Do not
activate MMU before reading task struct") a vmap stack overflow
results in a hard lockup. This is because emergency_ctx is still
addressed with its virtual address allthough data MMU is not active
anymore at that time.
Fix it by using a physical address instead.
Fixes: c118c7303a ("powerpc/32: Fix vmap stack - Do not activate MMU before reading task struct")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/ce30364fb7ccda489272af4a1612b6aa147e1d23.1637227521.git.christophe.leroy@csgroup.eu
The POWER9 ERAT flush instruction is a SLBIA with IH=7, which is a
reserved value on POWER7/8. On POWER8 this invalidates the SLB entries
above index 0, similarly to SLBIA IH=0.
If the SLB entries are invalidated, and then the guest is bypassed, the
host SLB does not get re-loaded, so the bolted entries above 0 will be
lost. This can result in kernel stack access causing a SLB fault.
Kernel stack access causing a SLB fault was responsible for the infamous
mega bug (search "Fix SLB reload bug"). Although since commit
48e7b76957 ("powerpc/64s/hash: Convert SLB miss handlers to C") that
starts using the kernel stack in the SLB miss handler, it might only
result in an infinite loop of SLB faults. In any case it's a bug.
Fix this by only executing the instruction on >= POWER9 where IH=7 is
defined not to invalidate the SLB. POWER7/8 don't require this ERAT
flush.
Fixes: 5008711259 ("KVM: PPC: Book3S HV: Invalidate ERAT when flushing guest TLB entries")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211119031627.577853-1-npiggin@gmail.com
As Vincent reports in:
https://lore.kernel.org/r/20211118163417.21617-1-vincent.whitchurch@axis.com
The put_user() in schedule_tail() can get stuck in a livelock, similar
to a problem recently fixed on riscv in commit:
285a76bb2c ("riscv: evaluate put_user() arg before enabling user access")
In __raw_put_user() we have a critical section between
uaccess_ttbr0_enable() and uaccess_ttbr0_disable() where we cannot
safely call into the scheduler without having taken an exception, as
schedule() and other scheduling functions will not save/restore the
TTBR0 state. If either of the `x` or `ptr` arguments to __raw_put_user()
contain a blocking call, we may call into the scheduler within the
critical section. This can result in two problems:
1) The access within the critical section will occur without the
required TTBR0 tables installed. This will fault, and where the
required tables permit access, the access will be retried without the
required tables, resulting in a livelock.
2) When TTBR0 SW PAN is in use, check_and_switch_context() does not
modify TTBR0, leaving a stale value installed. The mappings of the
blocked task will erroneously be accessible to regular accesses in
the context of the new task. Additionally, if the tables are
subsequently freed, local TLB maintenance required to reuse the ASID
may be lost, potentially resulting in TLB corruption (e.g. in the
presence of CnP).
The same issue exists for __raw_get_user() in the critical section
between uaccess_ttbr0_enable() and uaccess_ttbr0_disable().
A similar issue exists for __get_kernel_nofault() and
__put_kernel_nofault() for the critical section between
__uaccess_enable_tco_async() and __uaccess_disable_tco_async(), as the
TCO state is not context-switched by direct calls into the scheduler.
Here the TCO state may be lost from the context of the current task,
resulting in unexpected asynchronous tag check faults. It may also be
leaked to another task, suppressing expected tag check faults.
To fix all of these cases, we must ensure that we do not directly call
into the scheduler in their respective critical sections. This patch
reworks __raw_put_user(), __raw_get_user(), __get_kernel_nofault(), and
__put_kernel_nofault(), ensuring that parameters are evaluated outside
of the critical sections. To make this requirement clear, comments are
added describing the problem, and line spaces added to separate the
critical sections from other portions of the macros.
For __raw_get_user() and __raw_put_user() the `err` parameter is
conditionally assigned to, and we must currently evaluate this in the
critical section. This behaviour is relied upon by the signal code,
which uses chains of put_user_error() and get_user_error(), checking the
return value at the end. In all cases, the `err` parameter is a plain
int rather than a more complex expression with a blocking call, so this
is safe.
In future we should try to clean up the `err` usage to remove the
potential for this to be a problem.
Aside from the changes to time of evaluation, there should be no
functional change as a result of this patch.
Reported-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Link: https://lore.kernel.org/r/20211118163417.21617-1-vincent.whitchurch@axis.com
Fixes: f253d827f3 ("arm64: uaccess: refactor __{get,put}_user")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20211122125820.55286-1-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Add missing 'tu' variable initialization in the probes loop,
otherwise the head 'tu' is used instead of added probes.
Link: https://lkml.kernel.org/r/20211123142801.182530-1-jolsa@kernel.org
Cc: stable@vger.kernel.org
Fixes: 99c9a923e9 ("tracing/uprobe: Fix double perf_event linking on multiprobe uprobe")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmGdTV8UHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPN5Q/+O4ZtB40NcUOHOtJebrlXDdfTRpqJ
xLkVlAdOLF1v9PuigJ85AW9zyBn8cbb6mQxHeiVJaFpRCUO6xcsi32cg9y+3oYfj
rAGREsSD3JULvQTbist5e93BVbj+7J8+tgXwwUde38HqTViAAiZUQeWvZ32/Ussy
9P+PNUeWHy5X9y76YEBNEdvhPcSVH63nFbq2CC4WPlfe5vlaU2231L3Hwia/LWEt
oJsRDTUQ0iLBAfj0RH+UPZrci0L1F9HajP0MmaBazof4HBK/OnO0h18p7B8z1DdL
wfCxNi5ovlAyMtU8hbmzuGAr1HtXJu7Lj+ee5E4mb/3KfiG4/rDG9gvDHTDsri/n
S28eYKE1B0qpA4+KUvxu/7O9T8EBfb8jM4JQmcU5BOQvDyNbBHgOa6shG1fUBvva
wdlO6JREXA4mFjph91PSLWOLFPmzFgS/FBRYKej1OAOX2VCPN2z+UwixE3jfPzxk
YFUn34tz99/6xzgzWFMi4uPBU3TIOmXm7ZfeSjYoFKeWHjNkEihax96zgIHweSFY
HyEJYjNtXyiY/0kJRD/EAV7ORqpIsG5eiL3b22rgeoyTncW1kUykroUozG/tzAnA
xn9/6DIyofQdtdsgXThIPddJd+zEuJu9jweSPClzerHP45ILgdDcuG4pA91BG/vp
Vbxmkjb8xdxZu5A=
=3GqX
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20211123' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux fix from Paul Moore:
"A fix to make sure things are handled correctly when an allocation
fails"
* tag 'selinux-pr-20211123' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: fix NULL-pointer dereference when hashtab allocation fails
The Xen pv sound driver is not essential for booting. Set the respective
flag.
[boris: replace semicolon with comma]
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20211022064800.14978-6-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
The Xen pvcalls device is not essential for booting. Set the respective
flag.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20211022064800.14978-5-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
The Xen pv console driver is not essential for boot. Set the respective
flag.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20211022064800.14978-4-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>