Граф коммитов

10813 Коммитов

Автор SHA1 Сообщение Дата
Mitchell Levy 4fa7bc1bbd Merge fix/xsaves-lbr/5.15 into v5.15
* commit '46b414261e8193c1118924e0c62b773ad1747aff': (1884 commits)
  x86/fpu: Avoid writing LBR bit to IA32_XSS unless supported
  Linux 5.15.167
  udp: fix receiving fraglist GSO packets
  memcg: protect concurrent access to mem_cgroup_idr
  btrfs: fix race between direct IO write and fsync when using same fd
  net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket
  x86/mm: Fix PTI for i386 some more
  net: drop bad gso csum_start and offset in virtio_net_hdr
  gso: fix dodgy bit handling for GSO_UDP_L4
  net: change maximum number of UDP segments to 128
  net: more strict VIRTIO_NET_HDR_GSO_UDP_L4 validation
  gpio: rockchip: fix OF node leak in probe()
  drm/i915/fence: Mark debug_fence_free() with __maybe_unused
  drm/i915/fence: Mark debug_fence_init_onstack() with __maybe_unused
  ASoC: sunxi: sun4i-i2s: fix LRCLK polarity in i2s mode
  nvmet-tcp: fix kernel crash if commands allocation fails
  arm64: acpi: Harden get_cpu_for_acpi_id() against missing CPU entry
  arm64: acpi: Move get_cpu_for_acpi_id() to a header
  ACPI: processor: Fix memory leaks in error paths of processor_add()
  ACPI: processor: Return an error if acpi_processor_get_info() fails in processor_add()
  ...
2024-10-10 15:55:58 -07:00
Paolo Pisati cca17211c8 m68k: amiga: Turn off Warp1260 interrupts during boot
commit 1d8491d3e726984343dd8c3cdbe2f2b47cfdd928 upstream.

On an Amiga 1200 equipped with a Warp1260 accelerator, an interrupt
storm coming from the accelerator board causes the machine to crash in
local_irq_enable() or auto_irq_enable().  Disabling interrupts for the
Warp1260 in amiga_parse_bootinfo() fixes the problem.

Link: https://lore.kernel.org/r/ZkjwzVwYeQtyAPrL@amaterasu.local
Cc: stable <stable@kernel.org>
Signed-off-by: Paolo Pisati <p.pisati@gmail.com>
Reviewed-by: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/20240601153254.186225-1-p.pisati@gmail.com
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-08-19 05:45:13 +02:00
Pablo Neira Ayuso 8ad0ec7f36 netfilter: nf_tables: rise cap on SELinux secmark context
[ Upstream commit e29630247be24c3987e2b048f8e152771b32d38b ]

secmark context is artificially limited 256 bytes, rise it to 4Kbytes.

Fixes: fb96194545 ("netfilter: nf_tables: add SECMARK support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-08-19 05:44:58 +02:00
Iouri Tarassov 2d38986289 drivers: hv: dxgkrnl: Implement known escapes
Implement an escape to build test command buffer.
Implement other known escapes.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:11 +00:00
Iouri Tarassov 10f168f9c2 drivers: hv: dxgkrnl: Implement D3DDKMTIsFeatureEnabled API
D3DKMTIsFeatureEnabled is used to query if a particular feature is
supported by the given adapter.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:10 +00:00
Iouri Tarassov d05528a083 drivers: hv: dxgkrnl: Implement the D3DKMTEnumProcesses API
D3DKMTEnumProcesses is used to enumerate PIDs for all processes,
which opened the /dev/dxg device.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:10 +00:00
Iouri Tarassov 61a4538209 drivers: hv: dxgkrnl: Added implementation for D3DKMTInvalidateCache
D3DKMTInvalidateCache is called by user mode drivers when the device
doesn't support cache coherent access to compute device allocations.
It needs to be called after an allocation was accessed by CPU and now
needs to be accessed by the device. And vice versa.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:10 +00:00
Iouri Tarassov 6fc4a21466 drivers: hv: dxgkrnl: Implement D3DKMTWaitSyncFile
Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:10 +00:00
Iouri Tarassov 329f7fa954 drivers: hv: dxgkrnl: Creation of dxgsyncfile objects
Implement the ioctl to create a dxgsyncfile object
(LX_DXCREATESYNCFILE). This object is a wrapper around a monitored
fence sync object and a fence value.

dxgsyncfile is built on top of the Linux sync_file object and
provides a way for the user mode to synchronize with the execution
of the device DMA packets.

The ioctl creates a dxgsyncfile object for the given GPU synchronization
object and a fence value. A file descriptor of the sync_file object
is returned to the caller. The caller could wait for the object by using
poll(). When the underlying GPU synchronization object is signaled on
the host, the host sends a message to the virtual machine and the
sync_file object is signaled.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:09 +00:00
Iouri Tarassov e2c32b38d1 drivers: hv: dxgkrnl: Manage compute device virtual addresses
Implement ioctls to manage compute device virtual addresses (VA):
  - LX_DXRESERVEGPUVIRTUALADDRESS,
  - LX_DXFREEGPUVIRTUALADDRESS,
  - LX_DXMAPGPUVIRTUALADDRESS,
  - LX_DXUPDATEGPUVIRTUALADDRESS.

Compute devices access memory by using virtual addressses.
Each process has a dedicated VA space. The video memory manager
on the host is responsible with updating device page tables
before submitting a DMA buffer for execution.

The LX_DXRESERVEGPUVIRTUALADDRESS ioctl reserves a portion of the
process compute device VA space.

The LX_DXMAPGPUVIRTUALADDRESS ioctl reserves a portion of the process
compute device VA space and maps it to the given compute device
allocation.

The LX_DXFREEGPUVIRTUALADDRESS frees the previously reserved portion
of the compute device VA space.

The LX_DXUPDATEGPUVIRTUALADDRESS ioctl adds operations to modify the
compute device VA space to a compute device execution context. It
allows the operations to be queued and synchronized with execution
of other compute device DMA buffers..

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:09 +00:00
Iouri Tarassov a2b48aede5 drivers: hv: dxgkrnl: Manage residency of allocations
Implement ioctls to manage residency of compute device allocations:
  - LX_DXMAKERESIDENT,
  - LX_DXEVICT.

An allocation is "resident" when the compute devoce is setup to
access it. It means that the allocation is in the local device
memory or in non-pageable system memory.

The current design does not support on demand compute device page
faulting. An allocation must be resident before the compute device
is allowed to access it.

The LX_DXMAKERESIDENT ioctl instructs the video memory manager to
make the given allocations resident. The operation is submitted to
a paging queue (dxgpagingqueue). When the ioctl returns a "pending"
status, a monitored fence sync object can be used to synchronize
with the completion of the operation.

The LX_DXEVICT ioctl istructs the video memory manager to evict
the given allocations from device accessible memory.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:09 +00:00
Iouri Tarassov cb8161abde drivers: hv: dxgkrnl: Ioctls to manage scheduling priority
Implement iocts to manage compute device scheduling priority:
  - LX_DXGETCONTEXTINPROCESSSCHEDULINGPRIORITY
  - LX_DXGETCONTEXTSCHEDULINGPRIORITY
  - LX_DXSETCONTEXTINPROCESSSCHEDULINGPRIORITY
  - LX_DXSETCONTEXTSCHEDULINGPRIORITY

Each compute device execution context has an assigned scheduling
priority. It is used by the compute device scheduler on the host to
pick contexts for execution. There is a global priority and a
priority within a process.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:09 +00:00
Iouri Tarassov 15f5e8e6c0 drivers: hv: dxgkrnl: Offer and reclaim allocations
Implement ioctls to offer and reclaim compute device allocations:
  - LX_DXOFFERALLOCATIONS,
  - LX_DXRECLAIMALLOCATIONS2

When a user mode driver (UMD) does not need to access an allocation,
it can "offer" it by issuing the LX_DXOFFERALLOCATIONS ioctl.  This
means that the allocation is not in use and its local device memory
could be evicted. The freed space could be given to another allocation.
When the allocation is again needed, the UMD can attempt to"reclaim"
the allocation by issuing the LX_DXRECLAIMALLOCATIONS2 ioctl. If the
allocation is still not evicted, the reclaim operation succeeds and no
other action is required. If the reclaim operation fails, the caller
must restore the content of the allocation before it can be used by
the device.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:09 +00:00
Iouri Tarassov 92b6a85b1a drivers: hv: dxgkrnl: Ioctls to query statistics and clock calibration
Implement ioctls to query statistics from the VGPU device
(LX_DXQUERYSTATISTICS) and to query clock calibration
(LX_DXQUERYCLOCKCALIBRATION).

The LX_DXQUERYSTATISTICS ioctl is used to query various statistics from
the compute device on the host.

The LX_DXQUERYCLOCKCALIBRATION ioctl queries the compute device clock
and is used for performance monitoring.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:09 +00:00
Iouri Tarassov 39d7838ac1 drivers: hv: dxgkrnl: Ioctl to put device to error state
Implement the ioctl to put the virtual compute device to the error
state (LX_DXMARKDEVICEASERROR).

This ioctl is used by the user mode driver when it detects an
unrecoverable error condition.

When a compute device is put to the error state, all subsequent
ioctl calls to the device will fail.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:09 +00:00
Iouri Tarassov ad1c37783f drivers: hv: dxgkrnl: The escape ioctl
Implement the escape ioctl (LX_DXESCAPE).

This ioctl is used to send/receive private data between user mode
compute device driver (guest) and kernel mode compute device
driver (host). It allows the user mode driver to extend the virtual
compute device API.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:08 +00:00
Iouri Tarassov c61c38dc6a drivers: hv: dxgkrnl: Query video memory information
Implement the ioctl to query video memory information from the host
(LX_DXQUERYVIDEOMEMORYINFO).

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:08 +00:00
Iouri Tarassov bcd35de6f4 drivers: hv: dxgkrnl: Flush heap transitions
Implement the ioctl to flush heap transitions
(LX_DXFLUSHHEAPTRANSITIONS).

The ioctl is used to ensure that the video memory manager on the host
flushes all internal operations.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:08 +00:00
Iouri Tarassov 77bf29aa37 drivers: hv: dxgkrnl: Manage device allocation properties
Implement ioctls to manage properties of a compute device allocation:
  - LX_DXUPDATEALLOCPROPERTY,
  - LX_DXSETALLOCATIONPRIORITY,
  - LX_DXGETALLOCATIONPRIORITY,
  - LX_DXQUERYALLOCATIONRESIDENCY.
  - LX_DXCHANGEVIDEOMEMORYRESERVATION,

The LX_DXUPDATEALLOCPROPERTY ioctl requests the host to update
various properties of a compute devoce allocation.

The LX_DXSETALLOCATIONPRIORITY and LX_DXGETALLOCATIONPRIORITY ioctls
are used to set/get allocation priority, which defines the
importance of the allocation to be in the local device memory.

The LX_DXQUERYALLOCATIONRESIDENCY ioctl queries if the allocation
is located in the compute device accessible memory.

The LX_DXCHANGEVIDEOMEMORYRESERVATION ioctl changes compute device
memory reservation of an allocation.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:08 +00:00
Iouri Tarassov f403f70856 drivers: hv: dxgkrnl: Map(unmap) CPU address to device allocation
Implement ioctls to map/unmap CPU virtual addresses to compute device
allocations - LX_DXLOCK2 and LX_DXUNLOCK2.

The LX_DXLOCK2 ioctl maps a CPU virtual address to a compute device
allocation. The allocation could be located in system memory or local
device memory on the host. When the device allocation is created
from the guest system memory (existing sysmem allocation), the
allocation CPU address is known and is returned to the caller.
For other CPU visible allocations the code flow is the following:
1. A VM bus message is sent to the host to map the allocation
2. The host allocates a portion of the guest IO space and maps it
   to the allocation backing store. The IO space address of the
   allocation is returned back to the guest.
3. The guest allocates a CPU virtual address and maps it to the IO
   space (see the dxg_map_iospace function).
4. The CPU VA is returned back to the caller
cpu_address_mapped and cpu_address_refcount are used to track how
many times an allocation was mapped.

The LX_DXUNLOCK2 ioctl unmaps a CPU virtual address from a compute
device allocation.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:08 +00:00
Iouri Tarassov 3c3a6d1ee1 drivers: hv: dxgkrnl: Query the dxgdevice state
Implement the ioctl to query the dxgdevice state - LX_DXGETDEVICESTATE.
The IOCTL is used to query the state of the given dxgdevice object (active,
error, etc.).

A call to the dxgdevice execution state could be high frequency.
The following method is used to avoid sending a synchronous VM
bus message to the host for every call:
- When a dxgdevice is created, a pointer to dxgglobal->device_state_counter
  is sent to the host
- Every time the device state on the host is changed, the host will send
  an asynchronous message to the guest (DXGK_VMBCOMMAND_SETGUESTDATA) and
  the guest will increment the device_state_counter value.
- the dxgdevice object has execution_state_counter member, which is equal
  to dxgglobal->device_state_counter value at the time when
  LX_DXGETDEVICESTATE was last processed..
- if execution_state_counter is different from device_state_counter, the
  dxgk_vmbcommand_getdevicestate VM bus message is sent to the host.
  Otherwise, the cached value is returned to the caller.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:08 +00:00
Iouri Tarassov 359c7b1ac2 drivers: hv: dxgkrnl: Share objects with the host
Implement the LX_DXSHAREOBJECTWITHHOST ioctl.
This ioctl is used to create a Windows NT handle on the host
for the given shared object (resource or sync object). The NT
handle is returned to the caller. The caller could share the NT
handle with a host application, which needs to access the object.
The host application can open the shared resource using the NT
handle. This way the guest and the host have access to the same
object.

Fix incorrect handling of error results from copy_from_user().

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:08 +00:00
Iouri Tarassov e22f5ce9f2 drivers: hv: dxgkrnl: Submit execution commands to the compute device
Implements ioctls for submission of compute device buffers for execution:
  - LX_DXSUBMITCOMMAND
    The ioctl is used to submit a command buffer to the device,
    working in the "packet scheduling" mode.

  - LX_DXSUBMITCOMMANDTOHWQUEUE
  The ioctl is used to submit a command buffer to the device,
  working in the "hardware scheduling" mode.

To improve performance both ioctls use asynchronous VM bus messages
to communicate with the host as these are high frequency operations.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:08 +00:00
Iouri Tarassov aa067375a7 drivers: hv: dxgkrnl: Creation of paging queue objects.
Implement ioctls for creation/destruction of the paging queue objects:
  - LX_DXCREATEPAGINGQUEUE,
  - LX_DXDESTROYPAGINGQUEUE

Paging queue objects (dxgpagingqueue) contain operations, which
handle residency of device accessible allocations. An allocation is
resident, when the device has access to it. For example, the allocation
resides in local device memory or device page tables point to system
memory which is made non-pageable.

Each paging queue has an associated monitored fence sync object, which
is used to detect when a paging operation is completed.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:08 +00:00
Iouri Tarassov cd03c649d3 drivers: hv: dxgkrnl: Sharing of sync objects
Implement creation of a shared sync objects and the ioctl for sharing
dxgsyncobject objects between processes in the virtual machine.

Sync objects are shared using file descriptor (FD) handles.
The name "NT handle" is used to be compatible with Windows implementation.

An FD handle is created by the LX_DXSHAREOBJECTS ioctl. The created FD
handle could be sent to another process using any Linux API.

To use a shared sync object in other ioctls, the object needs to be
opened using its FD handle. A sync object is opened by the
LX_DXOPENSYNCOBJECTFROMNTHANDLE2 ioctl, which returns a d3dkmthandle
value.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:08 +00:00
Iouri Tarassov 3f4a94d21a drivers: hv: dxgkrnl: Sharing of dxgresource objects
Implement creation of shared resources and ioctls for sharing
dxgresource objects between processes in the virtual machine.

A dxgresource object is a collection of dxgallocation objects.
The driver API allows addition/removal of allocations to a resource,
but has limitations on addition/removal of allocations to a shared
resource. When a resource is "sealed", addition/removal of allocations
is not allowed.

Resources are shared using file descriptor (FD) handles. The name
"NT handle" is used to be compatible with Windows implementation.

An FD handle is created by the LX_DXSHAREOBJECTS ioctl. The given FD
handle could be sent to another process using any Linux API.

To use a shared resource object in other ioctls the object needs to be
opened using its FD handle. An resource object is opened by the
LX_DXOPENRESOURCEFROMNTHANDLE ioctl. This ioctl returns a d3dkmthandle
value, which can be used to reference the resource object.

The LX_DXQUERYRESOURCEINFOFROMNTHANDLE ioctl is used to query private
driver data of a shared resource object. This private data needs to be
used to actually open the object using the LX_DXOPENRESOURCEFROMNTHANDLE
ioctl.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:08 +00:00
Iouri Tarassov 4e561ebc06 drivers: hv: dxgkrnl: Operations using sync objects
Implement ioctls to submit operations with compute device
sync objects:
  - the LX_DXSIGNALSYNCHRONIZATIONOBJECT ioctl.
    The ioctl is used to submit a signal to a sync object.
  - the LX_DXWAITFORSYNCHRONIZATIONOBJECT ioctl.
    The ioctl is used to submit a wait for a sync object
  - the LX_DXSIGNALSYNCHRONIZATIONOBJECTFROMCPU ioctl
    The ioctl is used to signal to a monitored fence sync object
    from a CPU thread.
  - the LX_DXSIGNALSYNCHRONIZATIONOBJECTFROMGPU ioctl.
    The ioctl is used to submit a signal to a monitored fence
    sync object..
  - the LX_DXSIGNALSYNCHRONIZATIONOBJECTFROMGPU2 ioctl.
    The ioctl is used to submit a signal to a monitored fence
    sync object.
  - the LX_DXWAITFORSYNCHRONIZATIONOBJECTFROMGPU ioctl.
    The ioctl is used to submit a wait for a monitored fence
    sync object.

Compute device synchronization objects are used to synchronize
execution of DMA buffers between different execution contexts.
Operations with sync objects include "signal" and "wait". A wait
for a sync object is satisfied when the sync object is signaled.

A signal operation could be submitted to a compute device context or
the sync object could be signaled by a CPU thread.

To improve performance, submitting operations to the host is done
asynchronously when the host supports it.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:07 +00:00
Iouri Tarassov 6afd35c73a drivers: hv: dxgkrnl: Creation of compute device sync objects
Implement ioctls to create and destroy compute devicesync objects:
 - the LX_DXCREATESYNCHRONIZATIONOBJECT ioctl,
 - the LX_DXDESTROYSYNCHRONIZATIONOBJECT ioctl.

Compute device synchronization objects are used to synchronize
execution of compute device commands, which are queued to
different execution contexts (dxgcontext objects).

There are several types of sync objects (mutex, monitored
fence, CPU event, fence). A "signal" or a "wait" operation
could be queued to an execution context.

Monitored fence sync objects are particular important.
A monitored fence object has a fence value, which could be
monitored by the compute device or by CPU. Therefore, a CPU
virtual address is allocated during object creation to allow
an application to read the fence value. dxg_map_iospace and
dxg_unmap_iospace implement creation of the CPU virtual address.
This is done as follow:
- The host allocates a portion of the guest IO space, which is mapped
  to the actual fence value memory on the host
- The host returns the guest IO space address to the guest
- The guest allocates a CPU virtual address and updates page tables
  to point to the IO space address

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:07 +00:00
Iouri Tarassov ce7a1f172d drivers: hv: dxgkrnl: Creation of compute device allocations and resources
Implemented ioctls to create and destroy virtual compute device
allocations (dxgallocation) and resources (dxgresource):
  - the LX_DXCREATEALLOCATION ioctl,
  - the LX_DXDESTROYALLOCATION2 ioctl.

Compute device allocations (dxgallocation objects) represent memory
allocation, which could be accessible by the device. Allocations can
be created around existing system memory (provided by an application)
or memory, allocated by dxgkrnl on the host.

Compute device resources (dxgresource objects) represent containers of
compute device allocations. Allocations could be dynamically added,
removed from a resource.

Each allocation/resource has associated driver private data, which
is provided during creation.

Each created resource or allocation have a handle (d3dkmthandle),
which is used to reference the corresponding object in other ioctls.

A dxgallocation can be resident (meaning that it is accessible by
the compute device) or evicted. When an allocation is evicted,
its content is stored in the backing store in system memory.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:07 +00:00
Iouri Tarassov 8668f2837c drivers: hv: dxgkrnl: Creation of dxgcontext objects
Implement ioctls for creation/destruction of dxgcontext
objects:
  - the LX_DXCREATECONTEXTVIRTUAL ioctl
  - the LX_DXDESTROYCONTEXT ioctl.

A dxgcontext object represents a compute device execution thread.
Ccompute device DMA buffers and synchronization operations are
submitted for execution to a dxgcontext. dxgcontexts objects
belong to a dxgdevice object.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:07 +00:00
Iouri Tarassov 615677695d drivers: hv: dxgkrnl: Creation of dxgdevice objects
Implement ioctls for creation and destruction of dxgdevice
objects:
 - the LX_DXCREATEDEVICE ioctl
 - the LX_DXDESTROYDEVICE ioctl

A dxgdevice object represents a container of other virtual
compute device objects (allocations, sync objects, contexts,
etc.). It belongs to a dxgadapter object.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:07 +00:00
Iouri Tarassov 56538daeb2 drivers: hv: dxgkrnl: Opening of /dev/dxg device and dxgprocess creation
- Implement opening of the device (/dev/dxg) file object and creation of
dxgprocess objects.

- Add VM bus messages to create and destroy the host side of a dxgprocess
object.

- Implement the handle manager, which manages d3dkmthandle handles
for the internal process objects. The handles are used by a user mode
client to reference dxgkrnl objects.

dxgprocess is created for each process, which opens /dev/dxg.
dxgprocess is ref counted, so the existing dxgprocess objects is used
for a process, which opens the device object multiple time.
dxgprocess is destroyed when the file object is released.

A corresponding dxgprocess object is created on the host for every
dxgprocess object in the guest.

When a dxgkrnl object is created, in most cases the corresponding
object is created in the host. The VM references the host objects by
handles (d3dkmthandle). d3dkmthandle values for a host object and
the corresponding VM object are the same. A host handle is allocated
first and its value is assigned to the guest object.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:07 +00:00
Iouri Tarassov 1676b11742 drivers: hv: dxgkrnl: Add VMBus message support, initialize VMBus channels.
Implement support for sending/receiving VMBus messages between
the host and the guest.

Initialize the VMBus channels and notify the host about IO space
settings of the VMBus global channel.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:07 +00:00
Iouri Tarassov 334ce7fe44 drivers: hv: dxgkrnl: Driver initialization and loading
- Create skeleton and add basic functionality for the Hyper-V
compute device driver (dxgkrnl).

- Register for PCI and VMBus driver notifications and handle
initialization of VMBus channels.

- Connect the dxgkrnl module to the drivers/hv/ Makefile and Kconfig

- Create a MAINTAINERS entry

A VMBus channel is a communication interface between the Hyper-V guest
and the host. The are two type of VMBus channels, used in the driver:
  - the global channel
  - per virtual compute device channel

A PCI device is created for each virtual compute device, projected
by the host. The device vendor is PCI_VENDOR_ID_MICROSOFT and device
id is PCI_DEVICE_ID_VIRTUAL_RENDER. dxg_pci_probe_device handles
arrival of such devices. The PCI config space of the virtual compute
device has luid of the corresponding virtual compute device VM
bus channel. This is how the compute device adapter objects are
linked to VMBus channels.

VMBus interface version is exchanged by reading/writing the PCI config
space of the virtual compute device.

The IO space is used to handle CPU accessible compute device
allocations. Hyper-V allocates IO space for the global VMBus channel.

Signed-off-by: Iouri Tarassov <iourit@linux.microsoft.com>
2024-07-09 23:40:07 +00:00
Arnd Bergmann 16c0403b7d syscalls: fix compat_sys_io_pgetevents_time64 usage
commit d3882564a77c21eb746ba5364f3fa89b88de3d61 upstream.

Using sys_io_pgetevents() as the entry point for compat mode tasks
works almost correctly, but misses the sign extension for the min_nr
and nr arguments.

This was addressed on parisc by switching to
compat_sys_io_pgetevents_time64() in commit 6431e92fc8 ("parisc:
io_pgetevents_time64() needs compat syscall in 32-bit compat mode"),
as well as by using more sophisticated system call wrappers on x86 and
s390. However, arm64, mips, powerpc, sparc and riscv still have the
same bug.

Change all of them over to use compat_sys_io_pgetevents_time64()
like parisc already does. This was clearly the intention when the
function was originally added, but it got hooked up incorrectly in
the tables.

Cc: stable@vger.kernel.org
Fixes: 48166e6ea4 ("y2038: add 64-bit time_t syscalls to all 32-bit architectures")
Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-05 09:14:50 +02:00
Matthias Goergens f571c8ab18 hugetlb_encode.h: fix undefined behaviour (34 << 26)
commit 710bb68c2e upstream.

Left-shifting past the size of your datatype is undefined behaviour in C.
The literal 34 gets the type `int`, and that one is not big enough to be
left shifted by 26 bits.

An `unsigned` is long enough (on any machine that has at least 32 bits for
their ints.)

For uniformity, we mark all the literals as unsigned.  But it's only
really needed for HUGETLB_FLAG_ENCODE_16GB.

Thanks to Randy Dunlap for an initial review and suggestion.

Link: https://lkml.kernel.org/r/20220905031904.150925-1-matthias.goergens@gmail.com
Signed-off-by: Matthias Goergens <matthias.goergens@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-05 09:14:23 +02:00
Anton Protopopov f654b258e9 bpf: Pack struct bpf_fib_lookup
[ Upstream commit f91717007217d975aa975ddabd91ae1a107b9bff ]

The struct bpf_fib_lookup is supposed to be of size 64. A recent commit
59b418c7063d ("bpf: Add a check for struct bpf_fib_lookup size") added
a static assertion to check this property so that future changes to the
structure will not accidentally break this assumption.

As it immediately turned out, on some 32-bit arm systems, when AEABI=n,
the total size of the structure was equal to 68, see [1]. This happened
because the bpf_fib_lookup structure contains a union of two 16-bit
fields:

    union {
            __u16 tot_len;
            __u16 mtu_result;
    };

which was supposed to compile to a 16-bit-aligned 16-bit field. On the
aforementioned setups it was instead both aligned and padded to 32-bits.

Declare this inner union as __attribute__((packed, aligned(2))) such
that it always is of size 2 and is aligned to 16 bits.

  [1] https://lore.kernel.org/all/CA+G9fYtsoP51f-oP_Sp5MOq-Ffv8La2RztNpwvE6+R1VtFiLrw@mail.gmail.com/#t

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: e1850ea9bd ("bpf: bpf_fib_lookup return MTU value as output when looked up")
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240403123303.1452184-1-aspsk@isovalent.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-06-16 13:39:18 +02:00
Gergo Koteles 18c51d97a2 Input: allocate keycode for Display refresh rate toggle
[ Upstream commit cfeb98b95fff25c442f78a6f616c627bc48a26b7 ]

Newer Lenovo Yogas and Legions with 60Hz/90Hz displays send a wmi event
when Fn + R is pressed. This is intended for use to switch between the
two refresh rates.

Allocate a new KEY_REFRESH_RATE_TOGGLE keycode for it.

Signed-off-by: Gergo Koteles <soyer@irl.hu>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Link: https://lore.kernel.org/r/15a5d08c84cf4d7b820de34ebbcf8ae2502fb3ca.1710065750.git.soyer@irl.hu
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-04-13 13:01:46 +02:00
Amir Goldstein b65b2d4187 fanotify: introduce FAN_MARK_IGNORE
[ Upstream commit e252f2ed1c ]

This flag is a new way to configure ignore mask which allows adding and
removing the event flags FAN_ONDIR and FAN_EVENT_ON_CHILD in ignore mask.

The legacy FAN_MARK_IGNORED_MASK flag would always ignore events on
directories and would ignore events on children depending on whether
the FAN_EVENT_ON_CHILD flag was set in the (non ignored) mask.

FAN_MARK_IGNORE can be used to ignore events on children without setting
FAN_EVENT_ON_CHILD in the mark's mask and will not ignore events on
directories unconditionally, only when FAN_ONDIR is set in ignore mask.

The new behavior is non-downgradable.  After calling fanotify_mark() with
FAN_MARK_IGNORE once, calling fanotify_mark() with FAN_MARK_IGNORED_MASK
on the same object will return EEXIST error.

Setting the event flags with FAN_MARK_IGNORE on a non-dir inode mark
has no meaning and will return ENOTDIR error.

The meaning of FAN_MARK_IGNORED_SURV_MODIFY is preserved with the new
FAN_MARK_IGNORE flag, but with a few semantic differences:

1. FAN_MARK_IGNORED_SURV_MODIFY is required for filesystem and mount
   marks and on an inode mark on a directory. Omitting this flag
   will return EINVAL or EISDIR error.

2. An ignore mask on a non-directory inode that survives modify could
   never be downgraded to an ignore mask that does not survive modify.
   With new FAN_MARK_IGNORE semantics we make that rule explicit -
   trying to update a surviving ignore mask without the flag
   FAN_MARK_IGNORED_SURV_MODIFY will return EEXIST error.

The conveniene macro FAN_MARK_IGNORE_SURV is added for
(FAN_MARK_IGNORE | FAN_MARK_IGNORED_SURV_MODIFY), because the
common case should use short constant names.

Link: https://lore.kernel.org/r/20220629144210.2983229-4-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-10 16:19:07 +02:00
Amir Goldstein 7fcef3285a fanotify: implement "evictable" inode marks
[ Upstream commit 7d5e005d98 ]

When an inode mark is created with flag FAN_MARK_EVICTABLE, it will not
pin the marked inode to inode cache, so when inode is evicted from cache
due to memory pressure, the mark will be lost.

When an inode mark with flag FAN_MARK_EVICATBLE is updated without using
this flag, the marked inode is pinned to inode cache.

When an inode mark is updated with flag FAN_MARK_EVICTABLE but an
existing mark already has the inode pinned, the mark update fails with
error EEXIST.

Evictable inode marks can be used to setup inode marks with ignored mask
to suppress events from uninteresting files or directories in a lazy
manner, upon receiving the first event, without having to iterate all
the uninteresting files or directories before hand.

The evictbale inode mark feature allows performing this lazy marks setup
without exhausting the system memory with pinned inodes.

This change does not enable the feature yet.

Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiRDpuS=2uA6+ZUM7yG9vVU-u212tkunBmSnP_u=mkv=Q@mail.gmail.com/
Link: https://lore.kernel.org/r/20220422120327.3459282-15-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-10 16:19:03 +02:00
Amir Goldstein a187e777d7 fanotify: report old and/or new parent+name in FAN_RENAME event
[ Upstream commit 7326e382c2 ]

In the special case of FAN_RENAME event, we report old or new or both
old and new parent+name.

A single info record will be reported if either the old or new dir
is watched and two records will be reported if both old and new dir
(or their filesystem) are watched.

The old and new parent+name are reported using new info record types
FAN_EVENT_INFO_TYPE_{OLD,NEW}_DFID_NAME, so if a single info record
is reported, it is clear to the application, to which dir entry the
fid+name info is referring to.

Link: https://lore.kernel.org/r/20211129201537.1932819-11-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-10 16:18:55 +02:00
Amir Goldstein 9acb63f955 fanotify: record old and new parent and name in FAN_RENAME event
[ Upstream commit 3982534ba5 ]

In the special case of FAN_RENAME event, we record both the old
and new parent and name.

Link: https://lore.kernel.org/r/20211129201537.1932819-9-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-10 16:18:55 +02:00
Amir Goldstein 8bd3d40ea3 fanotify: introduce group flag FAN_REPORT_TARGET_FID
[ Upstream commit d61fd650e9 ]

FAN_REPORT_FID is ambiguous in that it reports the fid of the child for
some events and the fid of the parent for create/delete/move events.

The new FAN_REPORT_TARGET_FID flag is an implicit request to report
the fid of the target object of the operation (a.k.a the child inode)
also in create/delete/move events in addition to the fid of the parent
and the name of the child.

To reduce the test matrix for uninteresting use cases, the new
FAN_REPORT_TARGET_FID flag requires both FAN_REPORT_NAME and
FAN_REPORT_FID.  The convenience macro FAN_REPORT_DFID_NAME_TARGET
combines FAN_REPORT_TARGET_FID with all the required flags.

Link: https://lore.kernel.org/r/20211129201537.1932819-4-amir73il@gmail.com
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-10 16:18:54 +02:00
NeilBrown f829bb3a06 NFSD: move filehandle format declarations out of "uapi".
[ Upstream commit ef5825e3cf ]

A small part of the declaration concerning filehandle format are
currently in the "uapi" include directory:
   include/uapi/linux/nfsd/nfsfh.h

There is a lot more to the filehandle format, including "enum fid_type"
and "enum nfsd_fsid" which are not exported via "uapi".

This small part of the filehandle definition is of minimal use outside
of the kernel, and I can find no evidence that an other code is using
it. Certainly nfs-utils and wireshark (The most likely candidates) do not
use these declarations.

So move it out of "uapi" by copying the content from
  include/uapi/linux/nfsd/nfsfh.h
into
  fs/nfsd/nfsfh.h

A few unnecessary "#include" directives are not copied, and neither is
the #define of fh_auth, which is annotated as being for userspace only.

The copyright claims in the uapi file are identical to those in the nfsd
file, so there is no need to copy those.

The "__u32" style integer types are only needed in "uapi".  In
kernel-only code we can use the more familiar "u32" style.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-10 16:18:53 +02:00
Gabriel Krisman Bertazi c7c013dff4 fanotify: Emit generic error info for error event
[ Upstream commit 130a3c7421 ]

The error info is a record sent to users on FAN_FS_ERROR events
documenting the type of error.  It also carries an error count,
documenting how many errors were observed since the last reporting.

Link: https://lore.kernel.org/r/20211025192746.66445-28-krisman@collabora.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-10 16:18:52 +02:00
Gabriel Krisman Bertazi 11280c7181 fanotify: Reserve UAPI bits for FAN_FS_ERROR
[ Upstream commit 8d11a4f43e ]

FAN_FS_ERROR allows reporting of event type FS_ERROR to userspace, which
is a mechanism to report file system wide problems via fanotify.  This
commit preallocate userspace visible bits to match the FS_ERROR event.

Link: https://lore.kernel.org/r/20211025192746.66445-19-krisman@collabora.com
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2024-04-10 16:18:51 +02:00
Martynas Pumputis 68dbe92d67 bpf: Derive source IP addr via bpf_*_fib_lookup()
commit dab4e1f06cabb6834de14264394ccab197007302 upstream.

Extend the bpf_fib_lookup() helper by making it to return the source
IPv4/IPv6 address if the BPF_FIB_LOOKUP_SRC flag is set.

For example, the following snippet can be used to derive the desired
source IP address:

    struct bpf_fib_lookup p = { .ipv4_dst = ip4->daddr };

    ret = bpf_skb_fib_lookup(skb, p, sizeof(p),
            BPF_FIB_LOOKUP_SRC | BPF_FIB_LOOKUP_SKIP_NEIGH);
    if (ret != BPF_FIB_LKUP_RET_SUCCESS)
        return TC_ACT_SHOT;

    /* the p.ipv4_src now contains the source address */

The inability to derive the proper source address may cause malfunctions
in BPF-based dataplanes for hosts containing netdevs with more than one
routable IP address or for multi-homed hosts.

For example, Cilium implements packet masquerading in BPF. If an
egressing netdev to which the Cilium's BPF prog is attached has
multiple IP addresses, then only one [hardcoded] IP address can be used for
masquerading. This breaks connectivity if any other IP address should have
been selected instead, for example, when a public and private addresses
are attached to the same egress interface.

The change was tested with Cilium [1].

Nikolay Aleksandrov helped to figure out the IPv6 addr selection.

[1]: https://github.com/cilium/cilium/pull/28283

Signed-off-by: Martynas Pumputis <m@lambda.lt>
Link: https://lore.kernel.org/r/20231007081415.33502-2-m@lambda.lt
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06 14:38:50 +00:00
Louis DeLosSantos 39b4ee40d2 bpf: Add table ID to bpf_fib_lookup BPF helper
commit 8ad77e72ca upstream.

Add ability to specify routing table ID to the `bpf_fib_lookup` BPF
helper.

A new field `tbid` is added to `struct bpf_fib_lookup` used as
parameters to the `bpf_fib_lookup` BPF helper.

When the helper is called with the `BPF_FIB_LOOKUP_DIRECT` and
`BPF_FIB_LOOKUP_TBID` flags the `tbid` field in `struct bpf_fib_lookup`
will be used as the table ID for the fib lookup.

If the `tbid` does not exist the fib lookup will fail with
`BPF_FIB_LKUP_RET_NOT_FWDED`.

The `tbid` field becomes a union over the vlan related output fields
in `struct bpf_fib_lookup` and will be zeroed immediately after usage.

This functionality is useful in containerized environments.

For instance, if a CNI wants to dictate the next-hop for traffic leaving
a container it can create a container-specific routing table and perform
a fib lookup against this table in a "host-net-namespace-side" TC program.

This functionality also allows `ip rule` like functionality at the TC
layer, allowing an eBPF program to pick a routing table based on some
aspect of the sk_buff.

As a concrete use case, this feature will be used in Cilium's SRv6 L3VPN
datapath.

When egress traffic leaves a Pod an eBPF program attached by Cilium will
determine which VRF the egress traffic should target, and then perform a
FIB lookup in a specific table representing this VRF's FIB.

Signed-off-by: Louis DeLosSantos <louis.delos.devel@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230505-bpf-add-tbid-fib-lookup-v2-1-0a31c22c748c@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06 14:38:50 +00:00
Martin KaFai Lau 75ca92271d bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
commit 31de4105f0 upstream.

The bpf_fib_lookup() also looks up the neigh table.
This was done before bpf_redirect_neigh() was added.

In the use case that does not manage the neigh table
and requires bpf_fib_lookup() to lookup a fib to
decide if it needs to redirect or not, the bpf prog can
depend only on using bpf_redirect_neigh() to lookup the
neigh. It also keeps the neigh entries fresh and connected.

This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid
the double neigh lookup when the bpf prog always call
bpf_redirect_neigh() to do the neigh lookup. The params->smac
output is skipped together when SKIP_NEIGH is set because
bpf_redirect_neigh() will figure out the smac also.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-06 14:38:50 +00:00
Justin Iurman 28bbdb4e19 uapi: in6: replace temporary label with rfc9486
[ Upstream commit 6a2008641920a9c6fe1abbeb9acbec463215d505 ]

Not really a fix per se, but IPV6_TLV_IOAM is still tagged as "TEMPORARY
IANA allocation for IOAM", while RFC 9486 is available for some time
now. Just update the reference.

Fixes: 9ee11f0fff ("ipv6: ioam: Data plane support for Pre-allocated Trace")
Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240226124921.9097-1-justin.iurman@uliege.be
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-06 14:38:45 +00:00