Граф коммитов

1059909 Коммитов

Автор SHA1 Сообщение Дата
Greg Kroah-Hartman 0ac467447d UIO: use default_groups in kobj_type
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.  Move the UIO code to use default_groups field which has been the
preferred way since aa30f47cf6 ("kobject: Add support for default
attribute groups to kobj_type") so that we can soon get rid of the
obsolete default_attrs field.

Link: https://lore.kernel.org/r/20211228131319.249324-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-29 10:54:50 +01:00
Alexander Usyskin 43aa323e31 mei: cleanup status before client dma setup call
The upper layer may retry call to mei_cl_dma_alloc_and_map(),
in that case the client status may be non-zero after the previous call
and the wait condition will be true immediately.
Set cl->status to zero to allow waiting for an actual result
from the firmware.

Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20211223094705.204624-2-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-27 10:30:48 +01:00
Alexander Usyskin 38be5687da mei: add POWERING_DOWN into device state print
The POWERING_DOWN state string was missing from
the device states list, add it.

Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Link: https://lore.kernel.org/r/20211223094705.204624-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-27 10:30:48 +01:00
Greg Kroah-Hartman 651425fb24 This tag contains habanalabs driver changes for v5.17:
- Support reset-during-reset. In case the f/w notifies the driver
   that the f/w is going to reset the device, the driver should
   support that even if it is in the middle of doing another
   reset
 
 - Support events from f/w that arrive during device resets.
   These events would be ignored which is bad as critical errors
   would not be reported and treated by the driver.
 
 - Don't kill processes that hold the control device open during
   hard-reset of the device. The control device operations can't
   crash if done during hard-reset. And usually, only monitoring
   applications are using the control device, so killing them
   defies their purpose.
 
 - Fix handling of hwmon nodes when working with legacy f/w
 
 - Change the compute context pointer to be boolean. This pointer
   was abused by multiple code paths that wanted fast access to
   the compute context structure.
 
 - Add uapi to fetch historical errors. This is necessary as errors
   sometimes result in hard-reset where the user application is
   being terminated.
 
 - Optimize GAUDI's MMU cache invalidation.
 
 - Add support for loading the latest f/w.
 
 - Add uapi to fetch HBM replacement and pending rows information.
 
 - Multiple bug fixes to the reset code.
 
 - Multiple bug fixes for Multi-CS ioctl code.
 
 - Multiple bug fixes for wait-for-interrupt ioctl code.
 
 - Many small bug fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE7TEboABC71LctBLFZR1NuKta54AFAmHJcoMACgkQZR1NuKta
 54BXBwf+NYkQ8w3TUR2JSkPs3/pk3XFL+OAXq9NBn2bGJ0qoj8VUz6VkcphJ3vAq
 3PyLLxeVs8dk+cVePKEnBj1krED5fFBaxIQbp+g1VjS2LsMiHm2Nwi0iDkLWqo1i
 7jwFgATzmnsRvEPhhwke7/Gd+rZ9eCfJ2NuE5Hbb2+JkzIaOCo6ZUZns9diSVDsq
 X9zaIQSnYxMd4XEhE8CP4BChJO87i+Ik56ZbqrtKhpLz3kkV+Re5S6dnsEluyGuZ
 FXmUXyu5I8NbsGmmHZ9aWrCm/DmhKsF8/UWhKD5i/ECvpQ+U4nO2ii9XiCeuaDwQ
 yhsYe1LbqMlroJVkIqTnbBbrSSP6bA==
 =anio
 -----END PGP SIGNATURE-----

Merge tag 'misc-habanalabs-next-2021-12-27' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux into char-misc-next

Oded writes:

This tag contains habanalabs driver changes for v5.17:

- Support reset-during-reset. In case the f/w notifies the driver
  that the f/w is going to reset the device, the driver should
  support that even if it is in the middle of doing another
  reset

- Support events from f/w that arrive during device resets.
  These events would be ignored which is bad as critical errors
  would not be reported and treated by the driver.

- Don't kill processes that hold the control device open during
  hard-reset of the device. The control device operations can't
  crash if done during hard-reset. And usually, only monitoring
  applications are using the control device, so killing them
  defies their purpose.

- Fix handling of hwmon nodes when working with legacy f/w

- Change the compute context pointer to be boolean. This pointer
  was abused by multiple code paths that wanted fast access to
  the compute context structure.

- Add uapi to fetch historical errors. This is necessary as errors
  sometimes result in hard-reset where the user application is
  being terminated.

- Optimize GAUDI's MMU cache invalidation.

- Add support for loading the latest f/w.

- Add uapi to fetch HBM replacement and pending rows information.

- Multiple bug fixes to the reset code.

- Multiple bug fixes for Multi-CS ioctl code.

- Multiple bug fixes for wait-for-interrupt ioctl code.

- Many small bug fixes and cleanups.

* tag 'misc-habanalabs-next-2021-12-27' of https://git.kernel.org/pub/scm/linux/kernel/git/ogabbay/linux: (70 commits)
  habanalabs: support hard-reset scheduling during soft-reset
  habanalabs: add a lock to protect multiple reset variables
  habanalabs: refactor reset information variables
  habanalabs: handle skip multi-CS if handling not done
  habanalabs: add CPU-CP packet for engine core ASID cfg
  habanalabs: replace some -ENOTTY with -EINVAL
  habanalabs: fix comments according to kernel-doc
  habanalabs: fix endianness when reading cpld version
  habanalabs: change wait_for_interrupt implementation
  habanalabs: prevent wait if CS in multi-CS list completed
  habanalabs: modify cpu boot status error print
  habanalabs: clean MMU headers definitions
  habanalabs: expose soft reset sysfs nodes for inference ASIC
  habanalabs: sysfs support for two infineon versions
  habanalabs: keep control device alive during hard reset
  habanalabs: fix hwmon handling for legacy f/w
  habanalabs: add current PI value to cpu packets
  habanalabs: remove in_debug check in device open
  habanalabs: return correct clock throttling period
  habanalabs: wait again for multi-CS if no CS completed
  ...
2021-12-27 10:28:13 +01:00
Greg Kroah-Hartman 372c73b469 Update extcon next for v5.17
Detailed description for this pull request:
 1. Remove duplicate code in extcon_set_state_sync() in extcon core
 2. Fix non-kernel-doc comment for extcon-usb-gpio.c
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEsSpuqBtbWtRe4rLGnM3fLN7rz1MFAmHFaOAWHGN3MDAuY2hv
 aUBzYW1zdW5nLmNvbQAKCRCczd8s3uvPU/sMD/9BPPoxUTV/he+qwRJb3AKGOgcx
 87WOU4j/psL0dJdevxyGLHa7m6x8a8SQZDom4xo//ZkriPizZqI36VpBSHAXkdU6
 LZhEs426aRTL9ZyP4TY2wHj7f8/DvEf3DZndWrorbMIVd0eIrz3zFNLAZNXpHHxY
 ZL/ljGPbTtcIdMb3k2PXiIJtU6JyfQXfN1cqXemF1vXLBgvVZzjVocG6fYyOnSo+
 4xfVStUBpk1h9CuJyvUZ+/VUr5D6SIiYFrzD73hLh0vsk2Et4GcZJEq0e3SS9155
 ycPWPVvhH+qzAWajKfwUkzORHDf/VBApzzV1qMLSmgbU50X56GRQ76xLto8ogGJe
 GhB3hg02iUvdEnDhspUXpxxB9knYQKu+L2UGC7m53EKU1moKZYvHmKdZuKIIoauW
 v1T7ou1oqbhYD/XM6B7rNXla3tzOy3WNgcs8XaDwRaqyz/A598ZcsUhnWo/Hx2vl
 ptVm/4CciderTuj1Z42aN/eruMGzOK3AsYOZKf4hd1zAl03ktC7OE+iWF7oLvMza
 Vz4B1uBMTVxjTUojNc1u9XR+2V7HM+2HUrEQNykvdttYAFLhiNzkNr4B+BU4N7HA
 mnG9fQLJCqvHPjhYc7IqvRpBaG77JbMiZtiqusjAHNANqn4zeABcMmMlYZ4PyI18
 LsHKHyj7BwgrDGQStQ==
 =HQLV
 -----END PGP SIGNATURE-----

Merge tag 'extcon-next-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon into char-misc-next

Chanwoo writes:

Update extcon next for v5.17

Detailed description for this pull request:
1. Remove duplicate code in extcon_set_state_sync() in extcon core
2. Fix non-kernel-doc comment for extcon-usb-gpio.c

* tag 'extcon-next-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/chanwoo/extcon:
  extcon: Deduplicate code in extcon_set_state_sync()
  extcon: usb-gpio: fix a non-kernel-doc comment
2021-12-27 10:27:01 +01:00
Greg Kroah-Hartman 1bc4deedc2 interconnect changes for 5.17
Here are the interconnect changes for the 5.17-rc1 merge window
 consisting of new drivers, minor changes and fixes.
 
 New drivers:
  - New driver for MSM8996 platforms
  - New driver for SC7280 EPSS L3 hardware
  - New driver for QCM2290 platforms
  - New driver for SM8450 platforms
 
 Driver changes:
  - dt-bindings: interconnect: Combine SDM660 bindings into RPM schema
  - icc-rpm: Add support for bus power domain
  - icc-rpm: Use NOC_QOS_MODE_INVALID for qos_mode check
  - icc-rpm: Define ICC device type
  - icc-rpm: Add QNOC type QoS support
  - icc-rpm: Support child NoC device probe
  - icc-rpm: Prevent integer overflow in rate
  - icc-rpmh: Add BCMs to commit list in pre_aggregate
 
 Signed-off-by: Georgi Djakov <djakov@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJhxGkPAAoJEIDQzArG2BZjsRwP/3nr3s4b60PZ9hoxSxcBmmoD
 OAwd1tIeyXGxkeKHeJDdDj92hvgdvZwpl/H+wLnzy4VZmNqrIkREl23KaSlLeJE7
 P4U0ZUj7DYqXhR6+D1MiWa8+2rzQ/kqbDiXsFPEQL5Wa2/rGVgq95D/znz5Szv6f
 EtZUDBBtzR9s56LlL9+pprApWRz4+bf4YfX6AsMuKaKedjqzEKxLov2+wC+6zGoV
 2vkFQODxntsoZ95cwRz6GVBAZle/6O/NiSc6ndx/MpPhIQpsr2F0Er1nsgERxDum
 EsC3JVl+7aoHTym7npFp/JT04YBpHC3l7SnJff0EDws7MSiBjkp1WNIlYwP5juuA
 PZ3ziLtSzy36Vi0OVpykZSdwbeS9lNuz58CI74fIzcfGA1H1q3s8K4JHS7WMcgsQ
 EgghbCaiREZKBhZa6gMn4d6cufTpqAImDYae8gD788ziw7x+Pr3P6RgWbn+PE7mI
 odlx3OKwnCIsDPTj+5B5rXFKkfzV7grX1HI0sUZDIJ8Cv4Qs1kcLWP6Sg4ZafOMj
 ZcwYoTo9dMy81aDfEKIlpXR2ILxcb4vJTNTSdmFsPmNX+01X8rPlk/uCPPx9nClW
 acpqJLfCMKKbSwoZwrkWZQo2XCJBq/jan5dg8Q/fMLsqDORUtmnkK+hzR75mHHhP
 IydRk18TN1TQUhnNiFcd
 =dXxk
 -----END PGP SIGNATURE-----

Merge tag 'icc-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc into char-misc-next

Georgi writes:

interconnect changes for 5.17

Here are the interconnect changes for the 5.17-rc1 merge window
consisting of new drivers, minor changes and fixes.

New drivers:
 - New driver for MSM8996 platforms
 - New driver for SC7280 EPSS L3 hardware
 - New driver for QCM2290 platforms
 - New driver for SM8450 platforms

Driver changes:
 - dt-bindings: interconnect: Combine SDM660 bindings into RPM schema
 - icc-rpm: Add support for bus power domain
 - icc-rpm: Use NOC_QOS_MODE_INVALID for qos_mode check
 - icc-rpm: Define ICC device type
 - icc-rpm: Add QNOC type QoS support
 - icc-rpm: Support child NoC device probe
 - icc-rpm: Prevent integer overflow in rate
 - icc-rpmh: Add BCMs to commit list in pre_aggregate

Signed-off-by: Georgi Djakov <djakov@kernel.org>

* tag 'icc-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/djakov/icc:
  interconnect: qcom: Add QCM2290 driver support
  dt-bindings: interconnect: Add Qualcomm QCM2290 NoC support
  interconnect: icc-rpm: Support child NoC device probe
  interconnect: icc-rpm: Add QNOC type QoS support
  interconnect: icc-rpm: Define ICC device type
  interconnect: qcom: Add SM8450 interconnect provider driver
  dt-bindings: interconnect: Add Qualcomm SM8450 DT bindings
  interconnect: qcom: rpm: Prevent integer overflow in rate
  interconnect: icc-rpm: Use NOC_QOS_MODE_INVALID for qos_mode check
  interconnect: qcom: icc-rpmh: Add BCMs to commit list in pre_aggregate
  interconnect: qcom: Add MSM8996 interconnect provider driver
  dt-bindings: interconnect: Add Qualcomm MSM8996 DT bindings
  interconnect: icc-rpm: Add support for bus power domain
  dt-bindings: interconnect: Combine SDM660 bindings into RPM schema
  interconnect: qcom: Add EPSS L3 support on SC7280
  dt-bindings: interconnect: Add EPSS L3 DT binding on SC7280
2021-12-27 10:23:35 +01:00
Ofir Bitton ce80098db2 habanalabs: support hard-reset scheduling during soft-reset
As hard-reset can be requested during soft-reset, driver must allow
it or else critical events received during soft-reset will be
ignored.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 14:42:31 +02:00
Ofir Bitton 42eb2872e0 habanalabs: add a lock to protect multiple reset variables
Atomic operations during reset are replaced by a spinlock in order
to have the ability to protect more than a single variable.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 14:42:11 +02:00
Ofir Bitton eb13529191 habanalabs: refactor reset information variables
Unify variables related to device reset, which will help us to
add some new reset functionality in future patches.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 14:41:28 +02:00
Ohad Sharabi 60bf3bfb5a habanalabs: handle skip multi-CS if handling not done
This patch fixes issue in which we have timeout for multi-CS although
the CS in the list actually completed.

Example scenario (the two threads marked as WAIT for the thread that
handles the wait_for_multi_cs and CMPL as the thread that signal
completion for both CS and multi-CS):
1. Submit CS with sequence X
2. [WAIT]: call wait_for_multi_cs with single CS X
3. [CMPL]: CS X do invoke complete_all for both CS and multi-CS
           (multi_cs_completion_done still false)
4. [WAIT]: enter poll_fences, reinit the completion and find the CS
           as completed when asking on the fence but multi_cs_done is
	   still false it returns that no CS actually completed
5. [CMPL]: set multi_cs_handling_done as true
6. [WAIT]: wait for completion but no CS to awake the wait context
           and hence wait till timeout

Solution: if CS detected as completed in poll_fences but multi_cs_done
          is still false invoke complete_all to the multi-CS completion
	  and so it will not go to sleep in wait_for_completion but
	  rather will have a "second chance" to wait for
	  multi_cs_completion_done.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 14:40:25 +02:00
Tomer Tayar f297a0e9fe habanalabs: add CPU-CP packet for engine core ASID cfg
In some cases the driver cannot configure ASID of some engines due to
the security level of the relevant registers.
For this a new CPU-CP packet is introduced, which will allow the driver
to ask the F/W to do this configuration instead.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 14:39:53 +02:00
Oded Gabbay 519f4ed0a0 habanalabs: replace some -ENOTTY with -EINVAL
-ENOTTY is returned in case of error in the ioctl arguments themselves,
such as function that doesn't exists.

In all other cases, where the error is in the arguments of the custom
data structures that we define that are passed in the various ioctls,
we need to return -EINVAL.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:10 +02:00
Ofir Bitton 0a63ac769b habanalabs: fix comments according to kernel-doc
Fix missing fields, descriptions not according to kernel-doc style.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:10 +02:00
Ofir Bitton a7224c2116 habanalabs: fix endianness when reading cpld version
Current sysfs implementation does not take endianness into
consideration when dumping the cpld version.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
farah kassabri b9d31cada7 habanalabs: change wait_for_interrupt implementation
Currently the cq counters are allocated in userspace memory,
and mapped by the driver to the device address space.

A new requirement that is part of new future API related to this one,
requires that cq counters will be allocated in kernel memory.

We leverage the existing cb_create API with KERNEL_MAPPED flag set to
allocate this memory.

That way we gain two things:
1. The memory cannot be freed while in use since it's protected
by refcount in driver.

2. No need to wake up the user thread upon each interrupt from CQ,
because the kernel has direct access to the counter. Therefore,
it can make comparison with the target value in the interrupt
handler and wake up the user thread only if the counter reaches the
target value. This is instead of waking the thread up to copy counter
value from user then go sleep again if target value wasn't reached.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ohad Sharabi e2558f0f84 habanalabs: prevent wait if CS in multi-CS list completed
By the original design we assumed that if we "miss" multi CS completion
it is of no severe consequence as we'll just call wait_for_multi_cs
again.

Sequence of events for such scenario:
1. user submit CS with sequence N
2. user calls wait for multi-CS with only CS #N in the list
3. the multi CS call starts with poll of the CSs but find that none
   completed (while CS #N did not completed yet)
4. now, multi CS #N complete but multi CS CTX was not yet created for
   the above multi-CS. so, attempt to complete multi-CS fails (as no
   multi CS CTX exist)
5. wait_for_multi_cs call now does init_wait_multi_cs_completion (and
   for this create the multi-CS CTX)
6. wait_for_multi_cs wits on completion but will not get one as CS #N
   already completed

To fix the issue we initialize the multi-CS CTX prior polling the
fences.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ofir Bitton 86c00b2c36 habanalabs: modify cpu boot status error print
As BTL can be replaced by ROM we should modify relevant error print.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ohad Sharabi d636a932b3 habanalabs: clean MMU headers definitions
During the MMU development the MMU header files were left with unclean
definitions:

- MMU "version specific" definitions that were left in the mmu_general
  file
- unused definitions

This patch attempts, where possible, to keep definitions that can serve
multiple MMU versions (but that are not tightly bound with specific MMU
arch) in the mmu_general header file (e.g. different definitions for
number of HOPs).

Otherwise, move MMU version specific definitions (e.g. HOPs masks and
shifts) to the specific MMU version file.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ofir Bitton 9993f27de1 habanalabs: expose soft reset sysfs nodes for inference ASIC
As we allow soft-reset to be performed only on inference devices,
having the sysfs nodes may cause a confusion. Hence, we remove those
nodes on training ASICs.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ofir Bitton b5c92b8882 habanalabs: sysfs support for two infineon versions
Currently sysfs support dumping a single infineon version, in
future asics we will have two infineon versions.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Dani Liberman 707c125286 habanalabs: keep control device alive during hard reset
Need to allow user retrieve data during reset and afterwards without
the need to reopen the device.
Did it by seperating the user peocesses list into two lists:
1. fpriv_list which contains list of user processes that opened
   the device (currently only one).
2. fpriv_ctrl_list which contains list of user processes that opened
   the control device. This processes in this list shall not be
   killed during reset, only when the device is suddenly removed from
   PCI chain.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Oded Gabbay bb099a8051 habanalabs: fix hwmon handling for legacy f/w
In legacy f/w that use old hwmon.h file, the values of the hwmon
enums are different than the values that are in newer kernels (5.6
and above).

Therefore, to support working with those f/w, we need to do some
fixup before registering with the hwmon subsystem and also when
calling the functions that communicate with the f/w to retrieve
sensors information.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ofir Bitton 9acdc21b0b habanalabs: add current PI value to cpu packets
In order to increase cpucp messaging reliability we will add
the current PI value to the descriptor sent to F/W.
F/W will wait for the PI value as an indication of a valid packet.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Oded Gabbay 7363805b8a habanalabs: remove in_debug check in device open
The driver supports only a single user anyway, so there is no point
in checking whether we are in_debug state when a user tries to open
the device, because if we are in_debug, it means a user is already
using the device.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Ofir Bitton 7c623ef732 habanalabs: return correct clock throttling period
Current clock throttling period returned from driver was wrong due
to wrong time comparison.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Ohad Sharabi b02220536c habanalabs: wait again for multi-CS if no CS completed
The original multi-CS design assumption that stream masters are used
exclusively (i.e. multi-CS with set of stream master QIDs will not get
completed by CS not from the multi-CS set) is inaccurate.

Thus multi-CS behavior is now modified not to treat such case as an
error.

Instead, if we have multi-CS completion but we detect that no CS from
the list is actually completed we will do another multi-CS wait (with
modified timeout).

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay 5b90e59d55 habanalabs: remove compute context pointer
It was an error to save the compute context's pointer in the device
structure, as it allowed its use without proper ref-cnt.

Change the variable to a flag that only indicates whether there is
an active compute context. Code that needs the pointer will now
be forced to use proper internal APIs to get the pointer.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay 4337b50b5f habanalabs: add helper to get compute context
There are multiple places where the code needs to get the context's
pointer and increment its ref cnt. This is the proper way instead
of using the compute context pointer in the device structure.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay 6798676f7e habanalabs: fix etr asid configuration
Pass the user's context pointer into the etr configuration function
to extract its ASID.

Using the compute_ctx pointer is an error as it is just an indication
of whether a user has opened the compute device.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay 357ff3dc9a habanalabs: save ctx inside encaps signal
Compute context pointer in hdev shouldn't be used for fetching the
context's pointer.

If an object needs the context's pointer, it should get it while
incrementing its kref, and when the object is released, put it.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay a4dd2ecf36 habanalabs: remove redundant check on ctx_fini
The driver supports only a single context. Therefore, no need to check
if the user context that is closed is the compute context. The user
context, if exists, is always the compute context.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Oded Gabbay fee187fe46 habanalabs: free signal handle on failure
Fix a bug where in case of failure to allocate idr, the handle's
memory wasn't freed as part of the error handling code.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Tomer Tayar b166465452 habanalabs: add missing kernel-doc comments for hl_device fields
Add missing kernel-doc comments for the "last_error" and
"stream_master_qid_arr" fields of the "hl_device" structure".

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:08 +02:00
Tomer Tayar d214636be8 habanalabs: pass reset flags to reset thread
The reset flags used by the reset thread are currently a mix of
hard-coded values and a specific flag which is passed from the context
that initiates the reset.
To make it easier to pass more flags in future from this context to the
reset thread, modify it to pass all the original reset flags to the
thread.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00
Dani Liberman 2487f4a281 habanalabs: enable access to info ioctl during hard reset
Because info ioctl is used to retrieve data, some of its opcodes may be
used during hard reset.
Other ioctls should be blocked while device is not operational.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00
Dani Liberman 1880f7acd7 habanalabs: add SOB information to signal submission uAPI
For debug purpose, add SOB address and SOB initial counter value
before current submission to uAPI output.

Using SOB address and initial counter, user can calculate how much of
the submmision has been completed.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00
Ohad Sharabi 4fac990f60 habanalabs: skip read fw errors if dynamic descriptor invalid
Reporting FW errors involves reading of the error registers.

In case we have a corrupted FW descriptor we cannot do that since the
dynamic scratchpad is potentially corrupted as well and may cause kernel
crush when attempting access to a corrupted register offset.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00
Ofir Bitton 3416d4b59b habanalabs: handle events during soft-reset
Driver should handle events during soft-reset as F/W is not
going through reset and it keeps sending events towards host.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00
Ofir Bitton b13bef2041 habanalabs: change misleading IRQ warning during reset
Currently we dump the physical IRQ line index in host if an event
is received during reset. This ID is confusing as it means nothing
to the user.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00
Tomer Tayar 75a5c44d14 habanalabs: add power information type to POWER_GET packet
In new f/w versions, it is required to explicitly indicate the power
information type when querying the F/W for power info.
When getting the current power level it should be set to power_input.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00
Ofir Bitton 4119433445 habanalabs: add more info ioctls support during reset
Some info ioctls can be served even if the device is disabled or
in reset. Hence, we enable more info ioctls during reset, as these
ioctls do not require any H/W nor F/W communication.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00
Dani Liberman 3beaf903a3 habanalabs: fix race condition in multi CS completion
Race example scenario:
1. User have 2 threads that waits on multi CS:
   - thread_0 waits on QID 0 and uses multi CS context 0.
   - thread_1 waits on QID 1 and uses multi CS context 1.
2. thread_1 got completion and release multi CS context 1.
3. CS related to multi CS of thread_0 starts executing
   complete_multi_cs function, the first iteration of the loop
   completes the multi CS of thread_0, hence multi CS context 0
   is released.
4. thread_1 waits on QID 1 and uses multi CS context 0.
5. thread_0 waits on QID 0 and uses multi CS context 1.
6. The second iterattion of the loop (from step 3) starts, which
   means, start checking multi CS context 1:
   - multi CS contetxt is being used by thread_0 waiting on QID 0.
   - The fence of the CS (still CS from step 3) has QID map the same
     as the multi CS context 1.
   - multi CS context 1 (thread_0) gets completion on CS that triggered
     already thread_0 (with multi CS context 0) and is no longer
     being waited on.

Fixed by exiting the loop in complete_multi_cs after getting completion

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00
Ofir Bitton cad9eb4a8d habanalabs: move device boot warnings to the correct location
As device boot warnings clears the indication from the error mask,
they must be located together before the unknown error validation.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:07 +02:00
Oded Gabbay 9eade72e72 habanalabs/gaudi: return EPERM on non hard-reset
GAUDI supports only hard-reset. Therefore, this function should
return an error of operation not permitted.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:06 +02:00
Oded Gabbay 6c1bad35e6 habanalabs: rename late init after reset function
The ASIC-specific soft_reset_late_init() is now called after either
soft-reset or reset-upon-device-release. Therefore, it needs a more
appropriate name.

No need to split it to two functions, as an ASIC either supports
soft-reset or reset-upon-device-release.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:06 +02:00
Oded Gabbay 60e0431f41 habanalabs: fix soft reset accounting
Reset upon device release is not a soft-reset from user/system point
of view. As such, we shouldn't count that reset in the statistics we
gather and expose to the monitoring applications.

We also shouldn't print soft-reset when doing the reset upon device
release.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:06 +02:00
Rajaravi Krishna Katta d8eb50f31c habanalabs: Move frequency change thread to goya_late_init
Changing the frequency automatically is only done in Goya. In future
ASICs this is done inside the firmware. Therefore, move the common code
into the Goya specific files.

Main changes as part of the commit are:
    1. The thread for setting frequency is moved from device_late_init
       to goya_late_init
    2. hl_device_set_frequency is removed from hl_device_open as it is
       not relevant for other ASICs and for Goya it is taken care by
       the thread
    3. hl_device_set_frequency is renamed as goya_set_frequency

Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:06 +02:00
Oded Gabbay ab440d3e39 habanalabs: abort reset on invalid request
Hard-reset is mutually exclusive with reset-on-device-release.
Therefore, if such a request arrives to the reset function, abort
the reset and return an error to the callee.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:06 +02:00
Ofir Bitton a1b838adb0 habanalabs: fix possible deadlock in cache invl failure
Currently there is a deadlock in driver in scenarios where MMU
cache invalidation fails. The issue is basically device reset
being performed without releasing the MMU mutex.
The solution is to skip device reset as it is not necessary.
In addition we introduce a slight code refactor that prints the
invalidation error from a single location.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:06 +02:00
Ohad Sharabi 6f61e47a68 habanalabs: skip PLL freq fetch
Getting the used PLL index with which to send the CPUPU packet relies on
the CPUCP info packet.

In case CPU queues are not enabled getting the PLL index will issue an
error and in some ASICs will also fail the driver load.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:06 +02:00