When the f/w runs in secured mode, it can reset the ASIC when certain
events occur. In unsecured mode, the driver asks the f/w to reset the
ASIC for those events.
We need to perform the entire reset procedure but without accessing the
ASIC. i.e. without halting the engines and without sending messages
to the f/w.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
During the integration, the multi-CS requirements were refined:
- The multi CS call shall wait on "per-ASIC" predefined stream masters
instead of set of streams.
- Stream masters are set of QIDs used by the upper SW layers (synapse)
for completion (must be an external/HW queue).
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Currently there is no validity check for event ID received from F/W,
Thus exposing driver to memory overrun.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In order to better support variants of the same ASIC
the set_pci_regions function is now an ASIC function which
allows each ASIC to implement it internally, thus keeping
all definitions static to the file.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Add the server type property to the hl_info_hw_ip_info structure
that is exposed to the user via the INFO IOCTL.
This is needed by the userspace s/w stack to know the connections map
of the internal links that connect the ASIC among themselves inside the
server.
The F/W will tell us, as part of the NIC information, the server type
that the GAUDI is located in.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This commit is the second part of the encapsulated signals feature.
It contains the driver support for submission of cs with encapsulated
signals and the wait for them.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
The signaling from within encapsulated OP capability is merged into the
existing stream architecture, such that one can trigger multiple
signaling from an encapsulated op, according to the time the event
was done in the graph execution and avoid the need to wait for the
whole encapsulated OP execution to be complete before the stream can
signal.
This commit implements only the reserve/unreserve part.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Currently the SOB reset was in fence release function which happens
only at the CS wraparound during the CS allocation time.
In order to support the new encapsulated signals reservation feature,
we need to move the SOB reset to an earlier phase because this SOB
could reach it's max value very fast using the signal reservation.
Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When user sends multiple CSs, waiting for each CS is not efficient
as it involves many user-kernel context switches.
In order to address this issue we add support to "wait on multiple CSs"
using a new uAPI which can wait on maximum of 32 CSs. The new uAPI is
defined using a new flag - WAIT_FOR_MULTI_CS - in the wait_for_cs IOCTL.
The input parameters for this uAPI will be:
@seq: user pointer to an array of up to 32 CS's sequence numbers.
@seq_array_len: length of sequence array.
@timeout_us: timeout for waiting for any CS.
The output paramateres for this API will be:
@status: multi CS ioctl completion status (dedicated status was added as
well).
@flags: bitmap of output flags of the CS.
@cs_completion_map: bitmap for multi CS, if CS sequence that was placed
in index N in input seq array has completed- the N-th
bit in cs_completion_map will be 1, otherwise it will
be 0.
@timestamp_nsec: timestamp of the first completed CS
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Each ASIC can have a different offset to add to a host dma address,
to enable the ASIC to access that host memory.
The usage for this can be common code so add this to the asic
property structure.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
This function will be used for more mmap operations than just
mmaping CBs.
Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
With the infrastructure in place, monitors and fences dump shall be
implemented.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
To improve the user's ability to debug the case where a workload that
is part of executing training/inference of a topology is getting stuck,
we need to add a 'core dump' each time a CS times-out. The 'core dump'
shall contain all relevant Sync Manager information and corresponding
fence values.
The most recent dumps shall be accessible via debugfs, under
'state_dump' node. Reading from the node will provide the oldest dump
available. Writing an integer value X will discard X dumps, starting
with the oldest one, i.e. subsequent read will now return newer
dumps.
Signed-off-by: Yuri Nudelman <ynudelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
We introduce a new type of reset which is reset upon device release.
This reset is very similar to soft reset except the fact it is
performed only upon device release and not upon user sysfs request
nor TDR.
The purpose of this reset is to make sure the device is returned to
IDLE state after the current user has finished working with the device.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
fix the following smatch warnings:
goya_pin_memory_before_cs()
warn: '&userptr->job_node' not removed from list
gaudi_pin_memory_before_cs()
warn: '&userptr->job_node' not removed from list
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In current code, for dynamic f/w loading flow, DRAM scrambling is
enabled post Linux fit image is loaded to the card. This can cause the
device CPU to go into reset state.
The correct sequence should be:
1. Load boot fit image
2. Enable scrambling
3. Load Linux fit image
This commit aligns the DRAM scrambling enabling with the static f/w load
flow.
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Using negative logic (i.e. fw_security_disabled) is confusing.
Modify the flag to use positive logic (fw_security_enabled).
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Due to new security restrictions, GIC controller can no
longer be accessed from user/kernel.
To monitor that, a new status bit will be read from preboot
caps, indicating whether direct access to GIC is blocked.
In case it is blocked, driver will use scratchpad registers
instead of using GIC interface on two main scenarios:
The first of which LKD triggers interrupts to F/W through GIC,
and the second of when LKD configures all engines/QMANs
to write to GIC when they want to report an error.
From F/W perspective, it will poll on all SPs, and once IRQ
number is retrieved, SP register is cleared, and it will perform the
write to the GIC to trigger the IRQ handler.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Maintain both STS1 and ERR1 registers used for status communication
with F/W.
Those are not maintained as we currently have less than 31
statuses/error defined and so LKD did not refer to those register.
The reason to read them now is to try to support future f/w versions
with current driver.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When setting "DMA mask from FW" we are reading PSOC_GLOBAL_CONF register
which is allowed only once FW has done it's iATU configuration.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Implementing dynamic linux image load to the device.
This patch also implements the FW communication steps during the
boot-fit.
This patch also enables the dynamic protocol based on the compatibility
flag.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Implementing dynamic boot fit image load to the device.
Note that some necessary adjustment were added to the static loader as
well so that both loaders can co-exist.
as this is not the final FW load stage the dynamic FW load is still
forced to be non functional.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
LKD has interfaces in which it receives device address.
For instance the debugfs_read/write variants receives device address for
CFG/SRAM/DRAM for read/write and need to translate to the mapped PCI BAR
address.
In addition, the dynamic FW load protocol dictates that the address to
which the LKD will copy the image for the next FW component will be
received as a device address and can be placed either in SRAM or DRAM.
We need to distinguish those regions as the access methods to those
regions are different (in DRAM we possibly need to set the BAR base).
Looking forward this code will be used to remove duplicated code in the
debugfs_read/write that search the memory region for the input device
address.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
First stage of the dynamic FW load protocol is to reset the protocol to
avoid residues from former load cycles.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Instead of using multiple ASIC specific copies of functions to read the
FW version use single common one that gets ASIC specific arguments.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Use mmu cache range invalidation instead of entire cache invalidation
because it yields better performance.
In GOYA and GAUDI, always use entire cache invalidation because these
ASICs don't support range invalidation.
Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Replace multiple arguments to init device CPU function by passing
firmware loader managing structure that is initialized per ASIC with
the loader parameters.
In addition, the FW loader management structure is now part of the
habanalabs device, this way the loader parameters will be able to be
communicated across various boot stages.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Currently the user cannot interpret the PLL information based on index
as its exposed as an integer.
This commit exposes ASIC specific PLL indexes and maps it to a generic
FW compatible index.
Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As part of the securing GAUDI, the F/W will configure the PCI iATU
regions. If the driver identifies a secured PCI ID, it will know to
skip iATU configuration in a very early stage.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As F/ security indication must be available before driver approaches
PCI bus, F/W security should be derived from PCI id rather than be
fetched during boot handshake with F/W.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In order to minimize hard coded values between F/W and the driver, we
send msi-x indexes dynamically to the F/W.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When trying to debug program, the user often needs to
dump large parts of the device's DRAM, which can reach to tens of GBs.
Because reading from the device's internal memory through the PCI BAR
is extremely slow, the debug can take hours.
Instead, we can provide the user to copy data through one of the DMA
engines. This will make the operation much faster.
Currently, only GAUDI is supported.
In GAUDI, we need to find a PCI DMA engine that is IDLE and set the
DMA as secured to be able to bypass our MMU as we currently don't
map the temporary buffer to the MMU.
Example bash one-line to dump entire HBM to file (~2 minutes):
for (( i=0x0; i < 0x800000000; i+=0x8000000 )); do \
printf '0x%x\n' $i | sudo tee /sys/kernel/debug/habanalabs/hl0/addr ; \
echo 0x8000000 | sudo tee /sys/kernel/debug/habanalabs/hl0/dma_size ; \
sudo cat /sys/kernel/debug/habanalabs/hl0/data_dma >> hbm.txt ; done
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
The new approach is based on the notion that the relative
current power consumption is in relation of proportionality
to device's true utilization.
Utilization info ranges between [0,100]%
Currently, dc_power values are hard-coded.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In order to use minimum of hard coded values common to LKD and F/W
a dynamic method to work with PLLs is introduced in this patch.
Formerly asic specific PLL numbering is now common for all asics.
To be backward compatible a bit in dev status is defined, if the bit is
not set LKD will keep working with old PLL numbering.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
In order to have a better debuggability we allow debugfs access
to user mmu mapped host memory. Non-user host memory access will be
rejected.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
if reset is due to heartbeat, device CPU is no responsive in which
case no point sending PCI disable message to it.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As there are incorrect assumptions in which some of the
initialization and data path flows cannot sleep, most allocations
are being done using GFP_ATOMIC.
We modify the code to use GFP_ATOMIC only when realy needed, as
sleepable flow should use GFP_KERNEL.
In addition add a fallback to allocate memory using GFP_KERNEL,
once ATOMIC allocation fails.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
For simplicity, use a single bringup flag indicating which FW
binaries should loaded to device.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
As the F/wW is the first to detect out of sync event, a new event is
added to notify the driver on such event. In which case the driver
performs hard reset.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
When user gives us a block address to get its ID to mmap it, he also
needs to get from us the block size to pass to the driver in the mmap
function.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
when reading CPU_BOOT_DEV_STS0 reg after FW reports SRAM AVAILABLE the
value in the register might not yet be updated by FW.
to overcome this issue another "up-to-date" read of this register is
done at the end of CPU queues init.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Only after the initialization of the device is done, the driver is
ready to receive events from the F/W. The driver can't handle events
before that because of races so it will ignore events. In case of
a fatal event, the driver won't know about it and the device will be
operational although it shouldn't be.
Same logic should be applied after hard-reset.
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
User must be aware of the available CQs when it needs to use them.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Current messaging communictaion protocol with cpucp can get out
of sync due to coherency issues. In order to improve the protocol
reliability, we modify the protocol to expect a different
acknowledgment for every packet sent to cpucp.
Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
ETR should always be non-secured as it is used by the users to record
profiling/trace data.
This patch fixes the configuration to match those requirements.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Currently this API uses single 64 bits mask for engines idle indication.
Recently, it was observed that more bits are needed for some ASICs.
This patch modifies the use of the idle mask and the idle_extensions
mask.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>