Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next
Pull sparc updates from David Miller: "Of note is the addition of a driver for the Data Analytics Accelerator, and some small cleanups" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: oradax: Fix return value check in dax_attach() sparc: vDSO: remove an extra tab sparc64: drop unneeded compat include sparc64: Oracle DAX driver sparc64: Oracle DAX infrastructure
This commit is contained in:
Коммит
ba49097e1d
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -0,0 +1,429 @@
|
||||||
|
Oracle Data Analytics Accelerator (DAX)
|
||||||
|
---------------------------------------
|
||||||
|
|
||||||
|
DAX is a coprocessor which resides on the SPARC M7 (DAX1) and M8
|
||||||
|
(DAX2) processor chips, and has direct access to the CPU's L3 caches
|
||||||
|
as well as physical memory. It can perform several operations on data
|
||||||
|
streams with various input and output formats. A driver provides a
|
||||||
|
transport mechanism and has limited knowledge of the various opcodes
|
||||||
|
and data formats. A user space library provides high level services
|
||||||
|
and translates these into low level commands which are then passed
|
||||||
|
into the driver and subsequently the Hypervisor and the coprocessor.
|
||||||
|
The library is the recommended way for applications to use the
|
||||||
|
coprocessor, and the driver interface is not intended for general use.
|
||||||
|
This document describes the general flow of the driver, its
|
||||||
|
structures, and its programmatic interface. It also provides example
|
||||||
|
code sufficient to write user or kernel applications that use DAX
|
||||||
|
functionality.
|
||||||
|
|
||||||
|
The user library is open source and available at:
|
||||||
|
https://oss.oracle.com/git/gitweb.cgi?p=libdax.git
|
||||||
|
|
||||||
|
The Hypervisor interface to the coprocessor is described in detail in
|
||||||
|
the accompanying document, dax-hv-api.txt, which is a plain text
|
||||||
|
excerpt of the (Oracle internal) "UltraSPARC Virtual Machine
|
||||||
|
Specification" version 3.0.20+15, dated 2017-09-25.
|
||||||
|
|
||||||
|
|
||||||
|
High Level Overview
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
A coprocessor request is described by a Command Control Block
|
||||||
|
(CCB). The CCB contains an opcode and various parameters. The opcode
|
||||||
|
specifies what operation is to be done, and the parameters specify
|
||||||
|
options, flags, sizes, and addresses. The CCB (or an array of CCBs)
|
||||||
|
is passed to the Hypervisor, which handles queueing and scheduling of
|
||||||
|
requests to the available coprocessor execution units. A status code
|
||||||
|
returned indicates if the request was submitted successfully or if
|
||||||
|
there was an error. One of the addresses given in each CCB is a
|
||||||
|
pointer to a "completion area", which is a 128 byte memory block that
|
||||||
|
is written by the coprocessor to provide execution status. No
|
||||||
|
interrupt is generated upon completion; the completion area must be
|
||||||
|
polled by software to find out when a transaction has finished, but
|
||||||
|
the M7 and later processors provide a mechanism to pause the virtual
|
||||||
|
processor until the completion status has been updated by the
|
||||||
|
coprocessor. This is done using the monitored load and mwait
|
||||||
|
instructions, which are described in more detail later. The DAX
|
||||||
|
coprocessor was designed so that after a request is submitted, the
|
||||||
|
kernel is no longer involved in the processing of it. The polling is
|
||||||
|
done at the user level, which results in almost zero latency between
|
||||||
|
completion of a request and resumption of execution of the requesting
|
||||||
|
thread.
|
||||||
|
|
||||||
|
|
||||||
|
Addressing Memory
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
The kernel does not have access to physical memory in the Sun4v
|
||||||
|
architecture, as there is an additional level of memory virtualization
|
||||||
|
present. This intermediate level is called "real" memory, and the
|
||||||
|
kernel treats this as if it were physical. The Hypervisor handles the
|
||||||
|
translations between real memory and physical so that each logical
|
||||||
|
domain (LDOM) can have a partition of physical memory that is isolated
|
||||||
|
from that of other LDOMs. When the kernel sets up a virtual mapping,
|
||||||
|
it specifies a virtual address and the real address to which it should
|
||||||
|
be mapped.
|
||||||
|
|
||||||
|
The DAX coprocessor can only operate on physical memory, so before a
|
||||||
|
request can be fed to the coprocessor, all the addresses in a CCB must
|
||||||
|
be converted into physical addresses. The kernel cannot do this since
|
||||||
|
it has no visibility into physical addresses. So a CCB may contain
|
||||||
|
either the virtual or real addresses of the buffers or a combination
|
||||||
|
of them. An "address type" field is available for each address that
|
||||||
|
may be given in the CCB. In all cases, the Hypervisor will translate
|
||||||
|
all the addresses to physical before dispatching to hardware. Address
|
||||||
|
translations are performed using the context of the process initiating
|
||||||
|
the request.
|
||||||
|
|
||||||
|
|
||||||
|
The Driver API
|
||||||
|
--------------
|
||||||
|
|
||||||
|
An application makes requests to the driver via the write() system
|
||||||
|
call, and gets results (if any) via read(). The completion areas are
|
||||||
|
made accessible via mmap(), and are read-only for the application.
|
||||||
|
|
||||||
|
The request may either be an immediate command or an array of CCBs to
|
||||||
|
be submitted to the hardware.
|
||||||
|
|
||||||
|
Each open instance of the device is exclusive to the thread that
|
||||||
|
opened it, and must be used by that thread for all subsequent
|
||||||
|
operations. The driver open function creates a new context for the
|
||||||
|
thread and initializes it for use. This context contains pointers and
|
||||||
|
values used internally by the driver to keep track of submitted
|
||||||
|
requests. The completion area buffer is also allocated, and this is
|
||||||
|
large enough to contain the completion areas for many concurrent
|
||||||
|
requests. When the device is closed, any outstanding transactions are
|
||||||
|
flushed and the context is cleaned up.
|
||||||
|
|
||||||
|
On a DAX1 system (M7), the device will be called "oradax1", while on a
|
||||||
|
DAX2 system (M8) it will be "oradax2". If an application requires one
|
||||||
|
or the other, it should simply attempt to open the appropriate
|
||||||
|
device. Only one of the devices will exist on any given system, so the
|
||||||
|
name can be used to determine what the platform supports.
|
||||||
|
|
||||||
|
The immediate commands are CCB_DEQUEUE, CCB_KILL, and CCB_INFO. For
|
||||||
|
all of these, success is indicated by a return value from write()
|
||||||
|
equal to the number of bytes given in the call. Otherwise -1 is
|
||||||
|
returned and errno is set.
|
||||||
|
|
||||||
|
CCB_DEQUEUE
|
||||||
|
|
||||||
|
Tells the driver to clean up resources associated with past
|
||||||
|
requests. Since no interrupt is generated upon the completion of a
|
||||||
|
request, the driver must be told when it may reclaim resources. No
|
||||||
|
further status information is returned, so the user should not
|
||||||
|
subsequently call read().
|
||||||
|
|
||||||
|
CCB_KILL
|
||||||
|
|
||||||
|
Kills a CCB during execution. The CCB is guaranteed to not continue
|
||||||
|
executing once this call returns successfully. On success, read() must
|
||||||
|
be called to retrieve the result of the action.
|
||||||
|
|
||||||
|
CCB_INFO
|
||||||
|
|
||||||
|
Retrieves information about a currently executing CCB. Note that some
|
||||||
|
Hypervisors might return 'notfound' when the CCB is in 'inprogress'
|
||||||
|
state. To ensure a CCB in the 'notfound' state will never be executed,
|
||||||
|
CCB_KILL must be invoked on that CCB. Upon success, read() must be
|
||||||
|
called to retrieve the details of the action.
|
||||||
|
|
||||||
|
Submission of an array of CCBs for execution
|
||||||
|
|
||||||
|
A write() whose length is a multiple of the CCB size is treated as a
|
||||||
|
submit operation. The file offset is treated as the index of the
|
||||||
|
completion area to use, and may be set via lseek() or using the
|
||||||
|
pwrite() system call. If -1 is returned then errno is set to indicate
|
||||||
|
the error. Otherwise, the return value is the length of the array that
|
||||||
|
was actually accepted by the coprocessor. If the accepted length is
|
||||||
|
equal to the requested length, then the submission was completely
|
||||||
|
successful and there is no further status needed; hence, the user
|
||||||
|
should not subsequently call read(). Partial acceptance of the CCB
|
||||||
|
array is indicated by a return value less than the requested length,
|
||||||
|
and read() must be called to retrieve further status information. The
|
||||||
|
status will reflect the error caused by the first CCB that was not
|
||||||
|
accepted, and status_data will provide additional data in some cases.
|
||||||
|
|
||||||
|
MMAP
|
||||||
|
|
||||||
|
The mmap() function provides access to the completion area allocated
|
||||||
|
in the driver. Note that the completion area is not writeable by the
|
||||||
|
user process, and the mmap call must not specify PROT_WRITE.
|
||||||
|
|
||||||
|
|
||||||
|
Completion of a Request
|
||||||
|
-----------------------
|
||||||
|
|
||||||
|
The first byte in each completion area is the command status which is
|
||||||
|
updated by the coprocessor hardware. Software may take advantage of
|
||||||
|
new M7/M8 processor capabilities to efficiently poll this status byte.
|
||||||
|
First, a "monitored load" is achieved via a Load from Alternate Space
|
||||||
|
(ldxa, lduba, etc.) with ASI 0x84 (ASI_MONITOR_PRIMARY). Second, a
|
||||||
|
"monitored wait" is achieved via the mwait instruction (a write to
|
||||||
|
%asr28). This instruction is like pause in that it suspends execution
|
||||||
|
of the virtual processor for the given number of nanoseconds, but in
|
||||||
|
addition will terminate early when one of several events occur. If the
|
||||||
|
block of data containing the monitored location is modified, then the
|
||||||
|
mwait terminates. This causes software to resume execution immediately
|
||||||
|
(without a context switch or kernel to user transition) after a
|
||||||
|
transaction completes. Thus the latency between transaction completion
|
||||||
|
and resumption of execution may be just a few nanoseconds.
|
||||||
|
|
||||||
|
|
||||||
|
Application Life Cycle of a DAX Submission
|
||||||
|
------------------------------------------
|
||||||
|
|
||||||
|
- open dax device
|
||||||
|
- call mmap() to get the completion area address
|
||||||
|
- allocate a CCB and fill in the opcode, flags, parameters, addresses, etc.
|
||||||
|
- submit CCB via write() or pwrite()
|
||||||
|
- go into a loop executing monitored load + monitored wait and
|
||||||
|
terminate when the command status indicates the request is complete
|
||||||
|
(CCB_KILL or CCB_INFO may be used any time as necessary)
|
||||||
|
- perform a CCB_DEQUEUE
|
||||||
|
- call munmap() for completion area
|
||||||
|
- close the dax device
|
||||||
|
|
||||||
|
|
||||||
|
Memory Constraints
|
||||||
|
------------------
|
||||||
|
|
||||||
|
The DAX hardware operates only on physical addresses. Therefore, it is
|
||||||
|
not aware of virtual memory mappings and the discontiguities that may
|
||||||
|
exist in the physical memory that a virtual buffer maps to. There is
|
||||||
|
no I/O TLB or any scatter/gather mechanism. All buffers, whether input
|
||||||
|
or output, must reside in a physically contiguous region of memory.
|
||||||
|
|
||||||
|
The Hypervisor translates all addresses within a CCB to physical
|
||||||
|
before handing off the CCB to DAX. The Hypervisor determines the
|
||||||
|
virtual page size for each virtual address given, and uses this to
|
||||||
|
program a size limit for each address. This prevents the coprocessor
|
||||||
|
from reading or writing beyond the bound of the virtual page, even
|
||||||
|
though it is accessing physical memory directly. A simpler way of
|
||||||
|
saying this is that a DAX operation will never "cross" a virtual page
|
||||||
|
boundary. If an 8k virtual page is used, then the data is strictly
|
||||||
|
limited to 8k. If a user's buffer is larger than 8k, then a larger
|
||||||
|
page size must be used, or the transaction size will be truncated to
|
||||||
|
8k.
|
||||||
|
|
||||||
|
Huge pages. A user may allocate huge pages using standard interfaces.
|
||||||
|
Memory buffers residing on huge pages may be used to achieve much
|
||||||
|
larger DAX transaction sizes, but the rules must still be followed,
|
||||||
|
and no transaction will cross a page boundary, even a huge page. A
|
||||||
|
major caveat is that Linux on Sparc presents 8Mb as one of the huge
|
||||||
|
page sizes. Sparc does not actually provide a 8Mb hardware page size,
|
||||||
|
and this size is synthesized by pasting together two 4Mb pages. The
|
||||||
|
reasons for this are historical, and it creates an issue because only
|
||||||
|
half of this 8Mb page can actually be used for any given buffer in a
|
||||||
|
DAX request, and it must be either the first half or the second half;
|
||||||
|
it cannot be a 4Mb chunk in the middle, since that crosses a
|
||||||
|
(hardware) page boundary. Note that this entire issue may be hidden by
|
||||||
|
higher level libraries.
|
||||||
|
|
||||||
|
|
||||||
|
CCB Structure
|
||||||
|
-------------
|
||||||
|
A CCB is an array of 8 64-bit words. Several of these words provide
|
||||||
|
command opcodes, parameters, flags, etc., and the rest are addresses
|
||||||
|
for the completion area, output buffer, and various inputs:
|
||||||
|
|
||||||
|
struct ccb {
|
||||||
|
u64 control;
|
||||||
|
u64 completion;
|
||||||
|
u64 input0;
|
||||||
|
u64 access;
|
||||||
|
u64 input1;
|
||||||
|
u64 op_data;
|
||||||
|
u64 output;
|
||||||
|
u64 table;
|
||||||
|
};
|
||||||
|
|
||||||
|
See libdax/common/sys/dax1/dax1_ccb.h for a detailed description of
|
||||||
|
each of these fields, and see dax-hv-api.txt for a complete description
|
||||||
|
of the Hypervisor API available to the guest OS (ie, Linux kernel).
|
||||||
|
|
||||||
|
The first word (control) is examined by the driver for the following:
|
||||||
|
- CCB version, which must be consistent with hardware version
|
||||||
|
- Opcode, which must be one of the documented allowable commands
|
||||||
|
- Address types, which must be set to "virtual" for all the addresses
|
||||||
|
given by the user, thereby ensuring that the application can
|
||||||
|
only access memory that it owns
|
||||||
|
|
||||||
|
|
||||||
|
Example Code
|
||||||
|
------------
|
||||||
|
|
||||||
|
The DAX is accessible to both user and kernel code. The kernel code
|
||||||
|
can make hypercalls directly while the user code must use wrappers
|
||||||
|
provided by the driver. The setup of the CCB is nearly identical for
|
||||||
|
both; the only difference is in preparation of the completion area. An
|
||||||
|
example of user code is given now, with kernel code afterwards.
|
||||||
|
|
||||||
|
In order to program using the driver API, the file
|
||||||
|
arch/sparc/include/uapi/asm/oradax.h must be included.
|
||||||
|
|
||||||
|
First, the proper device must be opened. For M7 it will be
|
||||||
|
/dev/oradax1 and for M8 it will be /dev/oradax2. The simplest
|
||||||
|
procedure is to attempt to open both, as only one will succeed:
|
||||||
|
|
||||||
|
fd = open("/dev/oradax1", O_RDWR);
|
||||||
|
if (fd < 0)
|
||||||
|
fd = open("/dev/oradax2", O_RDWR);
|
||||||
|
if (fd < 0)
|
||||||
|
/* No DAX found */
|
||||||
|
|
||||||
|
Next, the completion area must be mapped:
|
||||||
|
|
||||||
|
completion_area = mmap(NULL, DAX_MMAP_LEN, PROT_READ, MAP_SHARED, fd, 0);
|
||||||
|
|
||||||
|
All input and output buffers must be fully contained in one hardware
|
||||||
|
page, since as explained above, the DAX is strictly constrained by
|
||||||
|
virtual page boundaries. In addition, the output buffer must be
|
||||||
|
64-byte aligned and its size must be a multiple of 64 bytes because
|
||||||
|
the coprocessor writes in units of cache lines.
|
||||||
|
|
||||||
|
This example demonstrates the DAX Scan command, which takes as input a
|
||||||
|
vector and a match value, and produces a bitmap as the output. For
|
||||||
|
each input element that matches the value, the corresponding bit is
|
||||||
|
set in the output.
|
||||||
|
|
||||||
|
In this example, the input vector consists of a series of single bits,
|
||||||
|
and the match value is 0. So each 0 bit in the input will produce a 1
|
||||||
|
in the output, and vice versa, which produces an output bitmap which
|
||||||
|
is the input bitmap inverted.
|
||||||
|
|
||||||
|
For details of all the parameters and bits used in this CCB, please
|
||||||
|
refer to section 36.2.1.3 of the DAX Hypervisor API document, which
|
||||||
|
describes the Scan command in detail.
|
||||||
|
|
||||||
|
ccb->control = /* Table 36.1, CCB Header Format */
|
||||||
|
(2L << 48) /* command = Scan Value */
|
||||||
|
| (3L << 40) /* output address type = primary virtual */
|
||||||
|
| (3L << 34) /* primary input address type = primary virtual */
|
||||||
|
/* Section 36.2.1, Query CCB Command Formats */
|
||||||
|
| (1 << 28) /* 36.2.1.1.1 primary input format = fixed width bit packed */
|
||||||
|
| (0 << 23) /* 36.2.1.1.2 primary input element size = 0 (1 bit) */
|
||||||
|
| (8 << 10) /* 36.2.1.1.6 output format = bit vector */
|
||||||
|
| (0 << 5) /* 36.2.1.3 First scan criteria size = 0 (1 byte) */
|
||||||
|
| (31 << 0); /* 36.2.1.3 Disable second scan criteria */
|
||||||
|
|
||||||
|
ccb->completion = 0; /* Completion area address, to be filled in by driver */
|
||||||
|
|
||||||
|
ccb->input0 = (unsigned long) input; /* primary input address */
|
||||||
|
|
||||||
|
ccb->access = /* Section 36.2.1.2, Data Access Control */
|
||||||
|
(2 << 24) /* Primary input length format = bits */
|
||||||
|
| (nbits - 1); /* number of bits in primary input stream, minus 1 */
|
||||||
|
|
||||||
|
ccb->input1 = 0; /* secondary input address, unused */
|
||||||
|
|
||||||
|
ccb->op_data = 0; /* scan criteria (value to be matched) */
|
||||||
|
|
||||||
|
ccb->output = (unsigned long) output; /* output address */
|
||||||
|
|
||||||
|
ccb->table = 0; /* table address, unused */
|
||||||
|
|
||||||
|
The CCB submission is a write() or pwrite() system call to the
|
||||||
|
driver. If the call fails, then a read() must be used to retrieve the
|
||||||
|
status:
|
||||||
|
|
||||||
|
if (pwrite(fd, ccb, 64, 0) != 64) {
|
||||||
|
struct ccb_exec_result status;
|
||||||
|
read(fd, &status, sizeof(status));
|
||||||
|
/* bail out */
|
||||||
|
}
|
||||||
|
|
||||||
|
After a successful submission of the CCB, the completion area may be
|
||||||
|
polled to determine when the DAX is finished. Detailed information on
|
||||||
|
the contents of the completion area can be found in section 36.2.2 of
|
||||||
|
the DAX HV API document.
|
||||||
|
|
||||||
|
while (1) {
|
||||||
|
/* Monitored Load */
|
||||||
|
__asm__ __volatile__("lduba [%1] 0x84, %0\n"
|
||||||
|
: "=r" (status)
|
||||||
|
: "r" (completion_area));
|
||||||
|
|
||||||
|
if (status) /* 0 indicates command in progress */
|
||||||
|
break;
|
||||||
|
|
||||||
|
/* MWAIT */
|
||||||
|
__asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::); /* 1000 ns */
|
||||||
|
}
|
||||||
|
|
||||||
|
A completion area status of 1 indicates successful completion of the
|
||||||
|
CCB and validity of the output bitmap, which may be used immediately.
|
||||||
|
All other non-zero values indicate error conditions which are
|
||||||
|
described in section 36.2.2.
|
||||||
|
|
||||||
|
if (completion_area[0] != 1) { /* section 36.2.2, 1 = command ran and succeeded */
|
||||||
|
/* completion_area[0] contains the completion status */
|
||||||
|
/* completion_area[1] contains an error code, see 36.2.2 */
|
||||||
|
}
|
||||||
|
|
||||||
|
After the completion area has been processed, the driver must be
|
||||||
|
notified that it can release any resources associated with the
|
||||||
|
request. This is done via the dequeue operation:
|
||||||
|
|
||||||
|
struct dax_command cmd;
|
||||||
|
cmd.command = CCB_DEQUEUE;
|
||||||
|
if (write(fd, &cmd, sizeof(cmd)) != sizeof(cmd)) {
|
||||||
|
/* bail out */
|
||||||
|
}
|
||||||
|
|
||||||
|
Finally, normal program cleanup should be done, i.e., unmapping
|
||||||
|
completion area, closing the dax device, freeing memory etc.
|
||||||
|
|
||||||
|
[Kernel example]
|
||||||
|
|
||||||
|
The only difference in using the DAX in kernel code is the treatment
|
||||||
|
of the completion area. Unlike user applications which mmap the
|
||||||
|
completion area allocated by the driver, kernel code must allocate its
|
||||||
|
own memory to use for the completion area, and this address and its
|
||||||
|
type must be given in the CCB:
|
||||||
|
|
||||||
|
ccb->control |= /* Table 36.1, CCB Header Format */
|
||||||
|
(3L << 32); /* completion area address type = primary virtual */
|
||||||
|
|
||||||
|
ccb->completion = (unsigned long) completion_area; /* Completion area address */
|
||||||
|
|
||||||
|
The dax submit hypercall is made directly. The flags used in the
|
||||||
|
ccb_submit call are documented in the DAX HV API in section 36.3.1.
|
||||||
|
|
||||||
|
#include <asm/hypervisor.h>
|
||||||
|
|
||||||
|
hv_rv = sun4v_ccb_submit((unsigned long)ccb, 64,
|
||||||
|
HV_CCB_QUERY_CMD |
|
||||||
|
HV_CCB_ARG0_PRIVILEGED | HV_CCB_ARG0_TYPE_PRIMARY |
|
||||||
|
HV_CCB_VA_PRIVILEGED,
|
||||||
|
0, &bytes_accepted, &status_data);
|
||||||
|
|
||||||
|
if (hv_rv != HV_EOK) {
|
||||||
|
/* hv_rv is an error code, status_data contains */
|
||||||
|
/* potential additional status, see 36.3.1.1 */
|
||||||
|
}
|
||||||
|
|
||||||
|
After the submission, the completion area polling code is identical to
|
||||||
|
that in user land:
|
||||||
|
|
||||||
|
while (1) {
|
||||||
|
/* Monitored Load */
|
||||||
|
__asm__ __volatile__("lduba [%1] 0x84, %0\n"
|
||||||
|
: "=r" (status)
|
||||||
|
: "r" (completion_area));
|
||||||
|
|
||||||
|
if (status) /* 0 indicates command in progress */
|
||||||
|
break;
|
||||||
|
|
||||||
|
/* MWAIT */
|
||||||
|
__asm__ __volatile__("wr %%g0, 1000, %%asr28\n" ::); /* 1000 ns */
|
||||||
|
}
|
||||||
|
|
||||||
|
if (completion_area[0] != 1) { /* section 36.2.2, 1 = command ran and succeeded */
|
||||||
|
/* completion_area[0] contains the completion status */
|
||||||
|
/* completion_area[1] contains an error code, see 36.2.2 */
|
||||||
|
}
|
||||||
|
|
||||||
|
The output bitmap is ready for consumption immediately after the
|
||||||
|
completion status indicates success.
|
|
@ -76,6 +76,10 @@
|
||||||
#define HV_ETOOMANY 15 /* Too many items specified */
|
#define HV_ETOOMANY 15 /* Too many items specified */
|
||||||
#define HV_ECHANNEL 16 /* Invalid LDC channel */
|
#define HV_ECHANNEL 16 /* Invalid LDC channel */
|
||||||
#define HV_EBUSY 17 /* Resource busy */
|
#define HV_EBUSY 17 /* Resource busy */
|
||||||
|
#define HV_EUNAVAILABLE 23 /* Resource or operation not
|
||||||
|
* currently available, but may
|
||||||
|
* become available in the future
|
||||||
|
*/
|
||||||
|
|
||||||
/* mach_exit()
|
/* mach_exit()
|
||||||
* TRAP: HV_FAST_TRAP
|
* TRAP: HV_FAST_TRAP
|
||||||
|
@ -941,6 +945,139 @@ unsigned long sun4v_mmu_map_perm_addr(unsigned long vaddr,
|
||||||
*/
|
*/
|
||||||
#define HV_FAST_MEM_SYNC 0x32
|
#define HV_FAST_MEM_SYNC 0x32
|
||||||
|
|
||||||
|
/* Coprocessor services
|
||||||
|
*
|
||||||
|
* M7 and later processors provide an on-chip coprocessor which
|
||||||
|
* accelerates database operations, and is known internally as
|
||||||
|
* DAX.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/* ccb_submit()
|
||||||
|
* TRAP: HV_FAST_TRAP
|
||||||
|
* FUNCTION: HV_CCB_SUBMIT
|
||||||
|
* ARG0: address of CCB array
|
||||||
|
* ARG1: size (in bytes) of CCB array being submitted
|
||||||
|
* ARG2: flags
|
||||||
|
* ARG3: reserved
|
||||||
|
* RET0: status (success or error code)
|
||||||
|
* RET1: size (in bytes) of CCB array that was accepted (might be less
|
||||||
|
* than arg1)
|
||||||
|
* RET2: status data
|
||||||
|
* if status == ENOMAP or ENOACCESS, identifies the VA in question
|
||||||
|
* if status == EUNAVAILBLE, unavailable code
|
||||||
|
* RET3: reserved
|
||||||
|
*
|
||||||
|
* ERRORS: EOK successful submission (check size)
|
||||||
|
* EWOULDBLOCK could not finish submissions, try again
|
||||||
|
* EBADALIGN array not 64B aligned or size not 64B multiple
|
||||||
|
* ENORADDR invalid RA for array or in CCB
|
||||||
|
* ENOMAP could not translate address (see status data)
|
||||||
|
* EINVAL invalid ccb or arguments
|
||||||
|
* ETOOMANY too many ccbs with all-or-nothing flag
|
||||||
|
* ENOACCESS guest has no access to submit ccbs or address
|
||||||
|
* in CCB does not have correct permissions (check
|
||||||
|
* status data)
|
||||||
|
* EUNAVAILABLE ccb operation could not be performed at this
|
||||||
|
* time (check status data)
|
||||||
|
* Status data codes:
|
||||||
|
* 0 - exact CCB could not be executed
|
||||||
|
* 1 - CCB opcode cannot be executed
|
||||||
|
* 2 - CCB version cannot be executed
|
||||||
|
* 3 - vcpu cannot execute CCBs
|
||||||
|
* 4 - no CCBs can be executed
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define HV_CCB_SUBMIT 0x34
|
||||||
|
#ifndef __ASSEMBLY__
|
||||||
|
unsigned long sun4v_ccb_submit(unsigned long ccb_buf,
|
||||||
|
unsigned long len,
|
||||||
|
unsigned long flags,
|
||||||
|
unsigned long reserved,
|
||||||
|
void *submitted_len,
|
||||||
|
void *status_data);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* flags (ARG2) */
|
||||||
|
#define HV_CCB_QUERY_CMD BIT(1)
|
||||||
|
#define HV_CCB_ARG0_TYPE_REAL 0UL
|
||||||
|
#define HV_CCB_ARG0_TYPE_PRIMARY BIT(4)
|
||||||
|
#define HV_CCB_ARG0_TYPE_SECONDARY BIT(5)
|
||||||
|
#define HV_CCB_ARG0_TYPE_NUCLEUS GENMASK(5, 4)
|
||||||
|
#define HV_CCB_ARG0_PRIVILEGED BIT(6)
|
||||||
|
#define HV_CCB_ALL_OR_NOTHING BIT(7)
|
||||||
|
#define HV_CCB_QUEUE_INFO BIT(8)
|
||||||
|
#define HV_CCB_VA_REJECT 0UL
|
||||||
|
#define HV_CCB_VA_SECONDARY BIT(13)
|
||||||
|
#define HV_CCB_VA_NUCLEUS GENMASK(13, 12)
|
||||||
|
#define HV_CCB_VA_PRIVILEGED BIT(14)
|
||||||
|
#define HV_CCB_VA_READ_ADI_DISABLE BIT(15) /* DAX2 only */
|
||||||
|
|
||||||
|
/* ccb_info()
|
||||||
|
* TRAP: HV_FAST_TRAP
|
||||||
|
* FUNCTION: HV_CCB_INFO
|
||||||
|
* ARG0: real address of CCB completion area
|
||||||
|
* RET0: status (success or error code)
|
||||||
|
* RET1: info array
|
||||||
|
* - RET1[0]: CCB state
|
||||||
|
* - RET1[1]: dax unit
|
||||||
|
* - RET1[2]: queue number
|
||||||
|
* - RET1[3]: queue position
|
||||||
|
*
|
||||||
|
* ERRORS: EOK operation successful
|
||||||
|
* EBADALIGN address not 64B aligned
|
||||||
|
* ENORADDR RA in address not valid
|
||||||
|
* EINVAL CA not valid
|
||||||
|
* EWOULDBLOCK info not available for this CCB currently, try
|
||||||
|
* again
|
||||||
|
* ENOACCESS guest cannot use dax
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define HV_CCB_INFO 0x35
|
||||||
|
#ifndef __ASSEMBLY__
|
||||||
|
unsigned long sun4v_ccb_info(unsigned long ca,
|
||||||
|
void *info_arr);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* info array byte offsets (RET1) */
|
||||||
|
#define CCB_INFO_OFFSET_CCB_STATE 0
|
||||||
|
#define CCB_INFO_OFFSET_DAX_UNIT 2
|
||||||
|
#define CCB_INFO_OFFSET_QUEUE_NUM 4
|
||||||
|
#define CCB_INFO_OFFSET_QUEUE_POS 6
|
||||||
|
|
||||||
|
/* CCB state (RET1[0]) */
|
||||||
|
#define HV_CCB_STATE_COMPLETED 0
|
||||||
|
#define HV_CCB_STATE_ENQUEUED 1
|
||||||
|
#define HV_CCB_STATE_INPROGRESS 2
|
||||||
|
#define HV_CCB_STATE_NOTFOUND 3
|
||||||
|
|
||||||
|
/* ccb_kill()
|
||||||
|
* TRAP: HV_FAST_TRAP
|
||||||
|
* FUNCTION: HV_CCB_KILL
|
||||||
|
* ARG0: real address of CCB completion area
|
||||||
|
* RET0: status (success or error code)
|
||||||
|
* RET1: CCB kill status
|
||||||
|
*
|
||||||
|
* ERRORS: EOK operation successful
|
||||||
|
* EBADALIGN address not 64B aligned
|
||||||
|
* ENORADDR RA in address not valid
|
||||||
|
* EINVAL CA not valid
|
||||||
|
* EWOULDBLOCK kill not available for this CCB currently, try
|
||||||
|
* again
|
||||||
|
* ENOACCESS guest cannot use dax
|
||||||
|
*/
|
||||||
|
|
||||||
|
#define HV_CCB_KILL 0x36
|
||||||
|
#ifndef __ASSEMBLY__
|
||||||
|
unsigned long sun4v_ccb_kill(unsigned long ca,
|
||||||
|
void *kill_status);
|
||||||
|
#endif
|
||||||
|
|
||||||
|
/* CCB kill status (RET1) */
|
||||||
|
#define HV_CCB_KILL_COMPLETED 0
|
||||||
|
#define HV_CCB_KILL_DEQUEUED 1
|
||||||
|
#define HV_CCB_KILL_KILLED 2
|
||||||
|
#define HV_CCB_KILL_NOTFOUND 3
|
||||||
|
|
||||||
/* Time of day services.
|
/* Time of day services.
|
||||||
*
|
*
|
||||||
* The hypervisor maintains the time of day on a per-domain basis.
|
* The hypervisor maintains the time of day on a per-domain basis.
|
||||||
|
@ -3355,6 +3492,7 @@ unsigned long sun4v_m7_set_perfreg(unsigned long reg_num,
|
||||||
#define HV_GRP_SDIO_ERR 0x0109
|
#define HV_GRP_SDIO_ERR 0x0109
|
||||||
#define HV_GRP_REBOOT_DATA 0x0110
|
#define HV_GRP_REBOOT_DATA 0x0110
|
||||||
#define HV_GRP_ATU 0x0111
|
#define HV_GRP_ATU 0x0111
|
||||||
|
#define HV_GRP_DAX 0x0113
|
||||||
#define HV_GRP_M7_PERF 0x0114
|
#define HV_GRP_M7_PERF 0x0114
|
||||||
#define HV_GRP_NIAG_PERF 0x0200
|
#define HV_GRP_NIAG_PERF 0x0200
|
||||||
#define HV_GRP_FIRE_PERF 0x0201
|
#define HV_GRP_FIRE_PERF 0x0201
|
||||||
|
|
|
@ -0,0 +1,91 @@
|
||||||
|
/*
|
||||||
|
* Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.
|
||||||
|
*
|
||||||
|
* This program is free software: you can redistribute it and/or modify
|
||||||
|
* it under the terms of the GNU General Public License as published by
|
||||||
|
* the Free Software Foundation, either version 3 of the License, or
|
||||||
|
* (at your option) any later version.
|
||||||
|
*
|
||||||
|
* This program is distributed in the hope that it will be useful,
|
||||||
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||||
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||||
|
* GNU General Public License for more details.
|
||||||
|
*
|
||||||
|
* You should have received a copy of the GNU General Public License
|
||||||
|
* along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||||
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Oracle DAX driver API definitions
|
||||||
|
*/
|
||||||
|
|
||||||
|
#ifndef _ORADAX_H
|
||||||
|
#define _ORADAX_H
|
||||||
|
|
||||||
|
#include <linux/types.h>
|
||||||
|
|
||||||
|
#define CCB_KILL 0
|
||||||
|
#define CCB_INFO 1
|
||||||
|
#define CCB_DEQUEUE 2
|
||||||
|
|
||||||
|
struct dax_command {
|
||||||
|
__u16 command; /* CCB_KILL/INFO/DEQUEUE */
|
||||||
|
__u16 ca_offset; /* offset into mmapped completion area */
|
||||||
|
};
|
||||||
|
|
||||||
|
struct ccb_kill_result {
|
||||||
|
__u16 action; /* action taken to kill ccb */
|
||||||
|
};
|
||||||
|
|
||||||
|
struct ccb_info_result {
|
||||||
|
__u16 state; /* state of enqueued ccb */
|
||||||
|
__u16 inst_num; /* dax instance number of enqueued ccb */
|
||||||
|
__u16 q_num; /* queue number of enqueued ccb */
|
||||||
|
__u16 q_pos; /* ccb position in queue */
|
||||||
|
};
|
||||||
|
|
||||||
|
struct ccb_exec_result {
|
||||||
|
__u64 status_data; /* additional status data (e.g. bad VA) */
|
||||||
|
__u32 status; /* one of DAX_SUBMIT_* */
|
||||||
|
};
|
||||||
|
|
||||||
|
union ccb_result {
|
||||||
|
struct ccb_exec_result exec;
|
||||||
|
struct ccb_info_result info;
|
||||||
|
struct ccb_kill_result kill;
|
||||||
|
};
|
||||||
|
|
||||||
|
#define DAX_MMAP_LEN (16 * 1024)
|
||||||
|
#define DAX_MAX_CCBS 15
|
||||||
|
#define DAX_CCB_BUF_MAXLEN (DAX_MAX_CCBS * 64)
|
||||||
|
#define DAX_NAME "oradax"
|
||||||
|
|
||||||
|
/* CCB_EXEC status */
|
||||||
|
#define DAX_SUBMIT_OK 0
|
||||||
|
#define DAX_SUBMIT_ERR_RETRY 1
|
||||||
|
#define DAX_SUBMIT_ERR_WOULDBLOCK 2
|
||||||
|
#define DAX_SUBMIT_ERR_BUSY 3
|
||||||
|
#define DAX_SUBMIT_ERR_THR_INIT 4
|
||||||
|
#define DAX_SUBMIT_ERR_ARG_INVAL 5
|
||||||
|
#define DAX_SUBMIT_ERR_CCB_INVAL 6
|
||||||
|
#define DAX_SUBMIT_ERR_NO_CA_AVAIL 7
|
||||||
|
#define DAX_SUBMIT_ERR_CCB_ARR_MMU_MISS 8
|
||||||
|
#define DAX_SUBMIT_ERR_NOMAP 9
|
||||||
|
#define DAX_SUBMIT_ERR_NOACCESS 10
|
||||||
|
#define DAX_SUBMIT_ERR_TOOMANY 11
|
||||||
|
#define DAX_SUBMIT_ERR_UNAVAIL 12
|
||||||
|
#define DAX_SUBMIT_ERR_INTERNAL 13
|
||||||
|
|
||||||
|
/* CCB_INFO states - must match HV_CCB_STATE_* definitions */
|
||||||
|
#define DAX_CCB_COMPLETED 0
|
||||||
|
#define DAX_CCB_ENQUEUED 1
|
||||||
|
#define DAX_CCB_INPROGRESS 2
|
||||||
|
#define DAX_CCB_NOTFOUND 3
|
||||||
|
|
||||||
|
/* CCB_KILL actions - must match HV_CCB_KILL_* definitions */
|
||||||
|
#define DAX_KILL_COMPLETED 0
|
||||||
|
#define DAX_KILL_DEQUEUED 1
|
||||||
|
#define DAX_KILL_KILLED 2
|
||||||
|
#define DAX_KILL_NOTFOUND 3
|
||||||
|
|
||||||
|
#endif /* _ORADAX_H */
|
|
@ -41,6 +41,7 @@ static struct api_info api_table[] = {
|
||||||
{ .group = HV_GRP_SDIO_ERR, },
|
{ .group = HV_GRP_SDIO_ERR, },
|
||||||
{ .group = HV_GRP_REBOOT_DATA, },
|
{ .group = HV_GRP_REBOOT_DATA, },
|
||||||
{ .group = HV_GRP_ATU, .flags = FLAG_PRE_API },
|
{ .group = HV_GRP_ATU, .flags = FLAG_PRE_API },
|
||||||
|
{ .group = HV_GRP_DAX, },
|
||||||
{ .group = HV_GRP_NIAG_PERF, .flags = FLAG_PRE_API },
|
{ .group = HV_GRP_NIAG_PERF, .flags = FLAG_PRE_API },
|
||||||
{ .group = HV_GRP_FIRE_PERF, },
|
{ .group = HV_GRP_FIRE_PERF, },
|
||||||
{ .group = HV_GRP_N2_CPU, },
|
{ .group = HV_GRP_N2_CPU, },
|
||||||
|
|
|
@ -871,3 +871,60 @@ ENTRY(sun4v_m7_set_perfreg)
|
||||||
retl
|
retl
|
||||||
nop
|
nop
|
||||||
ENDPROC(sun4v_m7_set_perfreg)
|
ENDPROC(sun4v_m7_set_perfreg)
|
||||||
|
|
||||||
|
/* %o0: address of CCB array
|
||||||
|
* %o1: size (in bytes) of CCB array
|
||||||
|
* %o2: flags
|
||||||
|
* %o3: reserved
|
||||||
|
*
|
||||||
|
* returns:
|
||||||
|
* %o0: status
|
||||||
|
* %o1: size (in bytes) of the CCB array that was accepted
|
||||||
|
* %o2: status data
|
||||||
|
* %o3: reserved
|
||||||
|
*/
|
||||||
|
ENTRY(sun4v_ccb_submit)
|
||||||
|
mov %o5, %g1
|
||||||
|
mov HV_CCB_SUBMIT, %o5
|
||||||
|
ta HV_FAST_TRAP
|
||||||
|
stx %o1, [%o4]
|
||||||
|
retl
|
||||||
|
stx %o2, [%g1]
|
||||||
|
ENDPROC(sun4v_ccb_submit)
|
||||||
|
EXPORT_SYMBOL(sun4v_ccb_submit)
|
||||||
|
|
||||||
|
/* %o0: completion area ra for the ccb to get info
|
||||||
|
*
|
||||||
|
* returns:
|
||||||
|
* %o0: status
|
||||||
|
* %o1: CCB state
|
||||||
|
* %o2: position
|
||||||
|
* %o3: dax unit
|
||||||
|
* %o4: queue
|
||||||
|
*/
|
||||||
|
ENTRY(sun4v_ccb_info)
|
||||||
|
mov %o1, %g1
|
||||||
|
mov HV_CCB_INFO, %o5
|
||||||
|
ta HV_FAST_TRAP
|
||||||
|
sth %o1, [%g1 + CCB_INFO_OFFSET_CCB_STATE]
|
||||||
|
sth %o2, [%g1 + CCB_INFO_OFFSET_QUEUE_POS]
|
||||||
|
sth %o3, [%g1 + CCB_INFO_OFFSET_DAX_UNIT]
|
||||||
|
retl
|
||||||
|
sth %o4, [%g1 + CCB_INFO_OFFSET_QUEUE_NUM]
|
||||||
|
ENDPROC(sun4v_ccb_info)
|
||||||
|
EXPORT_SYMBOL(sun4v_ccb_info)
|
||||||
|
|
||||||
|
/* %o0: completion area ra for the ccb to kill
|
||||||
|
*
|
||||||
|
* returns:
|
||||||
|
* %o0: status
|
||||||
|
* %o1: result of the kill
|
||||||
|
*/
|
||||||
|
ENTRY(sun4v_ccb_kill)
|
||||||
|
mov %o1, %g1
|
||||||
|
mov HV_CCB_KILL, %o5
|
||||||
|
ta HV_FAST_TRAP
|
||||||
|
retl
|
||||||
|
sth %o1, [%g1]
|
||||||
|
ENDPROC(sun4v_ccb_kill)
|
||||||
|
EXPORT_SYMBOL(sun4v_ccb_kill)
|
||||||
|
|
|
@ -9,9 +9,6 @@
|
||||||
* Copyright (C) 1997,1998 Jakub Jelinek (jj@sunsite.mff.cuni.cz)
|
* Copyright (C) 1997,1998 Jakub Jelinek (jj@sunsite.mff.cuni.cz)
|
||||||
*/
|
*/
|
||||||
|
|
||||||
#ifdef CONFIG_COMPAT
|
|
||||||
#include <linux/compat.h> /* for compat_old_sigset_t */
|
|
||||||
#endif
|
|
||||||
#include <linux/sched.h>
|
#include <linux/sched.h>
|
||||||
#include <linux/kernel.h>
|
#include <linux/kernel.h>
|
||||||
#include <linux/signal.h>
|
#include <linux/signal.h>
|
||||||
|
|
|
@ -251,7 +251,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm, int uses_interp)
|
||||||
else
|
else
|
||||||
return map_vdso(&vdso_image_32_builtin, &vdso_mapping32);
|
return map_vdso(&vdso_image_32_builtin, &vdso_mapping32);
|
||||||
#else
|
#else
|
||||||
return map_vdso(&vdso_image_64_builtin, &vdso_mapping64);
|
return map_vdso(&vdso_image_64_builtin, &vdso_mapping64);
|
||||||
#endif
|
#endif
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
|
@ -70,5 +70,13 @@ config DISPLAY7SEG
|
||||||
another UltraSPARC-IIi-cEngine boardset with a 7-segment display,
|
another UltraSPARC-IIi-cEngine boardset with a 7-segment display,
|
||||||
you should say N to this option.
|
you should say N to this option.
|
||||||
|
|
||||||
|
config ORACLE_DAX
|
||||||
|
tristate "Oracle Data Analytics Accelerator"
|
||||||
|
default m if SPARC64
|
||||||
|
help
|
||||||
|
Driver for Oracle Data Analytics Accelerator, which is
|
||||||
|
a coprocessor that performs database operations in hardware.
|
||||||
|
It is available on M7 and M8 based systems only.
|
||||||
|
|
||||||
endmenu
|
endmenu
|
||||||
|
|
||||||
|
|
|
@ -17,3 +17,4 @@ obj-$(CONFIG_SUN_OPENPROMIO) += openprom.o
|
||||||
obj-$(CONFIG_TADPOLE_TS102_UCTRL) += uctrl.o
|
obj-$(CONFIG_TADPOLE_TS102_UCTRL) += uctrl.o
|
||||||
obj-$(CONFIG_SUN_JSFLASH) += jsflash.o
|
obj-$(CONFIG_SUN_JSFLASH) += jsflash.o
|
||||||
obj-$(CONFIG_BBC_I2C) += bbc.o
|
obj-$(CONFIG_BBC_I2C) += bbc.o
|
||||||
|
obj-$(CONFIG_ORACLE_DAX) += oradax.o
|
||||||
|
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Загрузка…
Ссылка в новой задаче