Initial Commit
This commit is contained in:
Родитель
c578e7a332
Коммит
5575701178
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 6d3b29b401b2a4ec893b605b781f8569
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,166 @@
|
||||||
|
<!---TODO:
|
||||||
|
Advanced topics
|
||||||
|
* worker.AddInput(): to prewarm data
|
||||||
|
* how to trim networks at runtime (multi brain models)
|
||||||
|
* loading model from url: var modelFromDiskOrInternet = ModelLoader.Load(url, verbose); // will download and cache model from url
|
||||||
|
* recurrent state
|
||||||
|
--->
|
||||||
|
|
||||||
|
# Barracuda
|
||||||
|
|
||||||
|
**Barracuda** is a lightweight and **cross-platform** Neural Net **inference library for Unity**. Barracuda can execute both on GPU and CPU. Currently Barracuda is in the early development stage, so adventures are expected.
|
||||||
|
|
||||||
|
## Using Barracuda
|
||||||
|
Typically the following steps are needed to use Barracuda in application:
|
||||||
|
1. load model,
|
||||||
|
2. create inference engine (the worker),
|
||||||
|
3. execute model and
|
||||||
|
4. fetch results.
|
||||||
|
|
||||||
|
But first you have to convert your TensorFlow (or ONNX) model to Barracuda format with python scripts. Example usage:
|
||||||
|
```bash
|
||||||
|
python onnx_to_barracuda.py Models/mnist/model.onnx Destination/mnist.bytes
|
||||||
|
```
|
||||||
|
See _Converting models to Barracuda_ paragraph below for more information.
|
||||||
|
|
||||||
|
### Load Model into Barracuda
|
||||||
|
Once you have your TensorFlow (or ONNX) model converted, you can load resulting Barracuda file via `ModelLoader`:
|
||||||
|
```C#
|
||||||
|
var model = ModelLoader.LoadFromStreamingAssets(modelName + ".bytes");
|
||||||
|
```
|
||||||
|
|
||||||
|
### Create inference engine (Worker)
|
||||||
|
Inference engine in Barracuda is called Worker. Worker is responsible for converting model into executable tasks and scheduling them on GPU or CPU.
|
||||||
|
```C#
|
||||||
|
var worker = BarracudaWorkerFactory.CreateWorker(BarracudaWorkerFactory.Type.ComputeFast, model)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Execute the model
|
||||||
|
Inputs can be provided both as sole `Tensor` object (assuming Model has only one input) or as a dictionary of name and `Tensor` pairs.
|
||||||
|
|
||||||
|
```C#
|
||||||
|
var inputs = new Dictionary<string, Tensor>();
|
||||||
|
inputs[name1] = new Tensor(...);
|
||||||
|
inputs[name2] = new Tensor(...);
|
||||||
|
worker.Execute(inputs);
|
||||||
|
```
|
||||||
|
Execution is asynchronous for GPU backends. Currently implementation is synchronous for CPU backends, however it is good to assume that execution will be async for all backends in the future.
|
||||||
|
|
||||||
|
### Fetch outputs
|
||||||
|
If model has only single output, then simple `worker.Fetch()` can be used, otherwise output names should be provided.
|
||||||
|
```C#
|
||||||
|
var O = worker.Fetch(outputName);
|
||||||
|
```
|
||||||
|
|
||||||
|
### Cleanup
|
||||||
|
As a Barracuda client you are responsible to `Dispose` _worker_, _inputs_ and _outputs_ you fetched. This is necessary to properly free GPU resources.
|
||||||
|
```C#
|
||||||
|
O.Dispose();
|
||||||
|
worker.Dispose();
|
||||||
|
```
|
||||||
|
|
||||||
|
## Working with data
|
||||||
|
|
||||||
|
### Tensor
|
||||||
|
Barracuda stores data in `batch`,`height`,`width`,`channels` also known as _NHWC_ or _channels-last_ format. You can interact with `Tensor` data via multi-dimensional array operators:
|
||||||
|
```C#
|
||||||
|
var tensor = new Tensor(batchCount, height, width, channelCount);
|
||||||
|
tensor[n, y, x, c] = 1.0f; // as N batches of 3 dimensional data: N x {X, Y, C}
|
||||||
|
tensor[n, c] = 2.0f; // as N batches of 1 dimensional data: N x {C}
|
||||||
|
tensor[ i] = 3.0f; // as flat array
|
||||||
|
```
|
||||||
|
|
||||||
|
There are number of `Tensor` constructors that cover variety of scenarios. By default tensors are initialized with `0` upon construction, unless intialization `Array` is provided.
|
||||||
|
```C#
|
||||||
|
tensor = new Tensor(batchCount, height, width, channelCount); // batch of 3 dimensional data, 0 initialized: batchCount x {height, width, channelCount}
|
||||||
|
tensor = new Tensor(batchCount, elementCount); // batch of 1 dimensional data, 0 initialized: batchCount x {elementCount}
|
||||||
|
|
||||||
|
var stridedArray = new float[batchCount * elementCount] { ... };
|
||||||
|
tensor = new Tensor(batchCount, elementCount, stridedArray); // batch of 1 dimensional data, initialized from strided array
|
||||||
|
|
||||||
|
var jaggedArray = new float[batchCount][elementCount] { ... };
|
||||||
|
tensor = new Tensor(batchCount, elementCount, jaggedArray); // batch of 1 dimensional data, initialized from jagged array
|
||||||
|
|
||||||
|
Texture2D texture = ...;
|
||||||
|
tensor = new Tensor(texture); // tensor initialized with texture data: 1 x { texture.width, texture.height, 3}
|
||||||
|
```
|
||||||
|
|
||||||
|
You can query shape of the `Tensor` object, but you can not change it. Shape of the `Tensor` is immutable. If you want to have different shape of `Tensor`, you have to construct the new instance of `Tensor` object.
|
||||||
|
```C#
|
||||||
|
var shape = tensor.shape;
|
||||||
|
Debug.Log(shape + " or " + shape.batch + shape.height + shape.width + shape.channels);
|
||||||
|
```
|
||||||
|
|
||||||
|
### Texture as input
|
||||||
|
You can directly pass `Texture2D`, `Texture2DArray`, `Texture3D` or `RenderTexture` to Barracuda without accessing individual pixels on CPU:
|
||||||
|
```C#
|
||||||
|
var channelCount = 3; // you can treat input pixels as 1 (grayscale), 3 (color) or 4 (color with alpha) channels
|
||||||
|
var tensor = new Tensor(texture, channelCount);
|
||||||
|
```
|
||||||
|
You can batch multiple textures into the single `Tensor` object:
|
||||||
|
```C#
|
||||||
|
var textures = new [] { texture0, texture1, texture2, texture3 }; // these textures will form a batch
|
||||||
|
var tensor = new Tensor(textures, channelCount);
|
||||||
|
```
|
||||||
|
Note that to form a batch all textures must have the same width and height dimensions.
|
||||||
|
|
||||||
|
### Texture as output
|
||||||
|
If you want to use Barracuda execution results further in the graphics pipeline, you can copy data from `Tensor` into `RenderTexture` without stalling CPU or GPU:
|
||||||
|
```C#
|
||||||
|
var tensor = worker.Fetch();
|
||||||
|
var texture = BarracudaTextureUtils.TensorToRenderTexture(tensor);
|
||||||
|
```
|
||||||
|
If you wish, you can reuse the same `RenderTexture` multiple times:
|
||||||
|
```C#
|
||||||
|
var texture = new RenderTexture(width, height, 0);
|
||||||
|
// ...
|
||||||
|
var tensor = worker.Fetch();
|
||||||
|
BarracudaTextureUtils.TensorToRenderTexture(tensor, texture);
|
||||||
|
```
|
||||||
|
|
||||||
|
## Introspecting Barracuda models
|
||||||
|
Barracuda model has very simple memory representation. Once model is loaded you can query for inputs and outputs:
|
||||||
|
```C#
|
||||||
|
string[] inputNames = model.inputs; // query model inputs
|
||||||
|
string[] outputNames = model.outputs; // query model outputs
|
||||||
|
```
|
||||||
|
Or you can directly iterate through the layers and investigate what model is going to do:
|
||||||
|
```C#
|
||||||
|
foreach (var layer in model.layers)
|
||||||
|
Debug.Log(layer.name + " does " + layer.type);
|
||||||
|
```
|
||||||
|
|
||||||
|
## Verbose mode
|
||||||
|
You can turn on verbose mode for different parts of Barracuda:
|
||||||
|
```C#
|
||||||
|
bool verbose = true;
|
||||||
|
var model = ModelLoader.LoadFromStreamingAssets(modelName + ".bytes", verbose); // verbose loader
|
||||||
|
var worker = BarracudaWorkerFactory.CreateWorker(BarracudaWorkerFactory.Type.ComputeFast, model, verbose); // verbose execution
|
||||||
|
```
|
||||||
|
|
||||||
|
## Converting TensorFlow and ONNX models to Barracuda format
|
||||||
|
Barracuda comes with dedicated python scripts to convert pre-trained TensorFlow and ONNX models to Barracuda format.
|
||||||
|
|
||||||
|
Convert from TensorFlow:
|
||||||
|
```bash
|
||||||
|
python tensorflow_to_barracuda.py Models/3DBall-tf-model.pb Destination/3DBall-bc.bytes
|
||||||
|
```
|
||||||
|
|
||||||
|
Convert from ONNX:
|
||||||
|
```bash
|
||||||
|
python onnx_to_barracuda.py Models/mnist/model.onnx Destination/mnist-bc.bytes
|
||||||
|
```
|
||||||
|
|
||||||
|
If network has multiple outputs, but you need only particular ones during the inference, there is an optional `-trim` flag to remove unused outputs and calculations.
|
||||||
|
For example:
|
||||||
|
```bash
|
||||||
|
python tensorflow_to_barracuda.py Models/3DBall-tf-model.pb Destination/3DBall-bc.bytes -trim action$
|
||||||
|
```
|
||||||
|
Trim will first remove outputs that do not match regular expression from the graph. In this case only output that ends with `action` will be left.
|
||||||
|
Next trim will strip all nodes that do not participate in the evaluation of the output.
|
||||||
|
|
||||||
|
|
||||||
|
P.S. Python 3.5 or 3.6 is recommended
|
||||||
|
|
||||||
|
P.P.S. We plan to migrate Tensorflow and ONNX converters from Python to C# in the future.
|
||||||
|
|
|
@ -0,0 +1,7 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 3cf2bcd7dcfe144bebf6cf271e7dfbe0
|
||||||
|
TextScriptImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 4d59cec597ba94288831c0cade38b14e
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
Двоичный файл не отображается.
|
@ -0,0 +1,30 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: de59cc66e5e394f93b2a692e50bce97f
|
||||||
|
PluginImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
serializedVersion: 2
|
||||||
|
iconMap: {}
|
||||||
|
executionOrder: {}
|
||||||
|
isPreloaded: 0
|
||||||
|
isOverridable: 0
|
||||||
|
platformData:
|
||||||
|
- first:
|
||||||
|
Any:
|
||||||
|
second:
|
||||||
|
enabled: 1
|
||||||
|
settings: {}
|
||||||
|
- first:
|
||||||
|
Editor: Editor
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings:
|
||||||
|
DefaultValueInitialized: true
|
||||||
|
- first:
|
||||||
|
Windows Store Apps: WindowsStoreApps
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings:
|
||||||
|
CPU: AnyCPU
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: a7bba248e968b476a875260a8127a595
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 5087a463bec2b4b76808e7307a94887f
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,11 @@
|
||||||
|
{
|
||||||
|
"name": "MacBLAS",
|
||||||
|
"references": [],
|
||||||
|
"optionalUnityReferences": [],
|
||||||
|
"includePlatforms": [
|
||||||
|
"Editor",
|
||||||
|
"macOSStandalone"
|
||||||
|
],
|
||||||
|
"excludePlatforms": [],
|
||||||
|
"allowUnsafeCode": true
|
||||||
|
}
|
|
@ -0,0 +1,7 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 53fc9961397934ed38a573ce1392c80c
|
||||||
|
AssemblyDefinitionImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,29 @@
|
||||||
|
#if UNITY_STANDALONE_OSX || UNITY_EDITOR_OSX
|
||||||
|
using System.Runtime.InteropServices;
|
||||||
|
using Barracuda;
|
||||||
|
using UnityEngine;
|
||||||
|
using UnityEngine.Scripting;
|
||||||
|
|
||||||
|
|
||||||
|
[Preserve]
|
||||||
|
public class MacBLAS : BLASPlugin
|
||||||
|
{
|
||||||
|
[DllImport("macblas")]
|
||||||
|
static extern unsafe void macsgemm(float* Ap, int AN, int AM,
|
||||||
|
float* Bp, int BN, int BM,
|
||||||
|
float* Cp, int CN, int CM,
|
||||||
|
int bs, bool transposeA, bool transposeB);
|
||||||
|
|
||||||
|
public bool IsCurrentPlatformSupported()
|
||||||
|
{
|
||||||
|
return Application.platform == RuntimePlatform.OSXEditor ||
|
||||||
|
Application.platform == RuntimePlatform.OSXPlayer;
|
||||||
|
}
|
||||||
|
|
||||||
|
public unsafe void SGEMM(float* Ap, int AN, int AM, float* Bp, int BN, int BM, float* Cp, int CN, int CM, int bs,
|
||||||
|
bool transposeA = false, bool transposeB = false)
|
||||||
|
{
|
||||||
|
macsgemm(Ap, AN, AM, Bp, BN, BM, Cp, CN, CM, bs, transposeA, transposeB);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#endif // UNITY_OSX
|
|
@ -0,0 +1,11 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 680f04373f71f48a89408105d3f58a08
|
||||||
|
MonoImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
serializedVersion: 2
|
||||||
|
defaultReferences: []
|
||||||
|
executionOrder: 0
|
||||||
|
icon: {instanceID: 0}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,40 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 6633afded85ec4f00a4cc653053461bb
|
||||||
|
folderAsset: yes
|
||||||
|
PluginImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
serializedVersion: 2
|
||||||
|
iconMap: {}
|
||||||
|
executionOrder: {}
|
||||||
|
isPreloaded: 0
|
||||||
|
isOverridable: 0
|
||||||
|
platformData:
|
||||||
|
- first:
|
||||||
|
'': OSXIntel
|
||||||
|
second:
|
||||||
|
enabled: 1
|
||||||
|
settings: {}
|
||||||
|
- first:
|
||||||
|
'': OSXIntel64
|
||||||
|
second:
|
||||||
|
enabled: 1
|
||||||
|
settings: {}
|
||||||
|
- first:
|
||||||
|
Any:
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings: {}
|
||||||
|
- first:
|
||||||
|
Editor: Editor
|
||||||
|
second:
|
||||||
|
enabled: 1
|
||||||
|
settings:
|
||||||
|
DefaultValueInitialized: true
|
||||||
|
- first:
|
||||||
|
Standalone: OSXUniversal
|
||||||
|
second:
|
||||||
|
enabled: 1
|
||||||
|
settings: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 5de42c62131964fc999e1dc3d292cc31
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,40 @@
|
||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||||
|
<plist version="1.0">
|
||||||
|
<dict>
|
||||||
|
<key>BuildMachineOSBuild</key>
|
||||||
|
<string>14F27</string>
|
||||||
|
<key>CFBundleDevelopmentRegion</key>
|
||||||
|
<string>en</string>
|
||||||
|
<key>CFBundleExecutable</key>
|
||||||
|
<string>macblas</string>
|
||||||
|
<key>CFBundleIdentifier</key>
|
||||||
|
<string>com.unity3d.macblas</string>
|
||||||
|
<key>CFBundleInfoDictionaryVersion</key>
|
||||||
|
<string>6.0</string>
|
||||||
|
<key>CFBundleName</key>
|
||||||
|
<string>macblas</string>
|
||||||
|
<key>CFBundlePackageType</key>
|
||||||
|
<string>BNDL</string>
|
||||||
|
<key>CFBundleShortVersionString</key>
|
||||||
|
<string>0.1.4</string>
|
||||||
|
<key>CFBundleVersion</key>
|
||||||
|
<string>1</string>
|
||||||
|
<key>DTCompiler</key>
|
||||||
|
<string>com.apple.compilers.llvm.clang.1_0</string>
|
||||||
|
<key>DTPlatformBuild</key>
|
||||||
|
<string>6A1052d</string>
|
||||||
|
<key>DTPlatformVersion</key>
|
||||||
|
<string>GM</string>
|
||||||
|
<key>DTSDKBuild</key>
|
||||||
|
<string>14A382</string>
|
||||||
|
<key>DTSDKName</key>
|
||||||
|
<string>macosx10.10</string>
|
||||||
|
<key>DTXcode</key>
|
||||||
|
<string>0610</string>
|
||||||
|
<key>DTXcodeBuild</key>
|
||||||
|
<string>6A1052d</string>
|
||||||
|
<key>NSHumanReadableCopyright</key>
|
||||||
|
<string>Copyright © 2018 Unity Technologies. All rights reserved.</string>
|
||||||
|
</dict>
|
||||||
|
</plist>
|
|
@ -0,0 +1,7 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 844f003f25d444aafad9fb1fcea17bbc
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 0620b207d80004fe595413acf79f2f66
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
Двоичные данные
Assets/Barracuda.Core/Barracuda/Plugins/OSX/macblas.bundle/Contents/MacOS/macblas
Executable file
Двоичные данные
Assets/Barracuda.Core/Barracuda/Plugins/OSX/macblas.bundle/Contents/MacOS/macblas
Executable file
Двоичный файл не отображается.
|
@ -0,0 +1,7 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: e9ef2c9e25cad478aa1220d6cf68a2ed
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 93038b433855548879a151644d2354c1
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,105 @@
|
||||||
|
<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||||
|
<plist version="1.0">
|
||||||
|
<dict>
|
||||||
|
<key>files</key>
|
||||||
|
<dict/>
|
||||||
|
<key>files2</key>
|
||||||
|
<dict/>
|
||||||
|
<key>rules</key>
|
||||||
|
<dict>
|
||||||
|
<key>^Resources/</key>
|
||||||
|
<true/>
|
||||||
|
<key>^Resources/.*\.lproj/</key>
|
||||||
|
<dict>
|
||||||
|
<key>optional</key>
|
||||||
|
<true/>
|
||||||
|
<key>weight</key>
|
||||||
|
<real>1000</real>
|
||||||
|
</dict>
|
||||||
|
<key>^Resources/.*\.lproj/locversion.plist$</key>
|
||||||
|
<dict>
|
||||||
|
<key>omit</key>
|
||||||
|
<true/>
|
||||||
|
<key>weight</key>
|
||||||
|
<real>1100</real>
|
||||||
|
</dict>
|
||||||
|
<key>^version.plist$</key>
|
||||||
|
<true/>
|
||||||
|
</dict>
|
||||||
|
<key>rules2</key>
|
||||||
|
<dict>
|
||||||
|
<key>.*\.dSYM($|/)</key>
|
||||||
|
<dict>
|
||||||
|
<key>weight</key>
|
||||||
|
<real>11</real>
|
||||||
|
</dict>
|
||||||
|
<key>^(.*/)?\.DS_Store$</key>
|
||||||
|
<dict>
|
||||||
|
<key>omit</key>
|
||||||
|
<true/>
|
||||||
|
<key>weight</key>
|
||||||
|
<real>2000</real>
|
||||||
|
</dict>
|
||||||
|
<key>^(Frameworks|SharedFrameworks|PlugIns|Plug-ins|XPCServices|Helpers|MacOS|Library/(Automator|Spotlight|LoginItems))/</key>
|
||||||
|
<dict>
|
||||||
|
<key>nested</key>
|
||||||
|
<true/>
|
||||||
|
<key>weight</key>
|
||||||
|
<real>10</real>
|
||||||
|
</dict>
|
||||||
|
<key>^.*</key>
|
||||||
|
<true/>
|
||||||
|
<key>^Info\.plist$</key>
|
||||||
|
<dict>
|
||||||
|
<key>omit</key>
|
||||||
|
<true/>
|
||||||
|
<key>weight</key>
|
||||||
|
<real>20</real>
|
||||||
|
</dict>
|
||||||
|
<key>^PkgInfo$</key>
|
||||||
|
<dict>
|
||||||
|
<key>omit</key>
|
||||||
|
<true/>
|
||||||
|
<key>weight</key>
|
||||||
|
<real>20</real>
|
||||||
|
</dict>
|
||||||
|
<key>^Resources/</key>
|
||||||
|
<dict>
|
||||||
|
<key>weight</key>
|
||||||
|
<real>20</real>
|
||||||
|
</dict>
|
||||||
|
<key>^Resources/.*\.lproj/</key>
|
||||||
|
<dict>
|
||||||
|
<key>optional</key>
|
||||||
|
<true/>
|
||||||
|
<key>weight</key>
|
||||||
|
<real>1000</real>
|
||||||
|
</dict>
|
||||||
|
<key>^Resources/.*\.lproj/locversion.plist$</key>
|
||||||
|
<dict>
|
||||||
|
<key>omit</key>
|
||||||
|
<true/>
|
||||||
|
<key>weight</key>
|
||||||
|
<real>1100</real>
|
||||||
|
</dict>
|
||||||
|
<key>^[^/]+$</key>
|
||||||
|
<dict>
|
||||||
|
<key>nested</key>
|
||||||
|
<true/>
|
||||||
|
<key>weight</key>
|
||||||
|
<real>10</real>
|
||||||
|
</dict>
|
||||||
|
<key>^embedded\.provisionprofile$</key>
|
||||||
|
<dict>
|
||||||
|
<key>weight</key>
|
||||||
|
<real>20</real>
|
||||||
|
</dict>
|
||||||
|
<key>^version\.plist$</key>
|
||||||
|
<dict>
|
||||||
|
<key>weight</key>
|
||||||
|
<real>20</real>
|
||||||
|
</dict>
|
||||||
|
</dict>
|
||||||
|
</dict>
|
||||||
|
</plist>
|
|
@ -0,0 +1,7 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 523ab7e7760c743a9977ecfedabe1691
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 256085e1b062345239f3d7d88741f96c
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,11 @@
|
||||||
|
{
|
||||||
|
"name": "iOSBLAS",
|
||||||
|
"references": [],
|
||||||
|
"optionalUnityReferences": [],
|
||||||
|
"includePlatforms": [
|
||||||
|
"Editor",
|
||||||
|
"iOS"
|
||||||
|
],
|
||||||
|
"excludePlatforms": [],
|
||||||
|
"allowUnsafeCode": true
|
||||||
|
}
|
|
@ -0,0 +1,7 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 005937e819cd540429ad05eabcfb642f
|
||||||
|
AssemblyDefinitionImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,27 @@
|
||||||
|
#if UNITY_IOS
|
||||||
|
using System.Runtime.InteropServices;
|
||||||
|
using Barracuda;
|
||||||
|
using UnityEngine;
|
||||||
|
using UnityEngine.Scripting;
|
||||||
|
|
||||||
|
[Preserve]
|
||||||
|
public class iOSBLAS : BLASPlugin
|
||||||
|
{
|
||||||
|
[DllImport("__Internal")]
|
||||||
|
static extern unsafe void iossgemm(float* Ap, int AN, int AM,
|
||||||
|
float* Bp, int BN, int BM,
|
||||||
|
float* Cp, int CN, int CM,
|
||||||
|
int bs, bool transposeA, bool transposeB);
|
||||||
|
|
||||||
|
public bool IsCurrentPlatformSupported()
|
||||||
|
{
|
||||||
|
return Application.platform == RuntimePlatform.IPhonePlayer;
|
||||||
|
}
|
||||||
|
|
||||||
|
public unsafe void SGEMM(float* Ap, int AN, int AM, float* Bp, int BN, int BM, float* Cp, int CN, int CM, int bs,
|
||||||
|
bool transposeA = false, bool transposeB = false)
|
||||||
|
{
|
||||||
|
iossgemm(Ap, AN, AM, Bp, BN, BM, Cp, CN, CM, bs, transposeA, transposeB);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#endif // UNITY_IOS
|
|
@ -0,0 +1,11 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 75424b0c6afc14ea7a1debef68240d9e
|
||||||
|
MonoImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
serializedVersion: 2
|
||||||
|
defaultReferences: []
|
||||||
|
executionOrder: 0
|
||||||
|
icon: {instanceID: 0}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,15 @@
|
||||||
|
#import <Accelerate/Accelerate.h>
|
||||||
|
|
||||||
|
extern "C"
|
||||||
|
{
|
||||||
|
void iossgemm(float* Ap, int AN, int AM,
|
||||||
|
float* Bp, int BN, int BM,
|
||||||
|
float* Cp, int CN, int CM,
|
||||||
|
int bs, bool transposeA, bool transposeB)
|
||||||
|
{
|
||||||
|
cblas_sgemm(CblasRowMajor, transposeA ? CblasTrans : CblasNoTrans,
|
||||||
|
transposeB ? CblasTrans : CblasNoTrans,
|
||||||
|
AN, BM, BN, 1.0f, Ap, AM, Bp, BM, 1.0f, Cp, CM);
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
|
@ -0,0 +1,102 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 100b08f95d9f349118f287b0170140d4
|
||||||
|
PluginImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
serializedVersion: 2
|
||||||
|
iconMap: {}
|
||||||
|
executionOrder: {}
|
||||||
|
isPreloaded: 0
|
||||||
|
isOverridable: 0
|
||||||
|
platformData:
|
||||||
|
- first:
|
||||||
|
'': Any
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings:
|
||||||
|
Exclude Android: 1
|
||||||
|
Exclude Editor: 1
|
||||||
|
Exclude Linux: 1
|
||||||
|
Exclude Linux64: 1
|
||||||
|
Exclude LinuxUniversal: 1
|
||||||
|
Exclude OSXUniversal: 1
|
||||||
|
Exclude WebGL: 1
|
||||||
|
Exclude Win: 1
|
||||||
|
Exclude Win64: 1
|
||||||
|
Exclude iOS: 0
|
||||||
|
- first:
|
||||||
|
Android: Android
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings:
|
||||||
|
CPU: ARMv7
|
||||||
|
- first:
|
||||||
|
Any:
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings: {}
|
||||||
|
- first:
|
||||||
|
Editor: Editor
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings:
|
||||||
|
CPU: AnyCPU
|
||||||
|
DefaultValueInitialized: true
|
||||||
|
OS: AnyOS
|
||||||
|
- first:
|
||||||
|
Facebook: Win
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings:
|
||||||
|
CPU: AnyCPU
|
||||||
|
- first:
|
||||||
|
Facebook: Win64
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings:
|
||||||
|
CPU: AnyCPU
|
||||||
|
- first:
|
||||||
|
Standalone: Linux
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings:
|
||||||
|
CPU: x86
|
||||||
|
- first:
|
||||||
|
Standalone: Linux64
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings:
|
||||||
|
CPU: x86_64
|
||||||
|
- first:
|
||||||
|
Standalone: OSXUniversal
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings:
|
||||||
|
CPU: AnyCPU
|
||||||
|
- first:
|
||||||
|
Standalone: Win
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings:
|
||||||
|
CPU: AnyCPU
|
||||||
|
- first:
|
||||||
|
Standalone: Win64
|
||||||
|
second:
|
||||||
|
enabled: 0
|
||||||
|
settings:
|
||||||
|
CPU: AnyCPU
|
||||||
|
- first:
|
||||||
|
iPhone: iOS
|
||||||
|
second:
|
||||||
|
enabled: 1
|
||||||
|
settings:
|
||||||
|
AddToEmbeddedBinaries: false
|
||||||
|
CompileFlags:
|
||||||
|
FrameworkDependencies: Accelerate;
|
||||||
|
- first:
|
||||||
|
tvOS: tvOS
|
||||||
|
second:
|
||||||
|
enabled: 1
|
||||||
|
settings: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 264a957219ea041c58af860601fe1881
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,679 @@
|
||||||
|
#pragma kernel Relu
|
||||||
|
#pragma kernel Relu_CNyx
|
||||||
|
#pragma kernel Relu_Nyxc
|
||||||
|
#pragma kernel Relu6
|
||||||
|
#pragma kernel Relu6_CNyx
|
||||||
|
#pragma kernel Relu6_Nyxc
|
||||||
|
#pragma kernel Tanh
|
||||||
|
#pragma kernel Tanh_CNyx
|
||||||
|
#pragma kernel Tanh_Nyxc
|
||||||
|
#pragma kernel Swish
|
||||||
|
#pragma kernel Swish_CNyx
|
||||||
|
#pragma kernel Swish_Nyxc
|
||||||
|
#pragma kernel Sigmoid
|
||||||
|
#pragma kernel Sigmoid_CNyx
|
||||||
|
#pragma kernel Sigmoid_Nyxc
|
||||||
|
#pragma kernel Elu
|
||||||
|
#pragma kernel Elu_CNyx
|
||||||
|
#pragma kernel Elu_Nyxc
|
||||||
|
#pragma kernel LeakyRelu
|
||||||
|
#pragma kernel LeakyRelu_CNyx
|
||||||
|
#pragma kernel LeakyRelu_Nyxc
|
||||||
|
#pragma kernel Exp
|
||||||
|
#pragma kernel Exp_CNyx
|
||||||
|
#pragma kernel Exp_Nyxc
|
||||||
|
#pragma kernel Pow
|
||||||
|
#pragma kernel Pow_CNyx
|
||||||
|
#pragma kernel Pow_Nyxc
|
||||||
|
#pragma kernel Softmax
|
||||||
|
|
||||||
|
#include "Tensor.cginc"
|
||||||
|
|
||||||
|
TENSOR_DECL(X)
|
||||||
|
TENSOR_DECL_RW(O)
|
||||||
|
|
||||||
|
float _Alpha;
|
||||||
|
|
||||||
|
float relu(float v)
|
||||||
|
{
|
||||||
|
return 0.5f * (v + abs(v));
|
||||||
|
}
|
||||||
|
|
||||||
|
float relu6(float v)
|
||||||
|
{
|
||||||
|
return min(max(0, v), 6);
|
||||||
|
}
|
||||||
|
|
||||||
|
float swish(float v)
|
||||||
|
{
|
||||||
|
return v / (1.f + exp(-v));
|
||||||
|
}
|
||||||
|
|
||||||
|
float sigmoid(float v)
|
||||||
|
{
|
||||||
|
return 1.f / (1.f + exp(-v));
|
||||||
|
}
|
||||||
|
|
||||||
|
float elu(float v)
|
||||||
|
{
|
||||||
|
if (v <= 0)
|
||||||
|
v = _Alpha * (exp(v) - 1);
|
||||||
|
return v;
|
||||||
|
}
|
||||||
|
|
||||||
|
float lrelu(float v)
|
||||||
|
{
|
||||||
|
return max(v, _Alpha * v);
|
||||||
|
}
|
||||||
|
|
||||||
|
float signed_pow(float f, float e)
|
||||||
|
{
|
||||||
|
// handle negative f
|
||||||
|
float v = pow(abs(f), e);
|
||||||
|
float s = (e % 2 == 1) ?
|
||||||
|
sign(f): // exponent is odd => sign(f) * pow(abs(f), e)
|
||||||
|
1; // exponent is even => pow(abs(f), e)
|
||||||
|
return v * s;
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void Relu(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = relu(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void Relu6(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = relu6(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void Tanh(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = tanh(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void Sigmoid(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = sigmoid(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void Swish(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = swish(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void Elu(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = elu(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void LeakyRelu(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = lrelu(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void Exp(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = exp(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void Pow(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = signed_pow(v, _Alpha);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((16,16,1), (16,8,1), (16,4,1))
|
||||||
|
void Relu_CNyx(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.batch * O.height * O.width, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint nyx = dispatchThreadID.y;
|
||||||
|
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (c >= X.channels) return;
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = relu(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((512,1,1), (128,1,1), (64,1,1))
|
||||||
|
void Relu_Nyxc(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.batch * O.height * O.width * O.channels, 1, 1)
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint nyxc = dispatchThreadID.x;
|
||||||
|
|
||||||
|
uint c = nyxc % X.channels;
|
||||||
|
uint nyx = nyxc / X.channels;
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = relu(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((16,16,1), (16,8,1), (16,4,1))
|
||||||
|
void Relu6_CNyx(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.batch * O.height * O.width, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint nyx = dispatchThreadID.y;
|
||||||
|
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (c >= X.channels) return;
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = relu6(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((512,1,1), (128,1,1), (64,1,1))
|
||||||
|
void Relu6_Nyxc(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.batch * O.height * O.width * O.channels, 1, 1)
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint nyxc = dispatchThreadID.x;
|
||||||
|
|
||||||
|
uint c = nyxc % X.channels;
|
||||||
|
uint nyx = nyxc / X.channels;
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = relu6(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((16,16,1), (16,8,1), (16,4,1))
|
||||||
|
void Tanh_CNyx(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.batch * O.height * O.width, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint nyx = dispatchThreadID.y;
|
||||||
|
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (c >= X.channels) return;
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = tanh(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((512,1,1), (128,1,1), (64,1,1))
|
||||||
|
void Tanh_Nyxc(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.batch * O.height * O.width * O.channels, 1, 1)
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint nyxc = dispatchThreadID.x;
|
||||||
|
|
||||||
|
uint c = nyxc % X.channels;
|
||||||
|
uint nyx = nyxc / X.channels;
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = tanh(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((16,16,1), (16,8,1), (16,4,1))
|
||||||
|
void Sigmoid_CNyx(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.batch * O.height * O.width, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint nyx = dispatchThreadID.y;
|
||||||
|
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (c >= X.channels) return;
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = sigmoid(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((512,1,1), (128,1,1), (64,1,1))
|
||||||
|
void Sigmoid_Nyxc(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.batch * O.height * O.width * O.channels, 1, 1)
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint nyxc = dispatchThreadID.x;
|
||||||
|
|
||||||
|
uint c = nyxc % X.channels;
|
||||||
|
uint nyx = nyxc / X.channels;
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = sigmoid(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((16,16,1), (16,8,1), (16,4,1))
|
||||||
|
void Swish_CNyx(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.batch * O.height * O.width, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint nyx = dispatchThreadID.y;
|
||||||
|
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (c >= X.channels) return;
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = swish(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((512,1,1), (128,1,1), (64,1,1))
|
||||||
|
void Swish_Nyxc(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.batch * O.height * O.width * O.channels, 1, 1)
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint nyxc = dispatchThreadID.x;
|
||||||
|
|
||||||
|
uint c = nyxc % X.channels;
|
||||||
|
uint nyx = nyxc / X.channels;
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = swish(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((16,16,1), (16,8,1), (16,4,1))
|
||||||
|
void Elu_CNyx(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.batch * O.height * O.width, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint nyx = dispatchThreadID.y;
|
||||||
|
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (c >= X.channels) return;
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = elu(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((512,1,1), (128,1,1), (64,1,1))
|
||||||
|
void Elu_Nyxc(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.batch * O.height * O.width * O.channels, 1, 1)
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint nyxc = dispatchThreadID.x;
|
||||||
|
|
||||||
|
uint c = nyxc % X.channels;
|
||||||
|
uint nyx = nyxc / X.channels;
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = elu(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((16,16,1), (16,8,1), (16,4,1))
|
||||||
|
void LeakyRelu_CNyx(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.batch * O.height * O.width, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint nyx = dispatchThreadID.y;
|
||||||
|
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (c >= X.channels) return;
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = lrelu(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((512,1,1), (128,1,1), (64,1,1))
|
||||||
|
void LeakyRelu_Nyxc(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.batch * O.height * O.width * O.channels, 1, 1)
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint nyxc = dispatchThreadID.x;
|
||||||
|
|
||||||
|
uint c = nyxc % X.channels;
|
||||||
|
uint nyx = nyxc / X.channels;
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = lrelu(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((16,16,1), (16,8,1), (16,4,1))
|
||||||
|
void Exp_CNyx(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.batch * O.height * O.width, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint nyx = dispatchThreadID.y;
|
||||||
|
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (c >= X.channels) return;
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = exp(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((512,1,1), (128,1,1), (64,1,1))
|
||||||
|
void Exp_Nyxc(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.batch * O.height * O.width * O.channels, 1, 1)
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint nyxc = dispatchThreadID.x;
|
||||||
|
|
||||||
|
uint c = nyxc % X.channels;
|
||||||
|
uint nyx = nyxc / X.channels;
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = exp(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((16,16,1), (16,8,1), (16,4,1))
|
||||||
|
void Pow_CNyx(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.batch * O.height * O.width, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint nyx = dispatchThreadID.y;
|
||||||
|
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (c >= X.channels) return;
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = signed_pow(v, _Alpha);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((512,1,1), (128,1,1), (64,1,1))
|
||||||
|
void Pow_Nyxc(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.batch * O.height * O.width * O.channels, 1, 1)
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint nyxc = dispatchThreadID.x;
|
||||||
|
|
||||||
|
uint c = nyxc % X.channels;
|
||||||
|
uint nyx = nyxc / X.channels;
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = signed_pow(v, _Alpha);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
NUMTHREADS((64,4,1), (64,2,1), (64,1,1))
|
||||||
|
void Softmax(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.flatWidth, O.flatHeight, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint x = dispatchThreadID.x;
|
||||||
|
uint y = dispatchThreadID.y;
|
||||||
|
|
||||||
|
if (x >= O.GetFlatWidth()) return;
|
||||||
|
if (y >= O.GetFlatHeight()) return;
|
||||||
|
|
||||||
|
float maxV = -FLT_MAX;
|
||||||
|
for (uint i = 0; i < X.GetFlatWidth(); ++i)
|
||||||
|
{
|
||||||
|
float v = X.Get(y, i);
|
||||||
|
if (v > maxV)
|
||||||
|
maxV = v;
|
||||||
|
}
|
||||||
|
|
||||||
|
float acc = 0.0f;
|
||||||
|
for (i = 0; i < X.GetFlatWidth(); ++i)
|
||||||
|
{
|
||||||
|
float v = X.Get(y, i);
|
||||||
|
acc += exp(v - maxV);
|
||||||
|
}
|
||||||
|
|
||||||
|
float v = X.Get(y, x);
|
||||||
|
v = exp(v - maxV) / acc;
|
||||||
|
O.Set(y, x, v);
|
||||||
|
}
|
|
@ -0,0 +1,9 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: fdc94044b2f234c0fa80ada3771a2ae7
|
||||||
|
timeCreated: 1495527718
|
||||||
|
licenseType: Pro
|
||||||
|
ComputeShaderImporter:
|
||||||
|
currentAPIMask: 196608
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,885 @@
|
||||||
|
#pragma kernel Dense
|
||||||
|
#pragma kernel Conv2D
|
||||||
|
#pragma kernel DepthwiseConv2D
|
||||||
|
#pragma kernel Conv2DTrans
|
||||||
|
#pragma kernel Upsample2D
|
||||||
|
#pragma kernel Unstride2D
|
||||||
|
#pragma kernel MaxPool2D
|
||||||
|
#pragma kernel AvgPool2D
|
||||||
|
#pragma kernel GlobalMaxPool2D
|
||||||
|
#pragma kernel GlobalAvgPool2D
|
||||||
|
#pragma kernel ScaleBias
|
||||||
|
#pragma kernel InstanceNorm
|
||||||
|
#pragma kernel Dropout
|
||||||
|
#pragma kernel Relu
|
||||||
|
#pragma kernel Swish
|
||||||
|
#pragma kernel Softmax
|
||||||
|
#pragma kernel Tanh
|
||||||
|
#pragma kernel Sigmoid
|
||||||
|
#pragma kernel Relu6
|
||||||
|
#pragma kernel Elu
|
||||||
|
#pragma kernel LeakyRelu
|
||||||
|
#pragma kernel Exp
|
||||||
|
#pragma kernel Pow
|
||||||
|
#pragma kernel Copy
|
||||||
|
#pragma kernel BroadcastAdd
|
||||||
|
#pragma kernel BroadcastSub
|
||||||
|
#pragma kernel BroadcastMul
|
||||||
|
#pragma kernel BroadcastDiv
|
||||||
|
#pragma kernel BroadcastPow
|
||||||
|
#pragma kernel BroadcastMin
|
||||||
|
#pragma kernel BroadcastMax
|
||||||
|
#pragma kernel TextureToTensor
|
||||||
|
#pragma kernel TensorToTexture
|
||||||
|
|
||||||
|
#include "Tensor.cginc"
|
||||||
|
#include "Random.cginc"
|
||||||
|
|
||||||
|
TENSOR_DECL(X)
|
||||||
|
TENSOR_DECL(W)
|
||||||
|
TENSOR_DECL(K)
|
||||||
|
TENSOR_DECL(B)
|
||||||
|
TENSOR_DECL_RW(O)
|
||||||
|
|
||||||
|
uint4 _Pad;
|
||||||
|
uint4 _Pool;
|
||||||
|
uint4 _Stride;
|
||||||
|
float _Alpha;
|
||||||
|
float _Seed;
|
||||||
|
|
||||||
|
[numthreads(8,8,1)]
|
||||||
|
void Dense(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.flatWidth, O.flatHeight, 1);
|
||||||
|
TENSOR_ARGS4(X, W, B, O);
|
||||||
|
|
||||||
|
uint x = dispatchThreadID.x;
|
||||||
|
uint y = dispatchThreadID.y;
|
||||||
|
|
||||||
|
if (x >= O.GetFlatWidth()) return;
|
||||||
|
if (y >= O.GetFlatHeight()) return;
|
||||||
|
|
||||||
|
float acc = B.Get(x);
|
||||||
|
for (uint i = 0; i < X.GetFlatWidth(); ++i)
|
||||||
|
acc += X.Get(y, i) * W.Get(i, x);
|
||||||
|
|
||||||
|
O.Set(y, x, acc);
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Relu(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = 0.5f * (v + abs(v));
|
||||||
|
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Swish(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = v / (1 + exp(-v));
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Tanh(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = tanh(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Sigmoid(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = 1 / (1 + exp(-v));
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Relu6(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = min(max(0, v), 6);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Elu(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
if (v <= 0)
|
||||||
|
v = _Alpha * (exp(v) - 1);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void LeakyRelu(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = max(v, _Alpha * v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Exp(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = exp(v);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
float signed_pow(float f, float e)
|
||||||
|
{
|
||||||
|
// handle negative f
|
||||||
|
float v = pow(abs(f), e);
|
||||||
|
float s = (e % 2 == 1) ?
|
||||||
|
sign(f): // exponent is odd => sign(f) * pow(abs(f), e)
|
||||||
|
1; // exponent is even => pow(abs(f), e)
|
||||||
|
return v * s;
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Pow(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = signed_pow(v, _Alpha);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void BroadcastAdd(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v =
|
||||||
|
X.BroadcastGet(n, y, x, c) +
|
||||||
|
B.BroadcastGet(n, y, x, c);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void BroadcastSub(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v =
|
||||||
|
X.BroadcastGet(n, y, x, c) -
|
||||||
|
B.BroadcastGet(n, y, x, c);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void BroadcastMul(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float v =
|
||||||
|
X.BroadcastGet(n, y, x, c) *
|
||||||
|
B.BroadcastGet(n, y, x, c);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void BroadcastDiv(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v =
|
||||||
|
X.BroadcastGet(n, y, x, c) /
|
||||||
|
B.BroadcastGet(n, y, x, c);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void BroadcastPow(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = signed_pow(
|
||||||
|
X.BroadcastGet(n, y, x, c),
|
||||||
|
B.BroadcastGet(n, y, x, c));
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void BroadcastMin(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = min(
|
||||||
|
X.BroadcastGet(n, y, x, c),
|
||||||
|
B.BroadcastGet(n, y, x, c));
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void BroadcastMax(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = max(
|
||||||
|
X.BroadcastGet(n, y, x, c),
|
||||||
|
B.BroadcastGet(n, y, x, c));
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Copy(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
// NOTE: dispatched over X (not O)
|
||||||
|
DISPATCH_ARGS(X.channels, X.width, X.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= X.channels) return; if (x >= X.width) return; if (y >= X.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
O.Set(n + _Pad[0], y + _Pad[1], x + _Pad[2], c + _Pad[3], v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Dropout(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float4 seed = float4(n / O.batch, y / O.height, x / O.width, c / O.channels);
|
||||||
|
seed = frac(seed + _Seed);
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v *= Bernoulli(seed, 1 - _Alpha) / (1 - _Alpha);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void ScaleBias(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS4(X, W, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
float scale = W.Get(0, 0, 0, c);
|
||||||
|
float bias = B.Get(0, 0, 0, c);
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = v * scale + bias;
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(16,4,1)]
|
||||||
|
void Softmax(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.flatWidth, O.flatHeight, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint x = dispatchThreadID.x;
|
||||||
|
uint y = dispatchThreadID.y;
|
||||||
|
|
||||||
|
if (x >= O.GetFlatWidth()) return;
|
||||||
|
if (y >= O.GetFlatHeight()) return;
|
||||||
|
|
||||||
|
float maxV = -FLT_MAX;
|
||||||
|
for (uint i = 0; i < X.GetFlatWidth(); ++i)
|
||||||
|
{
|
||||||
|
float v = X.Get(y, i);
|
||||||
|
if (v > maxV)
|
||||||
|
maxV = v;
|
||||||
|
}
|
||||||
|
|
||||||
|
float acc = 0.0f;
|
||||||
|
for (i = 0; i < X.GetFlatWidth(); ++i)
|
||||||
|
{
|
||||||
|
float v = X.Get(y, i);
|
||||||
|
acc += exp(v - maxV);
|
||||||
|
}
|
||||||
|
|
||||||
|
float v = X.Get(y, x);
|
||||||
|
v = exp(v - maxV) / acc;
|
||||||
|
O.Set(y, x, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Upsample2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
// NOTE: dispatched over X (not O)
|
||||||
|
DISPATCH_ARGS(X.channels, X.width, X.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= X.channels) return;
|
||||||
|
if (x >= X.width) return;
|
||||||
|
if (y >= X.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
|
||||||
|
for (uint dy = 0; dy < _Pool.y; ++dy)
|
||||||
|
for (uint dx = 0; dx < _Pool.x; ++dx)
|
||||||
|
{
|
||||||
|
uint oy = y * _Pool.y + dy;
|
||||||
|
uint ox = x * _Pool.x + dx;
|
||||||
|
O.Set(n, oy, ox, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void MaxPool2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float maxV = -FLT_MAX;
|
||||||
|
for (uint dy = 0; dy < _Pool.y; ++dy)
|
||||||
|
for (uint dx = 0; dx < _Pool.x; ++dx)
|
||||||
|
{
|
||||||
|
uint2 pos = uint2(x, y) * _Stride.xy + uint2(dx, dy);
|
||||||
|
float v = X.SafeGet(n, pos, c, _Pad.xy);
|
||||||
|
maxV = max(v, maxV);
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(n, y, x, c, maxV);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void AvgPool2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
uint2 leftCorner = _Pad.xy;
|
||||||
|
uint2 rightCorner = uint2(X.width, X.height) + _Pad.xy;
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float acc = 0;
|
||||||
|
float counter = 0;
|
||||||
|
for (uint dy = 0; dy < _Pool.y; ++dy)
|
||||||
|
for (uint dx = 0; dx < _Pool.x; ++dx)
|
||||||
|
{
|
||||||
|
uint2 pos = uint2(x, y) * _Stride.xy + uint2(dx, dy);
|
||||||
|
|
||||||
|
bool mask = all(pos >= leftCorner) && all(pos < rightCorner);
|
||||||
|
acc += (mask)? X.Get(n, pos.y - leftCorner.y, pos.x - leftCorner.x, c): 0;
|
||||||
|
counter += (mask)? 1: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
acc /= counter;
|
||||||
|
O.Set(n, y, x, c, acc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(32,1,1)]
|
||||||
|
void GlobalMaxPool2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, 1, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
//ASSERT(X.batch == O.batch)
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float maxV = -FLT_MAX;
|
||||||
|
for (uint y = 0; y < X.height; ++y)
|
||||||
|
for (uint x = 0; x < X.width; ++x)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
maxV = max(v, maxV);
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(n, 0, 0, c, maxV);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(32,1,1)]
|
||||||
|
void GlobalAvgPool2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, 1, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
//ASSERT(X.batch == O.batch)
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = 0;
|
||||||
|
for (uint y = 0; y < X.height; ++y)
|
||||||
|
for (uint x = 0; x < X.width; ++x)
|
||||||
|
v += X.Get(n, y, x, c);
|
||||||
|
|
||||||
|
v /= (X.height * X.width);
|
||||||
|
O.Set(n, 0, 0, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(32,1,1)]
|
||||||
|
void InstanceNorm(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, 1, 1);
|
||||||
|
TENSOR_ARGS4(X, W, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
//ASSERT(X.shape == O.shape)
|
||||||
|
|
||||||
|
float gamma = W.Get(0, 0, 0, c);
|
||||||
|
float beta = B.Get(0, 0, 0, c);
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
uint x, y;
|
||||||
|
// calc mean
|
||||||
|
float acc = 0;
|
||||||
|
for (y = 0; y < O.height; ++y)
|
||||||
|
for (x = 0; x < O.width; ++x)
|
||||||
|
acc += X.Get(n, y, x, c);
|
||||||
|
float mean = acc / (O.width * O.height);
|
||||||
|
|
||||||
|
// calc variance
|
||||||
|
acc = 0;
|
||||||
|
for (y = 0; y < O.height; ++y)
|
||||||
|
for (x = 0; x < O.width; ++x)
|
||||||
|
{
|
||||||
|
float delta = X.Get(n, y, x, c) - mean;
|
||||||
|
acc += delta * delta;
|
||||||
|
}
|
||||||
|
float var = acc / (O.width * O.height);
|
||||||
|
|
||||||
|
// normalization factor
|
||||||
|
float invNormFactor = 1 / sqrt(var + FLT_EPSILON);
|
||||||
|
|
||||||
|
float scale = gamma * invNormFactor;
|
||||||
|
float bias = beta - gamma * mean * invNormFactor;
|
||||||
|
|
||||||
|
// apply normalization
|
||||||
|
for (y = 0; y < O.height; ++y)
|
||||||
|
for (x = 0; x < O.width; ++x)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = v * scale + bias;
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Conv2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(K.kernelCount, O.width, O.height);
|
||||||
|
TENSOR_ARGS4(X, K, B, O);
|
||||||
|
|
||||||
|
uint k = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (k >= K.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float acc = B.Get(k);
|
||||||
|
for (uint dy = 0; dy < K.GetKernelHeight(); ++dy)
|
||||||
|
{
|
||||||
|
for (uint dx = 0; dx < K.GetKernelWidth(); ++dx)
|
||||||
|
{
|
||||||
|
uint2 pos = uint2(x, y) * _Stride.xy + uint2(dx, dy);
|
||||||
|
for (uint c = 0; c < X.channels; ++c)
|
||||||
|
{
|
||||||
|
float v = X.SafeGet(n, pos, c, _Pad.xy);
|
||||||
|
acc += v * K.Get(dy, dx, c, k);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(n, y, x, k, acc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((16,4,4), (8,4,4), (4,4,4))
|
||||||
|
void DepthwiseConv2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(K.kernelCount, O.width, O.height);
|
||||||
|
TENSOR_ARGS4(X, K, B, O);
|
||||||
|
|
||||||
|
uint k = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (k >= K.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float acc = B.Get(k);
|
||||||
|
for (uint dy = 0; dy < K.GetKernelHeight(); ++dy)
|
||||||
|
for (uint dx = 0; dx < K.GetKernelWidth(); ++dx)
|
||||||
|
{
|
||||||
|
uint2 pos = uint2(x, y) * _Stride.xy + uint2(dx, dy);
|
||||||
|
float v = X.SafeGet(n, pos, k, _Pad.xy);
|
||||||
|
acc += v * K.Get(dy, dx, 0, k);
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(n, y, x, k, acc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Unstride2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
int xx = (int)x - (int)_Pad.x;
|
||||||
|
int yy = (int)y - (int)_Pad.y;
|
||||||
|
|
||||||
|
int my = yy % _Stride.y;
|
||||||
|
int mx = xx % _Stride.x;
|
||||||
|
|
||||||
|
int oy = yy / _Stride.y;
|
||||||
|
int ox = xx / _Stride.x;
|
||||||
|
|
||||||
|
bool mask = ox >= 0 && oy >= 0 && ox < (int)X.width && oy < (int)X.height &&
|
||||||
|
my == 0 && mx == 0;
|
||||||
|
|
||||||
|
float v = mask ? X.Get(n, (uint)oy, (uint)ox, c) : 0;
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(4,4,4)]
|
||||||
|
void Conv2DTrans(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(K.kernelCount, O.width, O.height);
|
||||||
|
TENSOR_ARGS4(X, K, B, O);
|
||||||
|
|
||||||
|
uint k = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (k >= K.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
uint2 strideMask = _Stride.xy - 1;
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float acc = B.Get(k);
|
||||||
|
for (uint dy = y & strideMask.y; dy < K.GetKernelHeight(); dy += _Stride.y)
|
||||||
|
{
|
||||||
|
for (uint dx = x & strideMask.x; dx < K.GetKernelWidth(); dx += _Stride.x)
|
||||||
|
{
|
||||||
|
for (uint c = 0; c < X.channels; ++c)
|
||||||
|
{
|
||||||
|
uint xx = x + dx;
|
||||||
|
uint yy = y + dy;
|
||||||
|
|
||||||
|
uint oy = (yy - _Pad.y) / _Stride.y;
|
||||||
|
uint ox = (xx - _Pad.x) / _Stride.x;
|
||||||
|
|
||||||
|
bool mask = xx >= _Pad.x && yy >= _Pad.y && ox < X.width && oy < X.height;
|
||||||
|
|
||||||
|
float v = (mask)? X.Get(n, oy, ox, c): 0;
|
||||||
|
acc += v * K.Get(K.GetKernelHeight() - 1 - dy, K.GetKernelWidth() - 1 - dx, c, k);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(n, y, x, k, acc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
Texture2D<float4> Xtex2D;
|
||||||
|
Texture3D<float4> Xtex3D;
|
||||||
|
Texture2DArray<float4> Xtex2DArray;
|
||||||
|
SamplerState samplerXtex2D { Filter = MIN_MAG_LINEAR_MIP_POINT; AddressU = Clamp; AddressV = Clamp; };
|
||||||
|
SamplerState samplerXtex3D { Filter = MIN_MAG_LINEAR_MIP_POINT; AddressU = Clamp; AddressV = Clamp; AddressW = Clamp; };
|
||||||
|
SamplerState samplerXtex2DArray { Filter = MIN_MAG_LINEAR_MIP_POINT; AddressU = Clamp; AddressV = Clamp; };
|
||||||
|
|
||||||
|
RWTexture2D<float4> Otex2D;
|
||||||
|
RWTexture3D<float4> Otex3D;
|
||||||
|
RWTexture2DArray<float4> Otex2DArray;
|
||||||
|
|
||||||
|
bool _FlipY;
|
||||||
|
|
||||||
|
// TODO: call TextureToTensor(v, dispatchThreadID) from Tex2DToTensor() { v = Xtex2D.SampleLevel }
|
||||||
|
[numthreads(8,8,1)]
|
||||||
|
void TextureToTensor(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
TENSOR_ARG_RW(O);
|
||||||
|
|
||||||
|
uint b = _Pad.x;
|
||||||
|
uint x = dispatchThreadID.x + _Pad.y;
|
||||||
|
uint y = dispatchThreadID.y + _Pad.z;
|
||||||
|
uint c = dispatchThreadID.z + _Pad.w;
|
||||||
|
|
||||||
|
// calculate texture coordinates:
|
||||||
|
// offset by 0.5 to get texel centers
|
||||||
|
// divide by texture resolution (_Pool)
|
||||||
|
float3 uvw = (float3)dispatchThreadID + float3(0.5f, 0.5f, 0);
|
||||||
|
uvw /= (float3)_Pool.xyz;
|
||||||
|
if (_FlipY)
|
||||||
|
uvw.y = 1 - uvw.y;
|
||||||
|
|
||||||
|
float4 v = Xtex2D.SampleLevel(samplerXtex2D, uvw.xy, 0);
|
||||||
|
//texArray.SampleLevel(smpArray, loc, 0);
|
||||||
|
|
||||||
|
if (_Stride.w == 1)
|
||||||
|
{
|
||||||
|
// TODO: interpret color as
|
||||||
|
O.Set(b, y, x, c+0, (v.r + v.g + v.b) / 3.0f);
|
||||||
|
}
|
||||||
|
else if (_Stride.w == 3)
|
||||||
|
{
|
||||||
|
O.Set(b, y, x, c+0, v.r);
|
||||||
|
O.Set(b, y, x, c+1, v.g);
|
||||||
|
O.Set(b, y, x, c+2, v.b);
|
||||||
|
}
|
||||||
|
else if (_Stride.w == 4)
|
||||||
|
{
|
||||||
|
O.Set(b, y, x, c+0, v.r);
|
||||||
|
O.Set(b, y, x, c+1, v.g);
|
||||||
|
O.Set(b, y, x, c+2, v.b);
|
||||||
|
O.Set(b, y, x, c+3, v.a);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(8,8,1)]
|
||||||
|
void TensorToTexture(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
TENSOR_ARG(X);
|
||||||
|
|
||||||
|
uint b = _Pad.x;
|
||||||
|
uint x = dispatchThreadID.x + _Pad.y;
|
||||||
|
uint y = dispatchThreadID.y + _Pad.z;
|
||||||
|
uint c = dispatchThreadID.z + _Pad.w;
|
||||||
|
|
||||||
|
if (_FlipY)
|
||||||
|
y = X.height - 1 - y;
|
||||||
|
|
||||||
|
float4 v = 0;
|
||||||
|
|
||||||
|
if (X.channels - c == 1)
|
||||||
|
{
|
||||||
|
// broadcast to all channels
|
||||||
|
v = X.Get(b, y, x, c);
|
||||||
|
}
|
||||||
|
else if (X.channels - c == 3)
|
||||||
|
{
|
||||||
|
v.r = X.Get(b, y, x, c+0);
|
||||||
|
v.g = X.Get(b, y, x, c+1);
|
||||||
|
v.b = X.Get(b, y, x, c+2);
|
||||||
|
v.a = 1;
|
||||||
|
}
|
||||||
|
else if (X.channels - c >= 4)
|
||||||
|
{
|
||||||
|
v.r = X.Get(b, y, x, c+0);
|
||||||
|
v.g = X.Get(b, y, x, c+1);
|
||||||
|
v.b = X.Get(b, y, x, c+2);
|
||||||
|
v.a = X.Get(b, y, x, c+3);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
Otex2D[dispatchThreadID.xy] = v;
|
||||||
|
}
|
|
@ -0,0 +1,9 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: b4b1b304aae6c404cb0cdab46b8fa084
|
||||||
|
timeCreated: 1495527718
|
||||||
|
licenseType: Pro
|
||||||
|
ComputeShaderImporter:
|
||||||
|
currentAPIMask: 196608
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,149 @@
|
||||||
|
#pragma kernel BroadcastAdd
|
||||||
|
#pragma kernel BroadcastSub
|
||||||
|
#pragma kernel BroadcastMul
|
||||||
|
#pragma kernel BroadcastDiv
|
||||||
|
#pragma kernel BroadcastPow
|
||||||
|
#pragma kernel BroadcastMin
|
||||||
|
#pragma kernel BroadcastMax
|
||||||
|
|
||||||
|
#include "Tensor.cginc"
|
||||||
|
|
||||||
|
TENSOR_DECL(X)
|
||||||
|
TENSOR_DECL(B)
|
||||||
|
TENSOR_DECL_RW(O)
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void BroadcastAdd(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v =
|
||||||
|
X.BroadcastGet(n, y, x, c) +
|
||||||
|
B.BroadcastGet(n, y, x, c);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void BroadcastSub(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v =
|
||||||
|
X.BroadcastGet(n, y, x, c) -
|
||||||
|
B.BroadcastGet(n, y, x, c);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void BroadcastMul(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float v =
|
||||||
|
X.BroadcastGet(n, y, x, c) *
|
||||||
|
B.BroadcastGet(n, y, x, c);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void BroadcastDiv(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v =
|
||||||
|
X.BroadcastGet(n, y, x, c) /
|
||||||
|
B.BroadcastGet(n, y, x, c);
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
float signed_pow(float f, float e)
|
||||||
|
{
|
||||||
|
// handle negative f
|
||||||
|
float v = pow(abs(f), e);
|
||||||
|
float s = (e % 2 == 1) ?
|
||||||
|
sign(f): // exponent is odd => sign(f) * pow(abs(f), e)
|
||||||
|
1; // exponent is even => pow(abs(f), e)
|
||||||
|
return v * s;
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void BroadcastPow(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = signed_pow(
|
||||||
|
X.BroadcastGet(n, y, x, c),
|
||||||
|
B.BroadcastGet(n, y, x, c));
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void BroadcastMin(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = min(
|
||||||
|
X.BroadcastGet(n, y, x, c),
|
||||||
|
B.BroadcastGet(n, y, x, c));
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void BroadcastMax(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS3(X, B, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= O.channels) return; if (x >= O.width) return; if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = max(
|
||||||
|
X.BroadcastGet(n, y, x, c),
|
||||||
|
B.BroadcastGet(n, y, x, c));
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 72dd00e416ab94bd79e7264a1fadef9d
|
||||||
|
ComputeShaderImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
currentAPIMask: 65536
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,396 @@
|
||||||
|
#pragma kernel Conv2D
|
||||||
|
#pragma kernel Conv2D_RegisterBlock4x2
|
||||||
|
//#pragma kernel Conv2D_L1Cached64_RegisterBlock4x4
|
||||||
|
|
||||||
|
#pragma kernel DepthwiseConv2D
|
||||||
|
|
||||||
|
#pragma kernel Conv2DTrans
|
||||||
|
#pragma kernel Conv2DTrans_L1Cached64_RegisterBlock2x2
|
||||||
|
|
||||||
|
#include "Tensor.cginc"
|
||||||
|
|
||||||
|
TENSOR_DECL(X)
|
||||||
|
TENSOR_DECL(K)
|
||||||
|
TENSOR_DECL(B)
|
||||||
|
TENSOR_DECL(WBK)
|
||||||
|
TENSOR_DECL_RW(O)
|
||||||
|
|
||||||
|
uint4 _Pad;
|
||||||
|
uint4 _Stride;
|
||||||
|
|
||||||
|
NUMTHREADS((16,4,4), (8,4,4), (4,4,4))
|
||||||
|
void Conv2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(K.kernelCount, O.width, O.height);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, K, B, WBK, O);
|
||||||
|
|
||||||
|
uint k = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (k >= K.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
uint2 leftCorner = _Pad.xy;
|
||||||
|
uint2 rightCorner = uint2(X.width, X.height) + _Pad.xy;
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float acc = B.Get(k);
|
||||||
|
for (uint dy = 0; dy < K.GetKernelHeight(); ++dy)
|
||||||
|
{
|
||||||
|
for (uint dx = 0; dx < K.GetKernelWidth(); ++dx)
|
||||||
|
{
|
||||||
|
uint2 pos = uint2(x, y) * _Stride.xy + uint2(dx, dy);
|
||||||
|
// @TODO: investigate
|
||||||
|
// WARNING: had to move both y check into the loop (as opposed to checking y in parent loop) - due to potential bug in Metal compiler
|
||||||
|
if (any(pos < leftCorner)) continue;
|
||||||
|
if (any(pos >= rightCorner)) continue;
|
||||||
|
|
||||||
|
for (uint c = 0; c < X.channels; ++c)
|
||||||
|
acc = fastfma(X.Get(n, pos.y - leftCorner.y, pos.x - leftCorner.x, c), K.Get(dy, dx, c, k), acc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(n, y, x, k, acc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#define SIZE_W 4
|
||||||
|
#define SIZE_H 2
|
||||||
|
NUMTHREADS((64, 2, 2), (32, 2, 2), (16, 2, 2))
|
||||||
|
void Conv2D_RegisterBlock4x2(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(K.kernelCount, O.width, O.height);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, K, B, WBK, O);
|
||||||
|
|
||||||
|
uint k = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (k >= K.channels) return;
|
||||||
|
if (x*SIZE_W >= O.width) return;
|
||||||
|
if (y*SIZE_H >= O.height) return;
|
||||||
|
|
||||||
|
uint2 leftCorner = _Pad.xy;
|
||||||
|
uint2 rightCorner = uint2(X.width, X.height) + _Pad.xy;
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float acc[SIZE_H*SIZE_W];
|
||||||
|
[unroll]
|
||||||
|
for (uint q = 0; q < SIZE_H*SIZE_W; ++q)
|
||||||
|
acc[q] = B.Get(k);
|
||||||
|
for (uint dy = 0; dy < K.GetKernelHeight(); ++dy)
|
||||||
|
{
|
||||||
|
for (uint dx = 0; dx < K.GetKernelWidth(); ++dx)
|
||||||
|
{
|
||||||
|
uint2 pos[SIZE_H*SIZE_W];
|
||||||
|
[unroll]
|
||||||
|
for (uint q = 0; q < SIZE_H*SIZE_W; ++q)
|
||||||
|
pos[q] = uint2(x*SIZE_W+(q%SIZE_W), y*SIZE_H+(q/SIZE_W)) * _Stride.xy + uint2(dx, dy);
|
||||||
|
|
||||||
|
for (uint c = 0; c < X.channels; ++c)
|
||||||
|
[unroll]
|
||||||
|
for (q = 0; q < SIZE_H*SIZE_W; ++q)
|
||||||
|
if (all(pos[q] >= leftCorner) && all(pos[q] < rightCorner))
|
||||||
|
acc[q] = fastfma(X.Get(n, pos[q] - leftCorner, c), K.Get(dy, dx, c, k), acc[q]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[unroll]
|
||||||
|
for (q = 0; q < SIZE_H*SIZE_W; ++q)
|
||||||
|
O.Set(n, y*SIZE_H+(q/SIZE_W), x*SIZE_W+(q%SIZE_W), k, acc[q]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
#undef SIZE_W
|
||||||
|
#undef SIZE_H
|
||||||
|
|
||||||
|
#undef L1CACHESIZE
|
||||||
|
#define L1CACHESIZE 64
|
||||||
|
#undef SIZE
|
||||||
|
#define SIZE 4
|
||||||
|
groupshared float Conv2D_L1Cached64_Reg_Loop_safe_X[SIZE*SIZE][L1CACHESIZE];
|
||||||
|
[numthreads(L1CACHESIZE, 1, 1)]
|
||||||
|
void Conv2D_L1Cached64_RegisterBlock4x4(uint3 groupID : SV_GroupID, uint3 groupThreadID : SV_GroupThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(K.kernelCount, O.width, O.height);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, K, B, WBK, O);
|
||||||
|
|
||||||
|
#define X_ Conv2D_L1Cached64_Reg_Loop_safe_X
|
||||||
|
|
||||||
|
uint k = L1CACHESIZE * groupID.x + groupThreadID.x;
|
||||||
|
uint x = groupID.y;
|
||||||
|
uint y = groupID.z;
|
||||||
|
|
||||||
|
// need all threads to load channels, thus will do late check against kernel count
|
||||||
|
if (x*SIZE >= O.width) return;
|
||||||
|
if (y*SIZE >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float acc[SIZE*SIZE];
|
||||||
|
[unroll]
|
||||||
|
for (uint q = 0; q < SIZE*SIZE; ++q)
|
||||||
|
acc[q] = B.SafeGet(k);
|
||||||
|
|
||||||
|
for (uint dy = 0; dy < K.GetKernelHeight(); ++dy)
|
||||||
|
{
|
||||||
|
for (uint dx = 0; dx < K.GetKernelWidth(); ++dx)
|
||||||
|
{
|
||||||
|
uint2 pos[SIZE*SIZE];
|
||||||
|
[unroll]
|
||||||
|
for (uint q = 0; q < SIZE*SIZE; ++q)
|
||||||
|
pos[q] = uint2(x*SIZE+(q%SIZE), y*SIZE+(q/SIZE)) * _Stride.xy + uint2(dx, dy);
|
||||||
|
|
||||||
|
for (uint c = 0; c < X.channels; c += L1CACHESIZE)
|
||||||
|
{
|
||||||
|
// Cache X
|
||||||
|
uint dc = groupThreadID.x;
|
||||||
|
[unroll]
|
||||||
|
for (q = 0; q < SIZE*SIZE; ++q)
|
||||||
|
X_[q][dc] = X.SafeGet(n, pos[q], c + dc, _Pad.xy);
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
|
||||||
|
// X * K
|
||||||
|
if (k < K.channels) // need all threads to load channels, thus late check against kernel count
|
||||||
|
{
|
||||||
|
uint kIndex = K.Index(dy, dx, c, k);
|
||||||
|
for (dc = 0; dc < L1CACHESIZE; ++dc)
|
||||||
|
{
|
||||||
|
[unroll]
|
||||||
|
for (q = 0; q < SIZE*SIZE; ++q)
|
||||||
|
acc[q] = fastfma(X_[q][dc], K.data[kIndex], acc[q]);
|
||||||
|
kIndex += K.channels;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
uint remainderW = (O.width - x*SIZE);
|
||||||
|
uint remainderH = (O.height - y*SIZE);
|
||||||
|
|
||||||
|
if (k < K.channels) // need all threads to load channels, thus late check against kernel count
|
||||||
|
[unroll]
|
||||||
|
for (q = 0; q < SIZE*SIZE; ++q)
|
||||||
|
if (q/SIZE < remainderH && q%SIZE < remainderW)
|
||||||
|
O.Set(n, y*SIZE+(q/SIZE), x*SIZE+(q%SIZE), k, acc[q]);
|
||||||
|
}
|
||||||
|
|
||||||
|
#undef X_
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
NUMTHREADS((16,4,4), (8,4,4), (4,4,4))
|
||||||
|
void DepthwiseConv2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(K.kernelCount, O.width, O.height);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, K, B, WBK, O);
|
||||||
|
|
||||||
|
uint k = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (k >= K.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
uint2 leftCorner = _Pad.xy;
|
||||||
|
uint2 rightCorner = uint2(X.width, X.height) + _Pad.xy;
|
||||||
|
|
||||||
|
uint2 leftKernelCorner = uint2(x, y) * _Stride.xy;
|
||||||
|
uint2 rightKernelCorner = leftKernelCorner + uint2(K.GetKernelWidth(), K.GetKernelHeight());
|
||||||
|
|
||||||
|
if (any(leftKernelCorner < leftCorner) || any(rightKernelCorner >= rightCorner))
|
||||||
|
{
|
||||||
|
// path with edge-cases checks
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float acc = B.Get(k);
|
||||||
|
for (uint dy = 0; dy < K.GetKernelHeight(); ++dy)
|
||||||
|
for (uint dx = 0; dx < K.GetKernelWidth(); ++dx)
|
||||||
|
{
|
||||||
|
uint2 pos = leftKernelCorner + uint2(dx, dy);
|
||||||
|
if (any(pos < leftCorner)) continue;
|
||||||
|
if (any(pos >= rightCorner)) continue;
|
||||||
|
|
||||||
|
acc = fastfma(
|
||||||
|
X.Get(n, pos.y - leftCorner.y, pos.x - leftCorner.x, k),
|
||||||
|
K.Get(dy, dx, 0, k),
|
||||||
|
acc);
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(n, y, x, k, acc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
// kernel is guaranteed to be within X,
|
||||||
|
// no need to check against edge-cases
|
||||||
|
leftKernelCorner -= leftCorner;
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float acc = B.Get(k);
|
||||||
|
for (uint dy = 0; dy < K.GetKernelHeight(); ++dy)
|
||||||
|
for (uint dx = 0; dx < K.GetKernelWidth(); ++dx)
|
||||||
|
{
|
||||||
|
uint2 pos = leftKernelCorner + uint2(dx, dy);
|
||||||
|
|
||||||
|
acc = fastfma(
|
||||||
|
X.Get(n, pos, k),
|
||||||
|
K.Get(dy, dx, 0, k),
|
||||||
|
acc);
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(n, y, x, k, acc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
// Significantly faster than Conv2DTrans
|
||||||
|
[numthreads(16,2,2)]
|
||||||
|
void Conv2DTrans(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
// NOTE: dispatched over X (not O)
|
||||||
|
DISPATCH_ARGS(K.kernelCount, X.width, X.height);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, K, B, WBK, O);
|
||||||
|
|
||||||
|
uint k = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (k >= K.channels) return;
|
||||||
|
if (x >= X.width) return;
|
||||||
|
if (y >= X.height) return;
|
||||||
|
|
||||||
|
uint2 pad = _Pad.xy / _Stride.xy;
|
||||||
|
uint2 leftCorner = pad;
|
||||||
|
uint2 rightCorner = uint2(X.width, X.height) + pad;
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
for (uint sy = 0; sy < _Stride.y; ++sy)
|
||||||
|
{
|
||||||
|
for (uint sx = 0; sx < _Stride.x; ++sx)
|
||||||
|
{
|
||||||
|
float acc = B.Get(k);
|
||||||
|
for (uint dy = sy; dy < K.GetKernelHeight(); dy += _Stride.y)
|
||||||
|
{
|
||||||
|
for (uint dx = sx; dx < K.GetKernelWidth(); dx += _Stride.x)
|
||||||
|
{
|
||||||
|
uint2 pos = uint2(x, y) + uint2(sx + dx, sy + dy) / _Stride.xy;
|
||||||
|
|
||||||
|
if (any(pos < leftCorner)) continue;
|
||||||
|
if (any(pos >= rightCorner)) continue;
|
||||||
|
|
||||||
|
for (uint c = 0; c < X.channels; ++c)
|
||||||
|
{
|
||||||
|
acc = fastfma( X.Get(n, pos - leftCorner, c),
|
||||||
|
K.Get( K.GetKernelHeight() - 1 - dy,
|
||||||
|
K.GetKernelWidth() - 1 - dx, c, k),
|
||||||
|
acc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
uint oy = y * _Stride.y + sy;
|
||||||
|
uint ox = x * _Stride.x + sx;
|
||||||
|
if (oy < O.height && ox < O.width)
|
||||||
|
O.Set(n, oy, ox, k, acc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#undef L1CACHESIZE
|
||||||
|
#define L1CACHESIZE 64
|
||||||
|
#undef SIZE
|
||||||
|
#define SIZE 2
|
||||||
|
groupshared float Conv2DTrans_L1Cached64_Reg_Loop_safe_X[SIZE*SIZE][L1CACHESIZE];
|
||||||
|
[numthreads(L1CACHESIZE, 1, 1)]
|
||||||
|
void Conv2DTrans_L1Cached64_RegisterBlock2x2(uint3 groupID : SV_GroupID, uint3 groupThreadID : SV_GroupThreadID)
|
||||||
|
{
|
||||||
|
// NOTE: dispatched over X (not O)
|
||||||
|
DISPATCH_ARGS(K.kernelCount, X.width / SIZE, X.height / SIZE);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, K, B, WBK, O);
|
||||||
|
|
||||||
|
#define X_ Conv2DTrans_L1Cached64_Reg_Loop_safe_X
|
||||||
|
|
||||||
|
uint k = L1CACHESIZE * groupID.x + groupThreadID.x;
|
||||||
|
uint x = groupID.y;
|
||||||
|
uint y = groupID.z;
|
||||||
|
|
||||||
|
// need all threads to load channels, thus will do late check against kernel count
|
||||||
|
if (x*SIZE >= X.width) return;
|
||||||
|
if (y*SIZE >= X.height) return;
|
||||||
|
|
||||||
|
uint2 pad = _Pad.xy / _Stride.xy;
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
for (uint sy = 0; sy < _Stride.y; ++sy)
|
||||||
|
{
|
||||||
|
for (uint sx = 0; sx < _Stride.x; ++sx)
|
||||||
|
{
|
||||||
|
float acc[SIZE*SIZE];
|
||||||
|
[unroll]
|
||||||
|
for (uint q = 0; q < SIZE*SIZE; ++q)
|
||||||
|
acc[q] = B.SafeGet(k);
|
||||||
|
|
||||||
|
for (uint dy = sy; dy < K.GetKernelHeight(); dy += _Stride.y)
|
||||||
|
{
|
||||||
|
for (uint dx = sx; dx < K.GetKernelWidth(); dx += _Stride.x)
|
||||||
|
{
|
||||||
|
uint2 pos[SIZE*SIZE];
|
||||||
|
[unroll]
|
||||||
|
for (uint q = 0; q < SIZE*SIZE; ++q)
|
||||||
|
pos[q] = uint2(x*SIZE+(q%SIZE), y*SIZE+(q/SIZE)) + uint2(dx+sx, dy+sy) / _Stride.xy;
|
||||||
|
|
||||||
|
for (uint c = 0; c < X.channels; c += L1CACHESIZE)
|
||||||
|
{
|
||||||
|
// Cache X
|
||||||
|
uint dc = groupThreadID.x;
|
||||||
|
[unroll]
|
||||||
|
for (q = 0; q < SIZE*SIZE; ++q)
|
||||||
|
X_[q][dc] = X.SafeGet(n, pos[q], c + dc, pad);
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
|
||||||
|
// X * K
|
||||||
|
if (k < K.channels) // need all threads to load channels, thus late check against kernel count
|
||||||
|
{
|
||||||
|
//uint kIndex = K.Index(dy, dx, c, k);
|
||||||
|
for (dc = 0; dc < L1CACHESIZE; ++dc)
|
||||||
|
{
|
||||||
|
[unroll]
|
||||||
|
for (q = 0; q < SIZE*SIZE; ++q)
|
||||||
|
acc[q] = fastfma( X_[q][dc],
|
||||||
|
K.Get( K.GetKernelHeight() - 1 - dy,
|
||||||
|
K.GetKernelWidth() - 1 - dx, c + dc, k),
|
||||||
|
acc[q]);
|
||||||
|
//kIndex += K.channels;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
if (k < K.channels) // need all threads to load channels, thus late check against kernel count
|
||||||
|
[unroll]
|
||||||
|
for (q = 0; q < SIZE*SIZE; ++q)
|
||||||
|
{
|
||||||
|
uint ox = (x*SIZE+(q%SIZE)) * _Stride.x + sx;
|
||||||
|
uint oy = (y*SIZE+(q/SIZE)) * _Stride.y + sy;
|
||||||
|
if (ox < O.width && oy < O.height)
|
||||||
|
O.Set(n, oy, ox, k, acc[q]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#undef X_
|
||||||
|
}
|
|
@ -0,0 +1,9 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 7f508b82f984146e8bf0ad8520c316c7
|
||||||
|
timeCreated: 1507457340
|
||||||
|
licenseType: Pro
|
||||||
|
ComputeShaderImporter:
|
||||||
|
currentAPIMask: 196608
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,418 @@
|
||||||
|
//#pragma kernel Conv2D_Kmod16_Nmod8_KNY
|
||||||
|
//#pragma kernel Conv2D_Cache_KCmod32_KNyx
|
||||||
|
//#pragma kernel Conv2D_Cache_KCmod32_KNyxDiv2
|
||||||
|
// NOTE: DISABLED 64 version because as it is slower than 32 version on AMD GPU
|
||||||
|
//#pragma kernel Conv2D_Cache_KCmod64_KNyx
|
||||||
|
|
||||||
|
#include "Tensor.cginc"
|
||||||
|
|
||||||
|
TENSOR_DECL(X)
|
||||||
|
TENSOR_DECL(K)
|
||||||
|
TENSOR_DECL(B)
|
||||||
|
TENSOR_DECL(WBK)
|
||||||
|
TENSOR_DECL_RW(O)
|
||||||
|
|
||||||
|
uint4 _Pad;
|
||||||
|
uint4 _Stride;
|
||||||
|
|
||||||
|
NUMTHREADS((16,8,1), (16,8,1), (16,4,1))
|
||||||
|
void Conv2D_Kmod16_Nmod8_KNY(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(K.channels, O.batch, O.height);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, K, B, WBK, O);
|
||||||
|
|
||||||
|
uint k = dispatchThreadID.x;
|
||||||
|
uint n = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
for (uint x = 0; x < O.width; ++x)
|
||||||
|
{
|
||||||
|
float v = B.Get(k);
|
||||||
|
for (uint dy = 0; dy < K.GetKernelHeight(); ++dy)
|
||||||
|
{
|
||||||
|
for (uint dx = 0; dx < K.GetKernelWidth(); ++dx)
|
||||||
|
{
|
||||||
|
uint oy = y * _Stride.y + dy;
|
||||||
|
uint ox = x * _Stride.x + dx;
|
||||||
|
// @TODO: investigate
|
||||||
|
// WARNING: had to move both y check into the loop (as opposed to checking y in parent loop) - due to potential bug in Metal compiler
|
||||||
|
if (oy < _Pad.y) continue;
|
||||||
|
if (oy - _Pad.w >= X.height) continue;
|
||||||
|
if (ox < _Pad.x) continue;
|
||||||
|
if (ox - _Pad.z >= X.width) continue;
|
||||||
|
|
||||||
|
for (uint c = 0; c < X.channels; ++c)
|
||||||
|
{
|
||||||
|
v += X.Get(n, oy-_Pad.y, ox-_Pad.x, c) * K.Get(dy, dx, c, k);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
O.Set(n, y, x, k, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
#undef CTILE
|
||||||
|
#define CTILE NUMTHREAD(16, 8, 8)
|
||||||
|
groupshared float Conv_Xcache[4][CTILE][CTILE];
|
||||||
|
groupshared float Conv_Kcache[4][CTILE][CTILE];
|
||||||
|
[numthreads(CTILE, CTILE, 1)]
|
||||||
|
void Conv2D_Cache_KCmod32_KNyx(uint3 groupID : SV_GroupID, uint3 groupThreadID : SV_GroupThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(K.kernelCount / 2, O.batch * O.height * O.width / 2, 1);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, K, B, WBK, O);
|
||||||
|
|
||||||
|
#define X_ Conv_Xcache
|
||||||
|
#define K_ Conv_Kcache
|
||||||
|
|
||||||
|
uint gx = groupThreadID.x;
|
||||||
|
uint gy = groupThreadID.y;
|
||||||
|
|
||||||
|
uint k = CTILE * groupID.x + groupThreadID.x;
|
||||||
|
uint nyx = CTILE * groupID.y + groupThreadID.y;
|
||||||
|
|
||||||
|
uint width = O.width;
|
||||||
|
uint height = O.height;
|
||||||
|
|
||||||
|
uint x = nyx % width;
|
||||||
|
uint ny = nyx / width;
|
||||||
|
uint y = ny % height;
|
||||||
|
uint n = ny / height;
|
||||||
|
|
||||||
|
float b0 = B.Get(k*2+0);
|
||||||
|
float b1 = B.Get(k*2+1);
|
||||||
|
float4 v = float4(b0, b1,
|
||||||
|
b0, b1);
|
||||||
|
|
||||||
|
for (uint dy = 0; dy < K.GetKernelHeight(); ++dy)
|
||||||
|
{
|
||||||
|
for (uint dx = 0; dx < K.GetKernelWidth(); ++dx)
|
||||||
|
{
|
||||||
|
bool mask = true;
|
||||||
|
uint oy = y * _Stride.y + dy;
|
||||||
|
uint ox = x * _Stride.x + dx;
|
||||||
|
// @TODO: investigate
|
||||||
|
// WARNING: had to move both y check into the loop (as opposed to checking y in parent loop) - due to potential bug in Metal compiler
|
||||||
|
if (oy < _Pad.y) mask = false;
|
||||||
|
if (oy - _Pad.w >= X.height) mask = false;
|
||||||
|
if (ox < _Pad.x) mask = false;
|
||||||
|
if (ox - _Pad.z >= X.width) mask = false;
|
||||||
|
|
||||||
|
for (uint m = 0; m < X.channels/(CTILE*2); ++m)
|
||||||
|
{
|
||||||
|
float x0 = 0;
|
||||||
|
float x1 = 0;
|
||||||
|
float x2 = 0;
|
||||||
|
float x3 = 0;
|
||||||
|
|
||||||
|
if (mask)
|
||||||
|
{
|
||||||
|
x0 = X.Get(n*2+0, oy-_Pad.y, ox-_Pad.x, (m*CTILE + gx)*2+0);
|
||||||
|
x1 = X.Get(n*2+0, oy-_Pad.y, ox-_Pad.x, (m*CTILE + gx)*2+1);
|
||||||
|
x2 = X.Get(n*2+1, oy-_Pad.y, ox-_Pad.x, (m*CTILE + gx)*2+0);
|
||||||
|
x3 = X.Get(n*2+1, oy-_Pad.y, ox-_Pad.x, (m*CTILE + gx)*2+1);
|
||||||
|
}
|
||||||
|
|
||||||
|
float k0 = K.Get(dy, dx, (m*CTILE + gy)*2+0, k*2+0);
|
||||||
|
float k1 = K.Get(dy, dx, (m*CTILE + gy)*2+0, k*2+1);
|
||||||
|
float k2 = K.Get(dy, dx, (m*CTILE + gy)*2+1, k*2+0);
|
||||||
|
float k3 = K.Get(dy, dx, (m*CTILE + gy)*2+1, k*2+1);
|
||||||
|
|
||||||
|
//X_[gy][gx] = float4(x0, x1,
|
||||||
|
// x2, x3);
|
||||||
|
//K_[gy][gx] = float4(k0, k1,
|
||||||
|
// k2, k3);
|
||||||
|
X_[0][gy][gx] = x0;
|
||||||
|
X_[1][gy][gx] = x1;
|
||||||
|
X_[2][gy][gx] = x2;
|
||||||
|
X_[3][gy][gx] = x3;
|
||||||
|
|
||||||
|
K_[0][gy][gx] = k0;
|
||||||
|
K_[1][gy][gx] = k1;
|
||||||
|
K_[2][gy][gx] = k2;
|
||||||
|
K_[3][gy][gx] = k3;
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
|
||||||
|
[unroll]
|
||||||
|
for (uint i = 0; i < CTILE; ++i)
|
||||||
|
{
|
||||||
|
float4 x = //X_[gy][i];
|
||||||
|
float4( X_[0][gy][i],
|
||||||
|
X_[1][gy][i],
|
||||||
|
X_[2][gy][i],
|
||||||
|
X_[3][gy][i]);
|
||||||
|
float4 k = //K_[i][gx];
|
||||||
|
float4( K_[0][i][gx],
|
||||||
|
K_[1][i][gx],
|
||||||
|
K_[2][i][gx],
|
||||||
|
K_[3][i][gx]);
|
||||||
|
|
||||||
|
v.x = mad(k.x, x.x, v.x);
|
||||||
|
v.x = mad(k.z, x.y, v.x);
|
||||||
|
|
||||||
|
v.y = mad(k.y, x.x, v.y);
|
||||||
|
v.y = mad(k.w, x.y, v.y);
|
||||||
|
|
||||||
|
v.z = mad(k.x, x.z, v.z);
|
||||||
|
v.z = mad(k.z, x.w, v.z);
|
||||||
|
|
||||||
|
v.w = mad(k.y, x.z, v.w);
|
||||||
|
v.w = mad(k.w, x.w, v.w);
|
||||||
|
|
||||||
|
//v.x += k.x*x.x + k.z*x.y;
|
||||||
|
//v.y += k.y*x.x + k.w*x.y;
|
||||||
|
//v.z += k.x*x.z + k.z*x.w;
|
||||||
|
//v.w += k.y*x.z + k.w*x.w;
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(n*2+0, y, x, k*2+0, v.x);
|
||||||
|
O.Set(n*2+0, y, x, k*2+1, v.y);
|
||||||
|
O.Set(n*2+1, y, x, k*2+0, v.z);
|
||||||
|
O.Set(n*2+1, y, x, k*2+1, v.w);
|
||||||
|
|
||||||
|
#undef X_
|
||||||
|
#undef K_
|
||||||
|
}
|
||||||
|
|
||||||
|
#undef CTILE
|
||||||
|
//#define CTILE NUMTHREAD(16, 8, 8)
|
||||||
|
#define CTILE 16
|
||||||
|
groupshared float Conv_Xcache2[4][CTILE][CTILE];
|
||||||
|
groupshared float Conv_Kcache2[4][CTILE][CTILE];
|
||||||
|
[numthreads(CTILE, CTILE, 1)]
|
||||||
|
void Conv2D_Cache_KCmod32_KNyxDiv2(uint3 groupID : SV_GroupID, uint3 groupThreadID : SV_GroupThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(K.kernelCount / 2, O.batch * O.height * O.width / 2, 1);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, K, B, WBK, O);
|
||||||
|
|
||||||
|
#define X_ Conv_Xcache2
|
||||||
|
#define K_ Conv_Kcache2
|
||||||
|
|
||||||
|
uint gx = groupThreadID.x;
|
||||||
|
uint gy = groupThreadID.y;
|
||||||
|
|
||||||
|
uint k = CTILE * groupID.x + groupThreadID.x;
|
||||||
|
uint nyx = CTILE * groupID.y + groupThreadID.y;
|
||||||
|
|
||||||
|
uint width = O.width / 2;
|
||||||
|
uint height = O.height;
|
||||||
|
|
||||||
|
uint x = nyx % width;
|
||||||
|
uint ny = nyx / width;
|
||||||
|
uint y = ny % height;
|
||||||
|
uint n = ny / height;
|
||||||
|
|
||||||
|
float b0 = B.Get(k*2+0);
|
||||||
|
float b1 = B.Get(k*2+1);
|
||||||
|
float4 v = float4(b0, b1,
|
||||||
|
b0, b1);
|
||||||
|
|
||||||
|
bool mask = n < O.batch;
|
||||||
|
|
||||||
|
for (uint dy = 0; dy < K.GetKernelHeight(); ++dy)
|
||||||
|
{
|
||||||
|
for (uint dx = 0; dx < K.GetKernelWidth(); ++dx)
|
||||||
|
{
|
||||||
|
// @TODO: investigate
|
||||||
|
// WARNING: had to move both y check into the loop (as opposed to checking y in parent loop) - due to potential bug in Metal compiler
|
||||||
|
bool maskY = mask;
|
||||||
|
uint oy = y * _Stride.y + dy;
|
||||||
|
if (oy < _Pad.y) maskY = false;
|
||||||
|
if (oy - _Pad.w >= X.height) maskY = false;
|
||||||
|
|
||||||
|
bool maskL = maskY;
|
||||||
|
uint oxL = (x*2+0) * _Stride.x + dx;
|
||||||
|
if (oxL < _Pad.x) maskL = false;
|
||||||
|
if (oxL - _Pad.z >= X.width) maskL = false;
|
||||||
|
|
||||||
|
bool maskR = maskY;
|
||||||
|
uint oxR = (x*2+1) * _Stride.x + dx;
|
||||||
|
if (oxR < _Pad.x) maskR = false;
|
||||||
|
if (oxR - _Pad.z >= X.width) maskR = false;
|
||||||
|
|
||||||
|
for (uint m = 0; m < X.channels/(CTILE*2); ++m)
|
||||||
|
{
|
||||||
|
if (maskL)
|
||||||
|
{
|
||||||
|
X_[0][gy][gx] = X.Get(n, oy-_Pad.y, oxL-_Pad.x, (m*CTILE + gx)*2+0);
|
||||||
|
X_[1][gy][gx] = X.Get(n, oy-_Pad.y, oxL-_Pad.x, (m*CTILE + gx)*2+1);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
X_[0][gy][gx] = X_[1][gy][gx] = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (maskR)
|
||||||
|
{
|
||||||
|
X_[2][gy][gx] = X.Get(n, oy-_Pad.y, oxR-_Pad.x, (m*CTILE + gx)*2+0);
|
||||||
|
X_[3][gy][gx] = X.Get(n, oy-_Pad.y, oxR-_Pad.x, (m*CTILE + gx)*2+1);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
X_[2][gy][gx] = X_[3][gy][gx] = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
K_[0][gy][gx] = K.Get(dy, dx, (m*CTILE + gy)*2+0, k*2+0);
|
||||||
|
K_[1][gy][gx] = K.Get(dy, dx, (m*CTILE + gy)*2+0, k*2+1);
|
||||||
|
K_[2][gy][gx] = K.Get(dy, dx, (m*CTILE + gy)*2+1, k*2+0);
|
||||||
|
K_[3][gy][gx] = K.Get(dy, dx, (m*CTILE + gy)*2+1, k*2+1);
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
|
||||||
|
[unroll]
|
||||||
|
for (uint i = 0; i < CTILE; ++i)
|
||||||
|
{
|
||||||
|
float4 x =
|
||||||
|
float4( X_[0][gy][i],
|
||||||
|
X_[1][gy][i],
|
||||||
|
X_[2][gy][i],
|
||||||
|
X_[3][gy][i]);
|
||||||
|
float4 k =
|
||||||
|
float4( K_[0][i][gx],
|
||||||
|
K_[1][i][gx],
|
||||||
|
K_[2][i][gx],
|
||||||
|
K_[3][i][gx]);
|
||||||
|
|
||||||
|
v.x = mad(k.x, x.x, v.x);
|
||||||
|
v.x = mad(k.z, x.y, v.x);
|
||||||
|
|
||||||
|
v.y = mad(k.y, x.x, v.y);
|
||||||
|
v.y = mad(k.w, x.y, v.y);
|
||||||
|
|
||||||
|
v.z = mad(k.x, x.z, v.z);
|
||||||
|
v.z = mad(k.z, x.w, v.z);
|
||||||
|
|
||||||
|
v.w = mad(k.y, x.z, v.w);
|
||||||
|
v.w = mad(k.w, x.w, v.w);
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(n, y, x*2+0, k*2+0, v.x);
|
||||||
|
O.Set(n, y, x*2+0, k*2+1, v.y);
|
||||||
|
if (mask && x*2+1 < O.width)
|
||||||
|
{
|
||||||
|
O.Set(n, y, x*2+1, k*2+0, v.z);
|
||||||
|
O.Set(n, y, x*2+1, k*2+1, v.w);
|
||||||
|
}
|
||||||
|
|
||||||
|
#undef X_
|
||||||
|
#undef K_
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#undef CTILE
|
||||||
|
//#define CTILE NUMTHREAD(16, 8, 8)
|
||||||
|
#define CTILE 16
|
||||||
|
#define RTILE 4
|
||||||
|
groupshared float Conv_XcacheR[RTILE*RTILE][CTILE*CTILE];
|
||||||
|
groupshared float Conv_KcacheR[RTILE*RTILE][CTILE*CTILE];
|
||||||
|
[numthreads(CTILE, CTILE, 1)]
|
||||||
|
void Conv2D_Cache_KCmod64_KNyx(uint3 groupID : SV_GroupID, uint3 groupThreadID : SV_GroupThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(K.kernelCount / 4, O.batch * O.height * O.width / 4, 1);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, K, B, WBK, O);
|
||||||
|
|
||||||
|
#define X_ Conv_XcacheR
|
||||||
|
#define K_ Conv_KcacheR
|
||||||
|
|
||||||
|
uint gx = groupThreadID.x;
|
||||||
|
uint gy = groupThreadID.y;
|
||||||
|
|
||||||
|
uint k = CTILE * groupID.x + groupThreadID.x;
|
||||||
|
uint nyx = CTILE * groupID.y + groupThreadID.y;
|
||||||
|
|
||||||
|
uint x = nyx % O.width;
|
||||||
|
uint ny = nyx / O.width;
|
||||||
|
uint y = ny % O.height;
|
||||||
|
uint n = ny / O.height;
|
||||||
|
|
||||||
|
float v[RTILE][RTILE];
|
||||||
|
for (uint xxxx = 0; xxxx < RTILE; ++xxxx)
|
||||||
|
{
|
||||||
|
float b = B.Get(k*RTILE+xxxx);
|
||||||
|
for (uint yyyy = 0; yyyy < RTILE; ++yyyy)
|
||||||
|
v[yyyy][xxxx] = b;
|
||||||
|
}
|
||||||
|
|
||||||
|
for (uint dy = 0; dy < K.GetKernelHeight(); ++dy)
|
||||||
|
{
|
||||||
|
for (uint dx = 0; dx < K.GetKernelWidth(); ++dx)
|
||||||
|
{
|
||||||
|
bool mask = true;
|
||||||
|
uint oy = y * _Stride.y + dy;
|
||||||
|
uint ox = x * _Stride.x + dx;
|
||||||
|
// @TODO: investigate
|
||||||
|
// WARNING: had to move both y check into the loop (as opposed to checking y in parent loop) - due to potential bug in Metal compiler
|
||||||
|
if (oy < _Pad.y) mask = false;
|
||||||
|
if (oy - _Pad.w >= X.height) mask = false;
|
||||||
|
if (ox < _Pad.x) mask = false;
|
||||||
|
if (ox - _Pad.z >= X.width) mask = false;
|
||||||
|
|
||||||
|
for (uint m = 0; m < X.channels/(CTILE*RTILE); ++m)
|
||||||
|
{
|
||||||
|
for (uint yy = 0; yy < RTILE; ++yy)
|
||||||
|
for (uint xx = 0; xx < RTILE; ++xx)
|
||||||
|
{
|
||||||
|
if (mask)
|
||||||
|
X_[yy*RTILE+xx][gy*CTILE+gx] = X.Get(n*RTILE+yy, oy - _Pad.y, ox - _Pad.x, (m*CTILE + gx)*RTILE+xx);
|
||||||
|
else
|
||||||
|
X_[yy*RTILE+xx][gy*CTILE+gx] = 0;
|
||||||
|
K_[yy*RTILE+xx][gy*CTILE+gx] = K.Get(dy, dx, (m*CTILE + gy)*RTILE+yy, k*RTILE+xx);
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
|
||||||
|
for (uint ii = 0; ii < CTILE; ++ii)
|
||||||
|
{
|
||||||
|
float x[RTILE][RTILE];
|
||||||
|
float k[RTILE][RTILE];
|
||||||
|
|
||||||
|
[unroll]
|
||||||
|
for (uint yy = 0; yy < RTILE; ++yy)
|
||||||
|
{
|
||||||
|
[unroll]
|
||||||
|
for (uint xx = 0; xx < RTILE; ++xx)
|
||||||
|
{
|
||||||
|
x[yy][xx] = X_[yy*RTILE+xx][gy*CTILE+ii];
|
||||||
|
k[yy][xx] = K_[yy*RTILE+xx][ii*CTILE+gx];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
[unroll]
|
||||||
|
for (uint yyy = 0; yyy < RTILE; ++yyy)
|
||||||
|
{
|
||||||
|
[unroll]
|
||||||
|
for (uint xxx = 0; xxx < RTILE; ++xxx)
|
||||||
|
{
|
||||||
|
[unroll]
|
||||||
|
for (uint i = 0; i < RTILE; ++i)
|
||||||
|
{
|
||||||
|
v[yyy][xxx] = mad(x[yyy][i], k[i][xxx], v[yyy][xxx]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for (uint yy = 0; yy < RTILE; ++yy)
|
||||||
|
for (uint xx = 0; xx < RTILE; ++xx)
|
||||||
|
O.Set(n*RTILE+yy, y, x, k*RTILE+xx, v[yy][xx]);
|
||||||
|
|
||||||
|
#undef X_
|
||||||
|
#undef K_
|
||||||
|
}
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: a89bb2d7cde74429c8475f7cd8bcdb01
|
||||||
|
ComputeShaderImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
currentAPIMask: 0
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,305 @@
|
||||||
|
#pragma kernel Dense_L1Cached64
|
||||||
|
#pragma kernel DenseTiled16x16
|
||||||
|
//#pragma kernel DenseTiled32x32
|
||||||
|
//#pragma kernel DenseTiled64x64
|
||||||
|
|
||||||
|
#include "Tensor.cginc"
|
||||||
|
|
||||||
|
TENSOR_DECL(X)
|
||||||
|
TENSOR_DECL(W)
|
||||||
|
TENSOR_DECL(B)
|
||||||
|
TENSOR_DECL(WBK)
|
||||||
|
TENSOR_DECL_RW(O)
|
||||||
|
|
||||||
|
// NOTE: usually this path is used for <16 batches
|
||||||
|
#undef CACHESIZE
|
||||||
|
#define CACHESIZE 64
|
||||||
|
groupshared float Dense_L1Cached64_X[CACHESIZE];
|
||||||
|
[numthreads(CACHESIZE, 1, 1)]
|
||||||
|
void Dense_L1Cached64(uint3 groupID : SV_GroupID, uint3 groupThreadID : SV_GroupThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.flatWidth, O.flatHeight, 1);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, W, B, WBK, O);
|
||||||
|
|
||||||
|
#define X_ Dense_L1Cached64_X
|
||||||
|
|
||||||
|
uint x = CACHESIZE * groupID.x + groupThreadID.x;
|
||||||
|
uint y = groupID.y;
|
||||||
|
|
||||||
|
uint wIndex = W.Index(0, x);
|
||||||
|
|
||||||
|
float acc = B.Get(x);
|
||||||
|
// loop over X columns (flatWidth) and W rows (height) in CACHESIZE steps
|
||||||
|
for (uint i = 0; i < X.GetFlatWidth(); i += CACHESIZE)
|
||||||
|
{
|
||||||
|
// Cache X
|
||||||
|
// coalescent reads
|
||||||
|
X_[groupThreadID.x] = X.SafeGet(y, i + groupThreadID.x);
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
|
||||||
|
// X * W
|
||||||
|
if (i + CACHESIZE <= X.GetFlatWidth())
|
||||||
|
{
|
||||||
|
[unroll]
|
||||||
|
for (uint di = 0; di < CACHESIZE; ++di)
|
||||||
|
{
|
||||||
|
acc = fastfma(X_[di], W.data[wIndex], acc);
|
||||||
|
wIndex += W.GetFlatWidth();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
// handle remainder of the line < CACHESIZE
|
||||||
|
for (uint di = 0; i + di < X.GetFlatWidth(); ++di)
|
||||||
|
{
|
||||||
|
acc = fastfma(X_[di], W.data[wIndex], acc);
|
||||||
|
wIndex += W.GetFlatWidth();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
}
|
||||||
|
|
||||||
|
// needed all threads to load matrix line, x might be out of the bounds for writing
|
||||||
|
if (x < O.GetFlatWidth())
|
||||||
|
O.Set(y, x, acc);
|
||||||
|
|
||||||
|
#undef X_
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#undef TILE_WIDTH
|
||||||
|
#define TILE_WIDTH NUMTHREAD(16,8,8)
|
||||||
|
groupshared float DenseTiled_Xcache[TILE_WIDTH][TILE_WIDTH];
|
||||||
|
groupshared float DenseTiled_Wcache[TILE_WIDTH][TILE_WIDTH];
|
||||||
|
[numthreads(TILE_WIDTH,TILE_WIDTH,1)]
|
||||||
|
void DenseTiled16x16(uint3 groupID : SV_GroupID, uint3 groupThreadID : SV_GroupThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.flatWidth, O.flatHeight, 1);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, W, B, WBK, O);
|
||||||
|
|
||||||
|
#define X_ DenseTiled_Xcache
|
||||||
|
#define W_ DenseTiled_Wcache
|
||||||
|
|
||||||
|
uint tx = groupThreadID.x;
|
||||||
|
uint ty = groupThreadID.y;
|
||||||
|
uint x = groupID.x*TILE_WIDTH + tx;
|
||||||
|
uint y = groupID.y*TILE_WIDTH + ty;
|
||||||
|
|
||||||
|
bool mask = (x < O.GetFlatWidth() && y < O.GetFlatHeight());
|
||||||
|
|
||||||
|
float v = B.Get(x);
|
||||||
|
for (uint m = 0; m < X.GetFlatWidth()/TILE_WIDTH; ++m)
|
||||||
|
{
|
||||||
|
if (mask)
|
||||||
|
{
|
||||||
|
X_[ty][tx] = X.Get(y, m*TILE_WIDTH + tx);
|
||||||
|
W_[ty][tx] = W.Get(m*TILE_WIDTH + ty, x);
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
X_[ty][tx] = 0;
|
||||||
|
W_[ty][tx] = 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
|
||||||
|
[unroll]
|
||||||
|
for (uint i = 0; i < TILE_WIDTH; ++i)
|
||||||
|
{
|
||||||
|
v = fastfma(X_[ty][i], W_[i][tx], v);
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
}
|
||||||
|
|
||||||
|
if (mask)
|
||||||
|
O.Set(y, x, v);
|
||||||
|
|
||||||
|
#undef X_
|
||||||
|
#undef W_
|
||||||
|
}
|
||||||
|
|
||||||
|
#undef TILE_WIDTH
|
||||||
|
#define TILE_WIDTH NUMTHREAD(16,8,8) // 32 crashes on MacBookPro/AMD
|
||||||
|
groupshared float DenseTiled_Xcache32[2*2][TILE_WIDTH][TILE_WIDTH];
|
||||||
|
groupshared float DenseTiled_Wcache32[2*2][TILE_WIDTH][TILE_WIDTH];
|
||||||
|
[numthreads(TILE_WIDTH,TILE_WIDTH,1)]
|
||||||
|
void DenseTiled32x32(uint3 groupID : SV_GroupID, uint3 groupThreadID : SV_GroupThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.flatWidth / 2, O.flatHeight / 2, 1);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, W, B, WBK, O);
|
||||||
|
|
||||||
|
#define X_ DenseTiled_Xcache32
|
||||||
|
#define W_ DenseTiled_Wcache32
|
||||||
|
|
||||||
|
uint tx = groupThreadID.x;
|
||||||
|
uint ty = groupThreadID.y;
|
||||||
|
uint x = groupID.x*TILE_WIDTH + tx;
|
||||||
|
uint y = groupID.y*TILE_WIDTH + ty;
|
||||||
|
|
||||||
|
float b0 = B.Get(x*2+0);
|
||||||
|
float b1 = B.Get(x*2+1);
|
||||||
|
float4 v = float4(b0, b1,
|
||||||
|
b0, b1);
|
||||||
|
|
||||||
|
for (uint m = 0; m < X.GetFlatWidth()/(TILE_WIDTH*2);)
|
||||||
|
{
|
||||||
|
float x0 = X.Get(y*2+0, m*TILE_WIDTH*2 + tx*2+0);
|
||||||
|
float x1 = X.Get(y*2+0, m*TILE_WIDTH*2 + tx*2+1);
|
||||||
|
float x2 = X.Get(y*2+1, m*TILE_WIDTH*2 + tx*2+0);
|
||||||
|
float x3 = X.Get(y*2+1, m*TILE_WIDTH*2 + tx*2+1);
|
||||||
|
|
||||||
|
float w0 = W.Get(m*TILE_WIDTH*2 + ty*2+0, x*2+0);
|
||||||
|
float w1 = W.Get(m*TILE_WIDTH*2 + ty*2+0, x*2+1);
|
||||||
|
float w2 = W.Get(m*TILE_WIDTH*2 + ty*2+1, x*2+0);
|
||||||
|
float w3 = W.Get(m*TILE_WIDTH*2 + ty*2+1, x*2+1);
|
||||||
|
|
||||||
|
++m;
|
||||||
|
|
||||||
|
X_[0][ty][tx] = x0;
|
||||||
|
X_[1][ty][tx] = x1;
|
||||||
|
X_[2][ty][tx] = x2;
|
||||||
|
X_[3][ty][tx] = x3;
|
||||||
|
|
||||||
|
W_[0][ty][tx] = w0;
|
||||||
|
W_[1][ty][tx] = w1;
|
||||||
|
W_[2][ty][tx] = w2;
|
||||||
|
W_[3][ty][tx] = w3;
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
|
||||||
|
[unroll]
|
||||||
|
for (uint i = 0; i < TILE_WIDTH; ++i)
|
||||||
|
{
|
||||||
|
float4 x =
|
||||||
|
float4( X_[0][ty][i],
|
||||||
|
X_[1][ty][i],
|
||||||
|
X_[2][ty][i],
|
||||||
|
X_[3][ty][i]);
|
||||||
|
float4 w =
|
||||||
|
float4( W_[0][i][tx],
|
||||||
|
W_[1][i][tx],
|
||||||
|
W_[2][i][tx],
|
||||||
|
W_[3][i][tx]);
|
||||||
|
|
||||||
|
v.x = fastfma(w.x, x.x, v.x);
|
||||||
|
v.y = fastfma(w.y, x.x, v.y);
|
||||||
|
v.z = fastfma(w.x, x.z, v.z);
|
||||||
|
v.w = fastfma(w.y, x.z, v.w);
|
||||||
|
|
||||||
|
v.x = fastfma(w.z, x.y, v.x);
|
||||||
|
v.y = fastfma(w.w, x.y, v.y);
|
||||||
|
v.z = fastfma(w.z, x.w, v.z);
|
||||||
|
v.w = fastfma(w.w, x.w, v.w);
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(y*2+0, x*2+0, v.x);
|
||||||
|
O.Set(y*2+0, x*2+1, v.y);
|
||||||
|
O.Set(y*2+1, x*2+0, v.z);
|
||||||
|
O.Set(y*2+1, x*2+1, v.w);
|
||||||
|
|
||||||
|
#undef X_
|
||||||
|
#undef W_
|
||||||
|
}
|
||||||
|
|
||||||
|
#undef TILE_WIDTH
|
||||||
|
#define TILE_WIDTH NUMTHREAD(16,8,8)
|
||||||
|
groupshared float DenseTiled_Xcache64[4*4][TILE_WIDTH*TILE_WIDTH];
|
||||||
|
groupshared float DenseTiled_Wcache64[4*4][TILE_WIDTH*TILE_WIDTH];
|
||||||
|
[numthreads(TILE_WIDTH,TILE_WIDTH,1)]
|
||||||
|
void DenseTiled64x64(uint3 groupID : SV_GroupID, uint3 groupThreadID : SV_GroupThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.flatWidth / 4, O.flatHeight / 4, 1);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, W, B, WBK, O);
|
||||||
|
|
||||||
|
#define X_ DenseTiled_Xcache64
|
||||||
|
#define W_ DenseTiled_Wcache64
|
||||||
|
|
||||||
|
uint tx = groupThreadID.x;
|
||||||
|
uint ty = groupThreadID.y;
|
||||||
|
uint x = groupID.x*TILE_WIDTH + tx;
|
||||||
|
uint y = groupID.y*TILE_WIDTH + ty;
|
||||||
|
|
||||||
|
float b0 = B.Get(x*4+0);
|
||||||
|
float b1 = B.Get(x*4+1);
|
||||||
|
float b2 = B.Get(x*4+2);
|
||||||
|
float b3 = B.Get(x*4+3);
|
||||||
|
|
||||||
|
float4 v0, v1, v2, v3;
|
||||||
|
v0 = v1 = v2 = v3 = float4(b0, b1, b2, b3);
|
||||||
|
|
||||||
|
for (uint m = 0; m < X.GetFlatWidth()/(TILE_WIDTH*4); ++m)
|
||||||
|
{
|
||||||
|
for (uint yy = 0; yy < 4; ++yy)
|
||||||
|
for (uint xx = 0; xx < 4; ++xx)
|
||||||
|
{
|
||||||
|
X_[yy*4+xx][ty*TILE_WIDTH+tx] = X.Get(y*4+yy, (m*TILE_WIDTH + tx)*4+xx);
|
||||||
|
W_[yy*4+xx][ty*TILE_WIDTH+tx] = W.Get((m*TILE_WIDTH + ty)*4+yy, x*4+xx);
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
|
||||||
|
for (uint i = 0; i < TILE_WIDTH; ++i)
|
||||||
|
{
|
||||||
|
[unroll]
|
||||||
|
for (uint q = 0; q < 4; ++q)
|
||||||
|
{
|
||||||
|
float x0 = X_[0*4+q][ty*TILE_WIDTH+i];
|
||||||
|
float x1 = X_[1*4+q][ty*TILE_WIDTH+i];
|
||||||
|
float x2 = X_[2*4+q][ty*TILE_WIDTH+i];
|
||||||
|
float x3 = X_[3*4+q][ty*TILE_WIDTH+i];
|
||||||
|
|
||||||
|
float w0 = W_[q*4+0][i*TILE_WIDTH+tx];
|
||||||
|
float w1 = W_[q*4+1][i*TILE_WIDTH+tx];
|
||||||
|
float w2 = W_[q*4+2][i*TILE_WIDTH+tx];
|
||||||
|
float w3 = W_[q*4+3][i*TILE_WIDTH+tx];
|
||||||
|
|
||||||
|
v0.x = fastfma(x0, w0, v0.x); //--
|
||||||
|
v1.x = fastfma(x1, w0, v1.x);
|
||||||
|
v2.x = fastfma(x2, w0, v2.x);
|
||||||
|
v3.x = fastfma(x3, w0, v3.x);
|
||||||
|
v0.y = fastfma(x0, w1, v0.y); //--
|
||||||
|
v1.y = fastfma(x1, w1, v1.y);
|
||||||
|
v2.y = fastfma(x2, w1, v2.y);
|
||||||
|
v3.y = fastfma(x3, w1, v3.y);
|
||||||
|
v0.z = fastfma(x0, w2, v0.z); //--
|
||||||
|
v1.z = fastfma(x1, w2, v1.z);
|
||||||
|
v2.z = fastfma(x2, w2, v2.z);
|
||||||
|
v3.z = fastfma(x3, w2, v3.z);
|
||||||
|
v0.w = fastfma(x0, w3, v0.w); //--
|
||||||
|
v1.w = fastfma(x1, w3, v1.w);
|
||||||
|
v2.w = fastfma(x2, w3, v2.w);
|
||||||
|
v3.w = fastfma(x3, w3, v3.w);
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(y*4+0, x*4+0, v0.x);
|
||||||
|
O.Set(y*4+0, x*4+1, v0.y);
|
||||||
|
O.Set(y*4+0, x*4+2, v0.z);
|
||||||
|
O.Set(y*4+0, x*4+3, v0.w);
|
||||||
|
|
||||||
|
O.Set(y*4+1, x*4+0, v1.x);
|
||||||
|
O.Set(y*4+1, x*4+1, v1.y);
|
||||||
|
O.Set(y*4+1, x*4+2, v1.z);
|
||||||
|
O.Set(y*4+1, x*4+3, v1.w);
|
||||||
|
|
||||||
|
O.Set(y*4+2, x*4+0, v2.x);
|
||||||
|
O.Set(y*4+2, x*4+1, v2.y);
|
||||||
|
O.Set(y*4+2, x*4+2, v2.z);
|
||||||
|
O.Set(y*4+2, x*4+3, v2.w);
|
||||||
|
|
||||||
|
O.Set(y*4+3, x*4+0, v3.x);
|
||||||
|
O.Set(y*4+3, x*4+1, v3.y);
|
||||||
|
O.Set(y*4+3, x*4+2, v3.z);
|
||||||
|
O.Set(y*4+3, x*4+3, v3.w);
|
||||||
|
|
||||||
|
#undef X_
|
||||||
|
#undef W_
|
||||||
|
}
|
|
@ -0,0 +1,9 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 6b08c0ac202ad41deb8881132b21894c
|
||||||
|
timeCreated: 1507457322
|
||||||
|
licenseType: Pro
|
||||||
|
ComputeShaderImporter:
|
||||||
|
currentAPIMask: 196608
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,72 @@
|
||||||
|
#pragma kernel DenseFP16Div2
|
||||||
|
|
||||||
|
#include "Tensor.cginc"
|
||||||
|
|
||||||
|
TENSOR_DECL(X)
|
||||||
|
TENSOR_DECL(W)
|
||||||
|
TENSOR_DECL(B)
|
||||||
|
TENSOR_DECL(WBK)
|
||||||
|
TENSOR_DECL_RW(O)
|
||||||
|
|
||||||
|
float f16tof32_(uint src)
|
||||||
|
{
|
||||||
|
// Based on Fabian Giesen's public domain half_to_float_fast3
|
||||||
|
const uint magic = 113 << 23;
|
||||||
|
const uint shiftedExp = 0x7c00 << 13; // exponent mask after shift
|
||||||
|
|
||||||
|
// Mask out sign bit
|
||||||
|
uint o = src & 0x7fff;
|
||||||
|
if (o)
|
||||||
|
{
|
||||||
|
// Move exponent + mantissa to correct bits
|
||||||
|
o <<= 13;
|
||||||
|
uint exponent = o & shiftedExp;
|
||||||
|
if (exponent == 0)
|
||||||
|
{
|
||||||
|
// Handle denormal
|
||||||
|
o = asuint(asfloat(o + magic) - asfloat(magic));
|
||||||
|
}
|
||||||
|
else if (exponent == shiftedExp) // Inf/NaN
|
||||||
|
o += (255 - 31) << 23;
|
||||||
|
else
|
||||||
|
o += (127 - 15) << 23;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Copy sign bit
|
||||||
|
o |= (src & 0x8000) << 16;
|
||||||
|
|
||||||
|
return asfloat(o);
|
||||||
|
}
|
||||||
|
|
||||||
|
float2 Unpack(SharedTensor t, uint y, uint x)
|
||||||
|
{
|
||||||
|
uint v = asuint(t.data[t.Index(y, x) >> 1]);
|
||||||
|
// TEMPORARY: f16tof32 is broken in GLSL/Metal compiler
|
||||||
|
// using custom conversion function for now
|
||||||
|
//return float2(f16tof32(v), f16tof32(v>>16));
|
||||||
|
return float2(f16tof32_(v), f16tof32_(v>>16));
|
||||||
|
}
|
||||||
|
|
||||||
|
// NOTE: usually this path is used for <16 batches
|
||||||
|
NUMTHREADS((256,1,1), (128,1,1), (64,1,1))
|
||||||
|
void DenseFP16Div2(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.flatWidth/2, O.flatHeight, 1);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, W, B, WBK, O);
|
||||||
|
|
||||||
|
uint x = dispatchThreadID.x;
|
||||||
|
uint y = dispatchThreadID.y;
|
||||||
|
|
||||||
|
if (x*2 >= O.GetFlatWidth()) return;
|
||||||
|
if (y >= O.GetFlatHeight()) return;
|
||||||
|
|
||||||
|
float2 acc = Unpack(B, 0, x*2);
|
||||||
|
for (uint i = 0; i < X.width; ++i)
|
||||||
|
{
|
||||||
|
float2 w = Unpack(W, i, x*2);
|
||||||
|
acc += X.Get(y, i) * w;
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(y, x*2+0, acc[0]);
|
||||||
|
O.Set(y, x*2+1, acc[1]);
|
||||||
|
}
|
|
@ -0,0 +1,9 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: cff3cb66e54744fa4888ef91a11ec90c
|
||||||
|
timeCreated: 1508334838
|
||||||
|
licenseType: Pro
|
||||||
|
ComputeShaderImporter:
|
||||||
|
currentAPIMask: 196608
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -0,0 +1,9 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 299ca130202014274b506123e830c52d
|
||||||
|
timeCreated: 1506672486
|
||||||
|
licenseType: Pro
|
||||||
|
ComputeShaderImporter:
|
||||||
|
currentAPIMask: 196608
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,188 @@
|
||||||
|
//#pragma kernel Dense64
|
||||||
|
//#pragma kernel Conv2D_Kernel3x3_64
|
||||||
|
|
||||||
|
#include "Tensor.cginc"
|
||||||
|
|
||||||
|
TENSOR_DECL(X)
|
||||||
|
TENSOR_DECL(W)
|
||||||
|
TENSOR_DECL(K)
|
||||||
|
TENSOR_DECL(B)
|
||||||
|
TENSOR_DECL(WBK)
|
||||||
|
TENSOR_DECL_RW(O)
|
||||||
|
|
||||||
|
uint4 _Pad;
|
||||||
|
uint4 _Stride;
|
||||||
|
|
||||||
|
#undef THREAD_COUNT
|
||||||
|
#define THREAD_COUNT 64 // ATM support only 8x8
|
||||||
|
|
||||||
|
#undef BLOCK_WIDTH
|
||||||
|
#define BLOCK_WIDTH 8
|
||||||
|
|
||||||
|
#undef LOAD_WIDTH
|
||||||
|
#define LOAD_WIDTH THREAD_COUNT
|
||||||
|
|
||||||
|
#undef LOAD_DEPTH
|
||||||
|
#define LOAD_DEPTH BLOCK_WIDTH
|
||||||
|
|
||||||
|
groupshared float DenseTiled_XcacheR[LOAD_DEPTH][LOAD_WIDTH];
|
||||||
|
groupshared float DenseTiled_WcacheR[LOAD_DEPTH][LOAD_WIDTH];
|
||||||
|
|
||||||
|
[numthreads(THREAD_COUNT, 1, 1)]
|
||||||
|
void Dense64(uint3 groupID : SV_GroupID, uint3 groupThreadID : SV_GroupThreadID)
|
||||||
|
{
|
||||||
|
// @TODO: DISPATCH_ARGS(...)
|
||||||
|
TENSOR_SHARED2_ARGS4(X, W, B, WBK, O);
|
||||||
|
|
||||||
|
#define X_ DenseTiled_XcacheR
|
||||||
|
#define W_ DenseTiled_WcacheR
|
||||||
|
|
||||||
|
uint id = groupThreadID.x;
|
||||||
|
uint bx = groupID.x;
|
||||||
|
uint by = groupID.y;
|
||||||
|
|
||||||
|
uint bbx = id % BLOCK_WIDTH;
|
||||||
|
uint bby = id / BLOCK_WIDTH;
|
||||||
|
|
||||||
|
float v[BLOCK_WIDTH][BLOCK_WIDTH];
|
||||||
|
for (uint yy = 0; yy < BLOCK_WIDTH; ++yy)
|
||||||
|
for (uint xx = 0; xx < BLOCK_WIDTH; ++xx)
|
||||||
|
{
|
||||||
|
float bias = B.Get(bx*LOAD_WIDTH + bbx*BLOCK_WIDTH + xx);
|
||||||
|
v[yy][xx] = bias;
|
||||||
|
}
|
||||||
|
|
||||||
|
for (uint m = 0; m < X.GetFlatWidth()/LOAD_DEPTH; ++m)
|
||||||
|
{
|
||||||
|
for (uint q = 0; q < LOAD_DEPTH; ++q)
|
||||||
|
{
|
||||||
|
X_[q][id] = X.Get(by*LOAD_WIDTH + id, m*LOAD_DEPTH + q);
|
||||||
|
W_[q][id] = W.Get(m*LOAD_DEPTH + q, bx*LOAD_WIDTH + id);
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
|
||||||
|
for (uint yyy = 0; yyy < BLOCK_WIDTH; ++yyy)
|
||||||
|
[unroll] for (uint xxx = 0; xxx < BLOCK_WIDTH; ++xxx)
|
||||||
|
[unroll] for (uint i = 0; i < LOAD_DEPTH; ++i)
|
||||||
|
{
|
||||||
|
v[yyy][xxx] = mad(X_[i][bby*BLOCK_WIDTH + yyy], W_[i][bbx*BLOCK_WIDTH + xxx], v[yyy][xxx]);
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
}
|
||||||
|
|
||||||
|
for (uint yyy = 0; yyy < BLOCK_WIDTH; ++yyy)
|
||||||
|
for (uint xxx = 0; xxx < BLOCK_WIDTH; ++xxx)
|
||||||
|
O.Set(by*LOAD_WIDTH + bby*BLOCK_WIDTH + yyy, bx*LOAD_WIDTH + bbx*BLOCK_WIDTH + xxx, v[yyy][xxx]);
|
||||||
|
|
||||||
|
#undef X_
|
||||||
|
#undef W_
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
#undef THREAD_COUNT
|
||||||
|
#define THREAD_COUNT 64 // ATM support only 8x8
|
||||||
|
|
||||||
|
#undef BLOCK_WIDTH
|
||||||
|
#define BLOCK_WIDTH 8
|
||||||
|
|
||||||
|
#undef LOAD_WIDTH
|
||||||
|
#define LOAD_WIDTH THREAD_COUNT
|
||||||
|
|
||||||
|
#undef LOAD_DEPTH
|
||||||
|
#define LOAD_DEPTH BLOCK_WIDTH
|
||||||
|
|
||||||
|
groupshared float Conv_KcacheR[LOAD_DEPTH][LOAD_WIDTH];
|
||||||
|
groupshared float Conv_XcacheR[LOAD_DEPTH][LOAD_WIDTH];
|
||||||
|
[numthreads(THREAD_COUNT, 1, 1)]
|
||||||
|
void Conv2D_Kernel3x3_64(uint3 groupID : SV_GroupID, uint3 groupThreadID : SV_GroupThreadID)
|
||||||
|
{
|
||||||
|
// @TODO: DISPATCH_ARGS(...)
|
||||||
|
TENSOR_SHARED2_ARGS4(X, K, B, WBK, O);
|
||||||
|
|
||||||
|
#define X_ Conv_XcacheR
|
||||||
|
#define K_ Conv_KcacheR
|
||||||
|
|
||||||
|
uint id = groupThreadID.x;
|
||||||
|
uint bx = groupID.x;
|
||||||
|
uint by = groupID.y;
|
||||||
|
|
||||||
|
uint bbx = id % BLOCK_WIDTH;
|
||||||
|
uint bby = id / BLOCK_WIDTH;
|
||||||
|
|
||||||
|
uint width = O.width;
|
||||||
|
uint height = O.height;
|
||||||
|
|
||||||
|
// ASSERT(LOAD_WIDTH == THREAD_COUNT)
|
||||||
|
uint loadNYX = by*LOAD_WIDTH + id; // only works for 8x8
|
||||||
|
uint loadX = loadNYX % width;
|
||||||
|
uint loadNY = loadNYX / width;
|
||||||
|
uint loadY = loadNY % height;
|
||||||
|
uint loadN = loadNY / height;
|
||||||
|
|
||||||
|
// @TODO: validate that _Stride works, added the following 2 lines without testing
|
||||||
|
loadX *= _Stride.x;
|
||||||
|
loadY *= _Stride.y;
|
||||||
|
|
||||||
|
float v[BLOCK_WIDTH][BLOCK_WIDTH];
|
||||||
|
[unroll] for (uint yy = 0; yy < BLOCK_WIDTH; ++yy)
|
||||||
|
[unroll] for (uint xx = 0; xx < BLOCK_WIDTH; ++xx)
|
||||||
|
{
|
||||||
|
float bias = B.Get(bx*LOAD_WIDTH + bbx*BLOCK_WIDTH + xx);
|
||||||
|
v[yy][xx] = bias;
|
||||||
|
}
|
||||||
|
|
||||||
|
for (uint dy = 0; dy < 3; ++dy)
|
||||||
|
{
|
||||||
|
bool mask = true;
|
||||||
|
|
||||||
|
if (loadY+dy < _Pad.y) mask = false;
|
||||||
|
if (loadY+dy - _Pad.w >= X.height) mask = false;
|
||||||
|
|
||||||
|
for (uint dx = 0; dx < 3; ++dx)
|
||||||
|
{
|
||||||
|
if (loadX+dx < _Pad.x) mask = false;
|
||||||
|
if (loadX+dx - _Pad.z >= X.width) mask = false;
|
||||||
|
|
||||||
|
for (uint m = 0; m < X.channels/LOAD_DEPTH; ++m)
|
||||||
|
{
|
||||||
|
for (uint q = 0; q < LOAD_DEPTH; ++q)
|
||||||
|
{
|
||||||
|
if (mask)
|
||||||
|
X_[q][id] = X.Get(loadN, loadY+dy-_Pad.y, loadX+dx-_Pad.x, m*LOAD_DEPTH + q);
|
||||||
|
else
|
||||||
|
X_[q][id] = 0;
|
||||||
|
K_[q][id] = K.Get(dy, dx, m*LOAD_DEPTH + q, bx*LOAD_WIDTH + id);
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
|
||||||
|
for (uint yyy = 0; yyy < BLOCK_WIDTH; ++yyy)
|
||||||
|
[unroll] for (uint xxx = 0; xxx < BLOCK_WIDTH; ++xxx)
|
||||||
|
[unroll] for (uint i = 0; i < LOAD_DEPTH; ++i)
|
||||||
|
{
|
||||||
|
v[yyy][xxx] += X_[i][bby*BLOCK_WIDTH + yyy] * K_[i][bbx*BLOCK_WIDTH + xxx];
|
||||||
|
}
|
||||||
|
|
||||||
|
GroupMemoryBarrierWithGroupSync();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[unroll] for (uint yyy = 0; yyy < BLOCK_WIDTH; ++yyy)
|
||||||
|
[unroll] for (uint xxx = 0; xxx < BLOCK_WIDTH; ++xxx)
|
||||||
|
{
|
||||||
|
uint saveNYX = by*LOAD_WIDTH + bby*BLOCK_WIDTH + yyy;
|
||||||
|
uint saveX = saveNYX % width;
|
||||||
|
uint saveNY = saveNYX / width;
|
||||||
|
uint saveY = saveNY % height;
|
||||||
|
uint saveN = saveNY / height;
|
||||||
|
|
||||||
|
uint saveK = bx*LOAD_WIDTH + bbx*BLOCK_WIDTH + xxx;
|
||||||
|
O.Set(saveN, saveY, saveX, saveK, v[yyy][xxx]);
|
||||||
|
}
|
||||||
|
|
||||||
|
#undef X_
|
||||||
|
#undef K_
|
||||||
|
}
|
|
@ -0,0 +1,9 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: c7c673db45e6845d5abaed4ed5ef42e1
|
||||||
|
timeCreated: 1507294253
|
||||||
|
licenseType: Pro
|
||||||
|
ComputeShaderImporter:
|
||||||
|
currentAPIMask: 196608
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,339 @@
|
||||||
|
#pragma kernel ScaleBias
|
||||||
|
#pragma kernel ScaleBias_CNyx
|
||||||
|
#pragma kernel Upsample2D
|
||||||
|
#pragma kernel AvgPool2D
|
||||||
|
#pragma kernel MaxPool2D
|
||||||
|
#pragma kernel AvgPool2D_NoPads
|
||||||
|
#pragma kernel MaxPool2D_NoPads
|
||||||
|
//#pragma kernel MaxPool2D_Pool2x2_NoPads
|
||||||
|
#pragma kernel GlobalAvgPool2D
|
||||||
|
#pragma kernel InstanceNorm
|
||||||
|
#pragma kernel Copy
|
||||||
|
|
||||||
|
#include "Tensor.cginc"
|
||||||
|
|
||||||
|
TENSOR_DECL(X)
|
||||||
|
TENSOR_DECL(W)
|
||||||
|
TENSOR_DECL(B)
|
||||||
|
TENSOR_DECL(WBK)
|
||||||
|
TENSOR_DECL_RW(O)
|
||||||
|
|
||||||
|
uint4 _Pool;
|
||||||
|
uint4 _Stride;
|
||||||
|
uint4 _Pad;
|
||||||
|
float _Alpha;
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void ScaleBias(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, W, B, WBK, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
float bias = B.Get(0, 0, 0, c);
|
||||||
|
float scale = W.Get(0, 0, 0, c);
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = v * scale + bias;
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((16,16,1), (16,8,1), (16,4,1))
|
||||||
|
void ScaleBias_CNyx(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.batch * O.height * O.width, 1);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, W, B, WBK, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint nyx = dispatchThreadID.y;
|
||||||
|
|
||||||
|
uint x = nyx % X.width;
|
||||||
|
uint ny = nyx / X.width;
|
||||||
|
uint y = ny % X.height;
|
||||||
|
uint n = ny / X.height;
|
||||||
|
|
||||||
|
if (c >= X.channels) return;
|
||||||
|
if (n >= X.batch) return;
|
||||||
|
|
||||||
|
float bias = B.Get(0, 0, 0, c);
|
||||||
|
float scale = W.Get(0, 0, 0, c);
|
||||||
|
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = v * scale + bias;
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void Upsample2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
// NOTE: dispatched over X (not O)
|
||||||
|
DISPATCH_ARGS(X.channels, X.width, X.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= X.channels) return;
|
||||||
|
if (x >= X.width) return;
|
||||||
|
if (y >= X.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
|
||||||
|
for (uint dy = 0; dy < _Pool.y; ++dy)
|
||||||
|
for (uint dx = 0; dx < _Pool.x; ++dx)
|
||||||
|
{
|
||||||
|
uint oy = y * _Pool.y + dy;
|
||||||
|
uint ox = x * _Pool.x + dx;
|
||||||
|
O.Set(n, oy, ox, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void MaxPool2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float maxV = -FLT_MAX;
|
||||||
|
for (uint dy = 0; dy < _Pool.y; ++dy)
|
||||||
|
for (uint dx = 0; dx < _Pool.x; ++dx)
|
||||||
|
{
|
||||||
|
uint oy = y * _Stride.y + dy;
|
||||||
|
uint ox = x * _Stride.x + dx;
|
||||||
|
|
||||||
|
bool mask = (oy >= _Pad.y) && (ox >= _Pad.x) && (oy - _Pad.w < X.height) && (ox - _Pad.z < X.width);
|
||||||
|
float v = (mask)? X.Get(n, oy - _Pad.y, ox - _Pad.x, c): 0;
|
||||||
|
maxV = max(v, maxV);
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(n, y, x, c, maxV);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void AvgPool2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float acc = 0;
|
||||||
|
float counter = 0;
|
||||||
|
for (uint dy = 0; dy < _Pool.y; ++dy)
|
||||||
|
for (uint dx = 0; dx < _Pool.x; ++dx)
|
||||||
|
{
|
||||||
|
uint oy = y * _Stride.y + dy;
|
||||||
|
uint ox = x * _Stride.x + dx;
|
||||||
|
|
||||||
|
bool mask = (oy >= _Pad.y) && (ox >= _Pad.x) && (oy - _Pad.w < X.height) && (ox - _Pad.z < X.width);
|
||||||
|
acc += (mask)? X.Get(n, oy - _Pad.y, ox - _Pad.x, c): 0;
|
||||||
|
counter += (mask)? 1: 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
acc /= counter;
|
||||||
|
O.Set(n, y, x, c, acc);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void MaxPool2D_NoPads(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float maxV = -FLT_MAX;
|
||||||
|
for (uint dy = 0; dy < _Pool[1]; ++dy)
|
||||||
|
for (uint dx = 0; dx < _Pool[0]; ++dx)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y * _Stride[1] + dy, x * _Stride[0] + dx, c);
|
||||||
|
maxV = max(v, maxV);
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(n, y, x, c, maxV);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void AvgPool2D_NoPads(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
float invPoolSize = 1.0f / (_Pool[0] * _Pool[1]);
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = 0;
|
||||||
|
for (uint dy = 0; dy < _Pool[1]; ++dy)
|
||||||
|
for (uint dx = 0; dx < _Pool[0]; ++dx)
|
||||||
|
v += X.Get(n, y * _Stride[1] + dy, x * _Stride[0] + dx, c) * invPoolSize;
|
||||||
|
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
//NUMTHREADS((16,4,4), (16,4,2), (16,2,2))
|
||||||
|
void MaxPool2D_Pool2x2_NoPads(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, O.width, O.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v0 = X.Get(n, y*2, x*2, c);
|
||||||
|
float v1 = X.Get(n, y*2+1, x*2, c);
|
||||||
|
float v2 = X.Get(n, y*2, x*2+1, c);
|
||||||
|
float v3 = X.Get(n, y*2+1, x*2+1, c);
|
||||||
|
float v = max(v0, max(v1, max(v2, v3)));
|
||||||
|
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(32,1,1)]
|
||||||
|
void GlobalAvgPool2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, 1, 1);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
//ASSERT(X.batch == O.batch)
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = 0;
|
||||||
|
for (uint y = 0; y < X.height; ++y)
|
||||||
|
for (uint x = 0; x < X.width; ++x)
|
||||||
|
v += X.Get(n, y, x, c);
|
||||||
|
|
||||||
|
v /= (X.height * X.width);
|
||||||
|
O.Set(n, 0, 0, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
[numthreads(64,1,1)]
|
||||||
|
void InstanceNorm(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
DISPATCH_ARGS(O.channels, 1, 1);
|
||||||
|
TENSOR_SHARED2_ARGS4(X, W, B, WBK, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x;
|
||||||
|
if (c >= O.channels) return;
|
||||||
|
//ASSERT(X.shape == O.shape)
|
||||||
|
|
||||||
|
float gamma = W.Get(0, 0, 0, c);
|
||||||
|
float beta = B.Get(0, 0, 0, c);
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
uint x, y;
|
||||||
|
// calc mean
|
||||||
|
float acc = 0;
|
||||||
|
for (y = 0; y < O.height; ++y)
|
||||||
|
for (x = 0; x < O.width; ++x)
|
||||||
|
acc += X.Get(n, y, x, c);
|
||||||
|
float mean = acc / (O.width * O.height);
|
||||||
|
|
||||||
|
// calc variance
|
||||||
|
acc = 0;
|
||||||
|
for (y = 0; y < O.height; ++y)
|
||||||
|
for (x = 0; x < O.width; ++x)
|
||||||
|
{
|
||||||
|
float delta = X.Get(n, y, x, c) - mean;
|
||||||
|
acc += delta * delta;
|
||||||
|
}
|
||||||
|
float var = acc / (O.width * O.height);
|
||||||
|
|
||||||
|
// normalization factor
|
||||||
|
float invNormFactor = 1 / sqrt(var + FLT_EPSILON);
|
||||||
|
|
||||||
|
float scale = gamma * invNormFactor;
|
||||||
|
float bias = beta - gamma * mean * invNormFactor;
|
||||||
|
|
||||||
|
// apply normalization
|
||||||
|
for (y = 0; y < O.height; ++y)
|
||||||
|
for (x = 0; x < O.width; ++x)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
v = v * scale + bias;
|
||||||
|
O.Set(n, y, x, c, v);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
NUMTHREADS((4,8,8), (4,8,4), (4,4,4))
|
||||||
|
void Copy(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
// NOTE: dispatched over X (not O)
|
||||||
|
DISPATCH_ARGS(X.channels, X.width, X.height);
|
||||||
|
TENSOR_ARGS2(X, O);
|
||||||
|
|
||||||
|
uint c = dispatchThreadID.x; uint x = dispatchThreadID.y; uint y = dispatchThreadID.z;
|
||||||
|
if (c >= X.channels) return; if (x >= X.width) return; if (y >= X.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < X.batch; ++n)
|
||||||
|
{
|
||||||
|
float v = X.Get(n, y, x, c);
|
||||||
|
O.Set(n + _Pad[0], y + _Pad[1], x + _Pad[2], c + _Pad[3], v);
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,9 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 62f5efacd43b24dd38ead3ce0d80cc34
|
||||||
|
timeCreated: 1495527718
|
||||||
|
licenseType: Pro
|
||||||
|
ComputeShaderImporter:
|
||||||
|
currentAPIMask: 196608
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,70 @@
|
||||||
|
|
||||||
|
// Based on: https://stackoverflow.com/questions/5149544/can-i-generate-a-random-number-inside-a-pixel-shader
|
||||||
|
// Output: Random number: [0,1), that is between 0.0 and 0.999999... inclusive.
|
||||||
|
// Author: Michael Pohoreski
|
||||||
|
// Copyright: Copyleft 2012 :-)
|
||||||
|
float RandomUsingCos(float4 seed)
|
||||||
|
{
|
||||||
|
float4 K1 = float4( // Transcendental numbers:
|
||||||
|
0.64341054629, // (Cahen's constant)
|
||||||
|
23.14069263277926, // e^pi (Gelfond's constant)
|
||||||
|
2.665144142690225, // 2^sqrt(2) (Gelfond-Schneider constant)
|
||||||
|
3.14159265359 // pi
|
||||||
|
);
|
||||||
|
return frac(cos(dot(seed, K1)) * 12345.6789);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Based on: https://stackoverflow.com/questions/4200224/random-noise-functions-for-glsl
|
||||||
|
// Author: Spatial
|
||||||
|
// 05 July 2013
|
||||||
|
|
||||||
|
// A single iteration of Bob Jenkins' One-At-A-Time hashing algorithm.
|
||||||
|
uint hash(uint x)
|
||||||
|
{
|
||||||
|
x += ( x << 10u );
|
||||||
|
x ^= ( x >> 6u );
|
||||||
|
x += ( x << 3u );
|
||||||
|
x ^= ( x >> 11u );
|
||||||
|
x += ( x << 15u );
|
||||||
|
return x;
|
||||||
|
}
|
||||||
|
uint hash( uint2 v ) { return hash( v.x ^ hash(v.y) ); }
|
||||||
|
uint hash( uint3 v ) { return hash( v.x ^ hash(v.y) ^ hash(v.z) ); }
|
||||||
|
uint hash( uint4 v ) { return hash( v.x ^ hash(v.y) ^ hash(v.z) ^ hash(v.w) ); }
|
||||||
|
|
||||||
|
// Construct a float with half-open range [0:1] using low 23 bits.
|
||||||
|
// All zeroes yields 0.0, all ones yields the next smallest representable value below 1.0.
|
||||||
|
float floatConstruct(uint m)
|
||||||
|
{
|
||||||
|
const uint ieeeMantissa = 0x007FFFFFu; // binary32 mantissa bitmask
|
||||||
|
const uint ieeeOne = 0x3F800000u; // 1.0 in IEEE binary32
|
||||||
|
|
||||||
|
m &= ieeeMantissa; // Keep only mantissa bits (fractional part)
|
||||||
|
m |= ieeeOne; // Add fractional part to 1.0
|
||||||
|
|
||||||
|
float f = asfloat(m); // Range [1:2]
|
||||||
|
return f - 1.0; // Range [0:1]
|
||||||
|
}
|
||||||
|
|
||||||
|
// Pseudo-random value in half-open range [0:1].
|
||||||
|
float RandomUsingHash(float4 seed)
|
||||||
|
{
|
||||||
|
return floatConstruct(hash(asuint(seed)));
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
// More alternatives:
|
||||||
|
// https://github.com/ashima/webgl-noise
|
||||||
|
// https://www.shadertoy.com/view/4djSRW
|
||||||
|
|
||||||
|
// ------------------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
float Random(float4 seed)
|
||||||
|
{
|
||||||
|
return RandomUsingCos(seed);
|
||||||
|
}
|
||||||
|
|
||||||
|
float Bernoulli(float4 seed, float p)
|
||||||
|
{
|
||||||
|
return Random(seed) <= p ? 1: 0;
|
||||||
|
}
|
|
@ -0,0 +1,10 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 5a17e0b3943a74564a02a8ed0a41228b
|
||||||
|
timeCreated: 1520855309
|
||||||
|
licenseType: Pro
|
||||||
|
ShaderImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
defaultTextures: []
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,311 @@
|
||||||
|
#define BARRACUDA_MAX_THREAD_COUNT 64
|
||||||
|
#if (BARRACUDA_MAX_THREAD_COUNT>=256)
|
||||||
|
#define NUMTHREADS(t256,t128,t64) [numthreads t256]
|
||||||
|
#define NUMTHREAD(t256, t128, t64) t256
|
||||||
|
#elif (BARRACUDA_MAX_THREAD_COUNT>=128)
|
||||||
|
#define NUMTHREADS(t256,t128,t64) [numthreads t128]
|
||||||
|
#define NUMTHREAD(t256,t128,t64) t128
|
||||||
|
#elif (BARRACUDA_MAX_THREAD_COUNT>=64)
|
||||||
|
#define NUMTHREADS(t256,t128,t64) [numthreads t64]
|
||||||
|
#define NUMTHREAD(t256,t128,t64) t64
|
||||||
|
#endif
|
||||||
|
|
||||||
|
struct Tensor
|
||||||
|
{
|
||||||
|
// @TODO: actually uint seems not like a good idea anymore, consider going to int
|
||||||
|
uint batch, height, width, channels;
|
||||||
|
|
||||||
|
void Init(uint4 nhwc)
|
||||||
|
{
|
||||||
|
batch = nhwc.x;
|
||||||
|
height = nhwc.y;
|
||||||
|
width = nhwc.z;
|
||||||
|
channels = nhwc.w;
|
||||||
|
}
|
||||||
|
|
||||||
|
uint4 Dims()
|
||||||
|
{
|
||||||
|
return uint4(batch, height, width, channels);
|
||||||
|
}
|
||||||
|
uint GetFlatHeight()
|
||||||
|
{
|
||||||
|
return batch;
|
||||||
|
}
|
||||||
|
uint GetFlatWidth()
|
||||||
|
{
|
||||||
|
return height * width * channels;
|
||||||
|
}
|
||||||
|
uint GetKernelHeight()
|
||||||
|
{
|
||||||
|
// kernels storage: {kernel_width * kernel_height * kernel_channels * kernel_count}
|
||||||
|
uint kernelHeight = batch;
|
||||||
|
return kernelHeight;
|
||||||
|
}
|
||||||
|
uint GetKernelWidth()
|
||||||
|
{
|
||||||
|
// kernels storage: {kernel_width * kernel_height * kernel_channels * kernel_count}
|
||||||
|
uint kernelWidth = height;
|
||||||
|
return kernelWidth;
|
||||||
|
}
|
||||||
|
|
||||||
|
uint Index(uint b, uint h, uint w, uint ch)
|
||||||
|
{
|
||||||
|
uint index =
|
||||||
|
b * height * width * channels +
|
||||||
|
h * width * channels +
|
||||||
|
w * channels +
|
||||||
|
ch;
|
||||||
|
return index;
|
||||||
|
}
|
||||||
|
|
||||||
|
uint Index(uint b, uint i)
|
||||||
|
{
|
||||||
|
uint index =
|
||||||
|
b * height * width * channels +
|
||||||
|
i;
|
||||||
|
return index;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
struct ReadonlyTensor : Tensor
|
||||||
|
{
|
||||||
|
StructuredBuffer<float> data;
|
||||||
|
|
||||||
|
void Init(uint4 nhwc, StructuredBuffer<float> data_)
|
||||||
|
{
|
||||||
|
Tensor::Init(nhwc);
|
||||||
|
data = data_;
|
||||||
|
}
|
||||||
|
|
||||||
|
float Get(uint b, uint h, uint w, uint ch)
|
||||||
|
{
|
||||||
|
return data[Index(b,h,w,ch)];
|
||||||
|
}
|
||||||
|
float Get(uint b, uint2 pos, uint ch)
|
||||||
|
{
|
||||||
|
return data[Index(b, pos.y, pos.x, ch)];
|
||||||
|
}
|
||||||
|
float Get(uint b, uint i)
|
||||||
|
{
|
||||||
|
return data[Index(b,i)];
|
||||||
|
}
|
||||||
|
float Get(uint i)
|
||||||
|
{
|
||||||
|
return data[i];
|
||||||
|
}
|
||||||
|
|
||||||
|
float BroadcastGet(uint b, uint h, uint w, uint ch)
|
||||||
|
{
|
||||||
|
return Get(b % batch, h % height, w % width, ch % channels);
|
||||||
|
}
|
||||||
|
float BroadcastGet(uint b, uint2 pos, uint ch)
|
||||||
|
{
|
||||||
|
return BroadcastGet(b, pos.y, pos.x, ch);
|
||||||
|
}
|
||||||
|
float BroadcastGet(uint b, uint i)
|
||||||
|
{
|
||||||
|
return Get(b % GetFlatHeight(), i % GetFlatWidth());
|
||||||
|
}
|
||||||
|
|
||||||
|
float SafeGet(uint b, uint2 pos, uint ch, uint2 pad)
|
||||||
|
{
|
||||||
|
if (b >= batch || ch >= channels) return 0;
|
||||||
|
|
||||||
|
if (any(pos < pad)) return 0;
|
||||||
|
if (any(pos >= uint2(width, height) + pad)) return 0;
|
||||||
|
pos -= pad;
|
||||||
|
|
||||||
|
return data[Index(b, pos.y, pos.x, ch)];
|
||||||
|
}
|
||||||
|
float SafeGet(uint b, uint h, uint w, uint ch, uint2 pad)
|
||||||
|
{
|
||||||
|
return SafeGet(b, uint2(w, h), ch, pad);
|
||||||
|
}
|
||||||
|
float SafeGet(uint b, uint i)
|
||||||
|
{
|
||||||
|
if (b >= batch || i >= height * width * channels) return 0;
|
||||||
|
return Get(b,i);
|
||||||
|
}
|
||||||
|
float SafeGet(uint i)
|
||||||
|
{
|
||||||
|
if (i >= batch * height * width * channels) return 0;
|
||||||
|
return Get(i);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
struct ReadWriteTensor : Tensor
|
||||||
|
{
|
||||||
|
RWStructuredBuffer<float> data;
|
||||||
|
|
||||||
|
void Init(int4 nhwc, RWStructuredBuffer<float> data_)
|
||||||
|
{
|
||||||
|
Tensor::Init(nhwc);
|
||||||
|
data = data_;
|
||||||
|
}
|
||||||
|
|
||||||
|
float Get(uint b, uint h, uint w, uint ch)
|
||||||
|
{
|
||||||
|
return data[Index(b,h,w,ch)];
|
||||||
|
}
|
||||||
|
float Get(uint b, uint2 pos, uint ch)
|
||||||
|
{
|
||||||
|
return data[Index(b, pos.y, pos.x, ch)];
|
||||||
|
}
|
||||||
|
float Get(uint b, uint i)
|
||||||
|
{
|
||||||
|
return data[Index(b,i)];
|
||||||
|
}
|
||||||
|
float Get(uint i)
|
||||||
|
{
|
||||||
|
return data[i];
|
||||||
|
}
|
||||||
|
|
||||||
|
float BroadcastGet(uint b, uint h, uint w, uint ch)
|
||||||
|
{
|
||||||
|
return Get(b % batch, h % height, w % width, ch % channels);
|
||||||
|
}
|
||||||
|
float BroadcastGet(uint b, uint2 pos, uint ch)
|
||||||
|
{
|
||||||
|
return BroadcastGet(b, pos.y, pos.x, ch);
|
||||||
|
}
|
||||||
|
float BroadcastGet(uint b, uint i)
|
||||||
|
{
|
||||||
|
return Get(b % GetFlatHeight(), i % GetFlatWidth());
|
||||||
|
}
|
||||||
|
|
||||||
|
float SafeGet(uint b, uint2 pos, uint ch, uint2 pad)
|
||||||
|
{
|
||||||
|
if (b >= batch || ch >= channels) return 0;
|
||||||
|
|
||||||
|
if (any(pos < pad)) return 0;
|
||||||
|
if (any(pos >= uint2(width, height) + pad)) return 0;
|
||||||
|
pos -= pad;
|
||||||
|
|
||||||
|
return Get(b, pos.y, pos.x, ch);
|
||||||
|
}
|
||||||
|
float SafeGet(uint b, uint h, uint w, uint ch, uint2 pad)
|
||||||
|
{
|
||||||
|
return SafeGet(b, uint2(w, h), ch, pad);
|
||||||
|
}
|
||||||
|
float SafeGet(uint b, uint i)
|
||||||
|
{
|
||||||
|
if (b >= batch || i >= height * width * channels) return 0;
|
||||||
|
return Get(b,i);
|
||||||
|
}
|
||||||
|
float SafeGet(uint i)
|
||||||
|
{
|
||||||
|
if (i >= batch * height * width * channels) return 0;
|
||||||
|
return Get(i);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
void Set(uint b, uint h, uint w, uint ch, float v)
|
||||||
|
{
|
||||||
|
data[Index(b,h,w,ch)] = v;
|
||||||
|
}
|
||||||
|
void Set(uint y, uint x, float v)
|
||||||
|
{
|
||||||
|
data[Index(y,x)] = v;
|
||||||
|
}
|
||||||
|
void Set(uint i, float v)
|
||||||
|
{
|
||||||
|
data[i] = v;
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
struct SharedTensor : Tensor
|
||||||
|
{
|
||||||
|
StructuredBuffer<float> data;
|
||||||
|
uint offset;
|
||||||
|
|
||||||
|
void Init(uint4 nhwc, uint4 info, StructuredBuffer<float> data_)
|
||||||
|
{
|
||||||
|
Tensor::Init(nhwc);
|
||||||
|
data = data_;
|
||||||
|
offset = info.x;
|
||||||
|
}
|
||||||
|
|
||||||
|
float Get(uint b, uint h, uint w, uint ch)
|
||||||
|
{
|
||||||
|
return data[Index(b,h,w,ch) + offset];
|
||||||
|
}
|
||||||
|
float Get(uint b, uint2 pos, uint ch)
|
||||||
|
{
|
||||||
|
return Get(b, pos.y, pos.x, ch);
|
||||||
|
}
|
||||||
|
float Get(uint b, uint i)
|
||||||
|
{
|
||||||
|
return data[Index(b,i) + offset];
|
||||||
|
}
|
||||||
|
float Get(uint i)
|
||||||
|
{
|
||||||
|
return data[i + offset];
|
||||||
|
}
|
||||||
|
|
||||||
|
float BroadcastGet(uint b, uint h, uint w, uint ch)
|
||||||
|
{
|
||||||
|
return Get(b % batch, h % height, w % width, ch % channels);
|
||||||
|
}
|
||||||
|
float BroadcastGet(uint b, uint2 pos, uint ch)
|
||||||
|
{
|
||||||
|
return BroadcastGet(b, pos.y, pos.x, ch);
|
||||||
|
}
|
||||||
|
float BroadcastGet(uint b, uint i)
|
||||||
|
{
|
||||||
|
return Get(b % GetFlatHeight(), i % GetFlatWidth());
|
||||||
|
}
|
||||||
|
|
||||||
|
float SafeGet(uint b, uint2 pos, uint ch, uint2 pad)
|
||||||
|
{
|
||||||
|
if (b >= batch || ch >= channels) return 0;
|
||||||
|
|
||||||
|
if (any(pos < pad)) return 0;
|
||||||
|
if (any(pos >= uint2(width, height) + pad)) return 0;
|
||||||
|
pos -= pad;
|
||||||
|
|
||||||
|
return Get(b, pos, ch);
|
||||||
|
}
|
||||||
|
float SafeGet(uint b, uint h, uint w, uint ch, uint2 pad)
|
||||||
|
{
|
||||||
|
return SafeGet(b, uint2(w, h), ch, pad);
|
||||||
|
}
|
||||||
|
float SafeGet(uint b, uint i)
|
||||||
|
{
|
||||||
|
if (b >= batch || i >= height * width * channels) return 0;
|
||||||
|
return Get(b,i);
|
||||||
|
}
|
||||||
|
float SafeGet(uint i)
|
||||||
|
{
|
||||||
|
if (i >= batch * height * width * channels) return 0;
|
||||||
|
return Get(i);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
#define TENSOR_DECL(X) uint4 X##decl[2]; StructuredBuffer<float> X##data;
|
||||||
|
#define TENSOR_DECL_RW(X) uint4 X ## decl[2]; RWStructuredBuffer<float> X ## data;
|
||||||
|
|
||||||
|
#define TENSOR_ARG(X) ReadonlyTensor X; X##.Init(X##decl[0], X##data); // readonly
|
||||||
|
#define TENSOR_MODEL(X) SharedTensor X; X##.Init(X##decl[0], X##decl[1], X##data); // RO w offset
|
||||||
|
#define TENSOR_ARG_RW(X) ReadWriteTensor X; X##.Init(X##decl[0], X##data);
|
||||||
|
|
||||||
|
#define TENSOR_ARGS2(X, O) TENSOR_ARG(X); TENSOR_ARG_RW(O);
|
||||||
|
#define TENSOR_ARGS3(X, A, O) TENSOR_ARG(X); TENSOR_MODEL(A); TENSOR_ARG_RW(O);
|
||||||
|
#define TENSOR_ARGS4(X, A, B, O) TENSOR_ARG(X); TENSOR_MODEL(A); TENSOR_MODEL(B); TENSOR_ARG_RW(O);
|
||||||
|
|
||||||
|
// shared model tensors
|
||||||
|
#define TENSOR_SHARED_MODEL(X, S) SharedTensor X; X##.Init(X##decl[0], X##decl[1], S##data);
|
||||||
|
#define TENSOR_SHARED2_ARGS4(X, A, B, S, O) TENSOR_ARG(X); TENSOR_SHARED_MODEL(A, S); TENSOR_SHARED_MODEL(B, S); TENSOR_ARG_RW(O);
|
||||||
|
|
||||||
|
|
||||||
|
// purely informational - declares contract between caller of Dispatch() and kernel
|
||||||
|
#define DISPATCH_ARGS(threadGroupsX, threadGroupsY, threadGroupsZ)
|
||||||
|
|
||||||
|
|
||||||
|
// @TODO: move into more appropriate file
|
||||||
|
#define FLT_MAX 3.402823466e+38F
|
||||||
|
#define FLT_EPSILON 1e-6
|
||||||
|
|
||||||
|
float fastfma(float a, float b, float c)
|
||||||
|
{
|
||||||
|
return dot(float2(a,c), float2(b, 1));
|
||||||
|
}
|
|
@ -0,0 +1,9 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 5761abd87a16940b2a81aaa755787fc9
|
||||||
|
timeCreated: 1506540305
|
||||||
|
licenseType: Pro
|
||||||
|
ShaderImporter:
|
||||||
|
defaultTextures: []
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,99 @@
|
||||||
|
#pragma kernel TexConv2D
|
||||||
|
|
||||||
|
#include "Tensor.cginc"
|
||||||
|
|
||||||
|
TENSOR_DECL(X)
|
||||||
|
TENSOR_DECL(K)
|
||||||
|
TENSOR_DECL(B)
|
||||||
|
TENSOR_DECL(WBK)
|
||||||
|
TENSOR_DECL_RW(O)
|
||||||
|
|
||||||
|
uint4 _Pad;
|
||||||
|
uint4 _Stride;
|
||||||
|
|
||||||
|
struct TextureAsTensor : Tensor
|
||||||
|
{
|
||||||
|
Texture2D<float4> tex;
|
||||||
|
SamplerState smp;
|
||||||
|
|
||||||
|
Texture2DArray<float4> texArray;
|
||||||
|
SamplerState smpArray;
|
||||||
|
|
||||||
|
void Init(uint4 nhwc, Texture2D<float4> tex_, SamplerState sampler_, Texture2DArray<float4> texArray_, SamplerState samplerArray_)
|
||||||
|
{
|
||||||
|
Tensor::Init(nhwc);
|
||||||
|
tex = tex_;
|
||||||
|
smp = sampler_;
|
||||||
|
texArray = texArray_;
|
||||||
|
smpArray = samplerArray_;
|
||||||
|
}
|
||||||
|
|
||||||
|
float4 Get(uint b, uint y, uint x)
|
||||||
|
{
|
||||||
|
float3 loc = float3((float)x / (float)width, (float)y / (float)height, b);
|
||||||
|
if (batch > 1)
|
||||||
|
return texArray.SampleLevel(smpArray, loc, 0);
|
||||||
|
else
|
||||||
|
return tex.SampleLevel(smp, loc.xy, 0);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
#define TENSOR_SHARED2_ARGS3(A, B, S, O) TENSOR_SHARED_ARG(A, S); TENSOR_SHARED_ARG(B, S); TENSOR_ARG_RW(O);
|
||||||
|
Texture2DArray<float4> Xtex2DArray;
|
||||||
|
Texture2D<float4> Xtex2D;
|
||||||
|
SamplerState samplerXtex2D { Filter = MIN_MAG_LINEAR_MIP_POINT; AddressU = Clamp; AddressV = Clamp; };
|
||||||
|
SamplerState samplerXtex2DArray { Filter = MIN_MAG_LINEAR_MIP_POINT; AddressU = Clamp; AddressV = Clamp; };
|
||||||
|
|
||||||
|
#define MAX_CHANNELS 4
|
||||||
|
|
||||||
|
NUMTHREADS((16,4,4), (16,4,2), (16,2,2))
|
||||||
|
void TexConv2D(uint3 dispatchThreadID : SV_DispatchThreadID)
|
||||||
|
{
|
||||||
|
// @TODO: currently it fails to compile, needs to be investigated
|
||||||
|
#if 0
|
||||||
|
DISPATCH_ARGS(K.kernelCount, O.width, O.height);
|
||||||
|
TextureAsTensor X; X.Init(Xdecl[0], Xtex2D, samplerXtex2D, Xtex2DArray, samplerXtex2DArray);
|
||||||
|
|
||||||
|
TENSOR_SHARED_ARG(K, WBK);
|
||||||
|
TENSOR_SHARED_ARG(B, WBK);
|
||||||
|
TENSOR_ARG_RW(O);
|
||||||
|
|
||||||
|
// ASSERT(X.channels <= MAX_CHANNELS)
|
||||||
|
|
||||||
|
uint k = dispatchThreadID.x;
|
||||||
|
uint x = dispatchThreadID.y;
|
||||||
|
uint y = dispatchThreadID.z;
|
||||||
|
|
||||||
|
if (k >= K.channels) return;
|
||||||
|
if (x >= O.width) return;
|
||||||
|
if (y >= O.height) return;
|
||||||
|
|
||||||
|
for (uint n = 0; n < O.batch; ++n)
|
||||||
|
{
|
||||||
|
float acc = B.Get(k);
|
||||||
|
for (uint dy = 0; dy < K.GetKernelHeight(); ++dy)
|
||||||
|
{
|
||||||
|
for (uint dx = 0; dx < K.GetKernelWidth(); ++dx)
|
||||||
|
{
|
||||||
|
uint oy = y * _Stride.y + dy;
|
||||||
|
uint ox = x * _Stride.x + dx;
|
||||||
|
|
||||||
|
// @TODO: investigate
|
||||||
|
// WARNING: had to move both y check into the loop (as opposed to checking y in parent loop) - due to potential bug in Metal compiler
|
||||||
|
if (oy < _Pad.y) continue;
|
||||||
|
if (oy - _Pad.w >= X.height) continue;
|
||||||
|
if (ox < _Pad.x) continue;
|
||||||
|
if (ox - _Pad.z >= X.width) continue;
|
||||||
|
|
||||||
|
float4 in4channels = X.Get(n, oy - _Pad.y, ox - _Pad.x);
|
||||||
|
for (uint c = 0; c < X.channels && c < MAX_CHANNELS; ++c)
|
||||||
|
{
|
||||||
|
acc += in4channels[c] * K.Get(dy, dx, c, k);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
O.Set(n, y, x, k, acc);
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
}
|
|
@ -0,0 +1,9 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 85d38d76f835143f797bca1481285596
|
||||||
|
timeCreated: 1507637303
|
||||||
|
licenseType: Pro
|
||||||
|
ComputeShaderImporter:
|
||||||
|
currentAPIMask: 196608
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,6 @@
|
||||||
|
Barracuda cross-platform Neural Net engine copyright © 2018 Unity Technologies ApS
|
||||||
|
|
||||||
|
Licensed under the Unity Companion License for Unity-dependent projects--see [Unity Companion License](http://www.unity3d.com/legal/licenses/Unity_Companion_License).
|
||||||
|
|
||||||
|
Unless expressly provided otherwise, the Software under this license is made available strictly on an “AS IS” BASIS WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. Please review the license for details on these and other terms and conditions.
|
||||||
|
|
|
@ -0,0 +1,7 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: dcc5ce8caa7664f8090ef0103a208c6e
|
||||||
|
TextScriptImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,82 @@
|
||||||
|
# Release notes
|
||||||
|
|
||||||
|
## 0.1.6
|
||||||
|
- Added activation type print in verbose mode
|
||||||
|
- Added fast and parallel CPU implementation for Swish, Relu, Add, Sub, Div, Min, Max, Tanh, Exp
|
||||||
|
- Removed duplicate profiler blocks for ops
|
||||||
|
- Improved scheduling on CPU for small batches of data
|
||||||
|
- Fixed compatibility with Unity 2019.2.x
|
||||||
|
|
||||||
|
## 0.1.5
|
||||||
|
- Added Transpose, MatMul and Indentity layer support for models exported from ONNX.
|
||||||
|
- Added BasicLSTM layer support for models exported from TF. Limited set of LSTM networks should work now.
|
||||||
|
- Added DepthwiseConv2D layer support. Most of the networks based on the MobileNet should work now.
|
||||||
|
- Added OneHot layer support for models exported from TF.
|
||||||
|
- Added optimized path for Conv2D, Dense and Transpose layers with single batch executions. Performance gain up to 100%.
|
||||||
|
- Fixed FMA performance issue on Metal GFX platforms.
|
||||||
|
- Added fast optimized path for Sigmoid and Mul layers on CPU.
|
||||||
|
- Fixed issue when worker is executed with different batch sizes.
|
||||||
|
- Added ``pip`` requirements file for Python dependencies, check ``Tools/requirements.txt```.
|
||||||
|
- Added proof of concept Docker wrappers for running model conversion inside of Docker container. Check ``Tools/docker-tensorflow-to-barracuda.sh`` and ``Tools/docker-onnx-to-barracuda.sh``. Currently it was tested only on Mac host.
|
||||||
|
- Refactored model importers for easier integration with ML Agents.
|
||||||
|
- Fixed input shape determination for Keras sequential model.
|
||||||
|
- Added metadata about input shapes to model. Look for ``Model.GetShapeByName()``.
|
||||||
|
- Added API to query constant Tensors embedded into network, look for ``Model.GetTensorByName()``.
|
||||||
|
- Added reference implementations for Selu, Abs, Neg, Ceil, Floor, Clip, Rcp, Log layers.
|
||||||
|
- Added support for Mean, Square, StridedSlice and Border2D layers.
|
||||||
|
- Added support for Swish activation, now it is automatically detected in models.
|
||||||
|
- Fixed Tanh NaN issue when large argument is passed.
|
||||||
|
- RandomNormal and RandomUniform now supports either embedded shape constant OR previous tensor shape for input.
|
||||||
|
- Fixed Keras/TF/ONNX FusedBatchNorm/BatchNorm import and now it takes ``epsilon`` into account.
|
||||||
|
- Now Barracuda will fallback to CSharpFast if compute shaders are not supported on the current platform.
|
||||||
|
- Improved compute kernel interop on Android.
|
||||||
|
- Implemented Pix2Pix model (.pict) importer.
|
||||||
|
|
||||||
|
## 0.1.4
|
||||||
|
- Implemented fast Conv2DTrans. Useful for GAN type networks.
|
||||||
|
- Fixed few ComputeBuffer handling issues.
|
||||||
|
- Simplified way to pass texture via ``Tensor`` constructor.
|
||||||
|
- Documentation improvements.
|
||||||
|
- Added Unity Companion License as part of distribution.
|
||||||
|
- Fixed boundary checks for Compute Copy/Concat operations.
|
||||||
|
- Improved profiling experience, now each layer will be reported separately in Unity Profiler.
|
||||||
|
- Fixed Broadcast layer support in ``ModelAnalyzer``.
|
||||||
|
- Exp, Pow and other layers are now also implemented in Compute. Improves RL model inference performance on GPU.
|
||||||
|
- Added platform specific BLAS plugin support. Out of the box Barracuda ships with Apple Accelerate framework support for iOS and macOS.
|
||||||
|
- Added Burst BLAS plugin, greatly improves performance in Unity Editor where native OS BLAS is not available. It's packaged as separate package and requires to have Burst enabled.
|
||||||
|
- Improved memory handling, now less GC allocations should be made per inference execution.
|
||||||
|
|
||||||
|
## 0.1.3
|
||||||
|
- Improved Barracuda support for Unity Profiler.
|
||||||
|
- Cleaned up Barracuda APIs.
|
||||||
|
- Added direct ``Texture`` input support. Look for ``TextureAsTensorData``. The following types of texture supported as input: ``Texture2D``, ``Texture2DArray``, ``Texture3D``, ``RenderTexture``.
|
||||||
|
- Added ``Tensor`` to ``RenderTexture`` conversion. Look for ``TensorToRenderTexture``.
|
||||||
|
- Autoencoder type networks can run completely on GPU now. Data roundtrip via CPU is not necessary anymore.
|
||||||
|
- Vertical flip is applied when converting between ``Texture`` and ``Tensor`` to match conventionts. To override this behavior look for ``TextureAsTensorData.Flip`` enum.
|
||||||
|
- Removed direct reference to WebCamTexture, now Barracuda compiles for Console targets.
|
||||||
|
- Fixed _Conv2DTranspose_ layer support. Now GANs using _Conv2DTranspose_ work properly.
|
||||||
|
- Added automated test for pix2pix GAN.
|
||||||
|
|
||||||
|
## 0.1.2
|
||||||
|
- Barracuda now is also available as preview package. Look for ``com.unity.barracuda`` in https://staging-packages.unity.com registry.
|
||||||
|
- Conv2D layers are now *up to 30x faster* with ``CSharpFast`` backend (``ComputeFast`` remains best backend for convolutional networks).
|
||||||
|
- Added profiler sample for ``Fetch()``.
|
||||||
|
- Fixed compilation issues on Xbox One.
|
||||||
|
- TexConv2D support was temporary disabled.
|
||||||
|
- Barracuda logging now can be configured via static fields of ``Barracuda.D`` class, it allows both disable specific logging levels or just disable stack trace collection (helps with performance when profiling).
|
||||||
|
- Compute Concat implementation now will fall back to C# implementation instead of throwing exception when unsupported configuration is encountered.
|
||||||
|
- Fixed several ``ComputeBuffer`` release issues.
|
||||||
|
- Added constructor for ``Tensor`` that allows to pass in data array.
|
||||||
|
- Improved Flatten handling in TensorFlow models.
|
||||||
|
- Added helper func ``ModelLoader.LoadFromStreamingAssets``.
|
||||||
|
- Fixed .meta file packaging.
|
||||||
|
- Small docs improvements.
|
||||||
|
- Fixed unnecessary patching of Activation layers in ``ModelLoader``.
|
||||||
|
- Added output trimming at run-time. See for extra parameters Worker factory.
|
||||||
|
|
||||||
|
## 0.1.1
|
||||||
|
- First internal realease as drop-in package
|
||||||
|
- Compatibility with ML Agents models: 3DBall, PushBlock, GridWorld, Soccer.
|
||||||
|
|
||||||
|
## 0.1.0
|
||||||
|
- First internal build. Due some bugs encountered wasn't published.
|
|
@ -0,0 +1,7 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: a129912fffc9d4ab3b5ae110be67a669
|
||||||
|
TextScriptImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,8 @@
|
||||||
|
{
|
||||||
|
"name": "com.unity.barracuda",
|
||||||
|
"displayName": "Barracuda",
|
||||||
|
"version": "0.1.6-preview",
|
||||||
|
"unity": "2017.4",
|
||||||
|
"description": "Barracuda is lightweight and cross-platform Neural Net inference library. Barracuda supports inference both on GPU and CPU.",
|
||||||
|
"dependencies": {}
|
||||||
|
}
|
|
@ -0,0 +1,7 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 73ae2d877fd444b04b5b6ef591d3fa0e
|
||||||
|
TextScriptImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: a69633ced4cc74b0d9a9af7e6f27e92d
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,3 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 7621aa5732574c9689c6603d4f50331b
|
||||||
|
timeCreated: 1548470002
|
|
@ -0,0 +1,19 @@
|
||||||
|
using System;
|
||||||
|
using Unity.Entities;
|
||||||
|
using Unity.Mathematics;
|
||||||
|
|
||||||
|
namespace ECS_MLAgents_v0.Core
|
||||||
|
{
|
||||||
|
/*
|
||||||
|
* This is the Agent Component, it contains information specific to the Agent such as the
|
||||||
|
* reward signal and the done flag.
|
||||||
|
*/
|
||||||
|
[Serializable]
|
||||||
|
public struct Agent : IComponentData
|
||||||
|
{
|
||||||
|
// TODO : Add the Agent IComponentData to the appropriate Entities before the first
|
||||||
|
// decision pass
|
||||||
|
public float3 Reward;
|
||||||
|
// public bool1 Done; // TODO : bool is not blittable
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,3 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: f701588218d34109a497b0deab92af6b
|
||||||
|
timeCreated: 1548470131
|
|
@ -0,0 +1,10 @@
|
||||||
|
using Unity.Entities;
|
||||||
|
|
||||||
|
namespace ECS_MLAgents_v0.Core
|
||||||
|
{
|
||||||
|
/*
|
||||||
|
* This is the ComponentDataWrapper for the Agent Component. It allows to attach an Agent Component
|
||||||
|
* to a GameObject in the Unity Editor.
|
||||||
|
*/
|
||||||
|
public class AgentComponent : ComponentDataWrapper<Agent> { }
|
||||||
|
}
|
|
@ -0,0 +1,3 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 4f2a8abc5e5549439b29a0f9cbb7776b
|
||||||
|
timeCreated: 1548382152
|
|
@ -0,0 +1,228 @@
|
||||||
|
using System.Linq;
|
||||||
|
using Unity.Collections;
|
||||||
|
using Unity.Collections.LowLevel.Unsafe;
|
||||||
|
using Unity.Entities;
|
||||||
|
using Unity.Jobs;
|
||||||
|
using UnityEngine;
|
||||||
|
|
||||||
|
namespace ECS_MLAgents_v0.Core
|
||||||
|
{
|
||||||
|
|
||||||
|
/*
|
||||||
|
* AgentSystem<Sensor, Actuator> is a JobComponentSystem that updates the Actuator based of
|
||||||
|
* the data present in Sensor for all of the compatible Entities. The user can create a new
|
||||||
|
* AgentSystem by defining a class this way :
|
||||||
|
*
|
||||||
|
* public class MyAgentSystem : AgentSystem<MySensor, MyActuator> { }
|
||||||
|
*
|
||||||
|
* The user can modify properties of MyAgentSystem to modify which Entities will be
|
||||||
|
* affected by MyAgentSystem.
|
||||||
|
*
|
||||||
|
* To access the instance of MyAgentSystem, use :
|
||||||
|
*
|
||||||
|
* World.Active.GetExistingManager<MyAgentSystem>();
|
||||||
|
*
|
||||||
|
* It is the responsibility of the user to create and populate
|
||||||
|
* the MySensor of each Entity as well as create and use the data in the MyActuator of each
|
||||||
|
* Entity. MySensor and MyActuator must be IComponentData struct that only contains blittable
|
||||||
|
* float fields
|
||||||
|
* Note that an Agent IComponentData must be attached to a Entity to be affected by
|
||||||
|
* MyAgentSystem.
|
||||||
|
*
|
||||||
|
* At each call to OnUpdate, the Data from the sensors of compatible entities will be
|
||||||
|
* aggregated into a single NativeArray<float>. The AgentSystem will then process this
|
||||||
|
* data in batch and generate a new NativeArray<float> that will be used to populate the
|
||||||
|
* Actuator data of all compatible Entities.
|
||||||
|
*/
|
||||||
|
public abstract class AgentSystem<TS, TA> : JobComponentSystem, IAgentSystem
|
||||||
|
where TS : struct, IComponentData
|
||||||
|
where TA : struct, IComponentData
|
||||||
|
{
|
||||||
|
private const int INITIAL_MEMORY_SIZE = 1024;
|
||||||
|
private const int SIZE_OF_FLOAT_IN_MEMORY = 4;
|
||||||
|
|
||||||
|
private int _sensorMemorySize = INITIAL_MEMORY_SIZE;
|
||||||
|
private int _actuatorMemorySize = INITIAL_MEMORY_SIZE;
|
||||||
|
|
||||||
|
public int DecisionInterval { get; set; }
|
||||||
|
private int _phase;
|
||||||
|
|
||||||
|
public IAgentDecision Decision { get; set; }
|
||||||
|
|
||||||
|
private ComponentGroup _componentGroup;
|
||||||
|
private int _sensorSize;
|
||||||
|
private int _actuatorSize;
|
||||||
|
// TODO : Make sure there is not extra cost for memory allocation here and when copying
|
||||||
|
private NativeArray<float> _sensorTensor =
|
||||||
|
new NativeArray<float>(INITIAL_MEMORY_SIZE, Allocator.Persistent);
|
||||||
|
private NativeArray<float> _actuatorTensor =
|
||||||
|
new NativeArray<float>(INITIAL_MEMORY_SIZE, Allocator.Persistent);
|
||||||
|
|
||||||
|
// TODO : Decide if we want to keep at all
|
||||||
|
private Logger _logger;
|
||||||
|
|
||||||
|
protected override void OnCreateManager()
|
||||||
|
{
|
||||||
|
_logger = new Logger(GetType().Name);
|
||||||
|
_logger.Log("OnCreateManager");
|
||||||
|
SetNewComponentGroup();
|
||||||
|
_sensorSize = UnsafeUtility.SizeOf<TS>();
|
||||||
|
_actuatorSize = UnsafeUtility.SizeOf<TA>();
|
||||||
|
}
|
||||||
|
|
||||||
|
protected override void OnDestroyManager()
|
||||||
|
{
|
||||||
|
_logger.Log("OnDestroyManager");
|
||||||
|
_sensorTensor.Dispose();
|
||||||
|
_actuatorTensor.Dispose();
|
||||||
|
}
|
||||||
|
|
||||||
|
public void SetNewComponentGroup(params ComponentType[] t)
|
||||||
|
{
|
||||||
|
_logger.Log("UpdateComponentGroup");
|
||||||
|
var componentTypes = t.ToList();
|
||||||
|
componentTypes.Add(ComponentType.ReadOnly(typeof(TS)));
|
||||||
|
componentTypes.Add(typeof(TA));
|
||||||
|
componentTypes.Add(typeof(Agent));
|
||||||
|
_componentGroup = GetComponentGroup(componentTypes.ToArray());
|
||||||
|
}
|
||||||
|
|
||||||
|
public void SetFilter<T>(T filter) where T : struct, ISharedComponentData
|
||||||
|
{
|
||||||
|
_componentGroup.SetFilter<T>(filter);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void SetFilter<T0, T1>(T0 filterA, T1 filterB)
|
||||||
|
where T0 : struct, ISharedComponentData
|
||||||
|
where T1 : struct, ISharedComponentData
|
||||||
|
{
|
||||||
|
_componentGroup.SetFilter<T0, T1>(filterA, filterB);
|
||||||
|
}
|
||||||
|
|
||||||
|
public void ResetFilter()
|
||||||
|
{
|
||||||
|
_componentGroup.ResetFilter();
|
||||||
|
}
|
||||||
|
|
||||||
|
protected override JobHandle OnUpdate(JobHandle inputDeps)
|
||||||
|
{
|
||||||
|
_logger.Log("OnUpdate");
|
||||||
|
|
||||||
|
if (_phase > 0)
|
||||||
|
{
|
||||||
|
_phase--;
|
||||||
|
return inputDeps;
|
||||||
|
}
|
||||||
|
_phase = DecisionInterval;
|
||||||
|
|
||||||
|
var nAgents = _componentGroup.CalculateLength();
|
||||||
|
|
||||||
|
/*
|
||||||
|
* If the AgentSystem is not active or if there is no Decision component on the
|
||||||
|
* AgentSystem or if no Entities match the ComponentGroups' requirement, the Update
|
||||||
|
* of the Actuators returns immediately.
|
||||||
|
*/
|
||||||
|
if (Decision == null || nAgents == 0)
|
||||||
|
{
|
||||||
|
return inputDeps;
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* If there was more agents than allowed by the memory allocation of the sensor or
|
||||||
|
* actuator, then the size is updated to the required size.
|
||||||
|
*/
|
||||||
|
if (nAgents * _sensorSize / SIZE_OF_FLOAT_IN_MEMORY > _sensorMemorySize)
|
||||||
|
{
|
||||||
|
_sensorMemorySize = nAgents * _sensorSize / SIZE_OF_FLOAT_IN_MEMORY;
|
||||||
|
_sensorTensor.Dispose();
|
||||||
|
_sensorTensor = new NativeArray<float>(_sensorMemorySize, Allocator.Persistent);
|
||||||
|
}
|
||||||
|
if (nAgents * _actuatorSize / SIZE_OF_FLOAT_IN_MEMORY > _actuatorMemorySize)
|
||||||
|
{
|
||||||
|
_actuatorMemorySize = nAgents * _actuatorSize / SIZE_OF_FLOAT_IN_MEMORY;
|
||||||
|
_actuatorTensor.Dispose();
|
||||||
|
_actuatorTensor = new NativeArray<float>(_actuatorMemorySize, Allocator.Persistent);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Collecting the DataArray necessary for the computation
|
||||||
|
*/
|
||||||
|
_logger.Log("On update with "+_componentGroup.CalculateLength()+" entities");
|
||||||
|
var sensors = _componentGroup.GetComponentDataArray<TS>();
|
||||||
|
var actuators = _componentGroup.GetComponentDataArray<TA>();
|
||||||
|
var agents = _componentGroup.GetComponentDataArray<Agent>();
|
||||||
|
var handle = inputDeps;
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Copy the data from the sensors to the sensor NativeArray<float> for batch processing.
|
||||||
|
*/
|
||||||
|
var copySensorsJob = new CopySensorsJob
|
||||||
|
{
|
||||||
|
Sensors = sensors,
|
||||||
|
SensorTensor = _sensorTensor,
|
||||||
|
SensorSize = _sensorSize
|
||||||
|
};
|
||||||
|
handle = copySensorsJob.Schedule(nAgents, 64, handle);
|
||||||
|
|
||||||
|
handle.Complete();
|
||||||
|
|
||||||
|
/*
|
||||||
|
* The Decision is called here to populate the NativeArray<float> of Actuators.
|
||||||
|
*/
|
||||||
|
handle = Decision.DecideBatch(ref _sensorTensor,
|
||||||
|
ref _actuatorTensor,
|
||||||
|
_sensorSize / SIZE_OF_FLOAT_IN_MEMORY,
|
||||||
|
_actuatorSize / SIZE_OF_FLOAT_IN_MEMORY,
|
||||||
|
nAgents,
|
||||||
|
handle);
|
||||||
|
|
||||||
|
/*
|
||||||
|
* Copy the data from the actuator NativeArray<float> to the actuators of each entity.
|
||||||
|
*/
|
||||||
|
var copyActuatorsJob = new CopyActuatorsJob
|
||||||
|
{
|
||||||
|
ActuatorTensor = _actuatorTensor,
|
||||||
|
Actuators = actuators,
|
||||||
|
ActuatorSize = _actuatorSize
|
||||||
|
};
|
||||||
|
|
||||||
|
return copyActuatorsJob.Schedule(nAgents, 64, handle);
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This IJobParallelFor copied the Sensor data into a NativeArray<float>
|
||||||
|
*/
|
||||||
|
// [BurstCompile]
|
||||||
|
private struct CopySensorsJob : IJobParallelFor
|
||||||
|
{
|
||||||
|
[ReadOnly] public ComponentDataArray<TS> Sensors;
|
||||||
|
public NativeArray<float> SensorTensor;
|
||||||
|
[ReadOnly] public int SensorSize;
|
||||||
|
|
||||||
|
public void Execute(int i)
|
||||||
|
{
|
||||||
|
TensorUtility.CopyToNativeArray(Sensors[i], SensorTensor, i * SensorSize);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/*
|
||||||
|
* This IJobParallelFor copies the Actuator data to the appropriate IComponentData
|
||||||
|
*/
|
||||||
|
// [BurstCompile]
|
||||||
|
private struct CopyActuatorsJob : IJobParallelFor
|
||||||
|
{
|
||||||
|
|
||||||
|
public ComponentDataArray<TA> Actuators;
|
||||||
|
public NativeArray<float> ActuatorTensor;
|
||||||
|
[ReadOnly] public int ActuatorSize;
|
||||||
|
|
||||||
|
public void Execute(int i)
|
||||||
|
{
|
||||||
|
var tmp = Actuators[i];
|
||||||
|
// TODO : Make sure there is no extra cost here
|
||||||
|
TensorUtility.CopyFromNativeArray(ActuatorTensor, out tmp, i * ActuatorSize);
|
||||||
|
Actuators[i] = tmp;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,11 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 0e421bb6f29cc4f90ad195a589c8782c
|
||||||
|
MonoImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
serializedVersion: 2
|
||||||
|
defaultReferences: []
|
||||||
|
executionOrder: 0
|
||||||
|
icon: {instanceID: 0}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,37 @@
|
||||||
|
//#define DEBUG_AGENT
|
||||||
|
#if DEBUG_AGENT
|
||||||
|
using UnityEngine;
|
||||||
|
#endif
|
||||||
|
|
||||||
|
namespace ECS_MLAgents_v0.Core
|
||||||
|
{
|
||||||
|
/*
|
||||||
|
* A class for debugging. The messages will only be printed when the define symbol DEBUG_AGENT
|
||||||
|
* is on.
|
||||||
|
*/
|
||||||
|
public class Logger
|
||||||
|
{
|
||||||
|
private string _prefix;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Constructor for the Logger object.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="prefix">The prefix that will be printed at the begining of each message
|
||||||
|
/// logged by the Logger instance</param>
|
||||||
|
public Logger(string prefix)
|
||||||
|
{
|
||||||
|
_prefix = prefix;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Logs the message provided as input using the UnityEngine Debug.Log call.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="message"></param>
|
||||||
|
public void Log(object message)
|
||||||
|
{
|
||||||
|
#if DEBUG_AGENT
|
||||||
|
Debug.Log(_prefix +" : "+ message);
|
||||||
|
#endif
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,3 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 2381447268bd4cc18e8f60043e9016b2
|
||||||
|
timeCreated: 1548538551
|
|
@ -0,0 +1,34 @@
|
||||||
|
using Unity.Collections;
|
||||||
|
using Unity.Jobs;
|
||||||
|
|
||||||
|
namespace ECS_MLAgents_v0.Core
|
||||||
|
{
|
||||||
|
/*
|
||||||
|
* The Interface to define a Decision process by which a bach of agent updates its actuator
|
||||||
|
* based on the information present in the sensor.
|
||||||
|
*/
|
||||||
|
public interface IAgentDecision
|
||||||
|
{
|
||||||
|
/// <summary>
|
||||||
|
/// DecideBatch updates the aggregated actuators of the agents present in the batch from
|
||||||
|
/// the aggregated actuators.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="sensor">The aggregated data for the sensor information present in the
|
||||||
|
/// batch. The sensor data is linearly arranged.</param>
|
||||||
|
/// <param name="actuator">The aggregated data for the actuator information present in the
|
||||||
|
/// batch. The sensor data is linearly arranged.</param>
|
||||||
|
/// <param name="sensorSize">The number of float values present in a sensor for one agent
|
||||||
|
/// </param>
|
||||||
|
/// <param name="actuatorSize">The number of float values present in an actuator
|
||||||
|
/// for one agent</param>
|
||||||
|
/// <param name="nAgents">The number of agents present in the batch</param>
|
||||||
|
/// <param name="handle">The JobHandle for the input dependencies.</param>
|
||||||
|
/// <returns>The Job Handle for the output dependencies.</returns>
|
||||||
|
JobHandle DecideBatch(ref NativeArray<float> sensor,
|
||||||
|
ref NativeArray<float> actuator,
|
||||||
|
int sensorSize,
|
||||||
|
int actuatorSize,
|
||||||
|
int nAgents,
|
||||||
|
JobHandle handle);
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,3 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 5ae010d1ac834febb1f3e5c038cab36e
|
||||||
|
timeCreated: 1548539963
|
|
@ -0,0 +1,51 @@
|
||||||
|
using Unity.Entities;
|
||||||
|
|
||||||
|
namespace ECS_MLAgents_v0.Core
|
||||||
|
{
|
||||||
|
public interface IAgentSystem
|
||||||
|
{
|
||||||
|
/// <summary>
|
||||||
|
/// If true, the AgentSystem will perform on the agents
|
||||||
|
/// </summary>
|
||||||
|
bool Enabled { get; set; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// The IAgentDecision that will be used to update the Actuators of compatible Entities.
|
||||||
|
/// </summary>
|
||||||
|
IAgentDecision Decision { get; set; }
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// This method defines what are the required ComponentType that are needed on an Entity
|
||||||
|
/// to be affected by the AgentSystem. Note : This will reset any filter previously set.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="t"> The ComponentType that are required on the Entities.</param>
|
||||||
|
void SetNewComponentGroup(params ComponentType[] t);
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Allows the creation of a filter on the Entities affected by the AgentSystem.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="filter"> A ISharedComponentData instance used for filtering</param>
|
||||||
|
/// <typeparam name="T"> The type of the ISharedComponentData filter</typeparam>
|
||||||
|
void SetFilter<T>(T filter) where T : struct, ISharedComponentData;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Allows the creation of a filter on the Entities affected by the AgentSystem.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="filterA">The first ISharedComponentData instance used for filtering
|
||||||
|
/// </param>
|
||||||
|
/// <param name="filterB">The second ISharedComponentData instance used for filtering
|
||||||
|
/// </param>
|
||||||
|
/// <typeparam name="T0">The type of the first ISharedComponentData filter</typeparam>
|
||||||
|
/// <typeparam name="T1">The type of the second ISharedComponentData filter</typeparam>
|
||||||
|
void SetFilter<T0, T1>(T0 filterA, T1 filterB)
|
||||||
|
where T0 : struct, ISharedComponentData
|
||||||
|
where T1 : struct, ISharedComponentData;
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Resets the filter previously set on this AgentSystem
|
||||||
|
/// </summary>
|
||||||
|
void ResetFilter();
|
||||||
|
|
||||||
|
int DecisionInterval { get; set; }
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,3 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 1353060c81d44091b517eeb1d0ae4597
|
||||||
|
timeCreated: 1549238797
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 1c8575b3918494070a6f53c91c03941e
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: fb8a49bbf26a244d8bcbd4b90fcc007f
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,28 @@
|
||||||
|
using System.IO;
|
||||||
|
using UnityEditor;
|
||||||
|
using UnityEngine;
|
||||||
|
using UnityEditor.Experimental.AssetImporters;
|
||||||
|
|
||||||
|
namespace ECS_MLAgents_v0.Core.Inference.Editor
|
||||||
|
{
|
||||||
|
/// <summary>
|
||||||
|
/// Asset Importer of barracuda models.
|
||||||
|
/// </summary>
|
||||||
|
[ScriptedImporter(1, new[] {"nn"})]
|
||||||
|
public class NNModelImporter : ScriptedImporter {
|
||||||
|
private const string IconPath = "Assets/ML-Agents/Resources/NNModelIcon.png";
|
||||||
|
|
||||||
|
public override void OnImportAsset(AssetImportContext ctx)
|
||||||
|
{
|
||||||
|
var model = File.ReadAllBytes(ctx.assetPath);
|
||||||
|
var asset = ScriptableObject.CreateInstance<NNModel>();
|
||||||
|
asset.Value = model;
|
||||||
|
|
||||||
|
Texture2D texture = (Texture2D)
|
||||||
|
AssetDatabase.LoadAssetAtPath(IconPath, typeof(Texture2D));
|
||||||
|
|
||||||
|
ctx.AddObjectToAsset(ctx.assetPath, asset, texture);
|
||||||
|
ctx.SetMainObject(asset);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,11 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 87cd9c69c75e6491c9d014a8b05de59c
|
||||||
|
MonoImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
serializedVersion: 2
|
||||||
|
defaultReferences: []
|
||||||
|
executionOrder: 0
|
||||||
|
icon: {instanceID: 0}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,9 @@
|
||||||
|
namespace ECS_MLAgents_v0.Core.Inference
|
||||||
|
{
|
||||||
|
public enum InferenceDevice
|
||||||
|
{
|
||||||
|
CPU = 0,
|
||||||
|
GPU = 1
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
|
@ -0,0 +1,3 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 991a669f5ba0473ab89633672994543f
|
||||||
|
timeCreated: 1549237699
|
|
@ -0,0 +1,10 @@
|
||||||
|
using UnityEngine;
|
||||||
|
|
||||||
|
namespace ECS_MLAgents_v0.Core.Inference
|
||||||
|
{
|
||||||
|
public class NNModel : ScriptableObject
|
||||||
|
{
|
||||||
|
[HideInInspector]
|
||||||
|
public byte[] Value;
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,11 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 1d92b4646016e4d20a97c85418644c9a
|
||||||
|
MonoImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
serializedVersion: 2
|
||||||
|
defaultReferences: []
|
||||||
|
executionOrder: 0
|
||||||
|
icon: {instanceID: 0}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,75 @@
|
||||||
|
using Barracuda;
|
||||||
|
using ECS_MLAgents_v0.Core.Inference;
|
||||||
|
using Unity.Collections;
|
||||||
|
using Unity.Jobs;
|
||||||
|
|
||||||
|
namespace ECS_MLAgents_v0.Core
|
||||||
|
{
|
||||||
|
/// <summary>
|
||||||
|
/// This class uses a pretrained Neural Network model to take the decisions for a batch of
|
||||||
|
/// agents. As such, it implements a IAgentDecision interface and requires a Barracuda Neural
|
||||||
|
/// Network model as input during construction.
|
||||||
|
/// </summary>
|
||||||
|
public class NNDecision : IAgentDecision
|
||||||
|
{
|
||||||
|
private NNModel _model;
|
||||||
|
public InferenceDevice inferenceDevice = InferenceDevice.CPU;
|
||||||
|
private Model _barracudaModel;
|
||||||
|
private IWorker _engine;
|
||||||
|
private const bool _verbose = false;
|
||||||
|
|
||||||
|
private float[] sensorData = new float[0];
|
||||||
|
/// <summary>
|
||||||
|
/// Generates a new NNDecision object that uses the model input to take a decision for
|
||||||
|
/// the agents present in the batches.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="model"> The Barracuda NNModel that will be use for the decision</param>
|
||||||
|
public NNDecision(NNModel model)
|
||||||
|
{
|
||||||
|
_model = model;
|
||||||
|
D.logEnabled = _verbose;
|
||||||
|
_engine?.Dispose();
|
||||||
|
|
||||||
|
_barracudaModel = ModelLoader.Load(model.Value);
|
||||||
|
var executionDevice = inferenceDevice == InferenceDevice.GPU
|
||||||
|
? BarracudaWorkerFactory.Type.ComputeFast
|
||||||
|
: BarracudaWorkerFactory.Type.CSharpFast;
|
||||||
|
|
||||||
|
_engine = BarracudaWorkerFactory.CreateWorker(
|
||||||
|
executionDevice, _barracudaModel, _verbose);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
public JobHandle DecideBatch(ref NativeArray<float> sensor,
|
||||||
|
ref NativeArray<float> actuator,
|
||||||
|
int sensorSize,
|
||||||
|
int actuatorSize,
|
||||||
|
int nAgents,
|
||||||
|
JobHandle handle)
|
||||||
|
{
|
||||||
|
if (sensorData.Length < sensor.Length)
|
||||||
|
{
|
||||||
|
sensorData = new float[sensor.Length];
|
||||||
|
}
|
||||||
|
|
||||||
|
sensor.CopyTo(sensorData);
|
||||||
|
// TODO : This is additional allocation here... need to go FASTER !
|
||||||
|
var sensorT = new Tensor(
|
||||||
|
new TensorShape(nAgents, sensorSize),
|
||||||
|
sensorData,
|
||||||
|
"sensor");
|
||||||
|
|
||||||
|
_engine.Execute(sensorT);
|
||||||
|
sensorT.Dispose();
|
||||||
|
var actuatorT = _engine.Fetch("actuator");
|
||||||
|
|
||||||
|
actuator.Slice(
|
||||||
|
0, actuatorSize*nAgents).CopyFrom(actuatorT.data.Download(actuator.Length));
|
||||||
|
actuatorT.Dispose();
|
||||||
|
sensorT.Dispose();
|
||||||
|
|
||||||
|
return handle;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,11 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: b10ba7c68b628465e9f5b3706682ba6e
|
||||||
|
MonoImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
serializedVersion: 2
|
||||||
|
defaultReferences: []
|
||||||
|
executionOrder: 0
|
||||||
|
icon: {instanceID: 0}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,110 @@
|
||||||
|
using System;
|
||||||
|
using System.Collections.Generic;
|
||||||
|
using System.Linq;
|
||||||
|
using System.Reflection;
|
||||||
|
using Unity.Collections;
|
||||||
|
using Unity.Collections.LowLevel.Unsafe;
|
||||||
|
using Unity.Mathematics;
|
||||||
|
|
||||||
|
namespace ECS_MLAgents_v0.Core
|
||||||
|
{
|
||||||
|
/*
|
||||||
|
* A library that uses unsafe code to copy data between structs and NativeArrays.
|
||||||
|
*/
|
||||||
|
public static class TensorUtility
|
||||||
|
{
|
||||||
|
// Replace this with a set
|
||||||
|
private static readonly List<Type> SeenTypes = new List<Type>();
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Copies a blittable struct of float data into a NativeArray of floats at a specific
|
||||||
|
/// location.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="src"> The source struct that contains the data to be copied</param>
|
||||||
|
/// <param name="dst"> The destination NativeArray of floats that will receive the data
|
||||||
|
/// </param>
|
||||||
|
/// <param name="index"> The index in the NativeArray destination at which to copy the data
|
||||||
|
/// </param>
|
||||||
|
/// <typeparam name="T"> The Type of the struct that will be copied.</typeparam>
|
||||||
|
public static void CopyToNativeArray<T>(T src, NativeArray<float> dst, int index)
|
||||||
|
where T : struct
|
||||||
|
{
|
||||||
|
if (!SeenTypes.Contains(typeof(T)))
|
||||||
|
{
|
||||||
|
DebugCheckStructure(typeof(T));
|
||||||
|
}
|
||||||
|
unsafe
|
||||||
|
{
|
||||||
|
UnsafeUtility.CopyStructureToPtr<T>(ref src, (byte*) (dst.GetUnsafePtr()) + index);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Copies the content of a NativeArray of float at a specific location into a blittable
|
||||||
|
/// struct of float.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="src"> The source NativeArray that contains the data to be copied.</param>
|
||||||
|
/// <param name="dst"> The destination struct that will receive the data</param>
|
||||||
|
/// <param name="index"> The index in the NativeArray at which the data is located.</param>
|
||||||
|
/// <typeparam name="T"> The Type of the struct that will receive the data</typeparam>
|
||||||
|
public static void CopyFromNativeArray<T>(NativeArray<float> src, out T dst, int index)
|
||||||
|
where T : struct
|
||||||
|
{
|
||||||
|
if (!SeenTypes.Contains(typeof(T)))
|
||||||
|
{
|
||||||
|
DebugCheckStructure(typeof(T));
|
||||||
|
}
|
||||||
|
unsafe
|
||||||
|
{
|
||||||
|
UnsafeUtility.CopyPtrToStructure((byte*) (src.GetUnsafePtr()) + index, out dst);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// A helper method that checks if the type of a struct is supported by the library. The
|
||||||
|
/// struct must be blittable and only contain fields of float with a valid type.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="t"> The Type that will be checked</param>
|
||||||
|
/// <exception cref="NotSupportedException"> NotSupportedException will be raised if the
|
||||||
|
/// Type t is not valid for use by the library.</exception>
|
||||||
|
private static void DebugCheckStructure(Type t)
|
||||||
|
{
|
||||||
|
SeenTypes.Add(t);
|
||||||
|
if (t.GetFields(BindingFlags.Public | BindingFlags.Instance)
|
||||||
|
.Any(f => !IsCompatibleObservationFieldType(f.FieldType)))
|
||||||
|
{
|
||||||
|
throw new NotSupportedException(
|
||||||
|
"You are trying to add an struct as observation data which contains an " +
|
||||||
|
"incompatible member type. Only float and vectors are supported for " +
|
||||||
|
"struct observations");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// Helper method that checks if the type of a field is a compatible blittable float.
|
||||||
|
/// </summary>
|
||||||
|
/// <param name="t"> The Type of the field.</param>
|
||||||
|
/// <returns> True if the Type is compatible and false otherwise.</returns>
|
||||||
|
private static bool IsCompatibleObservationFieldType(Type t)
|
||||||
|
{
|
||||||
|
if (t == typeof(float))
|
||||||
|
return true;
|
||||||
|
if (t == typeof(float2))
|
||||||
|
return true;
|
||||||
|
if (t == typeof(float3))
|
||||||
|
return true;
|
||||||
|
if (t == typeof(float4))
|
||||||
|
return true;
|
||||||
|
if (t == typeof(quaternion))
|
||||||
|
return true;
|
||||||
|
if (t == typeof(float2x2))
|
||||||
|
return true;
|
||||||
|
if (t == typeof(float3x3))
|
||||||
|
return true;
|
||||||
|
if (t == typeof(float4x4))
|
||||||
|
return true;
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
|
@ -0,0 +1,3 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 331744d812e64f74910ee4d5727312cd
|
||||||
|
timeCreated: 1548540468
|
|
@ -0,0 +1,3 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 8fc552fa6c4441fa8d84ba78bfbc22d7
|
||||||
|
timeCreated: 1548439524
|
|
@ -0,0 +1,3 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 06f334ad4721424cb426ab1ee3b705c8
|
||||||
|
timeCreated: 1548624746
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: fee4441a574c64877834ac1c7c5abfc2
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,136 @@
|
||||||
|
%YAML 1.1
|
||||||
|
%TAG !u! tag:unity3d.com,2011:
|
||||||
|
--- !u!1 &8480657802093770681
|
||||||
|
GameObject:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
serializedVersion: 6
|
||||||
|
m_Component:
|
||||||
|
- component: {fileID: 7453438336369544782}
|
||||||
|
- component: {fileID: 7329952648618696081}
|
||||||
|
- component: {fileID: 7175794993082824140}
|
||||||
|
- component: {fileID: 4931620525958682511}
|
||||||
|
- component: {fileID: 6329899614523502008}
|
||||||
|
- component: {fileID: 9208578237387885132}
|
||||||
|
- component: {fileID: 3511010848622024065}
|
||||||
|
m_Layer: 0
|
||||||
|
m_Name: Sphere
|
||||||
|
m_TagString: Untagged
|
||||||
|
m_Icon: {fileID: 0}
|
||||||
|
m_NavMeshLayer: 0
|
||||||
|
m_StaticEditorFlags: 0
|
||||||
|
m_IsActive: 1
|
||||||
|
--- !u!4 &7453438336369544782
|
||||||
|
Transform:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 8480657802093770681}
|
||||||
|
m_LocalRotation: {x: 0, y: 0, z: 0, w: 1}
|
||||||
|
m_LocalPosition: {x: 0, y: 0, z: 0}
|
||||||
|
m_LocalScale: {x: 1, y: 1, z: 1}
|
||||||
|
m_Children: []
|
||||||
|
m_Father: {fileID: 0}
|
||||||
|
m_RootOrder: 0
|
||||||
|
m_LocalEulerAnglesHint: {x: 0, y: 0, z: 0}
|
||||||
|
--- !u!114 &7329952648618696081
|
||||||
|
MonoBehaviour:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 8480657802093770681}
|
||||||
|
m_Enabled: 1
|
||||||
|
m_EditorHideFlags: 0
|
||||||
|
m_Script: {fileID: 11500000, guid: 5bf10cdea1344482e91a4f2b58506b77, type: 3}
|
||||||
|
m_Name:
|
||||||
|
m_EditorClassIdentifier:
|
||||||
|
--- !u!114 &7175794993082824140
|
||||||
|
MonoBehaviour:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 8480657802093770681}
|
||||||
|
m_Enabled: 1
|
||||||
|
m_EditorHideFlags: 0
|
||||||
|
m_Script: {fileID: 11500000, guid: 9b0fd4427893a4a16ba0c267dfd00217, type: 3}
|
||||||
|
m_Name:
|
||||||
|
m_EditorClassIdentifier:
|
||||||
|
m_SerializedData:
|
||||||
|
mesh: {fileID: 10207, guid: 0000000000000000e000000000000000, type: 0}
|
||||||
|
material: {fileID: 10302, guid: 0000000000000000f000000000000000, type: 0}
|
||||||
|
subMesh: 0
|
||||||
|
castShadows: 0
|
||||||
|
receiveShadows: 0
|
||||||
|
--- !u!114 &4931620525958682511
|
||||||
|
MonoBehaviour:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 8480657802093770681}
|
||||||
|
m_Enabled: 1
|
||||||
|
m_EditorHideFlags: 0
|
||||||
|
m_Script: {fileID: 11500000, guid: 0af0db853e732453799566a0e597993c, type: 3}
|
||||||
|
m_Name:
|
||||||
|
m_EditorClassIdentifier:
|
||||||
|
m_SerializedData:
|
||||||
|
Value:
|
||||||
|
x: 0
|
||||||
|
y: 0
|
||||||
|
z: 0
|
||||||
|
--- !u!114 &6329899614523502008
|
||||||
|
MonoBehaviour:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 8480657802093770681}
|
||||||
|
m_Enabled: 1
|
||||||
|
m_EditorHideFlags: 0
|
||||||
|
m_Script: {fileID: 11500000, guid: 4f2a8abc5e5549439b29a0f9cbb7776b, type: 3}
|
||||||
|
m_Name:
|
||||||
|
m_EditorClassIdentifier:
|
||||||
|
m_SerializedData:
|
||||||
|
Reward:
|
||||||
|
x: 0
|
||||||
|
y: 0
|
||||||
|
z: 0
|
||||||
|
--- !u!114 &9208578237387885132
|
||||||
|
MonoBehaviour:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 8480657802093770681}
|
||||||
|
m_Enabled: 1
|
||||||
|
m_EditorHideFlags: 0
|
||||||
|
m_Script: {fileID: 11500000, guid: 950225b98b4a438b843c2442cab09add, type: 3}
|
||||||
|
m_Name:
|
||||||
|
m_EditorClassIdentifier:
|
||||||
|
m_SerializedData:
|
||||||
|
Value:
|
||||||
|
x: 0
|
||||||
|
y: 0
|
||||||
|
z: 0
|
||||||
|
--- !u!114 &3511010848622024065
|
||||||
|
MonoBehaviour:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 8480657802093770681}
|
||||||
|
m_Enabled: 1
|
||||||
|
m_EditorHideFlags: 0
|
||||||
|
m_Script: {fileID: 11500000, guid: d4fe5e52c44e4e2097302ddf66e78272, type: 3}
|
||||||
|
m_Name:
|
||||||
|
m_EditorClassIdentifier:
|
||||||
|
m_SerializedData:
|
||||||
|
Value:
|
||||||
|
x: 0
|
||||||
|
y: 0
|
||||||
|
z: 0
|
|
@ -0,0 +1,7 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 9002bbfd4ae214a8fb609aeacaa0de4d
|
||||||
|
PrefabImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: c4e16e919f6024426bf06364075adc0b
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,313 @@
|
||||||
|
%YAML 1.1
|
||||||
|
%TAG !u! tag:unity3d.com,2011:
|
||||||
|
--- !u!29 &1
|
||||||
|
OcclusionCullingSettings:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
serializedVersion: 2
|
||||||
|
m_OcclusionBakeSettings:
|
||||||
|
smallestOccluder: 5
|
||||||
|
smallestHole: 0.25
|
||||||
|
backfaceThreshold: 100
|
||||||
|
m_SceneGUID: 00000000000000000000000000000000
|
||||||
|
m_OcclusionCullingData: {fileID: 0}
|
||||||
|
--- !u!104 &2
|
||||||
|
RenderSettings:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
serializedVersion: 9
|
||||||
|
m_Fog: 0
|
||||||
|
m_FogColor: {r: 0.5, g: 0.5, b: 0.5, a: 1}
|
||||||
|
m_FogMode: 3
|
||||||
|
m_FogDensity: 0.01
|
||||||
|
m_LinearFogStart: 0
|
||||||
|
m_LinearFogEnd: 300
|
||||||
|
m_AmbientSkyColor: {r: 0.212, g: 0.227, b: 0.259, a: 1}
|
||||||
|
m_AmbientEquatorColor: {r: 0.114, g: 0.125, b: 0.133, a: 1}
|
||||||
|
m_AmbientGroundColor: {r: 0.047, g: 0.043, b: 0.035, a: 1}
|
||||||
|
m_AmbientIntensity: 1
|
||||||
|
m_AmbientMode: 0
|
||||||
|
m_SubtractiveShadowColor: {r: 0.42, g: 0.478, b: 0.627, a: 1}
|
||||||
|
m_SkyboxMaterial: {fileID: 10304, guid: 0000000000000000f000000000000000, type: 0}
|
||||||
|
m_HaloStrength: 0.5
|
||||||
|
m_FlareStrength: 1
|
||||||
|
m_FlareFadeSpeed: 3
|
||||||
|
m_HaloTexture: {fileID: 0}
|
||||||
|
m_SpotCookie: {fileID: 10001, guid: 0000000000000000e000000000000000, type: 0}
|
||||||
|
m_DefaultReflectionMode: 0
|
||||||
|
m_DefaultReflectionResolution: 128
|
||||||
|
m_ReflectionBounces: 1
|
||||||
|
m_ReflectionIntensity: 1
|
||||||
|
m_CustomReflection: {fileID: 0}
|
||||||
|
m_Sun: {fileID: 0}
|
||||||
|
m_IndirectSpecularColor: {r: 0.44657838, g: 0.49641234, b: 0.57481676, a: 1}
|
||||||
|
m_UseRadianceAmbientProbe: 0
|
||||||
|
--- !u!157 &3
|
||||||
|
LightmapSettings:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
serializedVersion: 11
|
||||||
|
m_GIWorkflowMode: 0
|
||||||
|
m_GISettings:
|
||||||
|
serializedVersion: 2
|
||||||
|
m_BounceScale: 1
|
||||||
|
m_IndirectOutputScale: 1
|
||||||
|
m_AlbedoBoost: 1
|
||||||
|
m_EnvironmentLightingMode: 0
|
||||||
|
m_EnableBakedLightmaps: 1
|
||||||
|
m_EnableRealtimeLightmaps: 1
|
||||||
|
m_LightmapEditorSettings:
|
||||||
|
serializedVersion: 10
|
||||||
|
m_Resolution: 2
|
||||||
|
m_BakeResolution: 40
|
||||||
|
m_AtlasSize: 1024
|
||||||
|
m_AO: 0
|
||||||
|
m_AOMaxDistance: 1
|
||||||
|
m_CompAOExponent: 1
|
||||||
|
m_CompAOExponentDirect: 0
|
||||||
|
m_Padding: 2
|
||||||
|
m_LightmapParameters: {fileID: 0}
|
||||||
|
m_LightmapsBakeMode: 1
|
||||||
|
m_TextureCompression: 1
|
||||||
|
m_FinalGather: 0
|
||||||
|
m_FinalGatherFiltering: 1
|
||||||
|
m_FinalGatherRayCount: 256
|
||||||
|
m_ReflectionCompression: 2
|
||||||
|
m_MixedBakeMode: 2
|
||||||
|
m_BakeBackend: 1
|
||||||
|
m_PVRSampling: 1
|
||||||
|
m_PVRDirectSampleCount: 32
|
||||||
|
m_PVRSampleCount: 500
|
||||||
|
m_PVRBounces: 2
|
||||||
|
m_PVRFilterTypeDirect: 0
|
||||||
|
m_PVRFilterTypeIndirect: 0
|
||||||
|
m_PVRFilterTypeAO: 0
|
||||||
|
m_PVRFilteringMode: 1
|
||||||
|
m_PVRCulling: 1
|
||||||
|
m_PVRFilteringGaussRadiusDirect: 1
|
||||||
|
m_PVRFilteringGaussRadiusIndirect: 5
|
||||||
|
m_PVRFilteringGaussRadiusAO: 2
|
||||||
|
m_PVRFilteringAtrousPositionSigmaDirect: 0.5
|
||||||
|
m_PVRFilteringAtrousPositionSigmaIndirect: 2
|
||||||
|
m_PVRFilteringAtrousPositionSigmaAO: 1
|
||||||
|
m_ShowResolutionOverlay: 1
|
||||||
|
m_LightingDataAsset: {fileID: 0}
|
||||||
|
m_UseShadowmask: 1
|
||||||
|
--- !u!196 &4
|
||||||
|
NavMeshSettings:
|
||||||
|
serializedVersion: 2
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_BuildSettings:
|
||||||
|
serializedVersion: 2
|
||||||
|
agentTypeID: 0
|
||||||
|
agentRadius: 0.5
|
||||||
|
agentHeight: 2
|
||||||
|
agentSlope: 45
|
||||||
|
agentClimb: 0.4
|
||||||
|
ledgeDropHeight: 0
|
||||||
|
maxJumpAcrossDistance: 0
|
||||||
|
minRegionArea: 2
|
||||||
|
manualCellSize: 0
|
||||||
|
cellSize: 0.16666667
|
||||||
|
manualTileSize: 0
|
||||||
|
tileSize: 256
|
||||||
|
accuratePlacement: 0
|
||||||
|
debug:
|
||||||
|
m_Flags: 0
|
||||||
|
m_NavMeshData: {fileID: 0}
|
||||||
|
--- !u!1 &883511624
|
||||||
|
GameObject:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
serializedVersion: 6
|
||||||
|
m_Component:
|
||||||
|
- component: {fileID: 883511626}
|
||||||
|
- component: {fileID: 883511625}
|
||||||
|
m_Layer: 0
|
||||||
|
m_Name: Manager
|
||||||
|
m_TagString: Untagged
|
||||||
|
m_Icon: {fileID: 0}
|
||||||
|
m_NavMeshLayer: 0
|
||||||
|
m_StaticEditorFlags: 0
|
||||||
|
m_IsActive: 1
|
||||||
|
--- !u!114 &883511625
|
||||||
|
MonoBehaviour:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 883511624}
|
||||||
|
m_Enabled: 1
|
||||||
|
m_EditorHideFlags: 0
|
||||||
|
m_Script: {fileID: 11500000, guid: 8bc729803b874b188632e9d135d5ddec, type: 3}
|
||||||
|
m_Name:
|
||||||
|
m_EditorClassIdentifier:
|
||||||
|
maxDistance: 2
|
||||||
|
prefab: {fileID: 8480657802093770681, guid: 9002bbfd4ae214a8fb609aeacaa0de4d, type: 3}
|
||||||
|
modelA: {fileID: 11400002, guid: c16aa6693b8834a58855a5592bb4f5f8, type: 3}
|
||||||
|
modelB: {fileID: 11400002, guid: a3e419de75dc44356b527bf0b06e3b81, type: 3}
|
||||||
|
modelC: {fileID: 11400002, guid: 8533bf952c61d430e8765c2c16cde480, type: 3}
|
||||||
|
--- !u!4 &883511626
|
||||||
|
Transform:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 883511624}
|
||||||
|
m_LocalRotation: {x: 0, y: 0, z: 0, w: 1}
|
||||||
|
m_LocalPosition: {x: 0, y: 0, z: 0}
|
||||||
|
m_LocalScale: {x: 1, y: 1, z: 1}
|
||||||
|
m_Children: []
|
||||||
|
m_Father: {fileID: 0}
|
||||||
|
m_RootOrder: 2
|
||||||
|
m_LocalEulerAnglesHint: {x: 0, y: 0, z: 0}
|
||||||
|
--- !u!1 &1006088310
|
||||||
|
GameObject:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
serializedVersion: 6
|
||||||
|
m_Component:
|
||||||
|
- component: {fileID: 1006088313}
|
||||||
|
- component: {fileID: 1006088312}
|
||||||
|
- component: {fileID: 1006088311}
|
||||||
|
m_Layer: 0
|
||||||
|
m_Name: Main Camera
|
||||||
|
m_TagString: MainCamera
|
||||||
|
m_Icon: {fileID: 0}
|
||||||
|
m_NavMeshLayer: 0
|
||||||
|
m_StaticEditorFlags: 0
|
||||||
|
m_IsActive: 1
|
||||||
|
--- !u!81 &1006088311
|
||||||
|
AudioListener:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 1006088310}
|
||||||
|
m_Enabled: 1
|
||||||
|
--- !u!20 &1006088312
|
||||||
|
Camera:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 1006088310}
|
||||||
|
m_Enabled: 1
|
||||||
|
serializedVersion: 2
|
||||||
|
m_ClearFlags: 2
|
||||||
|
m_BackGroundColor: {r: 0.114142045, g: 0.16881007, b: 0.254717, a: 0}
|
||||||
|
m_projectionMatrixMode: 1
|
||||||
|
m_SensorSize: {x: 36, y: 24}
|
||||||
|
m_LensShift: {x: 0, y: 0}
|
||||||
|
m_GateFitMode: 2
|
||||||
|
m_FocalLength: 50
|
||||||
|
m_NormalizedViewPortRect:
|
||||||
|
serializedVersion: 2
|
||||||
|
x: 0
|
||||||
|
y: 0
|
||||||
|
width: 1
|
||||||
|
height: 1
|
||||||
|
near clip plane: 0.3
|
||||||
|
far clip plane: 1000
|
||||||
|
field of view: 60
|
||||||
|
orthographic: 0
|
||||||
|
orthographic size: 5
|
||||||
|
m_Depth: -1
|
||||||
|
m_CullingMask:
|
||||||
|
serializedVersion: 2
|
||||||
|
m_Bits: 4294967295
|
||||||
|
m_RenderingPath: -1
|
||||||
|
m_TargetTexture: {fileID: 0}
|
||||||
|
m_TargetDisplay: 0
|
||||||
|
m_TargetEye: 3
|
||||||
|
m_HDR: 1
|
||||||
|
m_AllowMSAA: 1
|
||||||
|
m_AllowDynamicResolution: 0
|
||||||
|
m_ForceIntoRT: 0
|
||||||
|
m_OcclusionCulling: 1
|
||||||
|
m_StereoConvergence: 10
|
||||||
|
m_StereoSeparation: 0.022
|
||||||
|
--- !u!4 &1006088313
|
||||||
|
Transform:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 1006088310}
|
||||||
|
m_LocalRotation: {x: 0.2588191, y: 0, z: 0, w: 0.9659258}
|
||||||
|
m_LocalPosition: {x: 0, y: 200, z: -153}
|
||||||
|
m_LocalScale: {x: 1, y: 1, z: 1}
|
||||||
|
m_Children: []
|
||||||
|
m_Father: {fileID: 0}
|
||||||
|
m_RootOrder: 0
|
||||||
|
m_LocalEulerAnglesHint: {x: 30, y: 0, z: 0}
|
||||||
|
--- !u!1 &1246396923
|
||||||
|
GameObject:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
serializedVersion: 6
|
||||||
|
m_Component:
|
||||||
|
- component: {fileID: 1246396925}
|
||||||
|
- component: {fileID: 1246396924}
|
||||||
|
m_Layer: 0
|
||||||
|
m_Name: Directional Light
|
||||||
|
m_TagString: Untagged
|
||||||
|
m_Icon: {fileID: 0}
|
||||||
|
m_NavMeshLayer: 0
|
||||||
|
m_StaticEditorFlags: 0
|
||||||
|
m_IsActive: 1
|
||||||
|
--- !u!108 &1246396924
|
||||||
|
Light:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 1246396923}
|
||||||
|
m_Enabled: 1
|
||||||
|
serializedVersion: 8
|
||||||
|
m_Type: 1
|
||||||
|
m_Color: {r: 1, g: 0.95686275, b: 0.8392157, a: 1}
|
||||||
|
m_Intensity: 1
|
||||||
|
m_Range: 10
|
||||||
|
m_SpotAngle: 30
|
||||||
|
m_CookieSize: 10
|
||||||
|
m_Shadows:
|
||||||
|
m_Type: 2
|
||||||
|
m_Resolution: -1
|
||||||
|
m_CustomResolution: -1
|
||||||
|
m_Strength: 1
|
||||||
|
m_Bias: 0.05
|
||||||
|
m_NormalBias: 0.4
|
||||||
|
m_NearPlane: 0.2
|
||||||
|
m_Cookie: {fileID: 0}
|
||||||
|
m_DrawHalo: 0
|
||||||
|
m_Flare: {fileID: 0}
|
||||||
|
m_RenderMode: 0
|
||||||
|
m_CullingMask:
|
||||||
|
serializedVersion: 2
|
||||||
|
m_Bits: 4294967295
|
||||||
|
m_Lightmapping: 4
|
||||||
|
m_LightShadowCasterMode: 0
|
||||||
|
m_AreaSize: {x: 1, y: 1}
|
||||||
|
m_BounceIntensity: 1
|
||||||
|
m_ColorTemperature: 6570
|
||||||
|
m_UseColorTemperature: 0
|
||||||
|
m_ShadowRadius: 0
|
||||||
|
m_ShadowAngle: 0
|
||||||
|
--- !u!4 &1246396925
|
||||||
|
Transform:
|
||||||
|
m_ObjectHideFlags: 0
|
||||||
|
m_CorrespondingSourceObject: {fileID: 0}
|
||||||
|
m_PrefabInstance: {fileID: 0}
|
||||||
|
m_PrefabAsset: {fileID: 0}
|
||||||
|
m_GameObject: {fileID: 1246396923}
|
||||||
|
m_LocalRotation: {x: 0.40821788, y: -0.23456968, z: 0.10938163, w: 0.8754261}
|
||||||
|
m_LocalPosition: {x: 0, y: 3, z: 0}
|
||||||
|
m_LocalScale: {x: 1, y: 1, z: 1}
|
||||||
|
m_Children: []
|
||||||
|
m_Father: {fileID: 0}
|
||||||
|
m_RootOrder: 1
|
||||||
|
m_LocalEulerAnglesHint: {x: 50, y: -30, z: 0}
|
|
@ -0,0 +1,7 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: 09d90853262f045e48c12ea3ba572f70
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,8 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: b6feaae123a73448f81a42f015ea41b7
|
||||||
|
folderAsset: yes
|
||||||
|
DefaultImporter:
|
||||||
|
externalObjects: {}
|
||||||
|
userData:
|
||||||
|
assetBundleName:
|
||||||
|
assetBundleVariant:
|
|
@ -0,0 +1,20 @@
|
||||||
|
using System;
|
||||||
|
using Unity.Entities;
|
||||||
|
using Unity.Mathematics;
|
||||||
|
|
||||||
|
namespace ECS_MLAgents_v0.Example.SpaceMagic.Scripts
|
||||||
|
{
|
||||||
|
/// <summary>
|
||||||
|
/// This component will represent the acceleration of the spheres
|
||||||
|
/// </summary>
|
||||||
|
[Serializable]
|
||||||
|
public struct Acceleration : IComponentData
|
||||||
|
{
|
||||||
|
public float3 Value;
|
||||||
|
}
|
||||||
|
|
||||||
|
/// <summary>
|
||||||
|
/// This wrapper only allows us to add this IComponentData as a Component to the sphere prefab
|
||||||
|
/// </summary>
|
||||||
|
public class AccelerationComponent : ComponentDataWrapper<Acceleration> { }
|
||||||
|
}
|
|
@ -0,0 +1,3 @@
|
||||||
|
fileFormatVersion: 2
|
||||||
|
guid: d4fe5e52c44e4e2097302ddf66e78272
|
||||||
|
timeCreated: 1548624826
|
|
@ -0,0 +1,16 @@
|
||||||
|
using System;
|
||||||
|
using Unity.Entities;
|
||||||
|
|
||||||
|
namespace ECS_MLAgents_v0.Example.SpaceMagic.Scripts
|
||||||
|
{
|
||||||
|
/// <summary>
|
||||||
|
/// This IShareComponentData will be used to assign each sphere in a different group that will
|
||||||
|
/// use a different IAgentSystem for its decision making.
|
||||||
|
/// </summary>
|
||||||
|
[Serializable]
|
||||||
|
public struct SphereGroup : ISharedComponentData
|
||||||
|
{
|
||||||
|
public int Group;
|
||||||
|
}
|
||||||
|
|
||||||
|
}
|
Некоторые файлы не были показаны из-за слишком большого количества измененных файлов Показать больше
Загрузка…
Ссылка в новой задаче