Merge pull request #29 from microsoft/dev/setup-refactor

Refactor: refine examples and doc
This commit is contained in:
Li Lyna Zhang 2021-10-11 13:03:43 +08:00 коммит произвёл GitHub
Родитель e62f661bd3 b3e5b75a4f
Коммит 2a01c6134d
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
12 изменённых файлов: 737 добавлений и 83 удалений

Просмотреть файл

@ -4,8 +4,8 @@ Note: This is an alpha (preview) version which is still under refining.
The current supported hardware and inference frameworks:
| Device | Framework | Processor | +-10% Accuracy | Hardware name |
| :-----------------: | :------------: | :------------: | :-------------: | :--------------------------------: |
| Device | Framework | Processor | +-10% Accuracy | Hardware name |
| :-----------------: | :------------: | :------------: | :-------------: | :----------------------: |
| Pixel4 | TFLite v2.1 | CortexA76 CPU | 99.0% | cortexA76cpu_tflite21 |
| Mi9 | TFLite v2.1 | Adreno 640 GPU | 99.1% | adreno640gpu_tflite21 |
| Pixel3XL | TFLite v2.1 | Adreno 630 GPU | 99.0% | adreno630gpu_tflite21 |
@ -20,6 +20,7 @@ The current supported hardware and inference frameworks:
- Those who want to get the DNN inference latency on mobile and edge devices with **no deployment efforts on real devices**.
- Those who want to run **hardware-aware NAS with [NNI](https://github.com/microsoft/nni)**.
- Those who want to **build latency predictors for their own devices**.
- Those who want to use the 26k latency [benchmark dataset](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip).
# Installation
@ -30,58 +31,59 @@ pip install nn-meter
```
If you want to try latest code, please install nn-Meter from source code. First git clone nn-Meter package to local:
```Bash
git clone git@github.com:microsoft/nn-Meter.git
cd nn-Meter
```
Then simply run the following pip install in an environment that has `python >= 3.6`. The command will complete the automatic installation of all necessary dependencies and nn-Meter.
```Bash
pip install .
```
nn-Meter is a latency predictor of models with type of Tensorflow, PyTorch, Onnx, nn-meter IR graph and [NNI IR graph](https://github.com/microsoft/nni). To use nn-Meter for specific model type, you also need to install corresponding required packages. The well tested versions are listed below:
| Testing Model Type | Requirements |
| :-------------------: | :------------------------------------------------: |
| Tensorflow | `tensorflow==1.15.0` |
| Torch | `torch==1.7.1`, `torchvision==0.8.2`, (alternative)[`onnx==1.9.0`, `onnx-simplifier==0.3.6`] or [`nni==2.4`][1] |
| Onnx | `onnx==1.9.0` |
| nn-Meter IR graph | --- |
| NNI IR graph | `nni==2.4` |
| Testing Model Type | Requirements |
| :----------------: | :-----------------------------------------------------------------------------------------------------------------------: |
| Tensorflow | `tensorflow==1.15.0` |
| Torch | `torch==1.7.1`, `torchvision==0.8.2`, (alternative)[`onnx==1.9.0`, `onnx-simplifier==0.3.6`] or [`nni==2.4`][1] |
| Onnx | `onnx==1.9.0` |
| nn-Meter IR graph | --- |
| NNI IR graph | `nni==2.4` |
[1] Please refer to [nn-Meter Usage](#torch-model-converters) for more information.
Please also check the versions of `numpy` and `scikit_learn`. The different versions may change the prediction accuracy of kernel predictors.
The stable version of wheel binary package will be released soon.
# Usage
To apply for hardware latency prediction, nn-Meter provides two types of interfaces
- command line `nn-meter` after `nn-meter` [installation](QuickStart.md#Installation).
- command line `nn-meter` after `nn-meter`[installation](QuickStart.md#Installation).
- Python binding provided by the module `nn_meter`
Here is a summary of supported inputs of the two methods.
| Testing Model Type | Command Support | Python Binding |
| :---------------: | :---------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------: |
| Tensorflow | Checkpoint file dumped by `tf.saved_model()` and end with `.pb` | Checkpoint file dumped by `tf.saved_model` and end with `.pb` |
| Torch | Models in `torchvision.models` | Object of `torch.nn.Module` |
| Onnx | Checkpoint file dumped by `torch.onnx.export()` or `onnx.save()` and end with `.onnx` | Checkpoint file dumped by `onnx.save()` or model loaded by `onnx.load()` |
| nn-Meter IR graph | Json file in the format of [nn-Meter IR Graph](./docs/input_models.md#nnmeter-ir-graph) | `dict` object following the format of [nn-Meter IR Graph](./docs/input_models.md#nnmeter-ir-graph) |
| NNI IR graph | - | NNI IR graph object |
| Testing Model Type | Command Support | Python Binding |
| :----------------: | :-----------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------: |
| Tensorflow | Checkpoint file dumped by `tf.saved_model()` and end with `.pb` | Checkpoint file dumped by `tf.saved_model` and end with `.pb` |
| Torch | Models in `torchvision.models` | Object of `torch.nn.Module` |
| Onnx | Checkpoint file dumped by `torch.onnx.export()` or `onnx.save()` and end with `.onnx` | Checkpoint file dumped by `onnx.save()` or model loaded by `onnx.load()` |
| nn-Meter IR graph | Json file in the format of[nn-Meter IR Graph](./docs/input_models.md#nnmeter-ir-graph) | `dict` object following the format of [nn-Meter IR Graph](./docs/input_models.md#nnmeter-ir-graph) |
| NNI IR graph | - | NNI IR graph object |
In both methods, users could appoint predictor name and version to target a specific hardware platform (device). Currently, nn-Meter supports prediction on the following four configs:
| Predictor (device_inferenceframework) | Processor Category | Version |
| :-----------------------------------: | :----------------: | :-----: |
| cortexA76cpu_tflite21 | CPU | 1.0 |
| adreno640gpu_tflite21 | GPU | 1.0 |
| adreno630gpu_tflite21 | GPU | 1.0 |
| myriadvpu_openvino2019r2 | VPU | 1.0 |
| cortexA76cpu_tflite21 | CPU | 1.0 |
| adreno640gpu_tflite21 | GPU | 1.0 |
| adreno630gpu_tflite21 | GPU | 1.0 |
| myriadvpu_openvino2019r2 | VPU | 1.0 |
Users can get all predefined predictors and versions by running
@ -147,7 +149,7 @@ By calling `load_latency_predictor`, user selects the target hardware and loads
In `predictor.predict()`, the allowed items of the parameter `model_type` include `["pb", "torch", "onnx", "nnmeter-ir", "nni-ir"]`, representing model types of tensorflow, torch, onnx, nn-meter IR graph and NNI IR graph, respectively.
<span id="torch-model-converters"> For Torch models, the shape of feature maps is unknown merely based on the given network structure, which is, however, significant parameters in latency prediction. Therefore, torch model requires a shape of input tensor for inference as a input of `predictor.predict()`. Based on the given input shape, a random tensor according to the shape will be generated and used. Another thing for Torch model prediction is that users can install the `onnx` and `onnx-simplifier` packages for latency prediction (referred to as Onnx-based latency prediction for torch model), or alternatively install the `nni` package (referred to as NNI-based latency prediction for torch model). Note that the `nni` option does not support command line calls. In addition, if users use `nni` for latency prediction, the PyTorch modules should be defined by the `nn` interface from NNI `import nni.retiarii.nn.pytorch as nn` (view [NNI doc](https://nni.readthedocs.io/en/stable/NAS/QuickStart.html#define-base-model) for more information), and the parameter `apply_nni` should be set as `True` in the function `predictor.predict()`. Here is an example of NNI-based latency prediction for Torch model:
`<span id="torch-model-converters">` For Torch models, the shape of feature maps is unknown merely based on the given network structure, which is, however, significant parameters in latency prediction. Therefore, torch model requires a shape of input tensor for inference as a input of `predictor.predict()`. Based on the given input shape, a random tensor according to the shape will be generated and used. Another thing for Torch model prediction is that users can install the `onnx` and `onnx-simplifier` packages for latency prediction (referred to as Onnx-based latency prediction for torch model), or alternatively install the `nni` package (referred to as NNI-based latency prediction for torch model). Note that the `nni` option does not support command line calls. In addition, if users use `nni` for latency prediction, the PyTorch modules should be defined by the `nn` interface from NNI `import nni.retiarii.nn.pytorch as nn` (view [NNI doc](https://nni.readthedocs.io/en/stable/NAS/QuickStart.html#define-base-model) for more information), and the parameter `apply_nni` should be set as `True` in the function `predictor.predict()`. Here is an example of NNI-based latency prediction for Torch model:
```python
import nni.retiarii.nn.pytorch as nn
@ -162,19 +164,26 @@ input_shape = (1, 3, 224, 224)
lat = predictor.predict(model, model_type='torch', input_shape=input_shape, apply_nni=True)
```
The Onnx-based latency prediction for torch model is stable but slower, while the NNI-based latency prediction for torch model is unstable as it could fail in some case but much faster compared to the Onnx-based model. The Onnx-based model is set as the default one for Torch model latency prediction in nn-Meter. Users could choose which one they preferred to use according to their needs. </span>
The Onnx-based latency prediction for torch model is stable but slower, while the NNI-based latency prediction for torch model is unstable as it could fail in some case but much faster compared to the Onnx-based model. The Onnx-based model is set as the default one for Torch model latency prediction in nn-Meter. Users could choose which one they preferred to use according to their needs.
Users could view the information all built-in predictors by `list_latency_predictors` or view the config file in `nn_meter/configs/predictors.yaml`.
Users could get a nn-Meter IR graph by applying `model_file_to_graph` and `model_to_graph` by calling the model name or model object and specify the model type. The supporting model types of `model_file_to_graph` include "onnx", "pb", "torch", "nnmeter-ir" and "nni-ir", while the supporting model types of `model_to_graph` include "onnx", "torch" and "nni-ir".
## Benchmark Dataset
To evaluate the effectiveness of a prediction model on an arbitrary DNN model, we need a representative dataset that covers a large prediction scope. As there is no such available latency dataset, nn-Meter collects and generates 26k CNN models. It contains various operators, configurations, and edge connections, with covering different levels of FLOPs and latency. (Please refer the paper for the dataset generation method and dataset numbers.)
We release the dataset, and provide an interface of `nn_meter.dataset` for users to get access to the dataset. Users can also download the data from the [Download Link](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip) for testing nn-Meter or their own prediction models.
## Hardware-aware NAS by nn-Meter and NNI
To empower affordable DNN on the edge and mobile devices, hardware-aware NAS searches both high accuracy and low latency models. In particular, the search algorithm only considers the models within the target latency constraints during the search process.
Currently we provides example of end-to-end [multi-trial NAS](https://nni.readthedocs.io/en/stable/NAS/multi_trial_nas.html), which is a [random search algorithm](https://arxiv.org/abs/1902.07638) on [SPOS NAS](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123610528.pdf) search space. More examples of more hardware-aware NAS and model compression algorithms are coming soon.
Currently we provides example of end-to-end [multi-trial NAS](https://nni.readthedocs.io/en/stable/NAS/multi_trial_nas.html), which is a [random search algorithm](https://arxiv.org/abs/1902.07638) on [SPOS NAS](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123610528.pdf) search space. More examples of more hardware-aware NAS and model compression algorithms are coming soon.
To run multi-trail SPOS demo, NNI should be installed through source code by following [NNI Doc](https://nni.readthedocs.io/en/stable/Tutorial/InstallationLinux.html#installation)
```bash
python setup.py develop
```
@ -209,8 +218,15 @@ exp_config.dummy_input = [1, 3, 32, 32]
exp.run(exp_config, port)
```
In `exp_config`, `dummy_input` is required for tracing shape info.
## Bench Dataset
To evaluate the effectiveness of a prediction model on an arbitrary DNN model, we need a representative dataset that covers a large prediction scope. nn-Meter collects and generates 26k CNN models. (Please refer the paper for the dataset generation method.)
We release the dataset, and provide an interface of `nn_meter.dataset` for users to get access to the dataset. Users can also download the data from the [Download Link](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip) on their own.
# Contributing
@ -234,7 +250,9 @@ The entire codebase is under [MIT license](https://github.com/microsoft/nn-Meter
The dataset is under [Open Use of Data Agreement](https://github.com/Community-Data-License-Agreements/Releases/blob/main/O-UDA-1.0.md)
# Citation
If you find that nn-Meter helps your research, please consider citing it:
```
@inproceedings{nnmeter,
author = {Zhang, Li Lyna and Han, Shihao and Wei, Jianyu and Zheng, Ningxin and Cao, Ting and Yang, Yuqing and Liu, Yunxin},
@ -254,4 +272,4 @@ If you find that nn-Meter helps your research, please consider citing it:
year = {2021},
url = {https://github.com/microsoft/nn-Meter},
}
```
```

Просмотреть файл

@ -4,29 +4,24 @@ In this folder, we provide several examples to show the usage of nn-Meter packag
The first example [1. Use nn-Meter for models with different format](nn-meter_for_different_model_format.ipynb) shows the basic python binding usage of nn-meter with models with different format of Tensorflow, PyTorch and ONNX model.
For the work of nn-Meter, we construct a latency dataset to test the performance of nn-Meter, together with other methods for comparison.
#### Benchmark dataset
With the publication of nn-Meter, we also release the dataset used for nn-Meter as a bench dataset, and we provide an interface of `nn_meter.dataset` for users to get access to the dataset. Users can also download the data from the [Download Link](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip) on their own.
To evaluate the effectiveness of a prediction model on an arbitrary DNN model, we need a representative dataset that covers a large prediction scope. nn-Meter collects and generates 26k CNN models. (Please refer the paper for the dataset generation method.)
We release the dataset, and provide an interface of `nn_meter.dataset` for users to get access to the dataset. Users can also download the data from the [Download Link](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip) on their own.
Example [2. Use nn-Meter with the bench dataset](nn-meter_for_bench_dataset.ipynb) shows how to use nn-Meter to predict latency for the bench dataset.
Since the dataset is encoded in a graph format, we also provide an example [3. Use bench dataset for GNN training](gnn_for_bench_dataset.ipynb) of using GCN to predict the model latency with the bench dataset.
Hardware-aware NAS
Finally, we provide more hardware-ware NAS examples in NNI.
## Examples list
1. [Use nn-Meter for models with different format](nn-meter_for_different_model_format.ipynb)
2. [Use nn-Meter with the bench dataset](nn-meter_for_bench_dataset.ipynb)
3. [Use bench dataset for GNN training](gnn_for_bench_dataset.ipynb)
4. Use nn-Meter to construct latency constraint in SPOS NAS (TBD)
- [Use nn-Meter in search part](https://github.com/microsoft/nni/blob/master/examples/nas/oneshot/spos/multi_trial.py)
- [Use nn-Meter in sampling part](https://github.com/microsoft/nni/blob/master/examples/nas/oneshot/spos/supernet.py)
5. [Use nn-Meter to construct latency penalty in Proxyless NAS](https://github.com/microsoft/nni/tree/master/examples/nas/oneshot/proxylessnas)
- [Use nn-Meter in search part](https://github.com/microsoft/nni/blob/master/examples/nas/oneshot/spos/multi_trial.py)
- [Use nn-Meter in sampling part](https://github.com/microsoft/nni/blob/master/examples/nas/oneshot/spos/supernet.py)
5. [Use nn-Meter to construct latency penalty in Proxyless NAS](https://github.com/microsoft/nni/tree/master/examples/nas/oneshot/proxylessnas)

Просмотреть файл

@ -0,0 +1,277 @@
{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Latency Dataset - GNN Model\n",
"\n",
"Considering the dataset is encoded in a graph format, here is an example of using GNN to predict the model latency with the bench dataset. \n",
"\n",
"In the previous work of [BRP-NAS](https://arxiv.org/abs/2007.08668v2), the authors propose an end-to-end latency predictor which consists of a GCN. Their GCN predictor demonstrates significant improvement over the layer-wise predictor on [NAS-Bench-201](https://arxiv.org/abs/2001.00326). While on our bench dataset, the preformance of BRP-NAS is consistently poor. As discussed in our paper, the reason is the model graph difference between training and testing set. GNN learns the representation of model graphs. Although the models in our bench dataset have largely overlapped operator types, the operator configurations, edges, and model latency ranges are different.\n",
"\n",
"To better deal with the problems above, we give a GNN example with graph representation improved. We first build our GNN model, which is constructed based on GraphSAGE, and maxpooling is selected as out pooling method. Next, we will start training after the data is loaded. `GNNDataset` and `GNNDataloader` in `nn_meter/dataset/gnn_dataloader.py` build the model structure of the Dataset in `.jsonl` format into our required Dataset and Dataloader. \n",
"\n",
"Let's start our journey!"
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"## Step 1: Build our GraphSAGE Model\n",
"\n",
"We built our model with the help of DGL library."
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 1,
"source": [
"import torch\n",
"import torch.nn as nn\n",
"from torch.nn.modules.module import Module\n",
"\n",
"from dgl.nn.pytorch.glob import MaxPooling\n",
"import dgl.nn as dglnn\n",
"from torch.optim.lr_scheduler import CosineAnnealingLR\n",
"\n",
"\n",
"class GNN(Module):\n",
" def __init__(self, \n",
" num_features=0, \n",
" num_layers=2,\n",
" num_hidden=32,\n",
" dropout_ratio=0):\n",
"\n",
" super(GNN, self).__init__()\n",
" self.nfeat = num_features\n",
" self.nlayer = num_layers\n",
" self.nhid = num_hidden\n",
" self.dropout_ratio = dropout_ratio\n",
" self.gc = nn.ModuleList([dglnn.SAGEConv(self.nfeat if i==0 else self.nhid, self.nhid, 'pool') for i in range(self.nlayer)])\n",
" self.bn = nn.ModuleList([nn.LayerNorm(self.nhid) for i in range(self.nlayer)])\n",
" self.relu = nn.ModuleList([nn.ReLU() for i in range(self.nlayer)])\n",
" self.pooling = MaxPooling()\n",
" self.fc = nn.Linear(self.nhid, 1)\n",
" self.fc1 = nn.Linear(self.nhid, self.nhid)\n",
" self.dropout = nn.ModuleList([nn.Dropout(self.dropout_ratio) for i in range(self.nlayer)])\n",
"\n",
" def forward_single_model(self, g, features):\n",
" x = self.relu[0](self.bn[0](self.gc[0](g, features)))\n",
" x = self.dropout[0](x)\n",
" for i in range(1,self.nlayer):\n",
" x = self.relu[i](self.bn[i](self.gc[i](g, x)))\n",
" x = self.dropout[i](x)\n",
" return x\n",
"\n",
" def forward(self, g, features):\n",
" x = self.forward_single_model(g, features)\n",
" with g.local_scope():\n",
" g.ndata['h'] = x\n",
" x = self.pooling(g, x)\n",
" x = self.fc1(x)\n",
" return self.fc(x)"
],
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"Using backend: pytorch\n"
]
}
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"## Step 2: Loading Data.\n",
"\n",
"Next, we will finish loading the data and learn about the size of the Training and Testing datasets."
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 2,
"source": [
"import os\r\n",
"from nn_meter.dataset import gnn_dataloader\r\n",
"\r\n",
"target_device = \"cortexA76cpu_tflite21\"\r\n",
"\r\n",
"print(\"Processing Training Set.\")\r\n",
"train_set = gnn_dataloader.GNNDataset(train=True, device=target_device) \r\n",
"print(\"Processing Testing Set.\")\r\n",
"test_set = gnn_dataloader.GNNDataset(train=False, device=target_device)\r\n",
"\r\n",
"train_loader = gnn_dataloader.GNNDataloader(train_set, batchsize=1 , shuffle=True)\r\n",
"test_loader = gnn_dataloader.GNNDataloader(test_set, batchsize=1, shuffle=False)\r\n",
"print('Train Dataset Size:', len(train_set))\r\n",
"print('Testing Dataset Size:', len(test_set))\r\n",
"print('Attribute tensor shape:', next(train_loader)[1].ndata['h'].size(1))\r\n",
"ATTR_COUNT = next(train_loader)[1].ndata['h'].size(1)"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Processing Training Set.\n",
"Processing Testing Set.\n",
"Train Dataset Size: 20732\n",
"Testing Dataset Size: 5173\n",
"Attribute tensor shape: 26\n"
]
}
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"## Step 3: Run and Test\n",
"\n",
"We can run the model and evaluate it now!"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 3,
"source": [
"if torch.cuda.is_available():\r\n",
" print(\"Using CUDA.\")\r\n",
"# device = \"cpu\"\r\n",
"device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\r\n",
"\r\n",
"# Start Training\r\n",
"load_model = False\r\n",
"if load_model:\r\n",
" model = GNN(ATTR_COUNT, 3, 400, 0.1).to(device)\r\n",
" opt = torch.optim.AdamW(model.parameters(), lr=4e-4)\r\n",
" checkpoint = torch.load('LatencyGNN.pt')\r\n",
" model.load_state_dict(checkpoint['model_state_dict'])\r\n",
" opt.load_state_dict(checkpoint['optimizer_state_dict'])\r\n",
" # EPOCHS = checkpoint['epoch']\r\n",
" EPOCHS = 0\r\n",
" loss_func = checkpoint['loss']\r\n",
"else:\r\n",
" model = GNN(ATTR_COUNT, 3, 400, 0.1).to(device)\r\n",
" opt = torch.optim.AdamW(model.parameters(), lr=4e-4)\r\n",
" EPOCHS=20\r\n",
" loss_func = nn.L1Loss()\r\n",
"\r\n",
"lr_scheduler = CosineAnnealingLR(opt, T_max=EPOCHS)\r\n",
"loss_sum = 0\r\n",
"for epoch in range(EPOCHS):\r\n",
" train_length = len(train_set)\r\n",
" tran_acc_ten = 0\r\n",
" loss_sum = 0 \r\n",
" # latency, graph, types, flops\r\n",
" for batched_l, batched_g in train_loader:\r\n",
" opt.zero_grad()\r\n",
" batched_l = batched_l.to(device).float()\r\n",
" batched_g = batched_g.to(device)\r\n",
" batched_f = batched_g.ndata['h'].float()\r\n",
" logits = model(batched_g, batched_f)\r\n",
" for i in range(len(batched_l)):\r\n",
" pred_latency = logits[i].item()\r\n",
" prec_latency = batched_l[i].item()\r\n",
" if (pred_latency >= 0.9 * prec_latency) and (pred_latency <= 1.1 * prec_latency):\r\n",
" tran_acc_ten += 1\r\n",
" # print(\"true latency: \", batched_l)\r\n",
" # print(\"Predict latency: \", logits)\r\n",
" batched_l = torch.reshape(batched_l, (-1 ,1))\r\n",
" loss = loss_func(logits, batched_l)\r\n",
" loss_sum += loss\r\n",
" loss.backward()\r\n",
" opt.step()\r\n",
" lr_scheduler.step()\r\n",
" print(\"[Epoch \", epoch, \"]: \", \"Training accuracy within 10%: \", tran_acc_ten / train_length * 100, \" %.\")\r\n",
" # print('Learning Rate:', lr_scheduler.get_last_lr())\r\n",
" # print('Loss:', loss_sum / train_length)\r\n",
"\r\n",
"# Save The Best Model\r\n",
"torch.save({\r\n",
" 'epoch': EPOCHS,\r\n",
" 'model_state_dict': model.state_dict(),\r\n",
" 'optimizer_state_dict': opt.state_dict(),\r\n",
" 'loss': loss_func,\r\n",
"}, 'LatencyGNN.pt')\r\n",
"\r\n",
"# Start Testing\r\n",
"count = 0\r\n",
"with torch.no_grad():\r\n",
" test_length = len(test_set)\r\n",
" test_acc_ten = 0\r\n",
" for batched_l, batched_g in test_loader:\r\n",
" batched_l = batched_l.to(device).float()\r\n",
" batched_g = batched_g.to(device)\r\n",
" batched_f = batched_g.ndata['h'].float()\r\n",
" result = model(batched_g, batched_f)\r\n",
" if (result.item() >= 0.9 * batched_l.item()) and (result.item() <= 1.1 * batched_l.item()):\r\n",
" test_acc_ten += 1\r\n",
" acc = (abs(result.item() - batched_l.item()) / batched_l.item()) * 100\r\n",
" count += 1\r\n",
" print(\"Testing accuracy within 10%: \", test_acc_ten / test_length * 100, \" %.\")"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"[Epoch 0 ]: Training accuracy within 10%: 21.999807061547365 %.\n",
"[Epoch 1 ]: Training accuracy within 10%: 27.725255643449742 %.\n",
"[Epoch 2 ]: Training accuracy within 10%: 30.228632066370825 %.\n",
"[Epoch 3 ]: Training accuracy within 10%: 31.357322014277443 %.\n",
"[Epoch 4 ]: Training accuracy within 10%: 33.06000385876906 %.\n",
"[Epoch 5 ]: Training accuracy within 10%: 34.917036465367545 %.\n",
"[Epoch 6 ]: Training accuracy within 10%: 36.48466139301563 %.\n",
"[Epoch 7 ]: Training accuracy within 10%: 39.070036658306 %.\n",
"[Epoch 8 ]: Training accuracy within 10%: 40.10708084121165 %.\n",
"[Epoch 9 ]: Training accuracy within 10%: 41.530001929384525 %.\n",
"[Epoch 10 ]: Training accuracy within 10%: 43.26162454177118 %.\n",
"[Epoch 11 ]: Training accuracy within 10%: 45.34053636889832 %.\n",
"[Epoch 12 ]: Training accuracy within 10%: 48.45166891761528 %.\n",
"[Epoch 13 ]: Training accuracy within 10%: 50.945398417904684 %.\n",
"[Epoch 14 ]: Training accuracy within 10%: 54.5774647887324 %.\n",
"[Epoch 15 ]: Training accuracy within 10%: 56.08238471927455 %.\n",
"[Epoch 16 ]: Training accuracy within 10%: 59.54562994404785 %.\n",
"[Epoch 17 ]: Training accuracy within 10%: 62.41076596565696 %.\n",
"[Epoch 18 ]: Training accuracy within 10%: 63.65521898514373 %.\n",
"[Epoch 19 ]: Training accuracy within 10%: 64.6826162454177 %.\n",
"Testing accuracy within 10%: 60.042528513435144 %.\n"
]
}
],
"metadata": {}
}
],
"metadata": {
"interpreter": {
"hash": "0238da245144306487e61782d9cba9bf2e5e19842e5054371ac0cfbea9be2b57"
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

Просмотреть файл

@ -12,7 +12,9 @@ from .nn_meter import (
load_latency_predictor,
list_latency_predictors,
model_file_to_graph,
model_to_graph
model_to_graph,
create_user_configs,
change_user_data_folder
)
from .utils.utils import download_from_url
from .prediction import latency_metrics

Просмотреть файл

@ -1,3 +1,4 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
from .bench_dataset import bench_dataset # TODO: add GNNDataloader and GNNDataset here @wenxuan
from .bench_dataset import bench_dataset
from .gnn_dataloader import GNNDataset, GNNDataloader

Просмотреть файл

@ -4,21 +4,22 @@ import os, sys
from nn_meter.prediction import latency_metrics
from glob import glob
from nn_meter.nn_meter import list_latency_predictors, load_latency_predictor
from nn_meter.nn_meter import list_latency_predictors, load_latency_predictor, get_user_data_folder
from nn_meter import download_from_url
import jsonlines
import logging
__user_dataset_folder__ = os.path.expanduser('~/.nn_meter/dataset')
__user_dataset_folder__ = os.path.join(get_user_data_folder(), 'dataset')
def bench_dataset(url="https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip"):
if not os.path.isdir(__user_dataset_folder__):
os.makedirs(__user_dataset_folder__)
def bench_dataset(url="https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip",
data_folder=__user_dataset_folder__):
if not os.path.isdir(data_folder):
os.makedirs(data_folder)
logging.keyinfo(f'Download from {url} ...')
download_from_url(url, __user_dataset_folder__)
download_from_url(url, data_folder)
datasets = glob(os.path.join(__user_dataset_folder__, "**.jsonl"))
datasets = glob(os.path.join(data_folder, "**.jsonl"))
return datasets
if __name__ == '__main__':

Просмотреть файл

@ -1,2 +1,275 @@
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT license.
# Licensed under the MIT license.
import torch
import jsonlines
import os
import random
from .bench_dataset import bench_dataset
from nn_meter.nn_meter import get_user_data_folder
from nn_meter.utils.utils import try_import_dgl
RAW_DATA_URL = "https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip"
__user_dataset_folder__ = os.path.join(get_user_data_folder(), 'dataset')
hws = [
"cortexA76cpu_tflite21",
"adreno640gpu_tflite21",
"adreno630gpu_tflite21",
"myriadvpu_openvino2019r2",
]
class GNNDataset(torch.utils.data.Dataset):
def __init__(self, train=True, device="cortexA76cpu_tflite21", split_ratio=0.8):
"""
Dataloader of the Latency Dataset
Parameters
----------
data_dir : string
Path to save the downloaded dataset
train: bool
Get the train dataset or the test dataset
device: string
The Device type of the corresponding latency
shuffle: bool
If shuffle the dataset at the begining of an epoch
batch_size: int
Batch size.
split_ratio: float
The ratio to split the train dataset and the test dataset.
"""
err_str = "Not supported device type"
assert device in hws, err_str
self.device = device
self.data_dir = __user_dataset_folder__
self.train = train
self.split_ratio = split_ratio
self.adjs = {}
self.attrs = {}
self.nodename2id = {}
self.id2nodename = {}
self.op_types = set()
self.opname2id = {}
self.raw_data = {}
self.name_list = []
self.latencies = {}
self.download_data()
self.load_model_archs_and_latencies(self.data_dir)
self.construct_attrs()
self.name_list = list(
filter(lambda x: x in self.latencies, self.name_list))
def download_data(self):
datasets = bench_dataset()
def load_model_archs_and_latencies(self, data_dir):
filelist = os.listdir(data_dir)
for filename in filelist:
if os.path.splitext(filename)[-1] != '.jsonl':
continue
self.load_model(os.path.join(data_dir, filename))
def load_model(self, fpath):
"""
Load a concrete model type.
"""
# print('Loading models in ', fpath)
assert os.path.exists(fpath), '{} does not exists'.format(fpath)
with jsonlines.open(fpath) as reader:
_names = []
for obj in reader:
if obj[self.device]:
# print(obj['id'])
_names.append(obj['id'])
self.latencies[obj['id']] = float(obj[self.device])
_names = sorted(_names)
split_ratio = self.split_ratio if self.train else 1-self.split_ratio
count = int(len(_names) * split_ratio)
if self.train:
_model_names = _names[:count]
else:
_model_names = _names[-1*count:]
self.name_list.extend(_model_names)
with jsonlines.open(fpath) as reader:
for obj in reader:
if obj['id'] in _model_names:
model_name = obj['id']
model_data = obj['graph']
self.parse_model(model_name, model_data)
self.raw_data[model_name] = model_data
def construct_attrs(self):
"""
Construct the attributes matrix for each model.
Attributes tensor:
one-hot encoded type + input_channel , output_channel,
input_h, input_w + kernel_size + stride
"""
op_types_list = list(sorted(self.op_types))
for i, _op in enumerate(op_types_list):
self.opname2id[_op] = i
n_op_type = len(self.op_types)
attr_len = n_op_type + 6
for model_name in self.raw_data:
n_node = len(self.raw_data[model_name])
# print("Model: ", model_name, " Number of Nodes: ", n_node)
t_attr = torch.zeros(n_node, attr_len)
for node in self.raw_data[model_name]:
node_attr = self.raw_data[model_name][node]
nid = self.nodename2id[model_name][node]
op_type = node_attr['attr']['type']
op_id = self.opname2id[op_type]
t_attr[nid][op_id] = 1
other_attrs = self.parse_node(model_name, node)
t_attr[nid][-6:] = other_attrs
self.attrs[model_name] = t_attr
def parse_node(self, model_name, node_name):
"""
Parse the attributes of specified node
Get the input_c, output_c, input_h, input_w, kernel_size, stride
of this node. Note: filled with 0 by default if this doesn't have
coressponding attribute.
"""
node_data = self.raw_data[model_name][node_name]
t_attr = torch.zeros(6)
op_type = node_data['attr']['type']
if op_type =='Conv2D':
weight_shape = node_data['attr']['attr']['weight_shape']
kernel_size, _, in_c, out_c = weight_shape
stride, _= node_data['attr']['attr']['strides']
_, h, w, _ = node_data['attr']['output_shape'][0]
t_attr = torch.tensor([in_c, out_c, h, w, kernel_size, stride])
elif op_type == 'DepthwiseConv2dNative':
weight_shape = node_data['attr']['attr']['weight_shape']
kernel_size, _, in_c, out_c = weight_shape
stride, _= node_data['attr']['attr']['strides']
_, h, w, _ = node_data['attr']['output_shape'][0]
t_attr = torch.tensor([in_c, out_c, h, w, kernel_size, stride])
elif op_type == 'MatMul':
in_node = node_data['inbounds'][0]
in_shape = self.raw_data[model_name][in_node]['attr']['output_shape'][0]
in_c = in_shape[-1]
out_c = node_data['attr']['output_shape'][0][-1]
t_attr[0] = in_c
t_attr[1] = out_c
elif len(node_data['inbounds']):
in_node = node_data['inbounds'][0]
h, w, in_c, out_c = 0, 0, 0, 0
in_shape = self.raw_data[model_name][in_node]['attr']['output_shape'][0]
in_c = in_shape[-1]
if 'ConCat' in op_type:
for i in range(1, len(node_data['in_bounds'])):
in_shape = self.raw_data[node_data['in_bounds']
[i]]['attr']['output_shape'][0]
in_c += in_shape[-1]
if len(node_data['attr']['output_shape']):
out_shape = node_data['attr']['output_shape'][0]
# N, H, W, C
out_c = out_shape[-1]
if len(out_shape) == 4:
h, w = out_shape[1], out_shape[2]
t_attr[-6:-2] = torch.tensor([in_c, out_c, h, w])
return t_attr
def parse_model(self, model_name, model_data):
"""
Parse the model data and build the adjacent matrixes
"""
n_nodes = len(model_data)
m_adj = torch.zeros(n_nodes, n_nodes, dtype=torch.int32)
id2name = {}
name2id = {}
tmp_node_id = 0
# build the mapping between the node name and node id
for node_name in model_data.keys():
id2name[tmp_node_id] = node_name
name2id[node_name] = tmp_node_id
op_type = model_data[node_name]['attr']['type']
self.op_types.add(op_type)
tmp_node_id += 1
for node_name in model_data:
cur_id = name2id[node_name]
for node in model_data[node_name]['inbounds']:
if node not in name2id:
# weight node
continue
in_id = name2id[node]
m_adj[in_id][cur_id] = 1
for node in model_data[node_name]['outbounds']:
if node not in name2id:
# weight node
continue
out_id = name2id[node]
m_adj[cur_id][out_id] = 1
for idx in range(n_nodes):
m_adj[idx][idx] = 1
self.adjs[model_name] = m_adj
self.nodename2id[model_name] = name2id
self.id2nodename[model_name] = id2name
def __getitem__(self, index):
model_name = self.name_list[index]
return (self.adjs[model_name], self.attrs[model_name]), self.latencies[model_name], self.op_types
def __len__(self):
return len(self.name_list)
class GNNDataloader(torch.utils.data.DataLoader):
def __init__(self, dataset, shuffle=False, batchsize=1):
self.dataset = dataset
self.op_num = len(dataset.op_types)
self.shuffle = shuffle
self.batchsize = batchsize
self.length = len(self.dataset)
self.indexes = list(range(self.length))
self.pos = 0
self.graphs = {}
self.latencies = {}
self.construct_graphs()
def construct_graphs(self):
dgl = try_import_dgl()
for gid in range(self.length):
(adj, attrs), latency, op_types = self.dataset[gid]
u, v = torch.nonzero(adj, as_tuple=True)
# import pdb; pdb.set_trace()
graph = dgl.graph((u, v))
MAX_NORM = torch.tensor([1]*len(op_types) + [6963, 6963, 224, 224, 11, 4])
attrs = attrs / MAX_NORM
graph.ndata['h'] = attrs
self.graphs[gid] = graph
self.latencies[gid] = latency
def __iter__(self):
if self.shuffle:
random.shuffle(self.indexes)
self.pos = 0
return self
def __len__(self):
return self.length
def __next__(self):
dgl = try_import_dgl()
start = self.pos
end = min(start + self.batchsize, self.length)
self.pos = end
if end - start <= 0:
raise StopIteration
batch_indexes = self.indexes[start:end]
batch_graphs = [self.graphs[i] for i in batch_indexes]
batch_latencies = [self.latencies[i] for i in batch_indexes]
return torch.tensor(batch_latencies), dgl.batch(batch_graphs)

Просмотреть файл

@ -18,8 +18,10 @@ class ShapeInference:
"AddN", # AddN does not support prodcast really
"AddV2",
"Subtract",
"Sub",
"MulNoNan",
"Multiply"
"Multiply",
"Mul",
"Div",
"DivNoNan",
"Equal",
@ -866,7 +868,7 @@ class ShapeInference:
return [input_shape], [exp_output_shape]
@staticmethod
def Packed_get_shape(graph, node):
def Pack_get_shape(graph, node):
"""
Get shape of a Transpose operator.
Patched for kernel detector.
@ -970,7 +972,7 @@ class ShapeInference:
# This is a patching for back-end, since backend extract shapes from
# those two ops.
for node_name in seq:
if model_graph.get_node_type(node_name) in ["Packed", "StridedSlice"]:
if model_graph.get_node_type(node_name) in ["Pack", "StridedSlice"]:
node_get_shape_name = model_graph.get_node_type(node_name) + "_get_shape"
input_shape, output_shape = eval("self." + node_get_shape_name)(
graph, graph[node_name]

Просмотреть файл

@ -19,10 +19,13 @@ def _nchw_to_nhwc(shapes):
class NNIIRConverter:
def __init__(self, ir_model):
from nni.retiarii.converter.graph_gen import GraphConverterWithShape
self.ir_model = ir_model.fork()
GraphConverterWithShape().flatten(self.ir_model)
try:
from nni.retiarii.converter.utils import flatten_model_graph
self.ir_model = flatten_model_graph(ir_model)
except:
from nni.retiarii.converter.graph_gen import GraphConverterWithShape
self.ir_model = ir_model.fork()
GraphConverterWithShape().flatten(self.ir_model)
def convert(self):
graph = self._to_graph_layout()
@ -44,12 +47,12 @@ class NNIIRConverter:
k: v
for k, v in node.operation.parameters.items()
},
"input_shape": _nchw_to_nhwc(node.operation.parameters["input_shape"]
"input_shape": _nchw_to_nhwc(node.operation.parameters.get("input_shape")
if "input_shape" in node.operation.parameters
else node.input_shape),
"output_shape": _nchw_to_nhwc(node.operation.parameters["output_shape"]
else node.operation.attr.get('input_shape')),
"output_shape": _nchw_to_nhwc(node.operation.parameters.get("output_shape")
if "output_shape" in node.operation.parameters
else node.output_shape),
else node.operation.attr.get('output_shape')),
"type": node.operation.type,
},
"inbounds": [],

Просмотреть файл

@ -14,7 +14,7 @@ from packaging import version
import logging
__user_config_folder__ = os.path.expanduser('~/.nn_meter/config')
__user_data_folder__ = os.path.expanduser('~/.nn_meter/data')
__default_user_data_folder__ = os.path.expanduser('~/.nn_meter/data')
__predictors_cfg_filename__ = 'predictors.yaml'
@ -26,6 +26,33 @@ def create_user_configs():
# TODO/backlog: to handle config merging when upgrading
for f in pkg_resources.resource_listdir(__name__, 'configs'):
copyfile(pkg_resources.resource_filename(__name__, f'configs/{f}'), os.path.join(__user_config_folder__, f))
# make default setting yaml file
with open(os.path.join(__user_config_folder__, 'settings.yaml'), 'w') as fp:
yaml.dump({'data_folder': __default_user_data_folder__}, fp)
def get_user_data_folder():
"""get user data folder in settings.yaml
"""
filepath = os.path.join(__user_config_folder__, 'settings.yaml')
try:
with open(filepath) as fp:
return os.path.join(yaml.load(fp, yaml.FullLoader)['data_folder'])
except FileNotFoundError:
logging.info(f"setting file {filepath} not found, created")
create_user_configs()
return get_user_data_folder()
def change_user_data_folder(new_folder):
"""change user data folder in settings.yaml
"""
os.makedirs(new_folder, exist_ok=True)
with open(os.path.join(__user_config_folder__, 'settings.yaml')) as fp:
setting = yaml.load(fp, yaml.FullLoader)
with open(os.path.join(__user_config_folder__, 'settings.yaml'), 'w') as fp:
setting['data_folder'] = new_folder
yaml.dump(setting, fp)
def load_config_file(fname: str, loader=None):
@ -91,8 +118,9 @@ def load_latency_predictor(predictor_name: str, predictor_version: float = None)
predictor_version: string to specify the version of the target latency predictor. If not specified (default as None), the lateast version of the
predictor will be loaded.
"""
user_data_folder = get_user_data_folder()
pred_info = load_predictor_config(predictor_name, predictor_version)
kernel_predictors, fusionrule = loading_to_local(pred_info, __user_data_folder__)
kernel_predictors, fusionrule = loading_to_local(pred_info, os.path.join(user_data_folder, 'predictor'))
return nnMeter(kernel_predictors, fusionrule)

Просмотреть файл

@ -36,7 +36,6 @@ def download_from_url(urladdr, ppath):
progress_bar.close()
os.remove(file_name)
def try_import_onnx(require_version = "1.9.0"):
try:
import onnx
@ -47,7 +46,6 @@ def try_import_onnx(require_version = "1.9.0"):
logging.error(f'You have not install the onnx package, please install onnx=={require_version} and try again.')
exit()
def try_import_torch(require_version = "1.7.1"):
try:
import torch
@ -58,7 +56,6 @@ def try_import_torch(require_version = "1.7.1"):
logging.error(f'You have not install the torch package, please install torch=={require_version} and try again.')
exit()
def try_import_tensorflow(require_version = "1.15.0"):
try:
import tensorflow
@ -69,7 +66,6 @@ def try_import_tensorflow(require_version = "1.15.0"):
logging.error(f'You have not install the tensorflow package, please install tensorflow=={require_version} and try again.')
exit()
def try_import_torchvision_models():
try:
import torchvision
@ -78,7 +74,6 @@ def try_import_torchvision_models():
logging.error(f'You have not install the torchvision package, please install torchvision and try again.')
exit()
def try_import_onnxsim():
try:
from onnxsim import simplify
@ -86,4 +81,12 @@ def try_import_onnxsim():
except ImportError:
logging.error(f'You have not install the onnx-simplifier package, please install onnx-simplifier and try again.')
exit()
def try_import_dgl():
try:
import dgl
return dgl
except ImportError:
logging.error(f'You have not install the dgl package, please install dgl and try again.')
exit()