Merge pull request #29 from microsoft/dev/setup-refactor

Refactor: refine examples and doc
2021-10-11 13:03:43 +08:00 · 2021-10-11 13:03:43 +08:00 · 2a01c6134d
--- a/README.md
+++ b/README.md
@ -4,8 +4,8 @@ Note: This is an alpha (preview) version which is still under refining.

 The current supported hardware and inference frameworks:

-|       Device       |   Framework   |   Processor   | +-10%  Accuracy |           Hardware name           |
-| :-----------------: | :------------: | :------------: | :-------------: | :--------------------------------: |
+|       Device       |   Framework   |   Processor   | +-10%  Accuracy |      Hardware name      |
+| :-----------------: | :------------: | :------------: | :-------------: | :----------------------: |
 |       Pixel4       |  TFLite v2.1  | CortexA76 CPU |      99.0%      |  cortexA76cpu_tflite21  |
 |         Mi9         |  TFLite v2.1  | Adreno 640 GPU |      99.1%      |  adreno640gpu_tflite21  |
 |      Pixel3XL      |  TFLite v2.1  | Adreno 630 GPU |      99.0%      |  adreno630gpu_tflite21  |
@ -20,6 +20,7 @@ The current supported hardware and inference frameworks:
 - Those who want to get the DNN inference latency on mobile and edge devices with **no deployment efforts on real devices**.
 - Those who want to run **hardware-aware NAS with [NNI](https://github.com/microsoft/nni)**.
 - Those who want to **build latency predictors for their own devices**.
+- Those who want to use the 26k latency [benchmark dataset](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip).

 # Installation

@ -30,58 +31,59 @@ pip install nn-meter
 ```

 If you want to try latest code, please install nn-Meter from source code. First git clone nn-Meter package to local:
+
 ```Bash
 git clone git@github.com:microsoft/nn-Meter.git
 cd nn-Meter
 ```
+
 Then simply run the following pip install in an environment that has `python >= 3.6`. The command will complete the automatic installation of all necessary dependencies and nn-Meter.
+
 ```Bash
 pip install .
 ```

 nn-Meter is a latency predictor of models with type of Tensorflow, PyTorch, Onnx, nn-meter IR graph and [NNI IR graph](https://github.com/microsoft/nni). To use nn-Meter for specific model type, you also need to install corresponding required packages. The well tested versions are listed below:

-|  Testing Model Type   |                       Requirements                      |
-| :-------------------: | :------------------------------------------------:     |
-|       Tensorflow      |  `tensorflow==1.15.0`                                  |
-|         Torch         |  `torch==1.7.1`, `torchvision==0.8.2`, (alternative)[`onnx==1.9.0`, `onnx-simplifier==0.3.6`] or [`nni==2.4`][1]  |
-|          Onnx         |  `onnx==1.9.0`                                         |
-|    nn-Meter IR graph  |   ---                                                  |
-|      NNI IR graph     |  `nni==2.4`                                            |
+| Testing Model Type |                                                       Requirements                                                       |
+| :----------------: | :-----------------------------------------------------------------------------------------------------------------------: |
+|     Tensorflow     |                                                  `tensorflow==1.15.0`                                                  |
+|       Torch       | `torch==1.7.1`, `torchvision==0.8.2`, (alternative)[`onnx==1.9.0`, `onnx-simplifier==0.3.6`] or [`nni==2.4`][1] |
+|        Onnx        |                                                      `onnx==1.9.0`                                                      |
+| nn-Meter IR graph |                                                            ---                                                            |
+|    NNI IR graph    |                                                       `nni==2.4`                                                       |

 [1] Please refer to [nn-Meter Usage](#torch-model-converters) for more information.

-
 Please also check the versions of `numpy` and `scikit_learn`. The different versions may change the prediction accuracy of kernel predictors.

 The stable version of wheel binary package will be released soon.

-
 # Usage

 To apply for hardware latency prediction, nn-Meter provides two types of interfaces：

- command line `nn-meter` after `nn-meter` [installation](QuickStart.md#Installation).
+- command line `nn-meter` after `nn-meter`[installation](QuickStart.md#Installation).
 - Python binding provided by the module `nn_meter`

 Here is a summary of supported inputs of the two methods.

-|       Testing Model Type       |                                   Command Support                                   |                                                   Python Binding                                                   |
-| :---------------: | :---------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------: |
-|    Tensorflow    |         Checkpoint file dumped by `tf.saved_model()` and end with `.pb`         |                          Checkpoint file dumped by `tf.saved_model` and end with `.pb`                          |
-|       Torch       |                          Models in `torchvision.models`                          |                                            Object of `torch.nn.Module`                                            |
-|       Onnx       |           Checkpoint file dumped by `torch.onnx.export()` or `onnx.save()` and end with `.onnx`           |                    Checkpoint file dumped by `onnx.save()` or model loaded by `onnx.load()`                    |
-| nn-Meter IR graph | Json file in the format of [nn-Meter IR Graph](./docs/input_models.md#nnmeter-ir-graph) |          `dict` object following the format of [nn-Meter IR Graph](./docs/input_models.md#nnmeter-ir-graph)          |
-|   NNI IR graph   |                                          -                                          | NNI IR graph object |
+| Testing Model Type |                                       Command Support                                       |                                          Python Binding                                          |
+| :----------------: | :-----------------------------------------------------------------------------------------: | :-----------------------------------------------------------------------------------------------: |
+|     Tensorflow     |             Checkpoint file dumped by `tf.saved_model()` and end with `.pb`             |                 Checkpoint file dumped by `tf.saved_model` and end with `.pb`                 |
+|       Torch       |                              Models in `torchvision.models`                              |                                   Object of `torch.nn.Module`                                   |
+|        Onnx        | Checkpoint file dumped by `torch.onnx.export()` or `onnx.save()` and end with `.onnx` |           Checkpoint file dumped by `onnx.save()` or model loaded by `onnx.load()`           |
+| nn-Meter IR graph |     Json file in the format of[nn-Meter IR Graph](./docs/input_models.md#nnmeter-ir-graph)     | `dict` object following the format of [nn-Meter IR Graph](./docs/input_models.md#nnmeter-ir-graph) |
+|    NNI IR graph    |                                              -                                              |                                        NNI IR graph object                                        |

 In both methods, users could appoint predictor name and version to target a specific hardware platform (device). Currently, nn-Meter supports prediction on the following four configs:

 | Predictor (device_inferenceframework) | Processor Category | Version |
 | :-----------------------------------: | :----------------: | :-----: |
-|         cortexA76cpu_tflite21         |        CPU         |   1.0   |
-|         adreno640gpu_tflite21         |        GPU         |   1.0   |
-|         adreno630gpu_tflite21         |        GPU         |   1.0   |
-|       myriadvpu_openvino2019r2        |        VPU         |   1.0   |
+|         cortexA76cpu_tflite21         |        CPU        |   1.0   |
+|         adreno640gpu_tflite21         |        GPU        |   1.0   |
+|         adreno630gpu_tflite21         |        GPU        |   1.0   |
+|       myriadvpu_openvino2019r2       |        VPU        |   1.0   |

 Users can get all predefined predictors and versions by running

@ -147,7 +149,7 @@ By calling `load_latency_predictor`, user selects the target hardware and loads

 In `predictor.predict()`, the allowed items of the parameter `model_type` include `["pb", "torch", "onnx", "nnmeter-ir", "nni-ir"]`, representing model types of tensorflow, torch, onnx, nn-meter IR graph and NNI IR graph, respectively.

-<span id="torch-model-converters"> For Torch models, the shape of feature maps is unknown merely based on the given network structure, which is, however, significant parameters in latency prediction. Therefore, torch model requires a shape of input tensor for inference as a input of `predictor.predict()`. Based on the given input shape, a random tensor according to the shape will be generated and used. Another thing for Torch model prediction is that users can install the `onnx` and `onnx-simplifier` packages for latency prediction (referred to as Onnx-based latency prediction for torch model), or alternatively install the `nni` package (referred to as NNI-based latency prediction for torch model). Note that the `nni` option does not support command line calls. In addition, if users use `nni` for latency prediction, the PyTorch modules should be defined by the `nn` interface from NNI `import nni.retiarii.nn.pytorch as nn` (view [NNI doc](https://nni.readthedocs.io/en/stable/NAS/QuickStart.html#define-base-model) for more information), and the parameter `apply_nni` should be set as `True` in the function `predictor.predict()`. Here is an example of NNI-based latency prediction for Torch model:
+`<span id="torch-model-converters">` For Torch models, the shape of feature maps is unknown merely based on the given network structure, which is, however, significant parameters in latency prediction. Therefore, torch model requires a shape of input tensor for inference as a input of `predictor.predict()`. Based on the given input shape, a random tensor according to the shape will be generated and used. Another thing for Torch model prediction is that users can install the `onnx` and `onnx-simplifier` packages for latency prediction (referred to as Onnx-based latency prediction for torch model), or alternatively install the `nni` package (referred to as NNI-based latency prediction for torch model). Note that the `nni` option does not support command line calls. In addition, if users use `nni` for latency prediction, the PyTorch modules should be defined by the `nn` interface from NNI `import nni.retiarii.nn.pytorch as nn` (view [NNI doc](https://nni.readthedocs.io/en/stable/NAS/QuickStart.html#define-base-model) for more information), and the parameter `apply_nni` should be set as `True` in the function `predictor.predict()`. Here is an example of NNI-based latency prediction for Torch model:

 ```python
 import nni.retiarii.nn.pytorch as nn
@ -162,19 +164,26 @@ input_shape = (1, 3, 224, 224)
 lat = predictor.predict(model, model_type='torch', input_shape=input_shape, apply_nni=True) 
 ```

-The Onnx-based latency prediction for torch model is stable but slower, while the NNI-based latency prediction for torch model is unstable as it could fail in some case but much faster compared to the Onnx-based model. The Onnx-based model is set as the default one for Torch model latency prediction in nn-Meter. Users could choose which one they preferred to use according to their needs. </span>
+The Onnx-based latency prediction for torch model is stable but slower, while the NNI-based latency prediction for torch model is unstable as it could fail in some case but much faster compared to the Onnx-based model. The Onnx-based model is set as the default one for Torch model latency prediction in nn-Meter. Users could choose which one they preferred to use according to their needs. 

 Users could view the information all built-in predictors by `list_latency_predictors` or view the config file in `nn_meter/configs/predictors.yaml`.

 Users could get a nn-Meter IR graph by applying `model_file_to_graph` and `model_to_graph` by calling the model name or model object and specify the model type. The supporting model types of `model_file_to_graph` include "onnx", "pb", "torch", "nnmeter-ir" and "nni-ir", while the supporting model types of `model_to_graph` include "onnx", "torch" and "nni-ir".

+## Benchmark Dataset
+
+To evaluate the effectiveness of a prediction model on an arbitrary DNN model, we need a representative dataset that covers a large prediction scope. As there is no such available latency dataset, nn-Meter collects and generates 26k CNN models. It contains various operators, configurations, and edge connections, with covering different levels of FLOPs and latency. (Please refer the paper for the dataset generation method and dataset numbers.)
+
+We release the dataset, and provide an interface of `nn_meter.dataset` for users to get access to the dataset. Users can also download the data from the [Download Link](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip) for testing nn-Meter or their own prediction models. 
+
 ## Hardware-aware NAS by nn-Meter and NNI

 To empower affordable DNN on the edge and mobile devices, hardware-aware NAS searches both high accuracy and low latency models. In particular, the search algorithm only considers the models within the target latency constraints during the search process.

-Currently we provides example of end-to-end [multi-trial NAS](https://nni.readthedocs.io/en/stable/NAS/multi_trial_nas.html), which is a [random search algorithm](https://arxiv.org/abs/1902.07638) on [SPOS NAS](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123610528.pdf) search space. More examples of more hardware-aware NAS and model compression algorithms are coming soon. 
+Currently we provides example of end-to-end [multi-trial NAS](https://nni.readthedocs.io/en/stable/NAS/multi_trial_nas.html), which is a [random search algorithm](https://arxiv.org/abs/1902.07638) on [SPOS NAS](https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123610528.pdf) search space. More examples of more hardware-aware NAS and model compression algorithms are coming soon.

 To run multi-trail SPOS demo, NNI should be installed through source code by following [NNI Doc](https://nni.readthedocs.io/en/stable/Tutorial/InstallationLinux.html#installation)
+
 ```bash
 python setup.py develop
 ```
@ -209,8 +218,15 @@ exp_config.dummy_input = [1, 3, 32, 32]

 exp.run(exp_config, port)
 ```
+
 In `exp_config`, `dummy_input` is required for tracing shape info.

+## Bench Dataset
+
+To evaluate the effectiveness of a prediction model on an arbitrary DNN model, we need a representative dataset that covers a large prediction scope. nn-Meter collects and generates 26k CNN models. (Please refer the paper for the dataset generation method.)
+
+We release the dataset, and provide an interface of `nn_meter.dataset` for users to get access to the dataset. Users can also download the data from the [Download Link](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip) on their own. 
+


 # Contributing
@ -234,7 +250,9 @@ The entire codebase is under [MIT license](https://github.com/microsoft/nn-Meter
 The dataset is under [Open Use of Data Agreement](https://github.com/Community-Data-License-Agreements/Releases/blob/main/O-UDA-1.0.md)

 # Citation
+
 If you find that nn-Meter helps your research, please consider citing it:
+
 ```
@inproceedings{nnmeter,
    author = {Zhang, Li Lyna and Han, Shihao and Wei, Jianyu and Zheng, Ningxin and Cao, Ting and Yang, Yuqing and Liu, Yunxin},
@ -254,4 +272,4 @@ If you find that nn-Meter helps your research, please consider citing it:
    year = {2021},
    url = {https://github.com/microsoft/nn-Meter},
 }
-```
+```
--- a/examples/README.md
+++ b/examples/README.md
@ -4,29 +4,24 @@ In this folder, we provide several examples to show the usage of nn-Meter packag

 The first example [1. Use nn-Meter for models with different format](nn-meter_for_different_model_format.ipynb) shows the basic python binding usage of nn-meter with models with different format of Tensorflow, PyTorch and ONNX model.

-For the work of nn-Meter, we construct a latency dataset to test the performance of nn-Meter, together with other methods for comparison.
+#### Benchmark dataset

-With the publication of nn-Meter, we also release the dataset used for nn-Meter as a bench dataset, and we provide an interface of `nn_meter.dataset` for users to get access to the dataset. Users can also download the data from the [Download Link](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip) on their own. 
+To evaluate the effectiveness of a prediction model on an arbitrary DNN model, we need a representative dataset that covers a large prediction scope. nn-Meter collects and generates 26k CNN models. (Please refer the paper for the dataset generation method.)
+
+We release the dataset, and provide an interface of `nn_meter.dataset` for users to get access to the dataset. Users can also download the data from the [Download Link](https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip) on their own. 

 Example [2. Use nn-Meter with the bench dataset](nn-meter_for_bench_dataset.ipynb) shows how to use nn-Meter to predict latency for the bench dataset.

 Since the dataset is encoded in a graph format, we also provide an example [3. Use bench dataset for GNN training](gnn_for_bench_dataset.ipynb) of using GCN to predict the model latency with the bench dataset.

-Hardware-aware NAS  
+Finally, we provide more hardware-ware NAS examples in NNI.

 ## Examples list

 1. [Use nn-Meter for models with different format](nn-meter_for_different_model_format.ipynb)
-
 2. [Use nn-Meter with the bench dataset](nn-meter_for_bench_dataset.ipynb)
-
 3. [Use bench dataset for GNN training](gnn_for_bench_dataset.ipynb)
-
 4. Use nn-Meter to construct latency constraint in SPOS NAS (TBD)
-
-    - [Use nn-Meter in search part](https://github.com/microsoft/nni/blob/master/examples/nas/oneshot/spos/multi_trial.py)
-
-    - [Use nn-Meter in sampling part](https://github.com/microsoft/nni/blob/master/examples/nas/oneshot/spos/supernet.py)
-
-
-5. [Use nn-Meter to construct latency penalty in Proxyless NAS](https://github.com/microsoft/nni/tree/master/examples/nas/oneshot/proxylessnas)
+   - [Use nn-Meter in search part](https://github.com/microsoft/nni/blob/master/examples/nas/oneshot/spos/multi_trial.py)
+   - [Use nn-Meter in sampling part](https://github.com/microsoft/nni/blob/master/examples/nas/oneshot/spos/supernet.py)
+5. [Use nn-Meter to construct latency penalty in Proxyless NAS](https://github.com/microsoft/nni/tree/master/examples/nas/oneshot/proxylessnas)
--- a/examples/gnn_for_bench_dataset.ipynb
+++ b/examples/gnn_for_bench_dataset.ipynb
@ -0,0 +1,277 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "source": [
+    "# Latency Dataset - GNN Model\n",
+    "\n",
+    "Considering the dataset is encoded in a graph format, here is an example of using GNN to predict the model latency with the bench dataset. \n",
+    "\n",
+    "In the previous work of [BRP-NAS](https://arxiv.org/abs/2007.08668v2), the authors propose an end-to-end latency predictor which consists of a GCN. Their GCN predictor demonstrates significant improvement over the layer-wise predictor on [NAS-Bench-201](https://arxiv.org/abs/2001.00326). While on our bench dataset, the preformance of BRP-NAS is consistently poor. As discussed in our paper, the reason is the model graph difference between training and testing set. GNN learns the representation of model graphs. Although the models in our bench dataset have largely overlapped operator types, the operator configurations, edges, and model latency ranges are different.\n",
+    "\n",
+    "To better deal with the problems above, we give a GNN example with graph representation improved. We first build our GNN model, which is constructed based on GraphSAGE, and maxpooling is selected as out pooling method. Next, we will start training after the data is loaded. `GNNDataset` and `GNNDataloader` in `nn_meter/dataset/gnn_dataloader.py` build the model structure of the Dataset in `.jsonl` format into our required Dataset and Dataloader. \n",
+    "\n",
+    "Let's start our journey!"
+   ],
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "## Step 1: Build our GraphSAGE Model\n",
+    "\n",
+    "We built our model with the help of DGL library."
+   ],
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "from torch.nn.modules.module import Module\n",
+    "\n",
+    "from dgl.nn.pytorch.glob import MaxPooling\n",
+    "import dgl.nn as dglnn\n",
+    "from torch.optim.lr_scheduler import CosineAnnealingLR\n",
+    "\n",
+    "\n",
+    "class GNN(Module):\n",
+    "    def __init__(self, \n",
+    "                num_features=0, \n",
+    "                num_layers=2,\n",
+    "                num_hidden=32,\n",
+    "                dropout_ratio=0):\n",
+    "\n",
+    "        super(GNN, self).__init__()\n",
+    "        self.nfeat = num_features\n",
+    "        self.nlayer = num_layers\n",
+    "        self.nhid = num_hidden\n",
+    "        self.dropout_ratio = dropout_ratio\n",
+    "        self.gc = nn.ModuleList([dglnn.SAGEConv(self.nfeat if i==0 else self.nhid, self.nhid, 'pool') for i in range(self.nlayer)])\n",
+    "        self.bn = nn.ModuleList([nn.LayerNorm(self.nhid) for i in range(self.nlayer)])\n",
+    "        self.relu = nn.ModuleList([nn.ReLU() for i in range(self.nlayer)])\n",
+    "        self.pooling = MaxPooling()\n",
+    "        self.fc = nn.Linear(self.nhid, 1)\n",
+    "        self.fc1 = nn.Linear(self.nhid, self.nhid)\n",
+    "        self.dropout = nn.ModuleList([nn.Dropout(self.dropout_ratio) for i in range(self.nlayer)])\n",
+    "\n",
+    "    def forward_single_model(self, g, features):\n",
+    "        x = self.relu[0](self.bn[0](self.gc[0](g, features)))\n",
+    "        x = self.dropout[0](x)\n",
+    "        for i in range(1,self.nlayer):\n",
+    "            x = self.relu[i](self.bn[i](self.gc[i](g, x)))\n",
+    "            x = self.dropout[i](x)\n",
+    "        return x\n",
+    "\n",
+    "    def forward(self, g, features):\n",
+    "        x = self.forward_single_model(g, features)\n",
+    "        with g.local_scope():\n",
+    "            g.ndata['h'] = x\n",
+    "            x = self.pooling(g, x)\n",
+    "            x = self.fc1(x)\n",
+    "            return self.fc(x)"
+   ],
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stderr",
+     "text": [
+      "Using backend: pytorch\n"
+     ]
+    }
+   ],
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "## Step 2: Loading Data.\n",
+    "\n",
+    "Next, we will finish loading the data and learn about the size of the Training and Testing datasets."
+   ],
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "source": [
+    "import os\r\n",
+    "from nn_meter.dataset import gnn_dataloader\r\n",
+    "\r\n",
+    "target_device = \"cortexA76cpu_tflite21\"\r\n",
+    "\r\n",
+    "print(\"Processing Training Set.\")\r\n",
+    "train_set = gnn_dataloader.GNNDataset(train=True, device=target_device) \r\n",
+    "print(\"Processing Testing Set.\")\r\n",
+    "test_set = gnn_dataloader.GNNDataset(train=False, device=target_device)\r\n",
+    "\r\n",
+    "train_loader = gnn_dataloader.GNNDataloader(train_set, batchsize=1 , shuffle=True)\r\n",
+    "test_loader = gnn_dataloader.GNNDataloader(test_set, batchsize=1, shuffle=False)\r\n",
+    "print('Train Dataset Size:', len(train_set))\r\n",
+    "print('Testing Dataset Size:', len(test_set))\r\n",
+    "print('Attribute tensor shape:', next(train_loader)[1].ndata['h'].size(1))\r\n",
+    "ATTR_COUNT = next(train_loader)[1].ndata['h'].size(1)"
+   ],
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "Processing Training Set.\n",
+      "Processing Testing Set.\n",
+      "Train Dataset Size: 20732\n",
+      "Testing Dataset Size: 5173\n",
+      "Attribute tensor shape: 26\n"
+     ]
+    }
+   ],
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "source": [
+    "## Step 3: Run and Test\n",
+    "\n",
+    "We can run the model and evaluate it now!"
+   ],
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "source": [
+    "if torch.cuda.is_available():\r\n",
+    "    print(\"Using CUDA.\")\r\n",
+    "# device = \"cpu\"\r\n",
+    "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\r\n",
+    "\r\n",
+    "# Start Training\r\n",
+    "load_model = False\r\n",
+    "if load_model:\r\n",
+    "    model = GNN(ATTR_COUNT, 3, 400, 0.1).to(device)\r\n",
+    "    opt = torch.optim.AdamW(model.parameters(), lr=4e-4)\r\n",
+    "    checkpoint = torch.load('LatencyGNN.pt')\r\n",
+    "    model.load_state_dict(checkpoint['model_state_dict'])\r\n",
+    "    opt.load_state_dict(checkpoint['optimizer_state_dict'])\r\n",
+    "    # EPOCHS = checkpoint['epoch']\r\n",
+    "    EPOCHS = 0\r\n",
+    "    loss_func = checkpoint['loss']\r\n",
+    "else:\r\n",
+    "    model = GNN(ATTR_COUNT, 3, 400, 0.1).to(device)\r\n",
+    "    opt = torch.optim.AdamW(model.parameters(), lr=4e-4)\r\n",
+    "    EPOCHS=20\r\n",
+    "    loss_func = nn.L1Loss()\r\n",
+    "\r\n",
+    "lr_scheduler = CosineAnnealingLR(opt, T_max=EPOCHS)\r\n",
+    "loss_sum = 0\r\n",
+    "for epoch in range(EPOCHS):\r\n",
+    "    train_length = len(train_set)\r\n",
+    "    tran_acc_ten = 0\r\n",
+    "    loss_sum = 0 \r\n",
+    "    # latency, graph, types, flops\r\n",
+    "    for batched_l, batched_g in train_loader:\r\n",
+    "        opt.zero_grad()\r\n",
+    "        batched_l = batched_l.to(device).float()\r\n",
+    "        batched_g = batched_g.to(device)\r\n",
+    "        batched_f = batched_g.ndata['h'].float()\r\n",
+    "        logits = model(batched_g, batched_f)\r\n",
+    "        for i in range(len(batched_l)):\r\n",
+    "            pred_latency = logits[i].item()\r\n",
+    "            prec_latency = batched_l[i].item()\r\n",
+    "            if (pred_latency >= 0.9 * prec_latency) and (pred_latency <= 1.1 * prec_latency):\r\n",
+    "                tran_acc_ten += 1\r\n",
+    "        # print(\"true latency: \", batched_l)\r\n",
+    "        # print(\"Predict latency: \", logits)\r\n",
+    "        batched_l = torch.reshape(batched_l, (-1 ,1))\r\n",
+    "        loss = loss_func(logits, batched_l)\r\n",
+    "        loss_sum += loss\r\n",
+    "        loss.backward()\r\n",
+    "        opt.step()\r\n",
+    "    lr_scheduler.step()\r\n",
+    "    print(\"[Epoch \", epoch, \"]: \", \"Training accuracy within 10%: \", tran_acc_ten / train_length * 100, \" %.\")\r\n",
+    "    # print('Learning Rate:', lr_scheduler.get_last_lr())\r\n",
+    "    # print('Loss:', loss_sum / train_length)\r\n",
+    "\r\n",
+    "# Save The Best Model\r\n",
+    "torch.save({\r\n",
+    "    'epoch': EPOCHS,\r\n",
+    "    'model_state_dict': model.state_dict(),\r\n",
+    "    'optimizer_state_dict': opt.state_dict(),\r\n",
+    "    'loss': loss_func,\r\n",
+    "}, 'LatencyGNN.pt')\r\n",
+    "\r\n",
+    "# Start Testing\r\n",
+    "count = 0\r\n",
+    "with torch.no_grad():\r\n",
+    "    test_length = len(test_set)\r\n",
+    "    test_acc_ten = 0\r\n",
+    "    for batched_l, batched_g in test_loader:\r\n",
+    "        batched_l = batched_l.to(device).float()\r\n",
+    "        batched_g = batched_g.to(device)\r\n",
+    "        batched_f = batched_g.ndata['h'].float()\r\n",
+    "        result = model(batched_g, batched_f)\r\n",
+    "        if (result.item() >= 0.9 * batched_l.item()) and (result.item() <= 1.1 * batched_l.item()):\r\n",
+    "            test_acc_ten += 1\r\n",
+    "        acc = (abs(result.item() - batched_l.item()) / batched_l.item()) * 100\r\n",
+    "        count += 1\r\n",
+    "    print(\"Testing accuracy within 10%: \", test_acc_ten / test_length * 100, \" %.\")"
+   ],
+   "outputs": [
+    {
+     "output_type": "stream",
+     "name": "stdout",
+     "text": [
+      "[Epoch  0 ]:  Training accuracy within 10%:  21.999807061547365  %.\n",
+      "[Epoch  1 ]:  Training accuracy within 10%:  27.725255643449742  %.\n",
+      "[Epoch  2 ]:  Training accuracy within 10%:  30.228632066370825  %.\n",
+      "[Epoch  3 ]:  Training accuracy within 10%:  31.357322014277443  %.\n",
+      "[Epoch  4 ]:  Training accuracy within 10%:  33.06000385876906  %.\n",
+      "[Epoch  5 ]:  Training accuracy within 10%:  34.917036465367545  %.\n",
+      "[Epoch  6 ]:  Training accuracy within 10%:  36.48466139301563  %.\n",
+      "[Epoch  7 ]:  Training accuracy within 10%:  39.070036658306  %.\n",
+      "[Epoch  8 ]:  Training accuracy within 10%:  40.10708084121165  %.\n",
+      "[Epoch  9 ]:  Training accuracy within 10%:  41.530001929384525  %.\n",
+      "[Epoch  10 ]:  Training accuracy within 10%:  43.26162454177118  %.\n",
+      "[Epoch  11 ]:  Training accuracy within 10%:  45.34053636889832  %.\n",
+      "[Epoch  12 ]:  Training accuracy within 10%:  48.45166891761528  %.\n",
+      "[Epoch  13 ]:  Training accuracy within 10%:  50.945398417904684  %.\n",
+      "[Epoch  14 ]:  Training accuracy within 10%:  54.5774647887324  %.\n",
+      "[Epoch  15 ]:  Training accuracy within 10%:  56.08238471927455  %.\n",
+      "[Epoch  16 ]:  Training accuracy within 10%:  59.54562994404785  %.\n",
+      "[Epoch  17 ]:  Training accuracy within 10%:  62.41076596565696  %.\n",
+      "[Epoch  18 ]:  Training accuracy within 10%:  63.65521898514373  %.\n",
+      "[Epoch  19 ]:  Training accuracy within 10%:  64.6826162454177  %.\n",
+      "Testing accuracy within 10%:  60.042528513435144  %.\n"
+     ]
+    }
+   ],
+   "metadata": {}
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "0238da245144306487e61782d9cba9bf2e5e19842e5054371ac0cfbea9be2b57"
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/examples/nn-meter_for_bench_dataset.ipynb
+++ b/examples/nn-meter_for_bench_dataset.ipynb
--- a/nn_meter/init.py
+++ b/nn_meter/init.py
@ -12,7 +12,9 @@ from .nn_meter import (
    load_latency_predictor,
    list_latency_predictors,
    model_file_to_graph,
-    model_to_graph
+    model_to_graph,
+    create_user_configs,
+    change_user_data_folder
 )
 from .utils.utils import download_from_url
 from .prediction import latency_metrics
--- a/nn_meter/dataset/init.py
+++ b/nn_meter/dataset/init.py
@ -1,3 +1,4 @@
 # Copyright (c) Microsoft Corporation.
 # Licensed under the MIT license.
-from .bench_dataset import bench_dataset  # TODO: add GNNDataloader and GNNDataset here @wenxuan
+from .bench_dataset import bench_dataset
+from .gnn_dataloader import GNNDataset, GNNDataloader
--- a/nn_meter/dataset/bench_dataset.py
+++ b/nn_meter/dataset/bench_dataset.py
@ -4,21 +4,22 @@ import os, sys
 from nn_meter.prediction import latency_metrics
 from glob import glob

-from nn_meter.nn_meter import list_latency_predictors, load_latency_predictor
+from nn_meter.nn_meter import list_latency_predictors, load_latency_predictor, get_user_data_folder
 from nn_meter import download_from_url
 import jsonlines
 import logging


-__user_dataset_folder__ = os.path.expanduser('~/.nn_meter/dataset')
+__user_dataset_folder__ = os.path.join(get_user_data_folder(), 'dataset')

-def bench_dataset(url="https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip"):
-    if not os.path.isdir(__user_dataset_folder__):
-        os.makedirs(__user_dataset_folder__)
+def bench_dataset(url="https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip",
+                  data_folder=__user_dataset_folder__):
+    if not os.path.isdir(data_folder):
+        os.makedirs(data_folder)
        logging.keyinfo(f'Download from {url} ...')
-        download_from_url(url, __user_dataset_folder__)
+        download_from_url(url, data_folder)

-    datasets = glob(os.path.join(__user_dataset_folder__, "**.jsonl"))
+    datasets = glob(os.path.join(data_folder, "**.jsonl"))
    return datasets
        
 if __name__ == '__main__':
--- a/nn_meter/dataset/gnn_dataloader.py
+++ b/nn_meter/dataset/gnn_dataloader.py
@ -1,2 +1,275 @@
 # Copyright (c) Microsoft Corporation.
-# Licensed under the MIT license.
+# Licensed under the MIT license.
+import torch
+import jsonlines
+import os
+import random
+from .bench_dataset import bench_dataset
+from nn_meter.nn_meter import get_user_data_folder
+from nn_meter.utils.utils import try_import_dgl
+
+RAW_DATA_URL = "https://github.com/microsoft/nn-Meter/releases/download/v1.0-data/datasets.zip"
+__user_dataset_folder__ = os.path.join(get_user_data_folder(), 'dataset')
+
+hws = [
+    "cortexA76cpu_tflite21",
+    "adreno640gpu_tflite21",
+    "adreno630gpu_tflite21",
+    "myriadvpu_openvino2019r2",
+]
+
+
+class GNNDataset(torch.utils.data.Dataset):
+    def __init__(self, train=True, device="cortexA76cpu_tflite21", split_ratio=0.8):
+        """
+        Dataloader of the Latency Dataset
+
+        Parameters
+        ----------
+        data_dir : string
+            Path to save the downloaded dataset
+        train: bool
+            Get the train dataset or the test dataset
+        device: string
+            The Device type of the corresponding latency
+        shuffle: bool
+            If shuffle the dataset at the begining of an epoch
+        batch_size: int
+            Batch size.
+        split_ratio: float
+            The ratio to split the train dataset and the test dataset.
+        """
+        err_str = "Not supported device type"
+        assert device in hws, err_str
+        self.device = device
+        self.data_dir = __user_dataset_folder__
+        self.train = train
+        self.split_ratio = split_ratio
+        self.adjs = {}
+        self.attrs = {}
+        self.nodename2id = {}
+        self.id2nodename = {}
+        self.op_types = set()
+        self.opname2id = {}
+        self.raw_data = {}
+        self.name_list = []
+        self.latencies = {}
+        self.download_data()
+        self.load_model_archs_and_latencies(self.data_dir)
+        self.construct_attrs()
+        self.name_list = list(
+            filter(lambda x: x in self.latencies, self.name_list))
+
+    def download_data(self):
+        datasets = bench_dataset()
+
+    def load_model_archs_and_latencies(self, data_dir):
+        filelist = os.listdir(data_dir)
+        for filename in filelist:
+            if os.path.splitext(filename)[-1] != '.jsonl':
+                continue
+            self.load_model(os.path.join(data_dir, filename))
+
+    def load_model(self, fpath):
+        """
+        Load a concrete model type.
+        """
+        # print('Loading models in ', fpath)
+        assert os.path.exists(fpath), '{} does not exists'.format(fpath)
+
+        with jsonlines.open(fpath) as reader:
+            _names = []
+            for obj in reader:
+                if obj[self.device]:
+                    # print(obj['id'])
+                    _names.append(obj['id'])
+                    self.latencies[obj['id']] = float(obj[self.device])
+
+            _names = sorted(_names)
+            split_ratio = self.split_ratio if self.train else 1-self.split_ratio
+            count = int(len(_names) * split_ratio)
+
+            if self.train:
+                _model_names = _names[:count]
+            else:
+                _model_names = _names[-1*count:]
+
+            self.name_list.extend(_model_names)
+
+        with jsonlines.open(fpath) as reader:
+            for obj in reader:
+                if obj['id'] in _model_names:
+                    model_name = obj['id']
+                    model_data = obj['graph']
+                    self.parse_model(model_name, model_data)
+                    self.raw_data[model_name] = model_data
+    
+    def construct_attrs(self):
+        """
+        Construct the attributes matrix for each model.
+        Attributes tensor:
+        one-hot encoded type + input_channel , output_channel,
+        input_h, input_w + kernel_size + stride
+        """
+        op_types_list = list(sorted(self.op_types))
+        for i, _op in enumerate(op_types_list):
+            self.opname2id[_op] = i
+        n_op_type = len(self.op_types)
+        attr_len = n_op_type + 6
+        for model_name in self.raw_data:
+            n_node = len(self.raw_data[model_name])
+            # print("Model: ", model_name, " Number of Nodes: ", n_node)
+            t_attr = torch.zeros(n_node, attr_len)
+            for node in self.raw_data[model_name]:
+                node_attr = self.raw_data[model_name][node]
+                nid = self.nodename2id[model_name][node]
+                op_type = node_attr['attr']['type']
+                op_id = self.opname2id[op_type]
+                t_attr[nid][op_id] = 1
+                other_attrs = self.parse_node(model_name, node)
+                t_attr[nid][-6:] = other_attrs
+            self.attrs[model_name] = t_attr
+
+    def parse_node(self, model_name, node_name):
+        """
+        Parse the attributes of specified node
+        Get the input_c, output_c, input_h, input_w, kernel_size, stride
+        of this node. Note: filled with 0 by default if this doesn't have
+        coressponding attribute.
+        """
+        node_data = self.raw_data[model_name][node_name]
+        t_attr = torch.zeros(6)
+        op_type = node_data['attr']['type']
+        if op_type =='Conv2D':
+            weight_shape = node_data['attr']['attr']['weight_shape']
+            kernel_size, _, in_c, out_c = weight_shape
+            stride, _= node_data['attr']['attr']['strides']
+            _, h, w, _ = node_data['attr']['output_shape'][0]
+            t_attr = torch.tensor([in_c, out_c, h, w, kernel_size, stride])
+        elif op_type == 'DepthwiseConv2dNative':
+            weight_shape = node_data['attr']['attr']['weight_shape']
+            kernel_size, _, in_c, out_c = weight_shape
+            stride, _= node_data['attr']['attr']['strides']
+            _, h, w, _ = node_data['attr']['output_shape'][0]
+            t_attr = torch.tensor([in_c, out_c, h, w, kernel_size, stride])
+        elif op_type == 'MatMul':
+            in_node = node_data['inbounds'][0]
+            in_shape = self.raw_data[model_name][in_node]['attr']['output_shape'][0]
+            in_c = in_shape[-1]
+            out_c = node_data['attr']['output_shape'][0][-1]
+            t_attr[0] = in_c
+            t_attr[1] = out_c
+        elif len(node_data['inbounds']):
+            in_node = node_data['inbounds'][0]
+            h, w, in_c, out_c = 0, 0, 0, 0
+            in_shape = self.raw_data[model_name][in_node]['attr']['output_shape'][0]
+            in_c = in_shape[-1]
+            if 'ConCat' in op_type:
+                for i in range(1, len(node_data['in_bounds'])):
+                    in_shape = self.raw_data[node_data['in_bounds']
+                                             [i]]['attr']['output_shape'][0]
+                    in_c += in_shape[-1]
+            if len(node_data['attr']['output_shape']):
+                out_shape = node_data['attr']['output_shape'][0]
+                # N, H, W, C
+                out_c = out_shape[-1]
+                if len(out_shape) == 4:
+                    h, w = out_shape[1], out_shape[2]
+            t_attr[-6:-2] = torch.tensor([in_c, out_c, h, w])
+
+        return t_attr
+
+    def parse_model(self, model_name, model_data):
+        """
+        Parse the model data and build the adjacent matrixes
+        """
+        n_nodes = len(model_data)
+        m_adj = torch.zeros(n_nodes, n_nodes, dtype=torch.int32)
+        id2name = {}
+        name2id = {}
+        tmp_node_id = 0
+        # build the mapping between the node name and node id
+
+        for node_name in model_data.keys():
+            id2name[tmp_node_id] = node_name
+            name2id[node_name] = tmp_node_id
+            op_type = model_data[node_name]['attr']['type']
+            self.op_types.add(op_type)
+            tmp_node_id += 1
+
+        for node_name in model_data:
+            cur_id = name2id[node_name]
+            for node in model_data[node_name]['inbounds']:
+                if node not in name2id:
+                    # weight node
+                    continue
+                in_id = name2id[node]
+                m_adj[in_id][cur_id] = 1
+            for node in model_data[node_name]['outbounds']:
+                if node not in name2id:
+                    # weight node
+                    continue
+                out_id = name2id[node]
+                m_adj[cur_id][out_id] = 1
+        
+        for idx in range(n_nodes):
+            m_adj[idx][idx] = 1
+
+        self.adjs[model_name] = m_adj
+        self.nodename2id[model_name] = name2id
+        self.id2nodename[model_name] = id2name
+
+    def __getitem__(self, index):
+        model_name = self.name_list[index]
+        return (self.adjs[model_name], self.attrs[model_name]), self.latencies[model_name], self.op_types
+
+    def __len__(self):
+        return len(self.name_list)
+
+
+class GNNDataloader(torch.utils.data.DataLoader):
+    def __init__(self, dataset, shuffle=False, batchsize=1):
+        self.dataset = dataset
+        self.op_num = len(dataset.op_types)
+        self.shuffle = shuffle
+        self.batchsize = batchsize
+        self.length = len(self.dataset)
+        self.indexes = list(range(self.length))
+        self.pos = 0
+        self.graphs = {}
+        self.latencies = {}
+        self.construct_graphs()
+
+    def construct_graphs(self):
+        dgl = try_import_dgl()
+        for gid in range(self.length):
+            (adj, attrs), latency, op_types = self.dataset[gid]
+            u, v = torch.nonzero(adj, as_tuple=True)
+            # import pdb; pdb.set_trace()
+            graph = dgl.graph((u, v))
+            MAX_NORM = torch.tensor([1]*len(op_types) + [6963, 6963, 224, 224, 11, 4])
+            attrs = attrs / MAX_NORM
+            graph.ndata['h'] = attrs
+            self.graphs[gid] = graph
+            self.latencies[gid] = latency
+
+    def __iter__(self):
+        if self.shuffle:
+            random.shuffle(self.indexes)
+        self.pos = 0
+        return self
+
+    def __len__(self):
+        return self.length
+
+    def __next__(self):
+        dgl = try_import_dgl()
+        start = self.pos
+        end = min(start + self.batchsize, self.length)
+        self.pos = end
+        if end - start <= 0:
+            raise StopIteration
+        batch_indexes = self.indexes[start:end]
+        batch_graphs = [self.graphs[i] for i in batch_indexes]
+        batch_latencies = [self.latencies[i] for i in batch_indexes]
+        return torch.tensor(batch_latencies), dgl.batch(batch_graphs)
--- a/nn_meter/ir_converters/frozenpb_converter/shape_inference.py
+++ b/nn_meter/ir_converters/frozenpb_converter/shape_inference.py
@ -18,8 +18,10 @@ class ShapeInference:
        "AddN", # AddN does not support prodcast really
        "AddV2",
        "Subtract",
+        "Sub",
        "MulNoNan",
-        "Multiply"
+        "Multiply",
+        "Mul",
        "Div",
        "DivNoNan",
        "Equal",
@ -866,7 +868,7 @@ class ShapeInference:
        return [input_shape], [exp_output_shape]

    @staticmethod
-    def Packed_get_shape(graph, node):
+    def Pack_get_shape(graph, node):
        """
        Get shape of a Transpose operator.
        Patched for kernel detector.
@ -970,7 +972,7 @@ class ShapeInference:
        # This is a patching for back-end, since backend extract shapes from
        # those two ops.
        for node_name in seq:
-            if model_graph.get_node_type(node_name) in ["Packed", "StridedSlice"]:
+            if model_graph.get_node_type(node_name) in ["Pack", "StridedSlice"]:
                node_get_shape_name = model_graph.get_node_type(node_name) + "_get_shape"
                input_shape, output_shape = eval("self." + node_get_shape_name)(
                    graph, graph[node_name]
--- a/nn_meter/ir_converters/torch_converter/converter.py
+++ b/nn_meter/ir_converters/torch_converter/converter.py
@ -19,10 +19,13 @@ def _nchw_to_nhwc(shapes):

 class NNIIRConverter:
    def __init__(self, ir_model):
-        from nni.retiarii.converter.graph_gen import GraphConverterWithShape
-
-        self.ir_model = ir_model.fork()
-        GraphConverterWithShape().flatten(self.ir_model)
+        try:
+            from nni.retiarii.converter.utils import flatten_model_graph
+            self.ir_model = flatten_model_graph(ir_model)
+        except:
+            from nni.retiarii.converter.graph_gen import GraphConverterWithShape
+            self.ir_model = ir_model.fork()
+            GraphConverterWithShape().flatten(self.ir_model)

    def convert(self):
        graph = self._to_graph_layout()
@ -44,12 +47,12 @@ class NNIIRConverter:
                        k: v
                        for k, v in node.operation.parameters.items()
                    },
-                    "input_shape": _nchw_to_nhwc(node.operation.parameters["input_shape"]
+                    "input_shape": _nchw_to_nhwc(node.operation.parameters.get("input_shape")
                                                 if "input_shape" in node.operation.parameters 
-                                                 else node.input_shape),
-                    "output_shape": _nchw_to_nhwc(node.operation.parameters["output_shape"] 
+                                                 else node.operation.attr.get('input_shape')),
+                    "output_shape": _nchw_to_nhwc(node.operation.parameters.get("output_shape") 
                                                 if "output_shape" in node.operation.parameters 
-                                                 else node.output_shape),
+                                                 else node.operation.attr.get('output_shape')),
                    "type": node.operation.type,
                },
                "inbounds": [],
--- a/nn_meter/nn_meter.py
+++ b/nn_meter/nn_meter.py
@ -14,7 +14,7 @@ from packaging import version
 import logging

 __user_config_folder__ = os.path.expanduser('~/.nn_meter/config')
-__user_data_folder__ = os.path.expanduser('~/.nn_meter/data')
+__default_user_data_folder__ = os.path.expanduser('~/.nn_meter/data')

 __predictors_cfg_filename__ = 'predictors.yaml'

@ -26,6 +26,33 @@ def create_user_configs():
    # TODO/backlog: to handle config merging when upgrading
    for f in pkg_resources.resource_listdir(__name__, 'configs'):
        copyfile(pkg_resources.resource_filename(__name__, f'configs/{f}'), os.path.join(__user_config_folder__, f))
+    # make default setting yaml file
+    with open(os.path.join(__user_config_folder__, 'settings.yaml'), 'w') as fp:
+        yaml.dump({'data_folder': __default_user_data_folder__}, fp)
+
+
+def get_user_data_folder():
+    """get user data folder in settings.yaml
+    """
+    filepath = os.path.join(__user_config_folder__, 'settings.yaml')
+    try:
+        with open(filepath) as fp:
+            return os.path.join(yaml.load(fp, yaml.FullLoader)['data_folder'])
+    except FileNotFoundError:
+        logging.info(f"setting file {filepath} not found, created")
+        create_user_configs()
+        return get_user_data_folder()
+
+
+def change_user_data_folder(new_folder):
+    """change user data folder in settings.yaml
+    """
+    os.makedirs(new_folder, exist_ok=True)
+    with open(os.path.join(__user_config_folder__, 'settings.yaml')) as fp:
+        setting = yaml.load(fp, yaml.FullLoader)
+    with open(os.path.join(__user_config_folder__, 'settings.yaml'), 'w') as fp:
+        setting['data_folder'] = new_folder
+        yaml.dump(setting, fp)


 def load_config_file(fname: str, loader=None):
@ -91,8 +118,9 @@ def load_latency_predictor(predictor_name: str, predictor_version: float = None)
    predictor_version:  string to specify the version of the target latency predictor. If not specified (default as None), the lateast version of the 
        predictor will be loaded.
    """
+    user_data_folder = get_user_data_folder()
    pred_info = load_predictor_config(predictor_name, predictor_version)
-    kernel_predictors, fusionrule = loading_to_local(pred_info, __user_data_folder__)
+    kernel_predictors, fusionrule = loading_to_local(pred_info, os.path.join(user_data_folder, 'predictor'))
    return nnMeter(kernel_predictors, fusionrule)


--- a/nn_meter/utils/utils.py
+++ b/nn_meter/utils/utils.py
@ -36,7 +36,6 @@ def download_from_url(urladdr, ppath):
    progress_bar.close()
    os.remove(file_name)

-
 def try_import_onnx(require_version = "1.9.0"):
    try:
        import onnx
@ -47,7 +46,6 @@ def try_import_onnx(require_version = "1.9.0"):
        logging.error(f'You have not install the onnx package, please install onnx=={require_version} and try again.')
        exit()

-
 def try_import_torch(require_version = "1.7.1"):
    try:
        import torch
@ -58,7 +56,6 @@ def try_import_torch(require_version = "1.7.1"):
        logging.error(f'You have not install the torch package, please install torch=={require_version} and try again.')
        exit()

-
 def try_import_tensorflow(require_version = "1.15.0"):
    try:
        import tensorflow
@ -69,7 +66,6 @@ def try_import_tensorflow(require_version = "1.15.0"):
        logging.error(f'You have not install the tensorflow package, please install tensorflow=={require_version} and try again.')
        exit()

-
 def try_import_torchvision_models():
    try:
        import torchvision
@ -78,7 +74,6 @@ def try_import_torchvision_models():
        logging.error(f'You have not install the torchvision package, please install torchvision and try again.')
        exit()

-
 def try_import_onnxsim():
    try:
        from onnxsim import simplify
@ -86,4 +81,12 @@ def try_import_onnxsim():
    except ImportError:
        logging.error(f'You have not install the onnx-simplifier package, please install onnx-simplifier and try again.')
        exit()
+
+def try_import_dgl():
+    try:
+        import dgl
+        return dgl
+    except ImportError:
+        logging.error(f'You have not install the dgl package, please install dgl and try again.')
+        exit()