Merge pull request #58 from microsoft/nn-Meter-pre

add quick start tutorial
2022-03-08 09:50:43 +08:00 · 2022-03-08 09:50:43 +08:00 · 7486317af6
--- a/examples/nn-meter_quick_start/1.quick_start.ipynb
+++ b/examples/nn-meter_quick_start/1.quick_start.ipynb
@ -0,0 +1,65 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Quick Start of nn-Meter\n",
+    "nn-Meter is a novel and efficient toolkit to accurately predict the inference latency of DNN models on diverse edge devices. nn-Meter has achieved the **Mobisys 21 Best Paper Award**, here is the paper link: [nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices](https://dl.acm.org/doi/10.1145/3458864.3467882). nn-Meter has been released as open source in [GitHub](https://github.com/microsoft/nn-Meter) and released as python package. In this notebook, we will try to leap the first step to use nn-Meter predictor, benchmark dataset and nn-Meter building tools. Let's start our journey!\n",
+    "\n",
+    "nn-Meter supports and is tested on Ubuntu >= 16.04, macOS >= 10.14.1, and Windows 10/11. Simply run the following `pip install` in an environment that has `python 64-bit >= 3.6`.\n",
+    "\n",
+    "```bash\n",
+    "pip install nn-meter\n",
+    "```\n",
+    "\n",
+    "After installation, a command named `nn-meter` is enabled. Users could run `nn-meter --help` to verify their installation. This is what we expect to get:\n",
+    "\n",
+    "```text\n",
+    "usage: nn-meter [-h] [-v] [--list-predictors] {predict,lat_pred,get_ir} ...\n",
+    "\n",
+    "please run \"nn-meter {positional argument} --help\" to see nn-meter guidance\n",
+    "\n",
+    "positional arguments:\n",
+    "  {predict,lat_pred,get_ir}\n",
+    "    predict (lat_pred)  apply latency predictor for testing model\n",
+    "    get_ir              specify a model type to convert to nn-meter ir graph\n",
+    "\n",
+    "optional arguments:\n",
+    "  -h, --help            show this help message and exit\n",
+    "  -v, --verbose         increase output verbosity\n",
+    "  --list-predictors     list all supported predictors\n",
+    "```\n",
+    "\n",
+    "We provide two main usages for nn-Meter here.\n",
+    "\n",
+    "- Use nn-Meter for latency prediction\n",
+    "\n",
+    "- Use nn-Meter benchmark dataset\n",
+    "\n",
+    "To run the jupyter notebook in this folder, users should download and unzip the test model data from [this link](https://github.com/microsoft/nn-Meter/tree/main/material/testmodels), and copy them to the user's project folder."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "nn-meter1.1-test",
+   "language": "python",
+   "name": "nn-meter1.1-test"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/examples/nn-meter_quick_start/2.nn-Meter_for_latency_prediction.ipynb
+++ b/examples/nn-meter_quick_start/2.nn-Meter_for_latency_prediction.ipynb
@ -0,0 +1,729 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Use nn-Meter for latency prediction\n",
+    "\n",
+    "## Use nn_meter as a python package\n",
+    "After nn-Meter installation, we can import `nn-Meter` package in python by:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "nn_meter version: 1.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import nn_meter\n",
+    "print(f\"nn_meter version: {nn_meter.__version__}\")\n",
+    "\n",
+    "project_path = \"/home/jiahang/nnmeter-demo/\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When using nn-Meter, the model of predictors will be automatically downloaded to the users' local device. We currently provide four predictors corresponding to four popular platforms, including mobile CPU (`\"cortexA76cpu_tflite21\"`), mobile Adreno 640 GPU (`\"adreno640gpu_tflite21\"`), mobile Adreno 630 GPU (`\"adreno640gpu_tflite21\"`), and Intel VPU (`\"myriadvpu_openvino2019r2\"`).  The whole four existing predictors will take up about 6.33GB. The folder is set as `~/.nn_meter/data/` by default. If users want to change the target directory, they could run:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "nn_meter.change_user_data_folder(new_folder=project_path) # path to the new folder"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Users could list all supporting latency predictors by running:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Predictor] cortexA76cpu_tflite21: version=1.0\n",
+      "[Predictor] adreno640gpu_tflite21: version=1.0\n",
+      "[Predictor] adreno630gpu_tflite21: version=1.0\n",
+      "[Predictor] myriadvpu_openvino2019r2: version=1.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "# list all supporting latency predictors\n",
+    "predictors = nn_meter.list_latency_predictors()\n",
+    "for p in predictors:\n",
+    "    print(f\"[Predictor] {p['name']}: version={p['version']}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "nn-Meter could predict latency for model with types of Tensorflow (with format of `.pb` file), ONNX (with format of `.onnx` file), and PyTorch ( with format of `nn.Module`). We provide some example files for users to quickly run nn-Meter. The data could be downloaded from [this link](). \n",
+    "\n",
+    "The first step is to load a predictor by specifying its name."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/jiahang/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/base.py:315: UserWarning: Trying to unpickle estimator DecisionTreeRegressor from version 0.23.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.\n",
+      "  UserWarning)\n",
+      "/home/jiahang/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/base.py:315: UserWarning: Trying to unpickle estimator RandomForestRegressor from version 0.23.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.\n",
+      "  UserWarning)\n"
+     ]
+    }
+   ],
+   "source": [
+    "predictor_name = \"adreno640gpu_tflite21\" # user can change text here to test other predictors\n",
+    "\n",
+    "# load predictor\n",
+    "predictor = nn_meter.load_latency_predictor(predictor_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If the user is the first time to use nn-Meter, it will take a while to download and unzip the required predictor model. \n",
+    "\n",
+    "After predictor loading, users could complete latency prediction by simply calling `predictor.predict()`. To use nn-Meter for specific model type, you also need to install corresponding required packages. The well tested versions are listed below:\n",
+    "\n",
+    "| Testing Model Type |                                                       Requirements                                                       |\n",
+    "| :----------------: | :-----------------------------------------------------------------------------------------------------------------------: |\n",
+    "|     Tensorflow     |                                                  `tensorflow==2.6.0`                                                  |\n",
+    "|       Torch       | `torch==1.9.0`, `torchvision==0.10.0`, (alternative)[`onnx==1.9.0`, `onnx-simplifier==0.3.6`] or [`nni>=2.4`][1] |\n",
+    "|        Onnx        |                                                      `onnx==1.9.0`                                                      |\n",
+    "\n",
+    "For Tensorflow `.pb` file:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[RESULT] predict latency for /home/jiahang/nnmeter-demo/testmodel/mobilenetv3small_0.pb: 4.489849402954042 ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "test_model = project_path + \"testmodel/mobilenetv3small_0.pb\"\n",
+    "\n",
+    "# predict latency\n",
+    "latency = predictor.predict(model=test_model, model_type=\"pb\") # result is in unit of ms\n",
+    "print(f'[RESULT] predict latency for {test_model}: {latency} ms')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For ONNX `.onnx` file:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[RESULT] predict latency for /home/jiahang/nnmeter-demo/testmodel/mobilenetv3small_0.onnx: 6.705541180860482 ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "test_model = project_path + \"testmodel/mobilenetv3small_0.onnx\"\n",
+    "\n",
+    "# predict latency\n",
+    "latency = predictor.predict(model=test_model, model_type=\"onnx\") # result is in unit of ms\n",
+    "print(f'[RESULT] predict latency for {test_model}: {latency} ms')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There is a little difference for PyTorch model in nn-Meter. For PyTorch model prediction, a torch model with `nn.Module` format is needed, and the input shape has to be specified. Here we generated a simple torch model to run a demo. Users could choose one group of required dependencies from [`onnx==1.9.0`, `onnx-simplifier==0.3.6`], which we mark as \"onnx_based way\", or [`nni>=2.4`], which we mark as \"nni_based way\". \"onnx_based way\" is applied by default."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch.nn as nn\n",
+    "\n",
+    "class VGG(nn.Module):\n",
+    "\n",
+    "    def __init__(self, features, num_classes=1000):\n",
+    "        super(VGG, self).__init__()\n",
+    "        self.features = features\n",
+    "        self.classifier = nn.Sequential(\n",
+    "            nn.Linear(512 * 7 * 7, 4096),\n",
+    "            nn.ReLU(True),\n",
+    "            nn.Dropout(),\n",
+    "            nn.Linear(4096, 4096),\n",
+    "            nn.ReLU(True),\n",
+    "            nn.Dropout(),\n",
+    "            nn.Linear(4096, num_classes),\n",
+    "        )\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        x = self.features(x)\n",
+    "        x = x.view(x.size(0), -1)\n",
+    "        x = self.classifier(x)\n",
+    "        return x\n",
+    "\n",
+    "def make_layers(cfg, batch_norm=False):\n",
+    "    layers = []\n",
+    "    in_channels = 3\n",
+    "    for v in cfg:\n",
+    "        if v == 'M':\n",
+    "            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]\n",
+    "        else:\n",
+    "            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)\n",
+    "            if batch_norm:\n",
+    "                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]\n",
+    "            else:\n",
+    "                layers += [conv2d, nn.ReLU(inplace=True)]\n",
+    "            in_channels = v\n",
+    "    return nn.Sequential(*layers)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A input shape should also be specified as the model cannot inference the input shape of the model by `nn.Module`. The prediction code should be:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[RESULT] predict latency for vgg11: 109.77864175998361 ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "vgg11 = VGG(make_layers([64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'])) # VGG 11-layer model\n",
+    "\n",
+    "# predict latency\n",
+    "latency = predictor.predict(vgg11, model_type=\"torch\", input_shape=(1, 3, 224, 224)) \n",
+    "print(f'[RESULT] predict latency for vgg11: {latency} ms')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    " For \"nni_based way\", the PyTorch modules should be defined by the `nn` interface from NNI `import nni.retiarii.nn.pytorch as nn` (view [NNI doc](https://nni.readthedocs.io/en/stable/NAS/QuickStart.html#define-base-model) for more information), and the parameter `apply_nni` should be set as True in the function `predictor.predict()`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:Start latency prediction ...\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:00] INFO (root/MainThread) Start latency prediction ...\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:NNI-based Torch Converter is applied for model conversion\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:00] INFO (root/MainThread) NNI-based Torch Converter is applied for model conversion\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:root:nni==999.dev0 is not well tested now, well tested version: nni==2.5, 2.4\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:00] WARNING (root/MainThread) nni==999.dev0 is not well tested now, well tested version: nni==2.5, 2.4\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "WARNING:root:nni==999.dev0 is not well tested now, well tested version: nni==2.5, 2.4\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] WARNING (root/MainThread) nni==999.dev0 is not well tested now, well tested version: nni==2.5, 2.4\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'fc', 'name': 'fc#0', 'input_tensors': [[1, 25088]], 'cin': 25088, 'cout': 4096, 'inbounds': [], 'outbounds': ['relu#1']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'fc', 'name': 'fc#0', 'input_tensors': [[1, 25088]], 'cin': 25088, 'cout': 4096, 'inbounds': [], 'outbounds': ['relu#1']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'relu', 'name': 'relu#1', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['fc#0'], 'outbounds': ['__torch__.torch.nn.modules.dropout.Dropout#2']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'relu', 'name': 'relu#1', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['fc#0'], 'outbounds': ['__torch__.torch.nn.modules.dropout.Dropout#2']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': '__torch__.torch.nn.modules.dropout.Dropout', 'name': '__torch__.torch.nn.modules.dropout.Dropout#2', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['relu#1'], 'outbounds': ['fc#3']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': '__torch__.torch.nn.modules.dropout.Dropout', 'name': '__torch__.torch.nn.modules.dropout.Dropout#2', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['relu#1'], 'outbounds': ['fc#3']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'fc', 'name': 'fc#3', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['__torch__.torch.nn.modules.dropout.Dropout#2'], 'outbounds': ['relu#4']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'fc', 'name': 'fc#3', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['__torch__.torch.nn.modules.dropout.Dropout#2'], 'outbounds': ['relu#4']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'relu', 'name': 'relu#4', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['fc#3'], 'outbounds': ['__torch__.torch.nn.modules.dropout.Dropout#5']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'relu', 'name': 'relu#4', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['fc#3'], 'outbounds': ['__torch__.torch.nn.modules.dropout.Dropout#5']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': '__torch__.torch.nn.modules.dropout.Dropout', 'name': '__torch__.torch.nn.modules.dropout.Dropout#5', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['relu#4'], 'outbounds': ['fc#6']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': '__torch__.torch.nn.modules.dropout.Dropout', 'name': '__torch__.torch.nn.modules.dropout.Dropout#5', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 4096, 'inbounds': ['relu#4'], 'outbounds': ['fc#6']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'fc', 'name': 'fc#6', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 1000, 'inbounds': ['__torch__.torch.nn.modules.dropout.Dropout#5'], 'outbounds': []}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'fc', 'name': 'fc#6', 'input_tensors': [[1, 4096]], 'cin': 4096, 'cout': 1000, 'inbounds': ['__torch__.torch.nn.modules.dropout.Dropout#5'], 'outbounds': []}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#7', 'input_tensors': [[1, 224, 224, 3]], 'ks': [3, 3], 'inputh': 224, 'inputw': 224, 'cin': 3, 'cout': 64, 'inbounds': [], 'outbounds': ['maxpool#8']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#7', 'input_tensors': [[1, 224, 224, 3]], 'ks': [3, 3], 'inputh': 224, 'inputw': 224, 'cin': 3, 'cout': 64, 'inbounds': [], 'outbounds': ['maxpool#8']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'maxpool', 'name': 'maxpool#8', 'input_tensors': [[1, 224, 224, 64]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 224, 'inputw': 224, 'cin': 64, 'cout': 64, 'inbounds': ['conv-relu#7'], 'outbounds': ['conv-relu#9']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'maxpool', 'name': 'maxpool#8', 'input_tensors': [[1, 224, 224, 64]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 224, 'inputw': 224, 'cin': 64, 'cout': 64, 'inbounds': ['conv-relu#7'], 'outbounds': ['conv-relu#9']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#9', 'input_tensors': [[1, 112, 112, 64]], 'ks': [3, 3], 'inputh': 112, 'inputw': 112, 'cin': 64, 'cout': 128, 'inbounds': ['maxpool#8'], 'outbounds': ['maxpool#10']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#9', 'input_tensors': [[1, 112, 112, 64]], 'ks': [3, 3], 'inputh': 112, 'inputw': 112, 'cin': 64, 'cout': 128, 'inbounds': ['maxpool#8'], 'outbounds': ['maxpool#10']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'maxpool', 'name': 'maxpool#10', 'input_tensors': [[1, 112, 112, 128]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 112, 'inputw': 112, 'cin': 128, 'cout': 128, 'inbounds': ['conv-relu#9'], 'outbounds': ['conv-relu#11']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'maxpool', 'name': 'maxpool#10', 'input_tensors': [[1, 112, 112, 128]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 112, 'inputw': 112, 'cin': 128, 'cout': 128, 'inbounds': ['conv-relu#9'], 'outbounds': ['conv-relu#11']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#11', 'input_tensors': [[1, 56, 56, 128]], 'ks': [3, 3], 'inputh': 56, 'inputw': 56, 'cin': 128, 'cout': 256, 'inbounds': ['maxpool#10'], 'outbounds': ['conv-relu#12']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#11', 'input_tensors': [[1, 56, 56, 128]], 'ks': [3, 3], 'inputh': 56, 'inputw': 56, 'cin': 128, 'cout': 256, 'inbounds': ['maxpool#10'], 'outbounds': ['conv-relu#12']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#12', 'input_tensors': [[1, 56, 56, 256]], 'ks': [3, 3], 'inputh': 56, 'inputw': 56, 'cin': 256, 'cout': 256, 'inbounds': ['conv-relu#11'], 'outbounds': ['maxpool#13']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#12', 'input_tensors': [[1, 56, 56, 256]], 'ks': [3, 3], 'inputh': 56, 'inputw': 56, 'cin': 256, 'cout': 256, 'inbounds': ['conv-relu#11'], 'outbounds': ['maxpool#13']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'maxpool', 'name': 'maxpool#13', 'input_tensors': [[1, 56, 56, 256]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 56, 'inputw': 56, 'cin': 256, 'cout': 256, 'inbounds': ['conv-relu#12'], 'outbounds': ['conv-relu#14']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'maxpool', 'name': 'maxpool#13', 'input_tensors': [[1, 56, 56, 256]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 56, 'inputw': 56, 'cin': 256, 'cout': 256, 'inbounds': ['conv-relu#12'], 'outbounds': ['conv-relu#14']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#14', 'input_tensors': [[1, 28, 28, 256]], 'ks': [3, 3], 'inputh': 28, 'inputw': 28, 'cin': 256, 'cout': 512, 'inbounds': ['maxpool#13'], 'outbounds': ['conv-relu#15']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#14', 'input_tensors': [[1, 28, 28, 256]], 'ks': [3, 3], 'inputh': 28, 'inputw': 28, 'cin': 256, 'cout': 512, 'inbounds': ['maxpool#13'], 'outbounds': ['conv-relu#15']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#15', 'input_tensors': [[1, 28, 28, 512]], 'ks': [3, 3], 'inputh': 28, 'inputw': 28, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#14'], 'outbounds': ['maxpool#16']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:02] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#15', 'input_tensors': [[1, 28, 28, 512]], 'ks': [3, 3], 'inputh': 28, 'inputw': 28, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#14'], 'outbounds': ['maxpool#16']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'maxpool', 'name': 'maxpool#16', 'input_tensors': [[1, 28, 28, 512]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 28, 'inputw': 28, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#15'], 'outbounds': ['conv-relu#17']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:03] INFO (root/MainThread) {'op': 'maxpool', 'name': 'maxpool#16', 'input_tensors': [[1, 28, 28, 512]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 28, 'inputw': 28, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#15'], 'outbounds': ['conv-relu#17']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#17', 'input_tensors': [[1, 14, 14, 512]], 'ks': [3, 3], 'inputh': 14, 'inputw': 14, 'cin': 512, 'cout': 512, 'inbounds': ['maxpool#16'], 'outbounds': ['conv-relu#18']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:03] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#17', 'input_tensors': [[1, 14, 14, 512]], 'ks': [3, 3], 'inputh': 14, 'inputw': 14, 'cin': 512, 'cout': 512, 'inbounds': ['maxpool#16'], 'outbounds': ['conv-relu#18']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'conv-relu', 'name': 'conv-relu#18', 'input_tensors': [[1, 14, 14, 512]], 'ks': [3, 3], 'inputh': 14, 'inputw': 14, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#17'], 'outbounds': ['maxpool#19']}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:03] INFO (root/MainThread) {'op': 'conv-relu', 'name': 'conv-relu#18', 'input_tensors': [[1, 14, 14, 512]], 'ks': [3, 3], 'inputh': 14, 'inputw': 14, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#17'], 'outbounds': ['maxpool#19']}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:{'op': 'maxpool', 'name': 'maxpool#19', 'input_tensors': [[1, 14, 14, 512]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 14, 'inputw': 14, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#18'], 'outbounds': []}\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:03] INFO (root/MainThread) {'op': 'maxpool', 'name': 'maxpool#19', 'input_tensors': [[1, 14, 14, 512]], 'ks': [2, 2], 'strides': [2, 2], 'inputh': 14, 'inputw': 14, 'cin': 512, 'cout': 512, 'inbounds': ['conv-relu#18'], 'outbounds': []}\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:Predict latency: 109.77864175998363 ms\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2021-11-16 20:35:04] INFO (root/MainThread) Predict latency: 109.77864175998363 ms\n",
+      "[RESULT] predict latency for vgg11: 109.77864175998363 ms\n"
+     ]
+    }
+   ],
+   "source": [
+    "import nni.retiarii.nn.pytorch as nn  # different from \"onnx_based way\"\n",
+    "\n",
+    "class VGG(nn.Module):\n",
+    "\n",
+    "    def __init__(self, features, num_classes=1000):\n",
+    "        super(VGG, self).__init__()\n",
+    "        self.features = features\n",
+    "        self.classifier = nn.Sequential(\n",
+    "            nn.Linear(512 * 7 * 7, 4096),\n",
+    "            nn.ReLU(True),\n",
+    "            nn.Dropout(),\n",
+    "            nn.Linear(4096, 4096),\n",
+    "            nn.ReLU(True),\n",
+    "            nn.Dropout(),\n",
+    "            nn.Linear(4096, num_classes),\n",
+    "        )\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        x = self.features(x)\n",
+    "        x = x.view(x.size(0), -1)\n",
+    "        x = self.classifier(x)\n",
+    "        return x\n",
+    "\n",
+    "def make_layers(cfg, batch_norm=False):\n",
+    "    layers = []\n",
+    "    in_channels = 3\n",
+    "    for v in cfg:\n",
+    "        if v == 'M':\n",
+    "            layers += [nn.MaxPool2d(kernel_size=2, stride=2)]\n",
+    "        else:\n",
+    "            conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)\n",
+    "            if batch_norm:\n",
+    "                layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)]\n",
+    "            else:\n",
+    "                layers += [conv2d, nn.ReLU(inplace=True)]\n",
+    "            in_channels = v\n",
+    "    return nn.Sequential(*layers)\n",
+    "\n",
+    "\n",
+    "vgg11 = VGG(make_layers([64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'])) # VGG 11-layer model\n",
+    "\n",
+    "# predict latency\n",
+    "latency = predictor.predict(\n",
+    "    vgg11, model_type=\"torch\", input_shape=(1, 3, 224, 224), \n",
+    "    apply_nni=True # different from \"onnx_based way\"\n",
+    "    ) \n",
+    "print(f'[RESULT] predict latency for vgg11: {latency} ms')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Use nn-Meter by command line\n",
+    "\n",
+    "Another way to run nn-Meter is be script command line. \n",
+    "\n",
+    "After nn-Meter installation, a command `nn-meter` is added. You can predict the latency by \n",
+    "```Bash\n",
+    "# for Tensorflow (.pb) file\n",
+    "nn-meter predict --predictor <hardware> [--predictor-version <version>] --tensorflow <pb-file_or_folder> \n",
+    "\n",
+    "# for ONNX (*.onnx) file\n",
+    "nn-meter predict --predictor <hardware> [--predictor-version <version>] --onnx <onnx-file_or_folder>\n",
+    "\n",
+    "# for torch model from torchvision model zoo (str)\n",
+    "nn-meter predict --predictor <hardware> [--predictor-version <version>] --torchvision <model-name> <model-name>... \n",
+    "```\n",
+    "\n",
+    "Here are some concrete examples:\n",
+    "```Bash\n",
+    "project_path=\"/home/jiahang/nnmeter-demo/testmodel\"\n",
+    "\n",
+    "nn-meter predict --predictor adreno640gpu_tflite21 --tensorflow $project_path\n",
+    "\n",
+    "nn-meter predict --predictor adreno640gpu_tflite21 --onnx $project_path\n",
+    "\n",
+    "nn-meter predict --predictor adreno640gpu_tflite21 --torchvision mobilenet_v2\n",
+    "```\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "2602612169f43f91d25fe52816b7763616055f24dc48b1edca6c7b81a282af45"
+  },
+  "kernelspec": {
+   "display_name": "Python 3.6.10 64-bit ('py36': conda)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.10"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/examples/nn-meter_quick_start/3.nn-Meter_benchmark_dataset.ipynb
+++ b/examples/nn-meter_quick_start/3.nn-Meter_benchmark_dataset.ipynb
@ -0,0 +1,397 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Use nn-Meter Benchmark Dataset\n",
+    "nn-Meter collects and generates 26k CNN models. The dataset is released and an interface of `nn_meter.dataset` is provided for users to get access to the dataset. In this notebook, we showed how to use nn-Meter benchmark dataset for nn-Meter latency prediction, and, as a extension, for GNN latency prediction.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Model group: alexnets.jsonl\n",
+      "Model group: densenets.jsonl\n",
+      "Model group: googlenets.jsonl\n",
+      "Model group: mnasnets.jsonl\n",
+      "Model group: mobilenetv1s.jsonl\n",
+      "Model group: mobilenetv2s.jsonl\n",
+      "Model group: mobilenetv3s.jsonl\n",
+      "Model group: nasbench201s.jsonl\n",
+      "Model group: proxylessnass.jsonl\n",
+      "Model group: resnets.jsonl\n",
+      "Model group: shufflenetv2s.jsonl\n",
+      "Model group: squeezenets.jsonl\n",
+      "Model group: vggs.jsonl\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "from nn_meter.dataset import bench_dataset\n",
+    "\n",
+    "datasets = bench_dataset()\n",
+    "for data in datasets:\n",
+    "    print(f\"Model group: {os.path.basename(data)}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There are 13 groups of models in the benchmark dataset. In each groups, about 2000 model with different parameters were sampled.\n",
+    "\n",
+    "Dataset schema: for each model, the dataset stores its: \n",
+    "- model id\n",
+    "- graph in nn-meter IR graph format \n",
+    "- latency numbers on four devices\n",
+    "\n",
+    "Here we export some information of one model to show the schema of the dataset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "dict keys: ['id', 'cortexA76cpu_tflite21', 'adreno640gpu_tflite21', 'adreno630gpu_tflite21', 'myriadvpu_openvino2019r2', 'graph']\n",
+      "model id alexnet_1356\n",
+      "cpu latency:  148.164\n",
+      "adreno640gpu latency:  24.4851\n",
+      "adreno630gpu latency:  31.932404999999996\n",
+      "intelvpu latency:  15.486\n",
+      "model graph is stored in nn-meter IR (shows only one node here): {'inbounds': ['input_im_0'], 'attr': {'name': 'conv1.conv/Conv2D', 'type': 'Conv2D', 'output_shape': [[1, 56, 56, 63]], 'attr': {'dilations': [1, 1], 'strides': [4, 4], 'data_format': 'NHWC', 'padding': 'VALID', 'kernel_shape': [7, 7], 'weight_shape': [7, 7, 3, 63], 'pads': [0, 0, 0, 0]}, 'input_shape': [[1, 224, 224, 3]]}, 'outbounds': ['conv1.relu.relu/Relu']}\n"
+     ]
+    }
+   ],
+   "source": [
+    "import jsonlines\n",
+    "test_data = datasets[0]\n",
+    "with jsonlines.open(test_data) as data_reader:\n",
+    "    True_lat = []\n",
+    "    Pred_lat = []\n",
+    "    for i, item in enumerate(data_reader):\n",
+    "        print('dict keys:',list(item.keys()))\n",
+    "        print('model id',item['id'])\n",
+    "        print('cpu latency: ',item['cortexA76cpu_tflite21'])\n",
+    "        print('adreno640gpu latency: ',item['adreno640gpu_tflite21'])\n",
+    "        print('adreno630gpu latency: ',item['adreno630gpu_tflite21'])\n",
+    "        print('intelvpu latency: ',item['myriadvpu_openvino2019r2'])\n",
+    "        print('model graph is stored in nn-meter IR (shows only one node here):',\\\n",
+    "            item['graph']['conv1.conv/Conv2D'])\n",
+    "        break"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Use nn-Meter predictor with benchmark dataset "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/home/jiahang/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/base.py:315: UserWarning: Trying to unpickle estimator DecisionTreeRegressor from version 0.23.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.\n",
+      "  UserWarning)\n",
+      "/home/jiahang/anaconda3/envs/py36/lib/python3.6/site-packages/sklearn/base.py:315: UserWarning: Trying to unpickle estimator RandomForestRegressor from version 0.23.1 when using version 0.24.2. This might lead to breaking code or invalid results. Use at your own risk.\n",
+      "  UserWarning)\n"
+     ]
+    }
+   ],
+   "source": [
+    "import nn_meter\n",
+    "\n",
+    "predictor_name = 'adreno640gpu_tflite21' # user can change text here to test other predictors\n",
+    "\n",
+    "# load predictor\n",
+    "predictor = nn_meter.load_latency_predictor(predictor_name)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[RESULT] alexnets.jsonl[0]: predict: 23.447085575244767, real: 24.4851\n",
+      "[RESULT] alexnets.jsonl[1]: predict: 23.885675776357132, real: 23.9185\n",
+      "[RESULT] alexnets.jsonl[2]: predict: 29.586297830632216, real: 30.3052\n",
+      "[RESULT] alexnets.jsonl[3]: predict: 51.12333226388625, real: 52.089\n",
+      "[RESULT] alexnets.jsonl[4]: predict: 4.937166470494071, real: 5.26442\n",
+      "[RESULT] alexnets.jsonl[5]: predict: 14.996201148770355, real: 15.2265\n",
+      "[RESULT] alexnets.jsonl[6]: predict: 9.262593840400983, real: 9.12046\n",
+      "[RESULT] alexnets.jsonl[7]: predict: 13.912859618198581, real: 14.2242\n",
+      "[RESULT] alexnets.jsonl[8]: predict: 15.02293612116675, real: 15.2457\n",
+      "[RESULT] alexnets.jsonl[9]: predict: 12.443609556620192, real: 12.5989\n",
+      "[RESULT] alexnets.jsonl[10]: predict: 15.971239887611217, real: 15.185\n",
+      "[RESULT] alexnets.jsonl[11]: predict: 19.469347190777857, real: 20.1434\n",
+      "[RESULT] alexnets.jsonl[12]: predict: 12.580476335563757, real: 14.4818\n",
+      "[RESULT] alexnets.jsonl[13]: predict: 18.514081238237033, real: 19.0136\n",
+      "[RESULT] alexnets.jsonl[14]: predict: 7.330729281187614, real: 7.7855\n",
+      "[RESULT] alexnets.jsonl[15]: predict: 14.860185617106685, real: 15.7775\n",
+      "[RESULT] alexnets.jsonl[16]: predict: 15.788781165175774, real: 16.0765\n",
+      "[RESULT] alexnets.jsonl[17]: predict: 35.33131516111195, real: 35.7741\n",
+      "[RESULT] alexnets.jsonl[18]: predict: 12.409197810645443, real: 12.4725\n",
+      "[RESULT] alexnets.jsonl[19]: predict: 37.08473259556314, real: 36.4975\n",
+      "[SUMMARY] The first 20 cases from alexnets.jsonl on adreno640gpu_tflite21: rmse: 0.6889098264185193, 5%accuracy: 0.75, 10%accuracy: 0.95\n"
+     ]
+    }
+   ],
+   "source": [
+    "# view latency prediction demo in one model group of the dataset \n",
+    "test_data = datasets[0]\n",
+    "with jsonlines.open(test_data) as data_reader:\n",
+    "    True_lat = []\n",
+    "    Pred_lat = []\n",
+    "    for i, item in enumerate(data_reader):\n",
+    "        if i >= 20: # only show the first 20 results to save space\n",
+    "            break\n",
+    "        graph = item[\"graph\"]\n",
+    "        pred_lat = predictor.predict(graph, model_type=\"nnmeter-ir\")\n",
+    "        real_lat = item[predictor_name]\n",
+    "        print(f'[RESULT] {os.path.basename(test_data)}[{i}]: predict: {pred_lat}, real: {real_lat}')\n",
+    "\n",
+    "        if real_lat != None:\n",
+    "            True_lat.append(real_lat)\n",
+    "            Pred_lat.append(pred_lat)\n",
+    "\n",
+    "if len(True_lat) > 0:\n",
+    "    rmse, rmspe, error, acc5, acc10, _ = nn_meter.latency_metrics(Pred_lat, True_lat)\n",
+    "    print(\n",
+    "        f'[SUMMARY] The first 20 cases from {os.path.basename(test_data)} on {predictor_name}: rmse: {rmse}, 5%accuracy: {acc5}, 10%accuracy: {acc10}'\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Use benckmark dataset for GNN\n",
+    "\n",
+    "Considering the dataset is encoded in a graph format, we also provide interfaces, i.e., `GNNDataset` and `GNNDataloader`, for GNN training to predict the model latency with the bench dataset. \n",
+    "\n",
+    "`GNNDataset` and `GNNDataloader` in `nn_meter/dataset/gnn_dataloader.py` build the model structure of the Dataset in `.jsonl` format into GNN required Dataset and Dataloader. The output of GNNDataset includes adjacency matrix and attributes of the graph, together with latency value. The script depends on package `torch` and `dgl`.\n",
+    "\n",
+    "Here we provide dataset for GNN training:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Processing Training Set.\n",
+      "Processing Testing Set.\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Using backend: pytorch\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Train Dataset Size: 20732\n",
+      "Testing Dataset Size: 5173\n",
+      "Attribute tensor shape: 26\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "from nn_meter.dataset import gnn_dataloader\n",
+    "\n",
+    "target_device = \"cortexA76cpu_tflite21\"\n",
+    "\n",
+    "print(\"Processing Training Set.\")\n",
+    "train_set = gnn_dataloader.GNNDataset(train=True, device=target_device) \n",
+    "print(\"Processing Testing Set.\")\n",
+    "test_set = gnn_dataloader.GNNDataset(train=False, device=target_device)\n",
+    "\n",
+    "train_loader = gnn_dataloader.GNNDataloader(train_set, batchsize=1 , shuffle=True)\n",
+    "test_loader = gnn_dataloader.GNNDataloader(test_set, batchsize=1, shuffle=False)\n",
+    "print('Train Dataset Size:', len(train_set))\n",
+    "print('Testing Dataset Size:', len(test_set))\n",
+    "print('Attribute tensor shape:', next(train_loader)[1].ndata['h'].size(1))\n",
+    "ATTR_COUNT = next(train_loader)[1].ndata['h'].size(1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Then we build a GNN model, which is constructed based on GraphSAGE, and maxpooling is selected as out pooling method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "from torch.nn.modules.module import Module\n",
+    "import dgl.nn as dglnn\n",
+    "from dgl.nn.pytorch.glob import MaxPooling\n",
+    "\n",
+    "class GNN(Module):\n",
+    "    def __init__(self, \n",
+    "                num_features=0, \n",
+    "                num_layers=2,\n",
+    "                num_hidden=32,\n",
+    "                dropout_ratio=0):\n",
+    "\n",
+    "        super(GNN, self).__init__()\n",
+    "        self.nfeat = num_features\n",
+    "        self.nlayer = num_layers\n",
+    "        self.nhid = num_hidden\n",
+    "        self.dropout_ratio = dropout_ratio\n",
+    "        self.gc = nn.ModuleList([dglnn.SAGEConv(self.nfeat if i==0 else self.nhid, self.nhid, 'pool') for i in range(self.nlayer)])\n",
+    "        self.bn = nn.ModuleList([nn.LayerNorm(self.nhid) for i in range(self.nlayer)])\n",
+    "        self.relu = nn.ModuleList([nn.ReLU() for i in range(self.nlayer)])\n",
+    "        self.pooling = MaxPooling()\n",
+    "        self.fc = nn.Linear(self.nhid, 1)\n",
+    "        self.fc1 = nn.Linear(self.nhid, self.nhid)\n",
+    "        self.dropout = nn.ModuleList([nn.Dropout(self.dropout_ratio) for i in range(self.nlayer)])\n",
+    "\n",
+    "    def forward_single_model(self, g, features):\n",
+    "        x = self.relu[0](self.bn[0](self.gc[0](g, features)))\n",
+    "        x = self.dropout[0](x)\n",
+    "        for i in range(1,self.nlayer):\n",
+    "            x = self.relu[i](self.bn[i](self.gc[i](g, x)))\n",
+    "            x = self.dropout[i](x)\n",
+    "        return x\n",
+    "\n",
+    "    def forward(self, g, features):\n",
+    "        x = self.forward_single_model(g, features)\n",
+    "        with g.local_scope():\n",
+    "            g.ndata['h'] = x\n",
+    "            x = self.pooling(g, x)\n",
+    "            x = self.fc1(x)\n",
+    "            return self.fc(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Start GNN training:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[Epoch  0 ]:  Training accuracy within 10%:  22.486976654447233  %.\n",
+      "[Epoch  1 ]:  Training accuracy within 10%:  29.471348639783905  %.\n",
+      "[Epoch  2 ]:  Training accuracy within 10%:  32.60659849508007  %.\n",
+      "[Epoch  3 ]:  Training accuracy within 10%:  37.830407100135055  %.\n",
+      "[Epoch  4 ]:  Training accuracy within 10%:  43.32915300019294  %.\n"
+     ]
+    }
+   ],
+   "source": [
+    "from torch.optim.lr_scheduler import CosineAnnealingLR\n",
+    "\n",
+    "if torch.cuda.is_available():\n",
+    "    print(\"Using CUDA.\")\n",
+    "# device = \"cpu\"\n",
+    "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
+    "\n",
+    "# Start Training\n",
+    "model = GNN(ATTR_COUNT, 3, 400, 0.1).to(device)\n",
+    "opt = torch.optim.AdamW(model.parameters(), lr=4e-4)\n",
+    "EPOCHS=5\n",
+    "loss_func = nn.L1Loss()\n",
+    "\n",
+    "lr_scheduler = CosineAnnealingLR(opt, T_max=EPOCHS)\n",
+    "loss_sum = 0\n",
+    "for epoch in range(EPOCHS):\n",
+    "    train_length = len(train_set)\n",
+    "    tran_acc_ten = 0\n",
+    "    loss_sum = 0 \n",
+    "    # latency, graph, types, flops\n",
+    "    for batched_l, batched_g in train_loader:\n",
+    "        opt.zero_grad()\n",
+    "        batched_l = batched_l.to(device).float()\n",
+    "        batched_g = batched_g.to(device)\n",
+    "        batched_f = batched_g.ndata['h'].float()\n",
+    "        logits = model(batched_g, batched_f)\n",
+    "        for i in range(len(batched_l)):\n",
+    "            pred_latency = logits[i].item()\n",
+    "            prec_latency = batched_l[i].item()\n",
+    "            if (pred_latency >= 0.9 * prec_latency) and (pred_latency <= 1.1 * prec_latency):\n",
+    "                tran_acc_ten += 1\n",
+    "        # print(\"true latency: \", batched_l)\n",
+    "        # print(\"Predict latency: \", logits)\n",
+    "        batched_l = torch.reshape(batched_l, (-1 ,1))\n",
+    "        loss = loss_func(logits, batched_l)\n",
+    "        loss_sum += loss\n",
+    "        loss.backward()\n",
+    "        opt.step()\n",
+    "    lr_scheduler.step()\n",
+    "    print(\"[Epoch \", epoch, \"]: \", \"Training accuracy within 10%: \", tran_acc_ten / train_length * 100, \" %.\")"
+   ]
+  }
+ ],
+ "metadata": {
+  "interpreter": {
+   "hash": "2602612169f43f91d25fe52816b7763616055f24dc48b1edca6c7b81a282af45"
+  },
+  "kernelspec": {
+   "display_name": "Python 3.6.10 64-bit ('py36': conda)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.10"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}