A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.

deep-learning deep-neural-networks edge-ai edge-computing efficient-model inference latency machine-learning neural-architecture-search onnx-models python pytorch tensorflow-models

Перейти к файлу

lzhani a84cbe4f96 checkpoint		2021-05-21 17:53:46 +08:00
data	checkpoint	2021-05-21 17:28:23 +08:00
ir_converters	Move inconsistency resolve from ir_converters to kerneldetection; Resolve inconsistency between ONNX and Tensorflow	2021-05-20 02:37:18 -04:00
kerneldetection	Complete end-to-end demo	2021-05-20 14:28:53 -04:00
prediction	checkpoint	2021-05-21 17:28:23 +08:00
results	checkpoint	2021-05-21 17:28:23 +08:00
utils	Move inconsistency resolve from ir_converters to kerneldetection; Resolve inconsistency between ONNX and Tensorflow	2021-05-20 02:37:18 -04:00
.gitattributes	checkpoint	2021-05-21 17:53:46 +08:00
.gitignore	Add .gitignore	2021-05-19 19:29:43 -04:00
CODE_OF_CONDUCT.md	Initial CODE_OF_CONDUCT.md commit	2021-04-25 23:04:58 -07:00
LICENSE	Updating LICENSE to template content	2021-04-25 23:04:59 -07:00
README.md	Update README.md	2021-05-21 16:43:22 +08:00
SECURITY.md	Initial SECURITY.md commit	2021-04-25 23:05:02 -07:00
SUPPORT.md	Initial SUPPORT.md commit	2021-04-25 23:05:03 -07:00
config.py	Complete end-to-end demo	2021-05-20 14:28:53 -04:00
demo.py	checkpoint	2021-05-21 17:28:23 +08:00
model_to_grapher.py	checkpoint	2021-05-21 17:28:23 +08:00
requirements.txt	checkpoint	2021-05-21 17:53:46 +08:00

README.md

Introduction

nn-Meter is a novel and efficient system to accurately predict the inference latency of DNN models on diverse edge devices. The key idea is dividing a whole model inference into kernels, i.e., the execution units of fused operators on a device, and conduct kernel-level prediction. nn-Meter contains two key techniques: (i) kernel detection to automatically detect the execution unit of model inference via a set of well-designed test cases; (ii) adaptive sampling to efficiently sample the most beneficial configurations from a large space to build accurate kernel-level latency predictors. nn-Meter currently evaluates four popular platforms on a large dataset of 26k models. It achieves 99.0% (mobile CPU), 99.1% (mobile Adreno 640 GPU), 99.0% (mobile Adreno 630 GPU), and 83.4% (Intel VPU) prediction accuracy.

The current supported hardware and inference frameworks:

name	Device	Framework	Processor	+-10% Accuracy
CPU	Pixel4	TFLite v2.1	CortexA76 CPU	99.0%
GPU	Mi9	TFLite v2.1	Adreno 640 GPU	99.1%
GPU1	Pixel3XL	TFLite v2.1	Adreno 630 GPU	99.0%
VPU	Intel Movidius NCS2	OpenVINO2019R2	Myriad VPU	83.4%

As the maintainer of this project, please make a few updates:

Improving this README.MD file to provide a great experience
Updating SUPPORT.MD with content about this project's support experience
Understanding the security reporting process in SECURITY.MD
Remove this section from the README

Installation

To install nn-meter, please first install python3. The test environment uses anaconda python 3.6.10. Install the dependencies via: pip3 install -r requirements.txt Please also check the versions of numpy, scikit_learn. The different versions may change the prediction accuracy of kernel predictors.

Usage

To run the latency predictor, we support two input formats. We include popular CNN models in data/testmodels

1. input model: xx.onnx or xx.pb:

python demo_with_converter.py --input_models data/testmodels/alexnet.onnx --mf alexnet

python demo_with_converter.py --input_models data/testmodels/alexnet.pb --mf alexnet

It will firstly convert onnx and pb models into our defined IR json. We conduct kernel detection with the IR graph and predict kernel latency on the 4 measured edge devices.

2. input model: the converted IR json:

python demo.py --input_models data/testmodels/alexnet_0.json --mf alexnet

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.