Merge pull request #123 from microsoft/harsha/reorg

Harsha/reorg
2019-12-11 10:15:20 -08:00 · 2019-12-11 10:15:20 -08:00 · ebd78a3415
--- a/README.md
+++ b/README.md
@ -1,73 +1,90 @@
-## Edge Machine Learning
+## The Edge Machine Learning library
 This repository provides code for machine learning algorithms for edge devices
 developed at [Microsoft Research
 India](https://www.microsoft.com/en-us/research/project/resource-efficient-ml-for-the-edge-and-endpoint-iot-devices/). 
 Machine learning models for edge devices need to have a small footprint in
-terms of storage, prediction latency, and energy. One example of a ubiquitous
+terms of storage, prediction latency, and energy. One instance of where such 
-real-world application where such models are desirable is resource-scarce
+models are desirable is resource-scarce devices and sensors in the Internet 
-devices and sensors in the Internet of Things (IoT) setting. Making real-time
+of Things (IoT) setting. Making real-time predictions locally on IoT devices 
-predictions locally on IoT devices without connecting to the cloud requires
+without connecting to the cloud requires models that fit in a few kilobytes.
 models that fit in a few kilobytes.
-This repository contains algorithms that shine in this setting in terms of both model size and compute, namely:
+### Contents
 Algorithms that shine in this setting in terms of both model size and compute, namely:
 - **Bonsai**: Strong and shallow non-linear tree based classifier.
 - **ProtoNN**: **Proto**type based k-nearest neighbors (k**NN**) classifier. 
 - **EMI-RNN**: Training routine to recover the critical signature from time series data for faster and accurate RNN predictions.
 - **S-RNN**: A meta-architecture for training RNNs that can be applied to streaming data.
 - **FastRNN & FastGRNN - FastCells**: **F**ast, **A**ccurate, **S**table and **T**iny (**G**ated) RNN cells.
 - **SeeDot**: Floating-point to fixed-point quantization tool.
 - **GesturePod**: Gesture recognition pipeline for microcontrollers.
 These algorithms can train models for classical supervised learning problems
 with memory requirements that are orders of magnitude lower than other modern
 ML algorithms. The trained models can be loaded onto edge devices such as IoT
 devices/sensors, and used to make fast and accurate predictions completely
 offline.
-The `tf` directory contains code, examples and scripts for all these algorithms
+A tool that adapts models trained by above algorithms to be inferred by fixed point arithmetic.
-in TensorFlow. The `pytorch` directory contains code, examples and scripts for all these algorithms
+ - **SeeDot**: Floating-point to fixed-point quantization tool.
 in PyTorch. The `cpp` directory has training and inference code for Bonsai and
 ProtoNN algorithms in C++. Please see install/run instruction in the Readme
 pages within these directories. The `applications` directory has code/demonstrations
 of applications of the EdgeML algorithms. The `Tools/SeeDot` directory has the
 quantization tool to generate fixed-point inference code.  
-For details, please see our [wiki
+Applications demonstrating usecases of these algorithms:
-page](https://github.com/Microsoft/EdgeML/wiki/) and our ICML'17 publications
+ - **GesturePod**: Gesture recognition pipeline for microcontrollers.
-on [Bonsai](docs/publications/Bonsai.pdf) and
+ - **MSC-RNN**: Multi-scale cascaded RNN for analyzing Radar data.
-[ProtoNN](docs/publications/ProtoNN.pdf) algorithms, NeurIPS'18 publications on
+
-[EMI-RNN](docs/publications/emi-rnn-nips18.pdf) and
+### Organization
-[FastGRNN](docs/publications/FastGRNN.pdf), PLDI'19 publication on 
+ - The `tf` directory contains the `edgeml_tf` package which specifies these architectures in TensorFlow,
-[SeeDot](docs/publications/SeeDot.pdf), and UIST'19 publication on 
+   and `examples/tf` contains sample training routines for these algorithms.
-[GesturePod](docs/publications/GesturePod-UIST19.pdf).
+ - The `pytorch` directory contains the `edgeml_pytorch` package which specifies these architectures in PyTorch,
   and `examples/pytorch` contains sample training routines for these algorithms.
 - The `cpp` directory has training and inference code for Bonsai and ProtoNN algorithms in C++.
 - The `applications` directory has code/demonstrations of applications of the EdgeML algorithms. 
 - The `tools/SeeDot` directory has the quantization tool to generate fixed-point inference code.  
 Please see install/run instructions in the README pages within these directories.
 ### Details and project pages
 For details, please see our
 [project page](https://microsoft.github.io/EdgeML/), 
 [Microsoft Research page](https://www.microsoft.com/en-us/research/project/resource-efficient-ml-for-the-edge-and-endpoint-iot-devices/),
 the ICML '17 publications on [Bonsai](/docs/publications/Bonsai.pdf) and
 [ProtoNN](/docs/publications/ProtoNN.pdf) algorithms, 
 the NeurIPS '18 publications on [EMI-RNN](/docs/publications/emi-rnn-nips18.pdf) and
 [FastGRNN](/docs/publications/FastGRNN.pdf),
 the PLDI '19 publication on [SeeDot compiler](/docs/publications/SeeDot.pdf),
 the UIST '19 publication on [Gesturepod](/docs/publications/ICane-UIST19.pdf), 
 the BuildSys '19 publication on [MSC-RNN](/docs/publications/MSCRNN.pdf),
 and the NeurIPS '19 publication on [S-RNN](/docs/publications/SRNN.pdf).
-Core Contributors:
+Also checkout the [ELL](https://github.com/Microsoft/ELL) project which can
-  - [Aditya Kusupati](https://adityakusupati.github.io/)
+provide optimized binaries for some of the ONNX models trained by this library.
-  - [Ashish Kumar](https://ashishkumar1993.github.io/)
+
-  - [Chirag Gupta](https://aigen.github.io/)
+### Contributors:
 Code for algorithms, applications and tools contributed by:
  - [Don Dennis](https://dkdennis.xyz)
-  - [Harsha Vardhan Simhadri](http://harsha-simhadri.org)
+  - [Yash Gaurkar](https://github.com/mr-yamraj/)
  - [Shishir Patil](https://shishirpatil.github.io/)
  - [Sridhar Gopinath](http://www.sridhargopinath.in/)
  - [Chirag Gupta](https://aigen.github.io/)
  - [Moksh Jain](https://github.com/MJ10)
  - [Ashish Kumar](https://ashishkumar1993.github.io/)
  - [Aditya Kusupati](https://adityakusupati.github.io/)
  - [Chris Lovett](https://github.com/lovettchris)
  - [Shishir Patil](https://shishirpatil.github.io/)
  - [Harsha Vardhan Simhadri](http://harsha-simhadri.org)
-We welcome contributions, comments, and criticism. For questions, please [email
+[Contributors](https://microsoft.github.io/EdgeML/People) to this project. New contributors welcome.
 us](mailto:edgeml@microsoft.com).
-[People](https://github.com/Microsoft/EdgeML/wiki/People/) who have contributed
+Please [email us](mailto:edgeml@microsoft.com) your comments, criticism, and questions.
 to this
 [project](https://www.microsoft.com/en-us/research/project/resource-efficient-ml-for-the-edge-and-endpoint-iot-devices/).
-If you use the EdgeML library in your projects or publications, please do cite us using the following BibTex:
+If you use software from this library in your work, please use the BibTex entry below for citation.
 ```
-@software{edgeml01,
+@software{edgeml03,
-   author = {{Dennis, Don Kurian and Gopinath, Sridhar and Gupta, Chirag and
+   author = {{Dennis, Don Kurian and Gaurkar, Yash and Gopinath, Sridhar and Gupta, Chirag and
-      Kumar, Ashish and Kusupati, Aditya and Patil, Shishir G and Simhadri, Harsha Vardhan}},
+              Jain, Moksh and Kumar, Ashish and Kusupati, Aditya and Lovett, Chris 
              and Patil, Shishir G and Simhadri, Harsha Vardhan}},
   title = {{EdgeML: Machine Learning for resource-constrained edge devices}},
   url = {https://github.com/Microsoft/EdgeML},
-   version = {0.1},
+   version = {0.3},
 }
 ```
--- a/applications/GesturePod/README.md
+++ b/applications/GesturePod/README.md
--- a/applications/GesturePod/onComputer/Makefile
+++ b/applications/GesturePod/onComputer/Makefile
--- a/applications/GesturePod/onComputer/README.md
+++ b/applications/GesturePod/onComputer/README.md
--- a/applications/GesturePod/onComputer/data/taps.txt
+++ b/applications/GesturePod/onComputer/data/taps.txt
--- a/applications/GesturePod/onComputer/src/config.h
+++ b/applications/GesturePod/onComputer/src/config.h
--- a/applications/GesturePod/onComputer/src/data.h
+++ b/applications/GesturePod/onComputer/src/data.h
--- a/applications/GesturePod/onComputer/src/featurizer.cpp
+++ b/applications/GesturePod/onComputer/src/featurizer.cpp
--- a/applications/GesturePod/onComputer/src/featurizer.h
+++ b/applications/GesturePod/onComputer/src/featurizer.h
--- a/applications/GesturePod/onComputer/src/main.cpp
+++ b/applications/GesturePod/onComputer/src/main.cpp
--- a/applications/GesturePod/onComputer/src/protoNN.cpp
+++ b/applications/GesturePod/onComputer/src/protoNN.cpp
--- a/applications/GesturePod/onComputer/src/protoNN.h
+++ b/applications/GesturePod/onComputer/src/protoNN.h
--- a/applications/GesturePod/onComputer/src/utils.cpp
+++ b/applications/GesturePod/onComputer/src/utils.cpp
--- a/applications/GesturePod/onComputer/src/utils.h
+++ b/applications/GesturePod/onComputer/src/utils.h
--- a/applications/GesturePod/onMKR1000/README.md
+++ b/applications/GesturePod/onMKR1000/README.md
--- a/applications/GesturePod/onMKR1000/config.h
+++ b/applications/GesturePod/onMKR1000/config.h
--- a/applications/GesturePod/onMKR1000/onMKR1000.ino
+++ b/applications/GesturePod/onMKR1000/onMKR1000.ino
--- a/applications/GesturePod/onMKR1000/src/comm_module.cpp
+++ b/applications/GesturePod/onMKR1000/src/comm_module.cpp
--- a/applications/GesturePod/onMKR1000/src/comm_module.h
+++ b/applications/GesturePod/onMKR1000/src/comm_module.h
--- a/applications/GesturePod/onMKR1000/src/data.h
+++ b/applications/GesturePod/onMKR1000/src/data.h
--- a/applications/GesturePod/onMKR1000/src/featurizer.cpp
+++ b/applications/GesturePod/onMKR1000/src/featurizer.cpp
--- a/applications/GesturePod/onMKR1000/src/featurizer.h
+++ b/applications/GesturePod/onMKR1000/src/featurizer.h
--- a/applications/GesturePod/onMKR1000/src/lib/I2Cdev.cpp
+++ b/applications/GesturePod/onMKR1000/src/lib/I2Cdev.cpp
--- a/applications/GesturePod/onMKR1000/src/lib/I2Cdev.h
+++ b/applications/GesturePod/onMKR1000/src/lib/I2Cdev.h
--- a/applications/GesturePod/onMKR1000/src/lib/MPU6050.cpp
+++ b/applications/GesturePod/onMKR1000/src/lib/MPU6050.cpp
--- a/applications/GesturePod/onMKR1000/src/lib/MPU6050.h
+++ b/applications/GesturePod/onMKR1000/src/lib/MPU6050.h
--- a/applications/GesturePod/onMKR1000/src/lib/MPU6050_6Axis_MotionApps20.h
+++ b/applications/GesturePod/onMKR1000/src/lib/MPU6050_6Axis_MotionApps20.h
--- a/applications/GesturePod/onMKR1000/src/lib/helper_3dmath.h
+++ b/applications/GesturePod/onMKR1000/src/lib/helper_3dmath.h
--- a/applications/GesturePod/onMKR1000/src/protoNN.cpp
+++ b/applications/GesturePod/onMKR1000/src/protoNN.cpp
--- a/applications/GesturePod/onMKR1000/src/protoNN.h
+++ b/applications/GesturePod/onMKR1000/src/protoNN.h
--- a/applications/GesturePod/onMKR1000/src/utils.cpp
+++ b/applications/GesturePod/onMKR1000/src/utils.cpp
--- a/applications/GesturePod/onMKR1000/src/utils.h
+++ b/applications/GesturePod/onMKR1000/src/utils.h
--- a/applications/MSCRNN/README.md
+++ b/applications/MSCRNN/README.md
@ -0,0 +1,22 @@
 MSC-RNN - Multi-Scale, Cascaded RNN
 ==========
 MSC-RNN is a new RNN architecture proposed in the paper,
 [One Size Does Not Fit All: Multi-Scale, Cascaded RNNs for Radar Classification](https://arxiv.org/abs/1909.03082), 
 which won the **Best Paper Runner-Up** Award at *BuildSys 2019*.
 MSC-RNN is created using EMI-RNN and FastGRNN from the EdgeML repository. 
 It comprises of an EMI-FastGRNN for clutter discrimination at a lower tier and a more complex FastGRNN 
 classifier for source classifcation at the upper-tier and is trained using a novel joint-training routine.
 MSC-RNN holistically improves the accuracy and per-class recalls over ML models suitable for radar inferencing. 
 Notably, MSC-RNN outperforms cross-domain handcrafted feature engineering with time-domain deep feature learning, 
 while also being up to ∼3× more efficient than the competitive SVM based solutions.
 # Resources
 **Paper** - [pdf](/docs/publications/MSCRNN.pdf) | [arXiv](https://arxiv.org/pdf/1909.03082.pdf) | [ACM DL](https://dl.acm.org/citation.cfm?id=3360860)
 **Code** - https://github.com/dhruboroy29/MSCRNN
 **Dataset** - https://doi.org/10.5281/zenodo.3451408
--- a/tf/docs/EMI-RNN.md
+++ b/tf/docs/EMI-RNN.md
@ -6,7 +6,7 @@ Tensorflow library. This document **does not** seek to be a comprehensive
 documentation of the EMI-RNN code base.
 For a quick and dirty 'getting started' example please refer to
-`tf/examples/EMI-RNN` directory.
+[EMI_RNN](../examples/tf/EMI-RNN) directory.
 ![MIML Formulation of Bags and Instances](img/MIML_illustration.png)
@ -49,21 +49,21 @@ sub-instance is set initialized to equal the label of the entire bag.
 Thus, the EMI-RNN implementation expects the train data to be of shape,
    [Num. of examples, Num. of instances, Num. of timestep, Num. features]
-    
+
 Further, the label information is expected to be one hot encoded and of the
 shape,
    [Num. of examples, Num. of instances, Num. classes]
 As a concrete end to end example, please refer to
-`tf/examples/EMI-RNN/fetch_har.py`.
+[EMI_RNN/fetch_har.py](../examples/tf/EMI-RNN/fetch_har.py).
 ## Training
 ![An illustration of the parts of the computation graph](img/3PartsGraph.png)
 The EMI-RNN algorithm consists of a graph construction phase and a training
-phase. 
+phase.
 ### Graph Construction
@ -83,7 +83,7 @@ forward computation graphs. All implementations of `EMI_RNN` are expected and
 assumed to provide an `EMI_RNN.output` attribute - the Tensor/Operation with
 the forward computation outputs. The following implementations of `EMI_RNN` are
 provided:
-  - `EMI_LSTM` 
+  - `EMI_LSTM`
  - `EMI_GRU`
  - `EMI_FastRNN`
  - `EMI_FastGRNN`
@ -116,11 +116,11 @@ Train_EMI_RNN:
 		Y: Train labels
 		EMI_Graph: A complete training graph
 		updatePolicy: An update policy that will update the instace lables
-			after each training rounds. 
+			after each training rounds.
 		NUM_ROUNDS: Number of rounds of training
-    
+
    curr_Y = Y
-	for round in range(NUM_ROUNDS): 
+	for round in range(NUM_ROUNDS):
 		minimize_loss(EMI_graph, X, curr_Y)
 		curr_Y = updatePolicy(EMI_RNN(X))
 ```
@ -163,7 +163,7 @@ invalid upon the graph being reset internally.
 It is possible to restore a trained model into a session from its checkpoint.
 `EMI_Driver` exposes an easy to use way of achieving this through
-`loadSavedGraphToNewSession` method. 
+`loadSavedGraphToNewSession` method.
 To use this method, first construct a new computation graph as you would
 normally do and setup `EMI_Driver` with this computation graph. Then you can
@ -181,8 +181,8 @@ matrices is also supported. This is achieved by attaching `tf.assign`
 operations to all the model tensors. Please have a look at `addAssignOps`
 method of `DataPipeline`, `EMI_RNN` and `EMI_Trainer` for more information.
-Please refer to `tf/examples/02_emi_lstm_initialization_and_restoring.npy` for
+Please refer to [EMI_RNN/02_emi_lstm_initialization_and_restoring.ipynb](../examples/tf/EMI-RNN/02_emi_lstm_initialization_and_restoring.ipynb)
-example usages.
+for example usages.
 ## Evaluating the  trained model
@ -210,4 +210,4 @@ Early prediction is accomplished by defining an early prediction policy method.
 This method receives the prediction at each step of the learned RNN for a
 sub-instance as input and is expected to return a predicted class and the
 0-indexed step at which it made this prediction.  Please refer to the
-`tf/examples/EMI-RNN` for concrete examples of the same.
+[EMI-RNN](../examples/tf/EMI-RNN) for concrete examples of the same.
--- a/tf/docs/FastCells.md
+++ b/tf/docs/FastCells.md
@ -1,11 +1,11 @@
 # FastRNN and FastGRNN - FastCells
-This document aims to explain and elaborate on specific details of FastCells 
+This document elaborates on the  details of FastCells
-present as part of `tf/edgeml/graph/rnn.py`. The endpoint use case scripts with 
+present in [tf/edgeml_tf/graph/rnn.py](/tf/edgeml_tf/graph/rnn.py). The
-3 phase training along with an example notebook are present in `tf/examples/FastCells/`.
+endpoint use case scripts with 3 phase training along with an example notebook
-One can use the endpoint script to test out the RNN architectures on any dataset 
+are present in [examples/tf/FastCells](/examples/tf/FastCells). One can use the endpoint script to test
-while specifying budget constraints as part of hyper-parameters in terms of sparsity and rank 
+out the RNN architectures on any dataset while specifying budget constraints as
-of weight matrices.
+part of hyper-parameters in terms of sparsity and rank of weight matrices.
 # FastRNN
 ![FastRNN](img/FastRNN.png)
@ -17,23 +17,23 @@ of weight matrices.
 # Plug and Play Cells
-`FastRNNCell` and `FastGRNNCell` present in `edgeml.graph.rnn` are very similar to 
+`FastRNNCell` and `FastGRNNCell` present in `edgeml.graph.rnn` are very similar to
-Tensorflow's inbuilt `RNNCell`, `GRUCell`, `BasicLSTMCell`, and `UGRNNCell` allowing us to 
+Tensorflow's inbuilt `RNNCell`, `GRUCell`, `BasicLSTMCell`, and `UGRNNCell` allowing us to
-replace any of the standard RNN Cell in our architecture with FastCells. 
+replace any of the standard RNN Cell in our architecture with FastCells.
-One can see the plug and play nature at the endpoint script for FastCells, where the graph 
+One can see the plug and play nature at the endpoint script for FastCells, where the graph
-building is very similar to LSTM/GRU in Tensorflow. 
+building is very similar to LSTM/GRU in Tensorflow.
-Script: [Endpoint Script](../examples/FastCells/fastcell_example.py)
+Script: [Endpoint Script](/examples/tf/FastCells/fastcell_example.py)
-Example Notebook: [iPython Notebook](../examples/FastCells/fastcell_example.ipynb)
+Example Notebook: [iPython Notebook](/examples/tf/FastCells/fastcell_example.ipynb)
-Cells: [FastRNNCell](../edgeml/graph/rnn.py#L206) and [FastGRNNCell](../edgeml/graph/rnn.py#L31).
+Cells: [FastRNNCell](/tf/edgeml/graph/rnn.py#L206) and [FastGRNNCell](/tf/edgeml/graph/rnn.py#L31).
 # 3 phase Fast Training
-`FastCells`, similar to `Bonsai` use a 3 phase training routine, to induce the right 
+`FastCells`, similar to `Bonsai` use a 3 phase training routine, to induce the right
-support and sparsity for the weight matrices. With the low-rank parameterization of weights 
+support and sparsity for the weight matrices. With the low-rank parameterization of weights
-followed by the 3 phase training, we obtain FastRNN and FastGRNN models which are compact 
+followed by the 3 phase training, we obtain FastRNN and FastGRNN models which are compact
 and they can be further compressed by using byte quantization without significant loss in accuracy.
 # Compression
@ -42,16 +42,16 @@ and they can be further compressed by using byte quantization without significan
 2) Sparsity (S)
 3) Quantization (Q)
-Low-rank is directly induced into the FastCells during initialization and the training happens with 
+Low-rank is directly induced into the FastCells during initialization and the training happens with
-the targetted low-rank versions of the weight matrices. One can use `wRank` and `uRank` parameters 
+the targetted low-rank versions of the weight matrices. One can use `wRank` and `uRank` parameters
 of FastCells to achieve this.
-Sparsity is taken in as hyper-parameter during the 3 phase training into `fastTrainer.py` which at the 
+Sparsity is taken in as hyper-parameter during the 3 phase training into `fastTrainer.py` which at the
 end spits out a sparse, low-rank model.
-Further compression is achieved by byte Quantization and can be performed using `quantizeFastModels.py` 
+Further compression is achieved by byte Quantization and can be performed using `quantizeFastModels.py`
-script which is part of `tf/exampled/FastCells/`. This will give model size reduction of up to 4x if 8-bit
+script which is part of [examples/tf/FastCells](/examples/tf/FastCells). This will give model size reduction of up to 4x if 8-bit
-integers are used. Lastly, to facilitate all integer arithmetic, including the non-linearities, one could 
+integers are used. Lastly, to facilitate all integer arithmetic, including the non-linearities, one could
-use `quantTanh` instead of `tanh` and `quantSigm` instead of `sigmoid` as the non-linearities in the RNN 
+use `quantTanh` instead of `tanh` and `quantSigm` instead of `sigmoid` as the non-linearities in the RNN
-Cells followed by byte quantization. These non-linearities can be set using the appropriate parameters in 
+Cells followed by byte quantization. These non-linearities can be set using the appropriate parameters in
 the `FastRNNCell` and `FastGRNNCell`
--- a/tf/docs/img/3PartsGraph.png
+++ b/tf/docs/img/3PartsGraph.png
--- a/tf/docs/img/FastGRNN.png
+++ b/tf/docs/img/FastGRNN.png
--- a/tf/docs/img/FastGRNN_eq.png
+++ b/tf/docs/img/FastGRNN_eq.png
--- a/tf/docs/img/FastRNN.png
+++ b/tf/docs/img/FastRNN.png
--- a/tf/docs/img/FastRNN_eq.png
+++ b/tf/docs/img/FastRNN_eq.png
--- a/tf/docs/img/MIML_illustration.png
+++ b/tf/docs/img/MIML_illustration.png
--- a/docs/publications/MSCRNN.pdf
+++ b/docs/publications/MSCRNN.pdf
--- a/examples/pytorch/Bonsai/README.md
+++ b/examples/pytorch/Bonsai/README.md
@ -4,10 +4,11 @@ This directory includes, example notebook and general execution script of
 Bonsai developed as part of EdgeML. Also, we include a sample cleanup and
 use-case on the USPS10 public dataset.
-`pytorch_edgeml.graph.bonsai` implements the Bonsai prediction graph in pytorch.
+`edgeml_pytorch.graph.bonsai` implements the Bonsai prediction graph in pytorch.
 The three-phase training routine for Bonsai is decoupled from the forward graph
 to facilitate a plug and play behaviour wherein Bonsai can be combined with or
-used as a final layer classifier for other architectures (RNNs, CNNs).
+used as a final layer classifier for other architectures (RNNs, CNNs). 
 See `edgeml_pytorch.trainer.bonsaiTrainer` for 3-phase training.
 Note that `bonsai_example.py` assumes that data is in a specific format.  It is
 assumed that train and test data is contained in two files, `train.npy` and
--- a/examples/pytorch/Bonsai/bonsai_example.py
+++ b/examples/pytorch/Bonsai/bonsai_example.py
@ -4,8 +4,8 @@
 import helpermethods
 import numpy as np
 import sys
-from pytorch_edgeml.trainer.bonsaiTrainer import BonsaiTrainer
+from edgeml_pytorch.trainer.bonsaiTrainer import BonsaiTrainer
-from pytorch_edgeml.graph.bonsai import Bonsai
+from edgeml_pytorch.graph.bonsai import Bonsai
 import torch
--- a/examples/pytorch/Bonsai/fetch_usps.py
+++ b/examples/pytorch/Bonsai/fetch_usps.py
--- a/examples/pytorch/Bonsai/helpermethods.py
+++ b/examples/pytorch/Bonsai/helpermethods.py
--- a/examples/pytorch/Bonsai/process_usps.py
+++ b/examples/pytorch/Bonsai/process_usps.py
--- a/examples/pytorch/Bonsai/quantizeBonsaiModels.py
+++ b/examples/pytorch/Bonsai/quantizeBonsaiModels.py
--- a/examples/pytorch/FastCells/KWS-training/README.md
+++ b/examples/pytorch/FastCells/KWS-training/README.md
@ -0,0 +1,99 @@
 ## Training Keyword-Spotting model
 This example demonstrates how to train a FastGRNN-based keyword spotting model based on the Google speech commands dataset,
 compile it using the ELL compiler and deploy the keyword spotting model on [STM BlueCoin](https://www.st.com/en/evaluation-tools/steval-bcnkt01v1.html).
 Follow the steps below to featurize data using ELL, train and export an ONNX model using the EdgeML library,
 and prepare a binary that provides prediction capability using the ELL library.   
 ### Install ELL
 [link](https://github.com/microsoft/ELL)
 ### Download Google speech commands dataset
 Download the [dataset](https://storage.cloud.google.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz) and extract data
 ```
 mkdir data_speech_commands_v2
 tar xvzf data_speech_commands_v0.02.tar.gz -C data_speech_commands_v2
 ```
 ### Export path to dataset and ELL
 ```
 export ELL_ROOT=<path to directory were ELL is installed>
 ### export ELL_ROOT=/home/user/ELL
 export DATASET_PATH=<path to directory were speechcommand dataset is extracted>
 ### export DATASET_PATH=/mnt/../../data_speech_data_v2
 ```
 ### Make training list -
 Use `-max n` to over-ride the default limit on the maximum number of samples from each category including `background`. For low false positive rate, train with a large number of negative `background` examples, say 50000 or 250000.
 ```
 python3 $ELL_ROOT/tools/utilities/pythonlibs/audio/training/make_training_list.py -max 50000 --wav_files $DATASET_PATH
 ```
 ### Create an ELL featurizer -
 ```
 python $ELL_ROOT/tools/utilities/pythonlibs/audio/training/make_featurizer.py -ws 400 --nfft 512 --iir --log --log_delta 2.220446049250313e-16 --power_spec -fs 32
 ```
 ### Compile the ELL featurizer -
 ```
 python $ELL_ROOT/tools/wrap/wrap.py --model_file featurizer.ell --outdir compiled_featurizer --module_name mfcc
 cd compiled_featurizer && mkdir build && cd build && cmake .. && make && cd ../..
 ```
 ### Pre-Process Dataset:
 ```
 python $ELL_ROOT/tools/utilities/pythonlibs/audio/training/make_dataset.py --list_file $DATASET_PATH/training_list.txt --featurizer compiled_featurizer/mfcc --window_size 98 --shift 98 --multicore
 python $ELL_ROOT/tools/utilities/pythonlibs/audio/training/make_dataset.py --list_file $DATASET_PATH/validation_list.txt --featurizer compiled_featurizer/mfcc --window_size 98 --shift 98 --multicore
 python $ELL_ROOT/tools/utilities/pythonlibs/audio/training/make_dataset.py --list_file $DATASET_PATH/testing_list.txt --featurizer compiled_featurizer/mfcc --window_size 98 --shift 98 --multicore
 ```
 If you have a background noise clips not containing keywords that you want to fuse with your dataset with, 
 place them in a folder `$DATASET_PATH/backgroundNoise` and follow these instructions instead of the ones above. 
 ```
 python $ELL_ROOT/tools/utilities/pythonlibs/audio/training/make_dataset.py --list_file $DATASET_PATH/training_list.txt --featurizer compiled_featurizer/mfcc --window_size 98 --shift 98 --multicore --noise_path $DATASET_PATH/backgroundNoise --max_noise_ratio 0.1 --noise_selection 1
 python $ELL_ROOT/tools/utilities/pythonlibs/audio/training/make_dataset.py --list_file $DATASET_PATH/validation_list.txt --featurizer compiled_featurizer/mfcc --window_size 98 --shift 98 --multicore --noise_path $DATASET_PATH/backgroundNoise --max_noise_ratio 0.1 --noise_selection 1
 python $ELL_ROOT/tools/utilities/pythonlibs/audio/training/make_dataset.py --list_file $DATASET_PATH/testing_list.txt --featurizer compiled_featurizer/mfcc --window_size 98 --shift 98 --multicore --noise_path $DATASET_PATH/backgroundNoise --max_noise_ratio 0.1 --noise_selection 1
 ```
 ### Run model training:
 ```
 python examples/pytorch/FastCells/train_classifier.py \
 	--use_gpu  --normalize --rolling --max_rolling_length 235 \
 	-a $DATASET_PATH -c $DATASET_PATH/categories.txt --outdir $MODEL_DIR \
 	--architecture FastGRNNCUDA --num_layers 2 \
 	--epochs 250 --learning_rate 0.005 -bs 128 -hu 128 \
 	--lr_min 0.0005 --lr_scheduler CosineAnnealingLR --lr_peaks 0
 ```
 Drop the `--rolling` and `--max_rolling_length` options if you are going to run inference on 1 second clips,
 and do not plan to stream data through the model without resettting.
 ### Convert .onnx model to .ell IR
 ```
 pip install onnx #If you haven't already
 python $ELL_ROOT/tools/importers/onnx/onnx_import.py output_model/model.onnx
 ```
 ### Compiling model and featurizer header and binary files for ARM Cortex M4 class devices.
 These commands will use ELL compiler to generate some files of which 4 are required: featurizer.h, featurizer.S, model.h and model.S
 #### For devices with hard FPU, e.g., STM Bluecoin
 ```
 $ELL_ROOT/build/bin/compile -imap model.ell -cfn Predict -cmn completemodel --bitcode -od . --fuseLinearOps True --header --blas false --optimize true --target custom --numBits 32 --cpu cortex-m4 --triple armv6m-gnueabi --features +vfp4,+d16
 /usr/lib/llvm-8/bin/opt model.bc -o model.opt.bc -O3
 /usr/lib/llvm-8/bin/llc model.opt.bc -o model.S -O3 -filetype=asm -mtriple=armv6m-gnueabi -mcpu=cortex-m4 -relocation-model=pic -float-abi=hard -mattr=+vfp4,+d16
 $ELL_ROOT/build/bin/compile -imap featurizer.ell -cfn Filter -cmn mfcc --bitcode -od . --fuseLinearOps True --header --blas false --optimize true --target custom --numBits 32 --cpu cortex-m4 --triple armv6m-gnueabi --features +vfp4,+d16
 /usr/lib/llvm-8/bin/opt featurizer.bc -o featurizer.opt.bc -O3
 /usr/lib/llvm-8/bin/llc featurizer.opt.bc -o featurizer.S -O3 -filetype=asm -mtriple=armv6m-gnueabi -mcpu=cortex-m4 -relocation-model=pic -float-abi=hard -mattr=+vfp4,+d16
 ```
 #### For M4 class devices without hard FPU, e.g., MXchip
 ```
 $ELL_ROOT/build/bin/compile -imap model.ell -cfn Predict -cmn completemodel --bitcode -od . --fuseLinearOps True --header --blas false --optimize true --target custom --numBits 32 --cpu cortex-m4 --triple armv6m-gnueabi --features +vfp4,+d16,+soft-float
 /usr/lib/llvm-8/bin/opt model.bc -o model.opt.bc -O3
 /usr/lib/llvm-8/bin/llc model.opt.bc -o model.S -O3 -filetype=asm -mtriple=armv6m-gnueabi -mcpu=cortex-m4 -relocation-model=pic -float-abi=soft -mattr=+vfp4,+d16
 $ELL_ROOT/build/bin/compile -imap featurizer.ell -cfn Filter -cmn mfcc --bitcode -od . --fuseLinearOps True --header --blas false --optimize true --target custom --numBits 32 --cpu cortex-m4 --triple armv6m-gnueabi --features +vfp4,+d16,+soft-float
 /usr/lib/llvm-8/bin/opt featurizer.bc -o featurizer.opt.bc -O3
 /usr/lib/llvm-8/bin/llc featurizer.opt.bc -o featurizer.S -O3 -filetype=asm -mtriple=armv6m-gnueabi -mcpu=cortex-m4 -relocation-model=pic -float-abi=soft -mattr=+vfp4,+d16
 ```
--- a/examples/pytorch/FastCells/KWS-training/train_classifier.py
+++ b/examples/pytorch/FastCells/KWS-training/train_classifier.py
@ -0,0 +1,747 @@
 #!/usr/bin/env python3
 ###################################################################################################
 #
 #  Project:  Embedded Learning Library (ELL)
 #  File:     train_classifier.py
 #  Authors:  Chris Lovett
 #
 #  Requires: Python 3.x
 #
 ###################################################################################################
 import argparse
 import json
 import math
 import os
 import sys
 import time
 import numpy as np
 import torch
 import torch.nn as nn
 import torch.nn.functional as F
 import torch.optim as optim
 import torch.onnx
 import random
 from torch.autograd import Variable, Function
 from torch.utils.data import Dataset, DataLoader
 from training_config import TrainingConfig
 from edgeml_pytorch.trainer.fastmodel import *
 class KeywordSpotter(nn.Module):
    """ This baseclass provides the PyTorch Module pattern for defining and training keyword spotters """
    def __init__(self):
        """
        Initialize the KeywordSpotter with the following parameters:
        input_dim - the size of the input audio frame in # samples
        num_keywords - the number of predictions to come out of the model.
        """
        super(KeywordSpotter, self).__init__()
        self.training = False
        self.tracking = False
        self.init_hidden()
    def name(self):
        return "KeywordSpotter"
    def init_hidden(self):
        """ Clear any  hidden state """
        pass
    def forward(self, input):
        """ Perform the forward processing of the given input and return the prediction """
        raise Exception("need to implement the forward method")
    def export(self, name, device):
        """ Export the model to the ONNX file format """
        self.init_hidden()
        self.tracking = True
        dummy_input = Variable(torch.randn(1, 1, self.input_dim))
        if device:
            dummy_input = dummy_input.to(device)
        torch.onnx.export(self, dummy_input, name, verbose=True)
        self.tracking = False
    def batch_accuracy(self, scores, labels):
        """ Compute the training accuracy of the results of a single mini-batch """
        batch_size = scores.shape[0]
        passed = 0
        results = []
        for i in range(batch_size):
            expected = labels[i]
            actual = scores[i].argmax()
            results += [int(actual)]
            if expected == actual:
                passed += 1
        return (float(passed) * 100.0 / float(batch_size), passed, results)
    def configure_optimizer(self, options):
        initial_rate = options.learning_rate
        oo = options.optimizer_options
        if options.optimizer == "Adadelta":
            optimizer = optim.Adadelta(self.parameters(), lr=initial_rate, weight_decay=oo.weight_decay,
                                       rho=oo.rho, eps=oo.eps)
        elif options.optimizer == "Adagrad":
            optimizer = optim.Adagrad(self.parameters(), lr=initial_rate, weight_decay=oo.weight_decay,
                                      lr_decay=oo.lr_decay)
        elif options.optimizer == "Adam":
            optimizer = optim.Adam(self.parameters(), lr=initial_rate, weight_decay=oo.weight_decay,
                                   betas=oo.betas, eps=oo.eps)
        elif options.optimizer == "Adamax":
            optimizer = optim.Adamax(self.parameters(), lr=initial_rate, weight_decay=oo.weight_decay,
                                     betas=oo.betas, eps=oo.eps)
        elif options.optimizer == "ASGD":
            optimizer = optim.ASGD(self.parameters(), lr=initial_rate, weight_decay=oo.weight_decay,
                                   lambd=oo.lambd, alpha=oo.alpha, t0=oo.t0)
        elif options.optimizer == "RMSprop":
            optimizer = optim.RMSprop(self.parameters(), lr=initial_rate, weight_decay=oo.weight_decay,
                                      eps=oo.eps, alpha=oo.alpha, momentum=oo.momentum, centered=oo.centered)
        elif options.optimizer == "Rprop":
            optimizer = optim.Rprop(self.parameters(), lr=initial_rate, etas=oo.etas,
                                    step_sizes=oo.step_sizes)
        elif options.optimizer == "SGD":
            optimizer = optim.SGD(self.parameters(), lr=initial_rate, weight_decay=oo.weight_decay,
                                  momentum=oo.momentum, dampening=oo.dampening, nesterov=oo.nesterov)
        return optimizer
    def configure_lr(self, options, optimizer, ticks, total_iterations):
        num_epochs = options.max_epochs
        learning_rate = options.learning_rate
        lr_scheduler = options.lr_scheduler
        lr_min = options.lr_min
        lr_peaks = options.lr_peaks
        gamma = options.lr_gamma
        if not lr_min:
            lr_min = learning_rate
        scheduler = None
        if lr_scheduler == "TriangleLR":
            steps = lr_peaks * 2 + 1
            stepsize = num_epochs / steps
            scheduler = TriangularLR(optimizer, stepsize * ticks, lr_min, learning_rate, gamma)
        elif lr_scheduler == "CosineAnnealingLR":
            # divide by odd number to finish on the minimum learning rate
            cycles = lr_peaks * 2 + 1
            scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=total_iterations / cycles,
                                                             eta_min=lr_min)
        elif lr_scheduler == "ExponentialLR":
            scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma)
        elif lr_scheduler == "StepLR":
            scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=options.lr_step_size, gamma=gamma)
        elif lr_scheduler == "ExponentialResettingLR":
            reset = (num_epochs * ticks) / 3  # reset at the 1/3 mark.
            scheduler = ExponentialResettingLR(optimizer, gamma, reset)
        return scheduler
    def fit(self, training_data, validation_data, options, sparsify=False, device=None, detail=False, run=None):
        """
        Perform the training.  This is not called "train" because
        the base class already defines that method with a different meaning.
        The base class "train" method puts the Module into "training mode".
        """
        print("Training {} using {} rows of featurized training input...".format(self.name(), training_data.num_rows))
        if training_data.mean is not None:
            mean = torch.from_numpy(np.array([[training_data.mean]])).to(device)
            std = torch.from_numpy(np.array([[training_data.std]])).to(device)
        else:
            mean = None
            std = None
        self.normalize(mean, std)
        self.training = True
        start = time.time()
        loss_function = nn.NLLLoss()
        optimizer = self.configure_optimizer(options)
        print(optimizer)
        num_epochs = options.max_epochs
        batch_size = options.batch_size
        trim_level = options.trim_level
        ticks = training_data.num_rows / batch_size  # iterations per epoch
        # Calculation of total iterations in non-rolling vs rolling training
        # ticks = num_rows/batch_size (total number of iterations per epoch)
        # Non-Rolling Training:
        # Total Iteration = num_epochs * ticks
        # Rolling Training:
        # irl = Initial_rolling_length (We are using 2)
        # If num_epochs <=  max_rolling_length:
        # Total Iterations = sum(range(irl, irl + num_epochs))
        # If num_epochs > max_rolling_length:
        # Total Iterations = sum(range(irl, irl + max_rolling_length)) + (num_epochs - max_rolling_length)*ticks
        if options.rolling:
            rolling_length = 2
            max_rolling_length = int(ticks)
            if max_rolling_length > options.max_rolling_length + rolling_length:
                max_rolling_length = options.max_rolling_length + rolling_length
            bag_count = 100
            hidden_bag_size = batch_size * bag_count
            if num_epochs + rolling_length < max_rolling_length:
                max_rolling_length = num_epochs + rolling_length
            total_iterations = sum(range(rolling_length, max_rolling_length))
            if num_epochs + rolling_length > max_rolling_length:
                epochs_remaining = num_epochs + rolling_length - max_rolling_length
                total_iterations += epochs_remaining * training_data.num_rows / batch_size
            ticks = total_iterations / num_epochs
        else:
            total_iterations = ticks * num_epochs
        scheduler = self.configure_lr(options, optimizer, ticks, total_iterations)
        # optimizer = optim.Adam(model.parameters(), lr=0.0001)
        log = []
        for epoch in range(num_epochs):
            self.train()
            if options.rolling:
                rolling_length += 1
                if rolling_length <= max_rolling_length:
                    self.init_hidden_bag(hidden_bag_size, device)
            for i_batch, (audio, labels) in enumerate(training_data.get_data_loader(batch_size)):
                if not self.batch_first:
                    audio = audio.transpose(1, 0)  # GRU wants seq,batch,feature
                if device:
                    self.move_to(device)
                    audio = audio.to(device)
                    labels = labels.to(device)
                # Also, we need to clear out the hidden state,
                # detaching it from its history on the last instance.
                if options.rolling:
                    if rolling_length <= max_rolling_length:
                        if (i_batch + 1) % rolling_length == 0:
                            self.init_hidden()
                            break
                    self.rolling_step()
                else:
                    self.init_hidden()
                self.to(device) # sparsify routines might move param matrices to cpu    
                # Before the backward pass, use the optimizer object to zero all of the
                # gradients for the variables it will update (which are the learnable
                # weights of the model). This is because by default, gradients are
                # accumulated in buffers( i.e, not overwritten) whenever .backward()
                # is called. Checkout docs of torch.autograd.backward for more details.
                optimizer.zero_grad()
                # Run our forward pass.
                keyword_scores = self(audio)
                # Compute the loss, gradients
                loss = loss_function(keyword_scores, labels)
                # Backward pass: compute gradient of the loss with respect to all the learnable
                # parameters of the model. Internally, the parameters of each Module are stored
                # in Tensors with requires_grad=True, so this call will compute gradients for
                # all learnable parameters in the model.
                loss.backward()
                # move to next learning rate
                if scheduler:
                    scheduler.step()
                # Calling the step function on an Optimizer makes an update to its parameters
                # applying the gradients we computed during back propagation
                optimizer.step()
                if sparsify:
                    if epoch >= num_epochs/3:
                        if epoch < (2*num_epochs)/3:
                            if i_batch % trim_level == 0:
                                self.sparsify()
                            else:
                                self.sparsifyWithSupport()
                        else:
                            self.sparsifyWithSupport()
                    self.to(device) # sparsify routines might move param matrices to cpu
                learning_rate = optimizer.param_groups[0]['lr']
                if detail:
                    learning_rate = optimizer.param_groups[0]['lr']
                    log += [{'iteration': iteration, 'loss': loss.item(), 'learning_rate': learning_rate}]
            # Find the best prediction in each sequence and return it's accuracy
            passed, total, rate = self.evaluate(validation_data, batch_size, device)
            learning_rate = optimizer.param_groups[0]['lr']
            current_loss = float(loss.item())
            print("Epoch {}, Loss {:.3f}, Validation Accuracy {:.3f}, Learning Rate {}".format(
                  epoch, current_loss, rate * 100, learning_rate))
            log += [{'epoch': epoch, 'loss': current_loss, 'accuracy': rate, 'learning_rate': learning_rate}]
            if run is not None:
                run.log('progress', epoch / num_epochs)
                run.log('epoch', epoch)
                run.log('accuracy', rate)
                run.log('loss', current_loss)
                run.log('learning_rate', learning_rate)
        end = time.time()
        self.training = False
        print("Trained in {:.2f} seconds".format(end - start))
        print("Model size {}".format(self.get_model_size()))
        return log
    def evaluate(self, test_data, batch_size, device=None, outfile=None):
        """
        Evaluate the given test data and print the pass rate
        """
        self.eval()
        passed = 0
        total = 0
        self.zero_grad()
        results = []
        with torch.no_grad():
            for i_batch, (audio, labels) in enumerate(test_data.get_data_loader(batch_size)):
                batch_size = audio.shape[0]
                audio = audio.transpose(1, 0)  # GRU wants seq,batch,feature
                if device:
                    audio = audio.to(device)
                    labels = labels.to(device)
                total += batch_size
                self.init_hidden()
                keyword_scores = self(audio)
                last_accuracy, ok, actual = self.batch_accuracy(keyword_scores, labels)
                results += actual
                passed += ok
        if outfile:
            print("Saving evaluation results in '{}'".format(outfile))
            with open(outfile, "w") as f:
                json.dump(results, f)
        return (passed, total, passed / total)
 class AudioDataset(Dataset):
    """
    Featurized Audio in PyTorch Dataset so we can get a DataLoader that is needed for
    mini-batch training.
    """
    def __init__(self, filename, config, keywords, training=False):
        """ Initialize the AudioDataset from the given *.npz file """
        self.dataset = np.load(filename)
        # get parameters saved by make_dataset.py
        parameters = self.dataset["parameters"]
        self.sample_rate = int(parameters[0])
        self.audio_size = int(parameters[1])
        self.input_size = int(parameters[2])
        self.window_size = int(parameters[3])
        self.shift = int(parameters[4])
        self.features = self.dataset["features"].astype(np.float32)
        self.num_rows = len(self.features)
        self.features = self.features.reshape((self.num_rows, self.window_size, self.input_size))
        if config.normalize:
            mean = self.features.mean(axis=0)
            std = self.features.std(axis=0)
            self.mean = mean.mean(axis=0).astype(np.float32)
            std = std.mean(axis=0)
            # self.std is a divisor, so make sure it contains no zeros
            self.std = np.array(np.where(std == 0, 1, std)).astype(np.float32)
        else:
            self.mean = None
            self.std = None
        self.label_names = self.dataset["labels"]
        self.keywords = keywords
        self.num_keywords = len(self.keywords)
        self.labels = self.to_long_vector()
        self.keywords_idx = None
        self.non_keywords_idx = None
        if training and config.sample_non_kw is not None:
            self.keywords_idx, self.non_keywords_idx = self.get_keyword_idx(config.sample_non_kw)
            self.sample_non_kw_probability = config.sample_non_kw_probability
        msg = "Loaded dataset {} and found sample rate {}, audio_size {}, input_size {}, window_size {} and shift {}"
        print(msg.format(os.path.basename(filename), self.sample_rate, self.audio_size, self.input_size,
                         self.window_size, self.shift))
    def get_data_loader(self, batch_size):
        """ Get a DataLoader that can enumerate shuffled batches of data in this dataset """
        return DataLoader(self, batch_size=batch_size, shuffle=True, drop_last=True)
    def to_long_vector(self):
        """ convert the expected labels to a list of integer indexes into the array of keywords """
        indexer = [(0 if x == "<null>" else self.keywords.index(x)) for x in self.label_names]
        return np.array(indexer, dtype=np.longlong)
    def get_keyword_idx(self, non_kw_label):
        """ find the keywords and store there index """
        indexer = [ids for ids, label in enumerate(self.label_names) if label != non_kw_label]
        non_indexer = [ids for ids, label in enumerate(self.label_names) if label == non_kw_label]
        return (np.array(indexer, dtype=np.longlong), np.array(non_indexer, dtype=np.longlong))
    def __len__(self):
        """ Return the number of rows in this Dataset """
        if self.non_keywords_idx is None:
            return self.num_rows
        else:
            return int(len(self.keywords_idx) / (1-self.sample_non_kw_probability))
    def __getitem__(self, idx):
        """ Return a single labelled sample here as a tuple """
        if self.non_keywords_idx is None:
            updated_idx=idx
        else:
            if idx < len(self.keywords_idx):
                updated_idx=self.keywords_idx[idx]
            else:
                updated_idx=np.random.choice(self.non_keywords_idx)
        audio = self.features[updated_idx]  # batch index is second dimension
        label = self.labels[updated_idx]
        sample = (audio, label)
        return sample
 def create_model(model_config, input_size, num_keywords):
    ModelClass = get_model_class(KeywordSpotter)
    hidden_units_list = [model_config.hidden_units1, model_config.hidden_units2, model_config.hidden_units3]
    wRank_list = [model_config.wRank1, model_config.wRank2, model_config.wRank3]
    uRank_list = [model_config.uRank1, model_config.uRank2, model_config.uRank3]
    wSparsity_list = [model_config.wSparsity, model_config.wSparsity, model_config.wSparsity]
    uSparsity_list = [model_config.uSparsity, model_config.uSparsity, model_config.uSparsity]
    print(model_config.gate_nonlinearity, model_config.update_nonlinearity)
    return ModelClass(model_config.architecture, input_size, model_config.num_layers,
                      hidden_units_list, wRank_list, uRank_list, wSparsity_list,
                      uSparsity_list, model_config.gate_nonlinearity, 
                      model_config.update_nonlinearity, num_keywords)
 def save_json(obj, filename):
    with open(filename, "w") as f:
        json.dump(obj, f, indent=2)
 def train(config, evaluate_only=False, outdir=".", detail=False, azureml=False):
    filename = config.model.filename
    categories_file = config.dataset.categories
    wav_directory = config.dataset.path
    batch_size = config.training.batch_size
    hidden_units = config.model.hidden_units
    architecture = config.model.architecture
    num_layers = config.model.num_layers
    use_gpu = config.training.use_gpu
    run = None
    if azureml:
        from azureml.core.run import Run
        run = Run.get_context()
        if run is None:
            print("### Run.get_context() returned None")
        else:
            print("### Running in Azure Context")
    valid_layers = [1, 2, 3]
    if num_layers not in valid_layers:
        raise Exception("--num_layers can only be one of these values {}".format(valid_layers))
    if not os.path.isdir(outdir):
        os.makedirs(outdir)
    if not filename:
        filename = "{}{}KeywordSpotter.pt".format(architecture, hidden_units)
        config.model.filename = filename
    # load the featurized data
    if not os.path.isdir(wav_directory):
        print("### Error: please specify valid --dataset folder location: {}".format(wav_directory))
        sys.exit(1)
    if not categories_file:
        categories_file = os.path.join(wav_directory, "categories.txt")
    with open(categories_file, "r") as f:
        keywords = [x.strip() for x in f.readlines()]
    training_file = os.path.join(wav_directory, "training_list.npz")
    testing_file = os.path.join(wav_directory, "testing_list.npz")
    validation_file = os.path.join(wav_directory, "validation_list.npz")
    if not os.path.isfile(training_file):
        print("Missing file {}".format(training_file))
        print("Please run make_datasets.py")
        sys.exit(1)
    if not os.path.isfile(validation_file):
        print("Missing file {}".format(validation_file))
        print("Please run make_datasets.py")
        sys.exit(1)
    if not os.path.isfile(testing_file):
        print("Missing file {}".format(testing_file))
        print("Please run make_datasets.py")
        sys.exit(1)
    model = None
    device = torch.device("cpu")
    if use_gpu:
        if torch.cuda.is_available():
            device = torch.device("cuda")
        else:
            print("### CUDA not available!!")
    print("Loading {}...".format(testing_file))
    test_data = AudioDataset(testing_file, config.dataset, keywords)
    log = None
    if not evaluate_only:
        print("Loading {}...".format(training_file))
        training_data = AudioDataset(training_file, config.dataset, keywords, training=True)
        print("Loading {}...".format(validation_file))
        validation_data = AudioDataset(validation_file, config.dataset, keywords)
        if training_data.mean is not None:
            fname = os.path.join(outdir, "mean.npy")
            print("Saving {}".format(fname))
            np.save(fname, training_data.mean)
            fname = os.path.join(outdir, "std.npy")
            print("Saving {}".format(fname))
            np.save(fname, training_data.std)
            # use the training_data mean and std variation
            test_data.mean = training_data.mean
            test_data.std = training_data.std
            validation_data.mean = training_data.mean
            validation_data.std = training_data.std
        print("Training model {}".format(filename))
        model = create_model(config.model, training_data.input_size, training_data.num_keywords)
        if device.type == 'cuda':
            model.cuda()  # move the processing to GPU
        start = time.time()
        log = model.fit(training_data, validation_data, config.training,
                       config.model.sparsify, device, detail, run)
        end = time.time()
        passed, total, rate = model.evaluate(training_data, batch_size, device)
        print("Training accuracy = {:.3f} %".format(rate * 100))
        torch.save(model.state_dict(), os.path.join(outdir, filename))
    print("Evaluating {} keyword spotter using {} rows of featurized test audio...".format(
          architecture, test_data.num_rows))
    if model is None:
        msg = "Loading trained model with input size {}, hidden units {} and num keywords {}"
        print(msg.format(test_data.input_size, hidden_units, test_data.num_keywords))
        model = create_model(config.model, test_data.input_size, test_data.num_keywords)
        model.load_dict(torch.load(filename))
        if model and device.type == 'cuda':
            model.cuda()  # move the processing to GPU
    results_file = os.path.join(outdir, "results.txt")
    passed, total, rate = model.evaluate(test_data, batch_size, device, results_file)
    print("Testing accuracy = {:.3f} %".format(rate * 100))
    if not evaluate_only:
        name = os.path.splitext(filename)[0] + ".onnx"
        print("saving onnx file: {}".format(name))
        model.export(os.path.join(outdir, name), device)
        config.dataset.sample_rate = test_data.sample_rate
        config.dataset.input_size = test_data.audio_size
        config.dataset.num_filters = test_data.input_size
        config.dataset.window_size = test_data.window_size
        config.dataset.shift = test_data.shift
        logdata = {
            "accuracy_val": rate,
            "training_time": end - start,
            "log": log
        }
        d = TrainingConfig.to_dict(config)
        logdata.update(d)
        logname = os.path.join(outdir, "train_results.json")
        save_json(logdata, logname)
    return rate, log
 def str2bool(v):
    if v is None:
        return False
    lower = v.lower()
    return lower in ["t", "1", "true", "yes"]
 if __name__ == '__main__':
    parser = argparse.ArgumentParser("train a RNN based neural network for keyword spotting")
    # all the training parameters
    parser.add_argument("--epochs", help="Number of epochs to train", type=int)
    parser.add_argument("--trim_level", help="Number of batches before sparse support is updated in IHT", type=int)
    parser.add_argument("--lr_scheduler", help="Type of learning rate scheduler (None, TriangleLR, CosineAnnealingLR,"
                                               " ExponentialLR, ExponentialResettingLR)")
    parser.add_argument("--learning_rate", help="Default learning rate, and maximum for schedulers", type=float)
    parser.add_argument("--lr_min", help="Minimum learning rate for the schedulers", type=float)
    parser.add_argument("--lr_peaks", help="Number of peaks for triangle and cosine schedules", type=float)
    parser.add_argument("--batch_size", "-bs", help="Batch size of training", type=int)
    parser.add_argument("--architecture", help="Specify model architecture (FastGRNN)")
    parser.add_argument("--num_layers", type=int, help="Number of RNN layers (1, 2 or 3)")
    parser.add_argument("--hidden_units", "-hu", type=int, help="Number of hidden units in the FastGRNN layers")
    parser.add_argument("--hidden_units1", "-hu1", type=int, help="Number of hidden units in the FastGRNN 1st layer")
    parser.add_argument("--hidden_units2", "-hu2", type=int, help="Number of hidden units in the FastGRNN 2nd layer")
    parser.add_argument("--hidden_units3", "-hu3", type=int, help="Number of hidden units in the FastGRNN 3rd layer")
    parser.add_argument("--use_gpu", help="Whether to use fastGRNN for training", action="store_true")
    parser.add_argument("--normalize", help="Whether to normalize audio dataset", action="store_true")
    parser.add_argument("--rolling", help="Whether to train model in rolling fashion or not", action="store_true")
    parser.add_argument("--max_rolling_length", help="Max number of epochs you want to roll the rolling training"
                        " default is 100", type=int)
    parser.add_argument("--sample_non_kw", "-sl", type=str, help="Sample data for this label with probability sample_prob")
    parser.add_argument("--sample_non_kw_probability", "-spr", type=float, help="Sample from scl with this probability")
    # arguments for fastgrnn
    parser.add_argument("--wRank", "-wr", help="Rank of W in 1st layer of FastGRNN default is None", type=int)
    parser.add_argument("--uRank", "-ur", help="Rank of U in 1st layer of FastGRNN default is None", type=int)
    parser.add_argument("--wRank1", "-wr1", help="Rank of W in 1st layer of FastGRNN default is None", type=int)
    parser.add_argument("--uRank1", "-ur1", help="Rank of U in 1st layer of FastGRNN default is None", type=int)
    parser.add_argument("--wRank2", "-wr2", help="Rank of W in 2nd layer of FastGRNN default is None", type=int)
    parser.add_argument("--uRank2", "-ur2", help="Rank of U in 2nd layer of FastGRNN default is None", type=int)
    parser.add_argument("--wRank3", "-wr3", help="Rank of W in 3rd layer of FastGRNN default is None", type=int)
    parser.add_argument("--uRank3", "-ur3", help="Rank of U in 3rd layer of FastGRNN default is None", type=int)
    parser.add_argument("--wSparsity", "-wsp", help="Sparsity of W matrices", type=float)
    parser.add_argument("--uSparsity", "-usp", help="Sparsity of U matrices", type=float)
    parser.add_argument("--gate_nonlinearity", "-gnl", help="Gate Non-Linearity in FastGRNN default is sigmoid"
                        " use between [sigmoid, quantSigmoid, tanh, quantTanh]")
    parser.add_argument("--update_nonlinearity", "-unl", help="Update Non-Linearity in FastGRNN default is Tanh"
                        " use between [sigmoid, quantSigmoid, tanh, quantTanh]")
    # or you can just specify an options file.
    parser.add_argument("--config", help="Use json file containing all these options (as per 'training_config.py')")
    # and some additional stuff ...
    parser.add_argument("--azureml", help="Tells script we are running in Azure ML context")
    parser.add_argument("--eval", "-e", help="No training, just evaluate existing model", action='store_true')
    parser.add_argument("--filename", "-o", help="Name of model file to generate")
    parser.add_argument("--categories", "-c", help="Name of file containing keywords")
    parser.add_argument("--dataset", "-a", help="Path to the audio folder containing 'training.npz' file")
    parser.add_argument("--outdir", help="Folder in which to store output file and log files")
    parser.add_argument("--detail", "-d", help="Save loss info for every iteration not just every epoch",
                        action="store_true")
    args = parser.parse_args()
    config = TrainingConfig()
    if args.config:
        config.load(args.config)
    azureml = str2bool(args.azureml)
    # then any user defined options overrides these defaults
    if args.epochs:
        config.training.max_epochs = args.epochs
    if args.trim_level:
        config.training.trim_level = args.trim_level
    else:
        config.training.trim_level = 15
    if args.learning_rate:
        config.training.learning_rate = args.learning_rate
    if args.lr_min:
        config.training.lr_min = args.lr_min
    if args.lr_peaks:
        config.training.lr_peaks = args.lr_peaks
    if args.lr_scheduler:
        config.training.lr_scheduler = args.lr_scheduler
    if args.batch_size:
        config.training.batch_size = args.batch_size
    if args.rolling:
        config.training.rolling = args.rolling
    if args.max_rolling_length:
        config.training.max_rolling_length = args.max_rolling_length
    if args.architecture:
        config.model.architecture = args.architecture
    if args.num_layers:
        config.model.num_layers = args.num_layers
    if args.hidden_units:
        config.model.hidden_units = args.hidden_units
    if args.hidden_units1:
        config.model.hidden_units = args.hidden_units
    if args.hidden_units2:
        config.model.hidden_units = args.hidden_units
    if args.hidden_units3:
        config.model.hidden_units = args.hidden_units
    if config.model.num_layers >= 1:
        if config.model.hidden_units1 is None:
            config.model.hidden_units1 = config.model.hidden_units
    if config.model.num_layers >= 2:
        if config.model.hidden_units2 is None:
            config.model.hidden_units2 = config.model.hidden_units1
    if config.model.num_layers == 3:
        if config.model.hidden_units3 is None:
            config.model.hidden_units3 = config.model.hidden_units2
    if args.filename:
        config.model.filename = args.filename
    if args.use_gpu:
        config.training.use_gpu = args.use_gpu
    if args.normalize:
        config.dataset.normalize = args.normalize
    if args.categories:
        config.dataset.categories = args.categories
    if args.dataset:
        config.dataset.path = args.dataset
    if args.sample_non_kw:
        config.dataset.sample_non_kw = args.sample_non_kw
        if args.sample_non_kw_probability is None:
            config.dataset.sample_non_kw_probability = 0.5
        else:
            config.dataset.sample_non_kw_probability = args.sample_non_kw_probability
    else:
        config.dataset.sample_non_kw = None
    if args.wRank:
        config.model.wRank = args.wRank
    if args.uRank:
        config.model.uRank = args.wRank
    if args.wRank1:
        config.model.wRank1 = args.wRank1
    if args.uRank1:
        config.model.uRank1 = args.wRank1
    if config.model.wRank1 is None:
        if config.model.wRank is not None:
            config.model.wRank1 = config.model.wRank
    if config.model.uRank1 is None:
        if config.model.uRank is not None:
            config.model.uRank1 = config.model.uRank
    if args.wRank2:
        config.model.wRank2 = args.wRank2
    if args.uRank2:
        config.model.uRank2 = args.wRank2
    if args.wRank3:
        config.model.wRank3 = args.wRank3
    if args.uRank3:
        config.model.uRank3 = args.wRank3
    if args.wSparsity:
        config.model.wSparsity = args.wSparsity
    else:
        config.model.wSparsity = 1.0
    if args.uSparsity:
        config.model.uSparsity = args.uSparsity
    else:
        config.model.uSparsity = 1.0
    if config.model.uSparsity < 1.0 or config.model.wSparsity < 1.0:
        config.model.sparsify = True
    else:
        config.model.sparsify = False
    if args.gate_nonlinearity:
        config.model.gate_nonlinearity = args.gate_nonlinearity
    if args.update_nonlinearity:
        config.model.update_nonlinearity = args.update_nonlinearity
    if not os.path.isfile("config.json"):
        config.save("config.json")
    train(config, args.eval, args.outdir, args.detail, azureml)
--- a/examples/pytorch/FastCells/KWS-training/training_config.py
+++ b/examples/pytorch/FastCells/KWS-training/training_config.py
@ -0,0 +1,146 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT license.
 # config file for train_classifier.py
 import json
 import os
 class ModelOptions:
    def __init__(self):
        self.architecture = "FastGRNN"
        self.num_layers = 1
        self.hidden_units = None
        self.hidden_units1 = None
        self.hidden_units2 = None
        self.hidden_units3 = None
        self.filename = ""
        self.wRank = None
        self.uRank = None
        self.wRank1 = None
        self.uRank1 = None
        self.wRank2 = None
        self.uRank2 = None
        self.wRank3 = None
        self.uRank3 = None
        self.gate_nonlinearity = "sigmoid"
        self.update_nonlinearity = "tanh"
 class DatasetOptions:
    def __init__(self):
        self.name = "speechcommandsv01"
        self.featurizer = "featurizer_mel_16000_512_512_80_40_log"
        self.categories = "categories.txt"
        self.path = ""
        self.auto_scale = False
        self.normalize = False
 class OptimizerOptions:
    def __init__(self):
        self.weight_decay = 1e-5
        self.momentum = 0.9  # RMSprop
        self.centered = False  # RMSprop
        self.alpha = 0  # ASGD, RMSprop
        self.eps = 1e-8
        self.rho = 0  # Adadelta
        self.lr_decay = 0  # Adagrad
        self.betas = (0.9, 0.999)  # Adam, SparseAdam, Adamax
        self.lambd = 0.0001  # ASGD
        self.t0 = 1000000.0  # ASGD
        self.etas = (0.5, 1.2)  # Rprop
        self.dampening = 0  # SGD
        self.step_sizes = (1e-06, 50)  # Rprop
        self.nesterov = True # SGD
 class TrainingOptions:
    def __init__(self):
        self.max_epochs = 30
        self.learning_rate = 1e-2
        self.lr_scheduler = None
        self.lr_peaks = 1
        self.lr_min = 1e-5
        self.lr_gamma = 1
        self.lr_step_size = 1
        self.batch_size = 128
        self.optimizer = "SGD"
        self.optimizer_options = OptimizerOptions()
        self.use_gpu = False
        self.rolling = False
        self.max_rolling_length = 100
        self.decay_step = 200
        self.decay_rate = 0.1
 class TrainingConfig:
    def __init__(self):
        self.name = ""
        self.description = ""
        self.folder = None
        self.model = ModelOptions()
        self.dataset = DatasetOptions()
        self.training = TrainingOptions()
        self.job_id = None
        self.status = None
        self.downloaded = False
        self.last_modified = 0
        self.retries = 0
        self.filename = None
        self.sweep = None
    def set(self, name, value):
        if name not in self.__dict__:
            self.__dict__[name] = value
        else:
            t = self.__dict__[name]
            if type(t) == int:
                self.__dict__[name] = int(value)
            if type(t) == float:
                self.__dict__[name] = float(value)
            if type(t) == str:
                self.__dict__[name] = str(value)
            else:
                self.__dict__[name] = value
    def load(self, filename):
        with open(filename, "r") as f:
            data = json.load(f)
        TrainingConfig.from_dict(self, data)
        self.filename = filename
        self.last_modified = os.path.getmtime(self.filename)
    def save(self, filename):
        """ save an options.json file in the self.filename location """
        self.filename = filename
        data = TrainingConfig.to_dict(self)
        data["model"] = self.model.__dict__
        with open(filename, "w") as f:
            json.dump(data, f, indent=2, sort_keys=True)
        self.last_modified = os.path.getmtime(self.filename)
    @staticmethod
    def to_dict(obj):
        data = dict(obj.__dict__)
        for k in data:
            o = data[k]
            if hasattr(o, "__dict__"):
                data[k] = TrainingConfig.to_dict(o)
        return data
    @staticmethod
    def from_dict(obj, data):
        for k in data:
            v = data[k]
            if not hasattr(obj, k):
                setattr(obj, k, v)
            else:
                if isinstance(v, dict):
                    TrainingConfig.from_dict(getattr(obj, k), v)
                elif isinstance(getattr(obj, k), tuple):
                    setattr(obj, k, tuple(v))
                else:
                    setattr(obj, k, v)
--- a/examples/pytorch/FastCells/README.md
+++ b/examples/pytorch/FastCells/README.md
@ -0,0 +1,93 @@
 # EdgeML FastCells on a sample public dataset
 This directory includes example notebooks and scripts for training
 FastCells (FastRNN & FastGRNN) along with modified
 UGRNN, GRU and LSTM to support the LSQ training routine. 
 There is also a sample cleanup and train/test script for the USPS10 public dataset.
 The subfolder [`KWS-training`](KWS-training) contains code
 for training a keyword spotting model using a single- or multi-layer RNN.
 [`edgeml_pytorch.graph.rnn`](../../../pytorch/pytorch_edgeml/graph/rnn.py) 
 provides two RNN cells **FastRNNCell**  and **FastGRNNCell** with additional
 features like low-rank parameterisation and custom non-linearities. Akin to
 Bonsai and ProtoNN, the three-phase training routine for FastRNN and FastGRNN
 is decoupled from the custom cells to facilitate a plug and play behaviour of 
 the custom RNN cells in other architectures (NMT, Encoder-Decoder etc.).
 Additionally, numerically  equivalent CUDA-based implementations FastRNNCuda
 and FastGRNNCuda are provided for faster training. 
 `edgeml_pytorch.graph.rnn` also contains modified RNN cells of **UGRNNCell**, 
 **GRUCell**, and **LSTMCell**, which can be substituted for Fast(G)RNN,
 as well as untrolled RNNs which are equivalent to `nn.LSTM` and `nn.GRU`. 
 Note that all the cells and wrappers, when used independently from `fastcell_example.py`
 or `edgeml_pytorch.trainer.fastTrainer`, take in data in a batch first format, i.e.,
 [batchSize, timeSteps, inputDims] by default, but can also support [timeSteps,
 batchSize, inputDims] format if  `batch_first` argument is set to False.
 `fast_example.py` automatically adjusts to the correct format across tf, c++ and pytorch.
 For training FastCells, `edgeml_pytorch.trainer.fastTrainer` implements the three-phase
 FastCell training routine in PyTorch. A simple example `fastcell_example.py` is provided
 to illustrate its usage. Note that `fastcell_example.py` assumes that data is in a specific format.
 It is assumed that train and test data is contained in two files, `train.npy` and
 `test.npy`, each containing a 2D numpy array of dimension `[numberOfExamples,
 numberOfFeatures]`. numberOfFeatures is `timesteps x inputDims`, flattened
 across timestep dimension with the input of the first time step followed by the second
 and so on.  For an N-Class problem, we assume the labels are integers from 0
 through N-1. Lastly, the training data, `train.npy`, is assumed to well shuffled 
 as the training routine doesn't shuffle internally.
 **Tested With:** PyTorch = 1.1 with Python 3.6
 ## Download and clean up sample dataset
 To validate the code with USPS dataset, first download and format the dataset to match
 the required format using the script [fetch_usps.py](fetch_usps.py) and
 [process_usps.py](process_usps.py)
 ```
 python fetch_usps.py
 python process_usps.py
 ```
 Note: Even though usps10 is not a time-series dataset, it can be regarding as a time-series
 dataset where time step sees a new row. So the number of timesteps = 16 and inputDims = 16.
 ## Sample command for FastCells on USPS10
 The following is a sample run on usps10 :
 ```bash
 python fastcell_example.py -dir usps10/ -id 16 -hd 32
 ```
 This command should give you a final output that reads roughly similar to
 (might not be exact numbers due to various version mismatches):
 ```
 Maximum Test accuracy at compressed model size(including early stopping): 0.9407075 at Epoch: 262
 Final Test Accuracy: 0.93721974
 Non-Zeros: 1932 Model Size: 7.546875 KB hasSparse: False
 ```
 `usps10/` directory will now have a consolidated results file called `FastRNNResults.txt` or 
 `FastGRNNResults.txt` depending on the choice of the RNN cell. A directory `FastRNNResults` or
 `FastGRNNResults` with the corresponding models with each run of the code on the `usps10` dataset.
 Note that the scalars like `alpha`, `beta`, `zeta` and `nu` correspond to the values before
 the application of the sigmoid function.
 ## Byte Quantization(Q) for model compression
 If you wish to quantize the generated model, use `quantizeFastModels.py`. Usage Instructions:
 ```
 python quantizeFastModels.py -h
 ```
 This will generate quantized models with a suffix of `q` before every param stored in a
 new directory `QuantizedFastModel` inside the model directory.
 Note that the scalars like `qalpha`, `qbeta`, `qzeta` and `qnu` correspond to values 
 after the application of the sigmoid function over them post quantization;
 they can be directly plugged into the inference pipleines.
 Copyright (c) Microsoft Corporation. All rights reserved. 
 Licensed under the MIT license.
--- a/examples/pytorch/FastCells/fastcell_example.py
+++ b/examples/pytorch/FastCells/fastcell_example.py
@ -5,8 +5,8 @@ import helpermethods
 import torch
 import numpy as np
 import sys
-from pytorch_edgeml.graph.rnn import *
+from edgeml_pytorch.graph.rnn import *
-from pytorch_edgeml.trainer.fastTrainer import FastTrainer
+from edgeml_pytorch.trainer.fastTrainer import FastTrainer
 def main():
@ -53,24 +53,24 @@ def main():
    if cell == "FastGRNN":
        FastCell = FastGRNNCell(inputDims, hiddenDims,
-                                gate_non_linearity=gate_non_linearity,
+                                gate_nonlinearity=gate_non_linearity,
-                                update_non_linearity=update_non_linearity,
+                                update_nonlinearity=update_non_linearity,
                                wRank=wRank, uRank=uRank)
    elif cell == "FastRNN":
        FastCell = FastRNNCell(inputDims, hiddenDims,
-                               update_non_linearity=update_non_linearity,
+                               update_nonlinearity=update_non_linearity,
                               wRank=wRank, uRank=uRank)
    elif cell == "UGRNN":
        FastCell = UGRNNLRCell(inputDims, hiddenDims,
-                               update_non_linearity=update_non_linearity,
+                               update_nonlinearity=update_non_linearity,
                               wRank=wRank, uRank=uRank)
    elif cell == "GRU":
        FastCell = GRULRCell(inputDims, hiddenDims,
-                             update_non_linearity=update_non_linearity,
+                             update_nonlinearity=update_non_linearity,
                             wRank=wRank, uRank=uRank)
    elif cell == "LSTM":
        FastCell = LSTMLRCell(inputDims, hiddenDims,
-                              update_non_linearity=update_non_linearity,
+                              update_nonlinearity=update_non_linearity,
                              wRank=wRank, uRank=uRank)
    else:
        sys.exit('Exiting: No Such Cell as ' + cell)
--- a/examples/pytorch/FastCells/fetch_usps.py
+++ b/examples/pytorch/FastCells/fetch_usps.py
--- a/examples/pytorch/FastCells/helpermethods.py
+++ b/examples/pytorch/FastCells/helpermethods.py
--- a/examples/pytorch/FastCells/process_usps.py
+++ b/examples/pytorch/FastCells/process_usps.py
--- a/examples/pytorch/FastCells/quantizeFastModels.py
+++ b/examples/pytorch/FastCells/quantizeFastModels.py
--- a/examples/pytorch/ProtoNN/README.md
+++ b/examples/pytorch/ProtoNN/README.md
@ -4,11 +4,11 @@ This directory includes an example [notebook](protoNN_example.ipynb)  and a
 command line execution script of ProtoNN developed as part of EdgeML. The
 example is based on the USPS dataset.
-`pytorch_edgeml.graph.protoNN` implements the ProtoNN prediction functions.
+`edgeml_pytorch.graph.protoNN` implements the ProtoNN prediction functions.
 The training routine for ProtoNN is decoupled from the forward graph to
 facilitate a plug and play behaviour wherein ProtoNN can be combined with or
 used as a final layer classifier for other architectures (RNNs, CNNs). The
-training routine is implemented in `pytorch_edgeml.trainer.protoNNTrainer`.
+training routine is implemented in `edgeml_pytorch.trainer.protoNNTrainer`.
 (This is also an artifact of consistency requirements with Tensorflow
 implementation).
--- a/examples/pytorch/ProtoNN/fetch_usps.py
+++ b/examples/pytorch/ProtoNN/fetch_usps.py
--- a/examples/pytorch/ProtoNN/helpermethods.py
+++ b/examples/pytorch/ProtoNN/helpermethods.py
@ -5,7 +5,7 @@ from __future__ import print_function
 import sys
 import os
 import numpy as np
-import pytorch_edgeml.utils as utils
+import edgeml_pytorch.utils as utils
 import argparse
@ -22,7 +22,7 @@ def getModelSize(matrixList, sparcityList, expected=True, bytesPerVar=4):
        assert A.ndim == 2
        assert s >= 0
        assert s <= 1
-        nnz, size, sparse = utils.countnnZ(A, s, bytesPerVar=bytesPerVar)
+        nnz, size, sparse = utils.estimateNNZ(A, s, bytesPerVar=bytesPerVar)
        nnzList.append(nnz)
        sizeList.append(size)
        hasSparse = (hasSparse or sparse)
--- a/examples/pytorch/ProtoNN/process_usps.py
+++ b/examples/pytorch/ProtoNN/process_usps.py
--- a/examples/pytorch/ProtoNN/protoNN_example.ipynb
+++ b/examples/pytorch/ProtoNN/protoNN_example.ipynb
@ -17,9 +17,9 @@
    "import numpy as np\n",
    "import torch\n",
    "\n",
-    "from pytorch_edgeml.graph.protoNN import ProtoNN\n",
+    "from edgeml_pytorch.graph.protoNN import ProtoNN\n",
-    "from pytorch_edgeml.trainer.protoNNTrainer import ProtoNNTrainer\n",
+    "from edgeml_pytorch.trainer.protoNNTrainer import ProtoNNTrainer\n",
-    "import pytorch_edgeml.utils as utils\n",
+    "import edgeml_pytorch.utils as utils\n",
    "import helpermethods as helper"
   ]
  },
--- a/examples/pytorch/ProtoNN/protoNN_example.py
+++ b/examples/pytorch/ProtoNN/protoNN_example.py
@ -5,9 +5,9 @@ from __future__ import print_function
 import sys
 import os
 import numpy as np
-from pytorch_edgeml.trainer.protoNNTrainer import ProtoNNTrainer
+from edgeml_pytorch.trainer.protoNNTrainer import ProtoNNTrainer
-from pytorch_edgeml.graph.protoNN import ProtoNN
+from edgeml_pytorch.graph.protoNN import ProtoNN
-import pytorch_edgeml.utils as utils
+import edgeml_pytorch.utils as utils
 import helpermethods as helper
 import torch
--- a/examples/pytorch/SRNN/README.md
+++ b/examples/pytorch/SRNN/README.md
@ -5,9 +5,9 @@ This directory includes an example [notebook](SRNN_Example.ipynb)  and a
 training a simple model on the [Google Speech Commands
 Dataset](https://ai.googleblog.com/2017/08/launching-speech-commands-dataset.html).
-`pytorch_edgeml.graph.rnn.SRNN2` implements a 2 layer SRNN network. We will use
+`edgeml_pytorch.graph.rnn.SRNN2` implements a 2 layer SRNN network. We will use
 this with an LSTM cell on this dataset. The training routine for SRNN is
-implemented in `pytorch_edgeml.trainer.srnnTrainer` and will be used as part of
+implemented in `edgeml_pytorch.trainer.srnnTrainer` and will be used as part of
 this example.
 **Tested With:** pytorch > 1.1.0 with Python 2 and Python 3
--- a/examples/pytorch/SRNN/SRNN_Example.ipynb
+++ b/examples/pytorch/SRNN/SRNN_Example.ipynb
@ -39,9 +39,9 @@
   },
   "outputs": [],
   "source": [
-    "from pytorch_edgeml.graph.rnn import SRNN2\n",
+    "from edgeml_pytorch.graph.rnn import SRNN2\n",
-    "from pytorch_edgeml.trainer.srnnTrainer import SRNNTrainer\n",
+    "from edgeml_pytorch.trainer.srnnTrainer import SRNNTrainer\n",
-    "import pytorch_edgeml.utils as utils"
+    "import edgeml_pytorch.utils as utils"
   ]
  },
  {
--- a/examples/pytorch/SRNN/SRNN_Example.py
+++ b/examples/pytorch/SRNN/SRNN_Example.py
@ -7,9 +7,9 @@ import os
 import numpy as np
 import torch
-from pytorch_edgeml.graph.rnn import SRNN2
+from edgeml_pytorch.graph.rnn import SRNN2
-from pytorch_edgeml.trainer.srnnTrainer import SRNNTrainer
+from edgeml_pytorch.trainer.srnnTrainer import SRNNTrainer
-import pytorch_edgeml.utils as utils
+import edgeml_pytorch.utils as utils
 import helpermethods as helper
 config = helper.getSRNN2Args()
@ -30,6 +30,8 @@ std[std[:] < 0.000001] = 1
 x_train_ = (x_train_ - mean) / std
 x_val_ = (x_val_ - mean) / std
 x_test_ = (x_test_ - mean) / std
 np.save('mean.npy', mean)
 np.save('std.npy', std)
 x_train = np.swapaxes(x_train_, 0, 1)
 x_val = np.swapaxes(x_val_, 0, 1)
@ -73,3 +75,6 @@ trainer = SRNNTrainer(srnn2, learningRate, lossType='xentropy', device=device)
 trainer.train(brickSize, batchSize, epochs, x_train, x_val, y_train, y_val,
              printStep=printStep, valStep=valStep)
 print('Saving trained model:')
 torch.save(srnn2.state_dict(), 'model_srnn.pt')
--- a/examples/pytorch/SRNN/fetch_google.sh
+++ b/examples/pytorch/SRNN/fetch_google.sh
--- a/examples/pytorch/SRNN/helpermethods.py
+++ b/examples/pytorch/SRNN/helpermethods.py
--- a/examples/pytorch/SRNN/process_google.py
+++ b/examples/pytorch/SRNN/process_google.py
--- a/examples/tf/Bonsai/README.md
+++ b/examples/tf/Bonsai/README.md
@ -4,7 +4,7 @@ This directory includes, example notebook and general execution script of
 Bonsai developed as part of EdgeML. Also, we include a sample cleanup and
 use-case on the USPS10 public dataset.
-`edgeml.graph.bonsai` implements the Bonsai prediction graph in tensorflow.
+`edgeml_tf.graph.bonsai` implements the Bonsai prediction graph in tensorflow.
 The three-phase training routine for Bonsai is decoupled from the forward graph
 to facilitate a plug and play behaviour wherein Bonsai can be combined with or
 used as a final layer classifier for other architectures (RNNs, CNNs).
--- a/examples/tf/Bonsai/bonsai_example.ipynb
+++ b/examples/tf/Bonsai/bonsai_example.ipynb
@ -33,8 +33,8 @@
    "os.environ['CUDA_VISIBLE_DEVICES'] =''\n",
    "\n",
    "#Bonsai imports\n",
-    "from edgeml.trainer.bonsaiTrainer import BonsaiTrainer\n",
+    "from edgeml_tf.trainer.bonsaiTrainer import BonsaiTrainer\n",
-    "from edgeml.graph.bonsai import Bonsai\n",
+    "from edgeml_tf.graph.bonsai import Bonsai\n",
    "\n",
    "# Fixing seeds for reproducibility\n",
    "tf.set_random_seed(42)\n",
--- a/examples/tf/Bonsai/bonsai_example.py
+++ b/examples/tf/Bonsai/bonsai_example.py
@ -5,8 +5,8 @@ import helpermethods
 import tensorflow as tf
 import numpy as np
 import sys
-from edgeml.trainer.bonsaiTrainer import BonsaiTrainer
+from edgeml_tf.trainer.bonsaiTrainer import BonsaiTrainer
-from edgeml.graph.bonsai import Bonsai
+from edgeml_tf.graph.bonsai import Bonsai
 def main():
--- a/examples/tf/Bonsai/fetch_usps.py
+++ b/examples/tf/Bonsai/fetch_usps.py
--- a/examples/tf/Bonsai/helpermethods.py
+++ b/examples/tf/Bonsai/helpermethods.py
--- a/examples/tf/Bonsai/process_usps.py
+++ b/examples/tf/Bonsai/process_usps.py
--- a/examples/tf/Bonsai/quantizeBonsaiModels.py
+++ b/examples/tf/Bonsai/quantizeBonsaiModels.py
--- a/examples/tf/EMI-RNN/00_emi_lstm_example.ipynb
+++ b/examples/tf/EMI-RNN/00_emi_lstm_example.ipynb
@ -33,15 +33,14 @@
    "import numpy as np\n",
    "# Making sure edgeml is part of python path\n",
    "os.environ['CUDA_VISIBLE_DEVICES'] ='0'\n",
    "\n",
    "np.random.seed(42)\n",
    "tf.set_random_seed(42)\n",
    "\n",
    "# MI-RNN and EMI-RNN imports\n",
-    "from edgeml.graph.rnn import EMI_DataPipeline\n",
+    "from edgeml_tf.graph.rnn import EMI_DataPipeline\n",
-    "from edgeml.graph.rnn import EMI_BasicLSTM\n",
+    "from edgeml_tf.graph.rnn import EMI_BasicLSTM\n",
-    "from edgeml.trainer.emirnnTrainer import EMI_Trainer, EMI_Driver\n",
+    "from edgeml_tf.trainer.emirnnTrainer import EMI_Trainer, EMI_Driver\n",
-    "import edgeml.utils"
+    "import edgeml_tf.utils"
   ]
  },
  {
@ -120,10 +119,10 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "x_train shape is: (6294, 6, 48, 9)\n",
+      "x_train shape is: (6409, 6, 48, 9)\n",
-      "y_train shape is: (6294, 6, 6)\n",
+      "y_train shape is: (6409, 6, 6)\n",
-      "x_test shape is: (1058, 6, 48, 9)\n",
+      "x_test shape is: (943, 6, 48, 9)\n",
-      "y_test shape is: (1058, 6, 6)\n"
+      "y_test shape is: (943, 6, 6)\n"
     ]
    }
   ],
@ -272,6 +271,11 @@
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
@ -279,36 +283,54 @@
      "Update policy: top-k\n",
      "Training with MI-RNN loss for 3 rounds\n",
      "Round: 0\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.00467 Acc 0.90104 | Val acc 0.90454 | Model saved to /tmp/model-lstm, global_step 1000\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.00375 Acc 0.95312 | Val acc 0.91304 | Model saved to /tmp/model-lstm, global_step 1000\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.00262 Acc 0.93750 | Val acc 0.91777 | Model saved to /tmp/model-lstm, global_step 1001\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.00279 Acc 0.96875 | Val acc 0.91835 | Model saved to /tmp/model-lstm, global_step 1001\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.00285 Acc 0.90625 | Val acc 0.91871 | Model saved to /tmp/model-lstm, global_step 1002\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.00262 Acc 0.96875 | Val acc 0.89077 | Model saved to /tmp/model-lstm, global_step 1002\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.00263 Acc 0.91146 | Val acc 0.92344 | Model saved to /tmp/model-lstm, global_step 1003\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.00255 Acc 0.96875 | Val acc 0.90668 | "
-      "INFO:tensorflow:Restoring parameters from /tmp/model-lstm-1003\n",
+     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model saved to /tmp/model-lstm, global_step 1003\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": []
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Round: 1\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.00260 Acc 0.91146 | Val acc 0.92155 | Model saved to /tmp/model-lstm, global_step 1004\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.00262 Acc 0.96875 | Val acc 0.89077 | Model saved to /tmp/model-lstm, global_step 1004\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.00268 Acc 0.91146 | Val acc 0.92628 | Model saved to /tmp/model-lstm, global_step 1005\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.00255 Acc 0.96875 | Val acc 0.90668 | Model saved to /tmp/model-lstm, global_step 1005\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.00241 Acc 0.92188 | Val acc 0.92911 | Model saved to /tmp/model-lstm, global_step 1006\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.00247 Acc 0.96875 | Val acc 0.89183 | Model saved to /tmp/model-lstm, global_step 1006\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.00245 Acc 0.91667 | Val acc 0.91493 | Model saved to /tmp/model-lstm, global_step 1007\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.00253 Acc 0.96875 | Val acc 0.91410 | Model saved to /tmp/model-lstm, global_step 1007\n",
      "INFO:tensorflow:Restoring parameters from /tmp/model-lstm-1006\n",
      "Round: 2\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.00245 Acc 0.91667 | Val acc 0.91493 | Model saved to /tmp/model-lstm, global_step 1008\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.00249 Acc 0.96875 | Val acc 0.90456 | Model saved to /tmp/model-lstm, global_step 1008\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.00247 Acc 0.93750 | Val acc 0.91210 | Model saved to /tmp/model-lstm, global_step 1009\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.00239 Acc 0.96875 | Val acc 0.89714 | Model saved to /tmp/model-lstm, global_step 1009\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.00238 Acc 0.93750 | Val acc 0.91115 | Model saved to /tmp/model-lstm, global_step 1010\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.00255 Acc 0.96354 | Val acc 0.91516 | Model saved to /tmp/model-lstm, global_step 1010\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.00247 Acc 0.91667 | Val acc 0.90737 | Model saved to /tmp/model-lstm, global_step 1011\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.00232 Acc 0.96875 | Val acc 0.91092 | Model saved to /tmp/model-lstm, global_step 1011\n",
      "INFO:tensorflow:Restoring parameters from /tmp/model-lstm-1008\n",
      "Round: 3\n",
      "Switching to EMI-Loss function\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.19644 Acc 0.92188 | Val acc 0.91304 | Model saved to /tmp/model-lstm, global_step 1012\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.23041 Acc 0.96875 | Val acc 0.89608 | Model saved to /tmp/model-lstm, global_step 1012\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.19590 Acc 0.92188 | Val acc 0.91304 | Model saved to /tmp/model-lstm, global_step 1013\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.20689 Acc 0.96875 | Val acc 0.89396 | Model saved to /tmp/model-lstm, global_step 1013\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.18886 Acc 0.91667 | Val acc 0.92250 | Model saved to /tmp/model-lstm, global_step 1014\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.19695 Acc 0.96875 | Val acc 0.90562 | Model saved to /tmp/model-lstm, global_step 1014\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.17789 Acc 0.92708 | Val acc 0.91210 | Model saved to /tmp/model-lstm, global_step 1015\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.18891 Acc 0.96875 | Val acc 0.89608 | Model saved to /tmp/model-lstm, global_step 1015\n",
      "INFO:tensorflow:Restoring parameters from /tmp/model-lstm-1014\n",
      "Round: 4\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.17789 Acc 0.92708 | Val acc 0.91210 | Model saved to /tmp/model-lstm, global_step 1016\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.18891 Acc 0.96875 | Val acc 0.89608 | Model saved to /tmp/model-lstm, global_step 1016\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.17308 Acc 0.93750 | Val acc 0.90737 | Model saved to /tmp/model-lstm, global_step 1017\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.17931 Acc 0.96875 | Val acc 0.90456 | Model saved to /tmp/model-lstm, global_step 1017\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.16609 Acc 0.93750 | Val acc 0.91682 | Model saved to /tmp/model-lstm, global_step 1018\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.17625 Acc 0.96875 | Val acc 0.90138 | Model saved to /tmp/model-lstm, global_step 1018\n",
-      "Epoch   1 Batch   193 (  390) Loss 0.16253 Acc 0.93750 | Val acc 0.91115 | Model saved to /tmp/model-lstm, global_step 1019\n",
+      "Epoch   1 Batch   189 (  390) Loss 0.16728 Acc 0.96875 | Val acc 0.93319 | Model saved to /tmp/model-lstm, global_step 1019\n"
      "INFO:tensorflow:Restoring parameters from /tmp/model-lstm-1018\n"
     ]
    }
   ],
@ -394,10 +416,10 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "Accuracy at k = 2: 0.894130\n",
+      "Accuracy at k = 2: 0.898880\n",
      "Savings due to MI-RNN : 0.625000\n",
-      "Savings due to Early prediction: 0.696645\n",
+      "Savings due to Early prediction: 0.623507\n",
-      "Total Savings: 0.886242\n"
+      "Total Savings: 0.858815\n"
     ]
    }
   ],
@ -431,27 +453,27 @@
     "output_type": "stream",
     "text": [
      "   len       acc  macro-fsc  macro-pre  macro-rec  micro-fsc  micro-pre  \\\n",
-      "0    1  0.869019   0.868122   0.878439   0.871618   0.869019   0.869019   \n",
+      "0    1  0.881235   0.881088   0.887293   0.885055   0.881235   0.881235   \n",
-      "1    2  0.894130   0.894577   0.898733   0.896412   0.894130   0.894130   \n",
+      "1    2  0.898880   0.899793   0.901265   0.902492   0.898880   0.898880   \n",
-      "2    3  0.893451   0.893584   0.897599   0.894880   0.893451   0.893451   \n",
+      "2    3  0.902952   0.904039   0.903778   0.906140   0.902952   0.902952   \n",
-      "3    4  0.873770   0.873280   0.882517   0.873642   0.873770   0.873770   \n",
+      "3    4  0.888700   0.889662   0.892359   0.890378   0.888700   0.888700   \n",
-      "4    5  0.853410   0.853243   0.870547   0.851575   0.853410   0.853410   \n",
+      "4    5  0.873431   0.874679   0.882768   0.874046   0.873431   0.873431   \n",
-      "5    6  0.836105   0.836891   0.863634   0.833071   0.836105   0.836105   \n",
+      "5    6  0.860197   0.862230   0.877873   0.859560   0.860197   0.860197   \n",
      "\n",
      "   micro-rec  \n",
-      "0   0.869019  \n",
+      "0   0.881235  \n",
-      "1   0.894130  \n",
+      "1   0.898880  \n",
-      "2   0.893451  \n",
+      "2   0.902952  \n",
-      "3   0.873770  \n",
+      "3   0.888700  \n",
-      "4   0.853410  \n",
+      "4   0.873431  \n",
-      "5   0.836105  \n",
+      "5   0.860197  \n",
-      "Max accuracy 0.894130 at subsequencelength 2\n",
+      "Max accuracy 0.902952 at subsequencelength 3\n",
-      "Max micro-f 0.894130 at subsequencelength 2\n",
+      "Max micro-f 0.902952 at subsequencelength 3\n",
-      "Micro-precision 0.894130 at subsequencelength 2\n",
+      "Micro-precision 0.902952 at subsequencelength 3\n",
-      "Micro-recall 0.894130 at subsequencelength 2\n",
+      "Micro-recall 0.902952 at subsequencelength 3\n",
-      "Max macro-f 0.894577 at subsequencelength 2\n",
+      "Max macro-f 0.904039 at subsequencelength 3\n",
-      "macro-precision 0.898733 at subsequencelength 2\n",
+      "macro-precision 0.903778 at subsequencelength 3\n",
-      "macro-recall 0.896412 at subsequencelength 2\n"
+      "macro-recall 0.906140 at subsequencelength 3\n"
     ]
    }
   ],
@ -483,16 +505,11 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "INFO:tensorflow:Restoring parameters from /tmp/model-lstm-1003\n",
+      "Round:  0, Validation accuracy: 0.9183, Test Accuracy (k = 2): 0.916865, Total Savings: 0.765207\n",
-      "Round:  0, Validation accuracy: 0.9234, Test Accuracy (k = 2): 0.899559, Total Savings: 0.790902\n",
+      "Round:  1, Validation accuracy: 0.9141, Test Accuracy (k = 2): 0.915507, Total Savings: 0.789403\n",
-      "INFO:tensorflow:Restoring parameters from /tmp/model-lstm-1006\n",
+      "Round:  2, Validation accuracy: 0.9152, Test Accuracy (k = 2): 0.908381, Total Savings: 0.799538\n",
-      "Round:  1, Validation accuracy: 0.9291, Test Accuracy (k = 2): 0.896844, Total Savings: 0.814705\n",
+      "Round:  3, Validation accuracy: 0.9056, Test Accuracy (k = 2): 0.903970, Total Savings: 0.844050\n",
-      "INFO:tensorflow:Restoring parameters from /tmp/model-lstm-1008\n",
+      "Round:  4, Validation accuracy: 0.9332, Test Accuracy (k = 2): 0.898880, Total Savings: 0.858815\n"
      "Round:  2, Validation accuracy: 0.9149, Test Accuracy (k = 2): 0.894469, Total Savings: 0.821671\n",
      "INFO:tensorflow:Restoring parameters from /tmp/model-lstm-1014\n",
      "Round:  3, Validation accuracy: 0.9225, Test Accuracy (k = 2): 0.894130, Total Savings: 0.876447\n",
      "INFO:tensorflow:Restoring parameters from /tmp/model-lstm-1018\n",
      "Round:  4, Validation accuracy: 0.9168, Test Accuracy (k = 2): 0.894130, Total Savings: 0.886242\n"
     ]
    }
   ],
@ -523,9 +540,9 @@
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": ".tf",
   "language": "python",
-   "name": "python3"
+   "name": ".tf"
  },
  "language_info": {
   "codemirror_mode": {
@ -537,7 +554,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.6.8"
+   "version": "3.5.2"
  }
 },
 "nbformat": 4,
--- a/examples/tf/EMI-RNN/00_emi_lstm_example_automode_false.ipynb
+++ b/examples/tf/EMI-RNN/00_emi_lstm_example_automode_false.ipynb
@ -38,10 +38,10 @@
    "tf.set_random_seed(42)\n",
    "\n",
    "# MI-RNN and EMI-RNN imports\n",
-    "from edgeml.graph.rnn import EMI_DataPipeline\n",
+    "from edgeml_tf.graph.rnn import EMI_DataPipeline\n",
-    "from edgeml.graph.rnn import EMI_BasicLSTM\n",
+    "from edgeml_tf.graph.rnn import EMI_BasicLSTM\n",
-    "from edgeml.trainer.emirnnTrainer import EMI_Trainer, EMI_Driver\n",
+    "from edgeml_tf.trainer.emirnnTrainer import EMI_Trainer, EMI_Driver\n",
-    "import edgeml.utils"
+    "import edgeml_tf.utils"
   ]
  },
  {
@ -598,7 +598,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.6.8"
+   "version": "3.5.2"
  }
 },
 "nbformat": 4,
--- a/examples/tf/EMI-RNN/01_emi_fastgrnn_example.ipynb
+++ b/examples/tf/EMI-RNN/01_emi_fastgrnn_example.ipynb
@ -34,11 +34,11 @@
    "os.environ['CUDA_VISIBLE_DEVICES'] ='1'\n",
    "\n",
    "# FastGRNN and FastRNN imports\n",
-    "from edgeml.graph.rnn import EMI_DataPipeline\n",
+    "from edgeml_tf.graph.rnn import EMI_DataPipeline\n",
-    "from edgeml.graph.rnn import EMI_FastGRNN\n",
+    "from edgeml_tf.graph.rnn import EMI_FastGRNN\n",
-    "from edgeml.graph.rnn import EMI_FastRNN\n",
+    "from edgeml_tf.graph.rnn import EMI_FastRNN\n",
-    "from edgeml.trainer.emirnnTrainer import EMI_Trainer, EMI_Driver\n",
+    "from edgeml_tf.trainer.emirnnTrainer import EMI_Trainer, EMI_Driver\n",
-    "import edgeml.utils"
+    "import edgeml_tf.utils"
   ]
  },
  {
--- a/examples/tf/EMI-RNN/02_emi_lstm_initialization_and_restoring.ipynb
+++ b/examples/tf/EMI-RNN/02_emi_lstm_initialization_and_restoring.ipynb
@ -35,10 +35,10 @@
    "os.environ['CUDA_VISIBLE_DEVICES'] ='0'\n",
    "\n",
    "# MI-RNN and EMI-RNN imports\n",
-    "from edgeml.graph.rnn import EMI_DataPipeline\n",
+    "from edgeml_tf.graph.rnn import EMI_DataPipeline\n",
-    "from edgeml.graph.rnn import EMI_BasicLSTM\n",
+    "from edgeml_tf.graph.rnn import EMI_BasicLSTM\n",
-    "from edgeml.trainer.emirnnTrainer import EMI_Trainer, EMI_Driver\n",
+    "from edgeml_tf.trainer.emirnnTrainer import EMI_Trainer, EMI_Driver\n",
-    "import edgeml.utils"
+    "import edgeml_tf.utils"
   ]
  },
  {
@ -654,7 +654,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.6.3"
+   "version": "3.5.2"
  }
 },
 "nbformat": 4,
--- a/examples/tf/EMI-RNN/README.md
+++ b/examples/tf/EMI-RNN/README.md
@ -3,7 +3,7 @@
 This directory includes example notebooks EMI-RNN developed as part of EdgeML.
 The example is based on the UCI Human Activity Recognition dataset.
-Please refer to `tf/docs/EMI-RNN.md` for detailed documentation of EMI-RNN.
+Please refer to `docs/EMI-RNN.md` for detailed documentation of EMI-RNN.
 Please refer to `00_emi_lstm_example.ipynb` for a quick and dirty getting
 started guide.
--- a/examples/tf/EMI-RNN/fetch_har.py
+++ b/examples/tf/EMI-RNN/fetch_har.py
--- a/examples/tf/EMI-RNN/helpermethods.py
+++ b/examples/tf/EMI-RNN/helpermethods.py
--- a/examples/tf/EMI-RNN/img/3PartsGraph.png
+++ b/examples/tf/EMI-RNN/img/3PartsGraph.png
--- a/examples/tf/EMI-RNN/img/MIML_illustration.png
+++ b/examples/tf/EMI-RNN/img/MIML_illustration.png
--- a/examples/tf/EMI-RNN/process_har.py
+++ b/examples/tf/EMI-RNN/process_har.py
--- a/examples/tf/FastCells/README.md
+++ b/examples/tf/FastCells/README.md
@ -5,7 +5,7 @@ FastCells (FastRNN & FastGRNN) developed as part of EdgeML along with modified
 UGRNN, GRU and LSTM to support the LSQ training routine. 
 Also, we include a sample cleanup and use-case on the USPS10 public dataset.
-`edgeml.graph.rnn` implements the custom RNN cells of **FastRNN** ([`FastRNNCell`](../../edgeml/graph/rnn.py#L215)) and **FastGRNN** ([`FastGRNNCell`](../../edgeml/graph/rnn.py#L40)) with
+`edgeml_tf.graph.rnn` implements the custom RNN cells of **FastRNN** ([`FastRNNCell`](../../edgeml/graph/rnn.py#L215)) and **FastGRNN** ([`FastGRNNCell`](../../edgeml/graph/rnn.py#L40)) with
 multiple additional features like Low-Rank parameterisation, custom
 non-linearities etc., Similar to Bonsai and ProtoNN, the three-phase training
 routine for FastRNN and FastGRNN is decoupled from the custom cells to
@ -14,9 +14,9 @@ architectures (NMT, Encoder-Decoder etc.,) in place of the inbuilt `RNNCell`, `G
 `edgeml.graph.rnn` also contains modified RNN cells of **UGRNN** ([`UGRNNLRCell`](../../edgeml/graph/rnn.py#L862)), 
 **GRU** ([`GRULRCell`](../../edgeml/graph/rnn.py#L635)) and **LSTM** ([`LSTMLRCell`](../../edgeml/graph/rnn.py#L376)). These cells also can be substituted for FastCells where ever feasible. 
-For training FastCells, `edgeml.trainer.fastTrainer` implements the three-phase
+For training FastCells, `edgeml_tf.trainer.fastTrainer` implements the three-phase
 FastCell training routine in Tensorflow. A simple example,
-`examples/fastcell_example.py` is provided to illustrate its usage.
+`examples/tf/fastcell_example.py` is provided to illustrate its usage.
 Note that `fastcell_example.py` assumes that data is in a specific format.  It
 is assumed that train and test data is contained in two files, `train.npy` and
--- a/examples/tf/FastCells/fastcell_example.ipynb
+++ b/examples/tf/FastCells/fastcell_example.ipynb
@ -28,12 +28,12 @@
    "os.environ['CUDA_VISIBLE_DEVICES'] =''\n",
    "\n",
    "#FastRNN and FastGRNN imports\n",
-    "from edgeml.trainer.fastTrainer import FastTrainer\n",
+    "from edgeml_tf.trainer.fastTrainer import FastTrainer\n",
-    "from edgeml.graph.rnn import FastGRNNCell\n",
+    "from edgeml_tf.graph.rnn import FastGRNNCell\n",
-    "from edgeml.graph.rnn import FastRNNCell\n",
+    "from edgeml_tf.graph.rnn import FastRNNCell\n",
-    "from edgeml.graph.rnn import UGRNNLRCell\n",
+    "from edgeml_tf.graph.rnn import UGRNNLRCell\n",
-    "from edgeml.graph.rnn import GRULRCell\n",
+    "from edgeml_tf.graph.rnn import GRULRCell\n",
-    "from edgeml.graph.rnn import LSTMLRCell\n",
+    "from edgeml_tf.graph.rnn import LSTMLRCell\n",
    "\n",
    "# Fixing seeds for reproducibility\n",
    "tf.set_random_seed(42)\n",
--- a/examples/tf/FastCells/fastcell_example.py
+++ b/examples/tf/FastCells/fastcell_example.py
@ -6,12 +6,12 @@ import tensorflow as tf
 import numpy as np
 import sys
-from edgeml.trainer.fastTrainer import FastTrainer
+from edgeml_tf.trainer.fastTrainer import FastTrainer
-from edgeml.graph.rnn import FastGRNNCell
+from edgeml_tf.graph.rnn import FastGRNNCell
-from edgeml.graph.rnn import FastRNNCell
+from edgeml_tf.graph.rnn import FastRNNCell
-from edgeml.graph.rnn import UGRNNLRCell
+from edgeml_tf.graph.rnn import UGRNNLRCell
-from edgeml.graph.rnn import GRULRCell
+from edgeml_tf.graph.rnn import GRULRCell
-from edgeml.graph.rnn import LSTMLRCell
+from edgeml_tf.graph.rnn import LSTMLRCell
 def main():
--- a/examples/tf/FastCells/fetch_usps.py
+++ b/examples/tf/FastCells/fetch_usps.py
--- a/examples/tf/FastCells/helpermethods.py
+++ b/examples/tf/FastCells/helpermethods.py
--- a/examples/tf/FastCells/process_usps.py
+++ b/examples/tf/FastCells/process_usps.py
--- a/examples/tf/FastCells/quantizeFastModels.py
+++ b/examples/tf/FastCells/quantizeFastModels.py
--- a/examples/tf/ProtoNN/README.md
+++ b/examples/tf/ProtoNN/README.md
@ -4,11 +4,11 @@ This directory includes an example [notebook](protoNN_example.ipynb)  and a
 command line execution script of ProtoNN developed as part of EdgeML. The
 example is based on the USPS dataset.
-`edgeml.graph.protoNN` implements the ProtoNN prediction graph in Tensorflow.
+`edgeml_tf.graph.protoNN` implements the ProtoNN prediction graph in Tensorflow.
 The training routine for ProtoNN is decoupled from the forward graph to
 facilitate a plug and play behaviour wherein ProtoNN can be combined with or
 used as a final layer classifier for other architectures (RNNs, CNNs). The
-training routine is implemented in `edgeml.trainer.protoNNTrainer`.
+training routine is implemented in `edgeml_tf.trainer.protoNNTrainer`.
 Note that, `protoNN_example.py` assumes the data to be in a specific format.  It
 is assumed that train and test data is contained in two files, `train.npy` and
--- a/examples/tf/ProtoNN/fetch_usps.py
+++ b/examples/tf/ProtoNN/fetch_usps.py
--- a/examples/tf/ProtoNN/helpermethods.py
+++ b/examples/tf/ProtoNN/helpermethods.py
@ -6,7 +6,7 @@ import sys
 import os
 import numpy as np
 import tensorflow as tf
-import edgeml.utils as utils
+import edgeml_tf.utils as utils
 import argparse
--- a/examples/tf/ProtoNN/process_usps.py
+++ b/examples/tf/ProtoNN/process_usps.py
--- a/examples/tf/ProtoNN/protoNN_example.ipynb
+++ b/examples/tf/ProtoNN/protoNN_example.ipynb
@ -29,9 +29,9 @@
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "\n",
-    "from edgeml.trainer.protoNNTrainer import ProtoNNTrainer\n",
+    "from edgeml_tf.trainer.protoNNTrainer import ProtoNNTrainer\n",
-    "from edgeml.graph.protoNN import ProtoNN\n",
+    "from edgeml_tf.graph.protoNN import ProtoNN\n",
-    "import edgeml.utils as utils\n",
+    "import edgeml_tf.utils as utils\n",
    "import helpermethods as helper"
   ]
  },
--- a/examples/tf/ProtoNN/protoNN_example.py
+++ b/examples/tf/ProtoNN/protoNN_example.py
@ -6,9 +6,9 @@ import sys
 import os
 import numpy as np
 import tensorflow as tf
-from edgeml.trainer.protoNNTrainer import ProtoNNTrainer
+from edgeml_tf.trainer.protoNNTrainer import ProtoNNTrainer
-from edgeml.graph.protoNN import ProtoNN
+from edgeml_tf.graph.protoNN import ProtoNN
-import edgeml.utils as utils
+import edgeml_tf.utils as utils
 import helpermethods as helper
--- a/pytorch/README.md
+++ b/pytorch/README.md
@ -1,27 +1,48 @@
-## Edge Machine Learning: PyTorch Library 
+## Edge Machine Learning: Pytorch Library 
-This directory includes, PyTorch implementations of various techniques and
+This package includes PyTorch implementations of following algorithms and training
-algorithms developed as part of EdgeML. Currently, the following algorithms are
+techniques developed as part of EdgeML. The PyTorch graphs for the forward/backward
-available in PyTorch:
+pass of these algorithms are packaged as `edgeml_pytorch.graph` and the trainers
 for these algorithms are in `edgeml_pytorch.trainer`. 
-1. [Bonsai](../docs/publications/Bonsai.pdf)
+1. [Bonsai](https://github.com/microsoft/EdgeML/blob/master/docs/publications/Bonsai.pdf): `edgeml_pytorch.graph.bonsai` implements
-2. [FastRNN & FastGRNN](../docs/publications/FastGRNN.pdf)
+   the Bonsai prediction graph. The three-phase training routine for Bonsai is decoupled
   from the forward graph to facilitate a plug and play behaviour wherein Bonsai can be
   combined with or used as a final layer classifier for other architectures (RNNs, CNNs).
   See `edgeml_pytorch.trainer.bonsaiTrainer` for 3-phase training.  
 2. [ProtoNN](https://github.com/microsoft/EdgeML/blob/master/docs/publications/ProtoNN.pdf): `edgeml_pytorch.graph.protoNN` implements the
   ProtoNN prediction functions. The training routine for ProtoNN is decoupled from the forward
   graph to facilitate a plug and play behaviour wherein ProtoNN can be combined with or used
   as a final layer classifier for other architectures (RNNs, CNNs). The training routine is
   implemented in `edgeml_pytorch.trainer.protoNNTrainer`.
 3. [FastRNN & FastGRNN](https://github.com/microsoft/EdgeML/blob/master/docs/publications/FastGRNN.pdf): `edgeml_pytorch.graph.rnn` provides
    various RNN cells --- including new cells `FastRNNCell` and `FastGRNNCell` as well as 
    `UGRNNCell`, `GRUCell`, and `LSTMCell` --- with features like low-rank parameterisation
    of weight matrices and custom non-linearities. Akin to Bonsai and ProtoNN, the three-phase
    training routine for FastRNN and FastGRNN is decoupled from the custom cells to enable plug and
    play behaviour of the custom RNN cells in other architectures (NMT, Encoder-Decoder etc.).
    Additionally, numerically equivalent CUDA-based implementations `FastRNNCUDACell` and 
    `FastGRNNCUDACell` are provided for faster training. `edgeml_pytorch.graph.rnn`.
    `edgeml_pytorch.graph.rnn.Fast(G)RNN(CUDA)` provides unrolled RNNs equivalent to `nn.LSTM` and `nn.GRU`.
    `edgeml_pytorch.trainer.fastmodel` presents a sample multi-layer RNN + multi-class classifier model.
 4. [S-RNN](https://github.com/microsoft/EdgeML/blob/master/docs/publications/SRNN.pdf): `edgeml_pytorch.graph.rnn.SRNN2` implements a 
    2 layer SRNN network which can be instantied with a choice of RNN cell. The training
    routine for SRNN is in `edgeml_pytorch.trainer.srnnTrainer`.
 Usage directions and examples notebooks for this package are provided [here](https://github.com/microsoft/EdgeML/blobl/master/examples/pytorch).
 The PyTorch compute graphs for these algoriths are packaged as
 `pytorch_edgeml.graph`. Trainers for these algorithms are in `pytorch_edgeml.trainer`. Usage
 directions and examples for these algorithms are provided in `examples`
 directory. To get started with any of the provided algorithms, please follow
 the notebooks in the the `examples` directory.
 ## Installation
-Use pip and the provided requirements file to first install required
+It is highly recommended that EdgeML be installed in a virtual environment. 
-dependencies before installing the `pytorch_edgeml` library. Details for installation provided below.
+Please create a new virtual environment using your environment manager
-
+ ([virtualenv](https://virtualenv.pypa.io/en/stable/userguide/#usage) or
-It is highly recommended that EdgeML be installed in a virtual environment. Please create
+  [Anaconda](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands)).
 a new virtual environment using your environment manager ([virtualenv](https://virtualenv.pypa.io/en/stable/userguide/#usage) or [Anaconda](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands)).
 Make sure the new environment is active before running the below mentioned commands.
 Use pip to install requirements before installing the `edgeml_pytorch` library.
 Details for cpu based installation and gpu based installation provided below.
 ### CPU
 ``` 
@ -29,18 +50,16 @@ pip install -r requirements-cpu.txt
 pip install -e .
 ```
-Tested on Python 3.6 with PyTorch 1.1.
+Tested on Python3.6 with >= PyTorch 1.1.0.
 ### GPU
-Install appropriate CUDA and cuDNN [Tested with >= CUDA 9.0 and cuDNN >= 7.0]
+Install appropriate CUDA and cuDNN [Tested with >= CUDA 8.1 and cuDNN >= 6.1]
 ```
 pip install -r requirements-gpu.txt
 pip install -e .
 ```
 Note: If the above commands don't go through for PyTorch installation on CPU and GPU, please follow this [link](https://pytorch.org/get-started/locally/).
 Copyright (c) Microsoft Corporation. All rights reserved.
 Licensed under the MIT license.
--- a/Показать больше
+++ b/Показать больше