Example of using HyperDrive to tune a regular ML learner.

azureml

Перейти к файлу

Mario Bourgoin d9faf99553 Output to data directory.		2018-11-12 13:45:05 +00:00
utilities	Update to BatchAI repo version.	2018-09-26 13:51:53 +00:00
.gitignore	Add creaed files and directories	2018-11-12 13:44:21 +00:00
00_Data_Prep.ipynb	Output to data directory.	2018-11-12 13:45:05 +00:00
01_Training_Script.ipynb	Standardize tag names.	2018-10-11 14:43:49 +00:00
02_Docker_Image.ipynb	Table of content with anchors.	2018-10-11 14:43:58 +00:00
03_Configure_Batch_AI.ipynb	Table of content with anchors.	2018-10-11 14:50:17 +00:00
04_Create_Cluster.ipynb	Table of content with anchors.	2018-10-11 14:56:20 +00:00
05_Hyperparameter_Search.ipynb	Do a random search instead of a grid search	2018-11-07 14:05:18 +00:00
06_Tear_Down.ipynb	Table of content with anchors.	2018-10-11 16:53:36 +00:00
Design.png	Add initial user-facing instructions. The diagram needs to be updated.	2018-11-07 09:17:58 -05:00
ItemSelector.py	bug fix	2018-11-02 16:37:39 -04:00
LICENSE	Initial commit	2018-08-14 10:13:23 -07:00
README.md	Missing end of sentence.	2018-11-07 09:54:43 -05:00
environment.yml	Pull python-dotenv directly from its repo to workaround bugs.	2018-09-14 16:23:48 +00:00
label_rank.py	missing	2018-08-29 19:16:41 +00:00
text_utilities.py	Fix bug.	2018-11-06 09:38:58 -05:00
timer.py	Missing module.	2018-08-29 19:13:43 +00:00

README.md

Author: Mario Bourgoin

Tuning Python models on a Batch AI cluster

Overview

This scenario shows how to tune a Frequently Asked Questions (FAQ) matching model that can be deployed as a web service to provide predictions for user questions. For this scenario, “Input Data” in the architecture diagram refers to text strings containing the user questions to match with a list of FAQs. The scenario is designed for the Scikit-Learn machine learning library for Python but can be generalized to any scenario that uses Python models to make real-time predictions.

Design

The scenario uses a subset of Stack Overflow question data which includes original questions tagged as JavaScript, their duplicate questions, and their answers. It tunes a Scikit-Learn pipeline to predict the match probability of a duplicate question with each of the original questions. The application flow for this architecture is as follows:

Prerequisites

Linux(Ubuntu).
Anaconda Python installed.
Docker installed.
DockerHub account.
Azure account.

The tutorial was developed on an Azure Ubuntu DSVM, which addresses the first three prerequisites.

Setup

To set up your environment to run these notebooks, please follow these steps. They setup the notebooks to use Docker and Azure seamlessly.

Create a Linux DSVM.
In a bash shell on the DSVM, add your login to the docker group:
```
sudo usermod -a -G docker <login>
```
Login to your DockerHub account:
```
docker login
```

Clone, fork, or download the zip file for this repository:

git clone https://github.com/Azure/MLBatchAIHyperparameterTuning.git

Create the Python MLBatchAIHyperparameterTuning virtual environment using the environment.yml:
```
conda env create -f environment.yml
```

Activate the virtual environment:

source activate MLBatchAIHyperparameterTuning

If you have more than one Azure subscription, select it:

az account set --subscription <Your Azure Subscription>

Start the Jupyter notebook server in the virtual environment:
```
jupyter notebook
```

Steps

After following the setup instructions above, run the Jupyter notebooks in order starting with Data Prep Notebook.

Cleaning up

To remove the conda environment created see here. The last Jupyter notebook also gives details on deleting Azure resources associated with this repository.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.