Example of using HyperDrive to tune a regular ML learner.
Перейти к файлу
Mario Bourgoin 6ecd87a1b6 Use the simpler gain computation 2019-05-30 19:10:10 +00:00
.gitignore Add creaed files and directories 2018-11-12 13:44:21 +00:00
00_Data_Prep.ipynb Split data into train-tune-test. 2019-04-23 12:54:24 +00:00
01_Training_Script.ipynb Remove unused imports 2019-05-30 19:09:37 +00:00
02_Testing_Script.ipynb Use the simpler gain computation 2019-05-30 19:10:10 +00:00
03_Run_Locally.ipynb Add ce;; timing. 2019-05-14 13:47:21 +00:00
04_Hyperparameter_Random_Search.ipynb Add an early termination policy 2019-05-14 12:34:28 +00:00
05_Train_Best_Model.ipynb Add a notion of how long this notebook will take 2019-05-14 12:34:58 +00:00
06_Test_Best_Model.ipynb Move the metion of the MLAKSDeployAML tutorial to the model testing notebook. 2019-05-14 12:36:11 +00:00
07_Tear_Down.ipynb Move the metion of the MLAKSDeployAML tutorial to the model testing notebook. 2019-05-14 12:36:11 +00:00
Design.png Use Nanette's diagram 2019-03-04 09:16:10 -05:00
LICENSE Initial commit 2018-08-14 10:13:23 -07:00
README.md Fix the notebook reference. 2019-05-14 08:49:34 -04:00
azure-pipelines.yml New notebooks and new names. 2019-05-10 22:52:51 +00:00
environment.yml Update AML SDK version 2019-05-29 20:07:13 +00:00
get_auth.py ServicePrincipalAuthentication structural change 2019-03-06 06:59:28 -05:00
text_utilities.py Add copyright notice. 2019-02-28 21:12:32 +00:00

README.md

Author: Mario Bourgoin

Training of Python scikit-learn models on Azure

Overview

This scenario shows how to tune a Frequently Asked Questions (FAQ) matching model that can be deployed as a web service to provide predictions for user questions. For this scenario, "Input Data" in the architecture diagram refers to text strings containing the user questions to match with a list of FAQs. The scenario is designed for the Scikit-Learn machine learning library for Python but can be generalized to any scenario that uses Python models to make real-time predictions.

Design

alt text The scenario uses a subset of Stack Overflow question data which includes original questions tagged as JavaScript, their duplicate questions, and their answers. It tunes a Scikit-Learn pipeline to predict the match probability of a duplicate question with each of the original questions. The application flow for this architecture is as follows:

  1. Create an Azure ML Service workspace.
  2. Create an Azure ML Compute cluster.
  3. Upload training, tuning, and testing data to Azure Storage.
  4. Configure a HyperDrive random parameter search.
  5. Submit the search.
  6. Monitor until complete.
  7. Retrieve the best set of hyperparameters.
  8. Register the best model.

Prerequisites

  1. Linux(Ubuntu).
  2. Anaconda Python installed.
  3. Azure account.

The tutorial was developed on an Azure Ubuntu DSVM, which addresses the first three prerequisites.

Setup

To set up your environment to run these notebooks, please follow these steps. They setup the notebooks to use Azure seamlessly.

  1. Create a Linux Ubuntu DSVM.
  2. Clone, fork, or download the zip file for this repository:
    git clone https://github.com/Microsoft/MLHyperparameterTuning.git
    
  3. Enter the local repository:
    cd MLHyperparameterTuning
    
  4. Create the Python MLHyperparameterTuning virtual environment using the environment.yml:
    conda env create -f environment.yml
    
  5. Activate the virtual environment:
    source activate MLHyperparameterTuning
    
  6. Login to Azure:
    az login
    
  7. If you have more than one Azure subscription, select it:
    az account set --subscription <Your Azure Subscription>
    
  8. Start the Jupyter notebook server in the virtual environment:
    jupyter notebook
    

Steps

After following the setup instructions above, run the Jupyter notebooks in order starting with Data Prep Notebook.

Cleaning up

The last Jupyter notebook describes how to delete the Azure resources created for running the tutorial. Consult the conda documentation for information on how to remove the conda environment created during the setup.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.