Running the most popular deep learning frameworks on Azure Batch AI
Перейти к файлу
msalvaris dec79e118b Updates for final working version 2018-05-04 17:48:28 +00:00
docker Updates caffe dockerfile 2018-05-04 15:31:57 +01:00
exec_src Updates for final working version 2018-05-04 17:48:28 +00:00
local_test updates dockerfile 2017-12-24 14:24:15 +00:00
.gitignore Initial commit 2017-11-28 03:30:57 -06:00
ExploringBatchAI.ipynb Updates for final working version 2018-05-04 17:48:28 +00:00
LICENSE Initial commit 2017-11-28 01:31:00 -08:00
Makefile Updates for final working version 2018-05-04 17:48:28 +00:00
README.md Merge pull request #4 from aniedea/patch-1 2017-12-05 00:50:58 +00:00
anaconda-project.yml Updates for final working version 2018-05-04 17:48:28 +00:00
process_cifar.py Adds project to github 2017-11-28 09:44:24 +00:00
setup_bait.py Updates for final working version 2018-05-04 17:48:28 +00:00
utilities.py Updates for final working version 2018-05-04 17:48:28 +00:00

README.md

Deep Learning Frameworks using Azure Batch AI

Introduction

This repo contains everything you need to run some of the most popular deep learning frameworks on Batch AI. Batch AI is a service that allows you to run various machine learning workloads on clusters of VMs. For more details on the service please look here.

This project uses anaconda-project and makefiles to create the environment, download the data and prepare all necessary artifacts.

The frameworks are:

Each of the notebooks trains a simple Convolution Neural Network on the CIFAR10 dataset.

Setup

This project was developed and tested on an Azure Ubuntu DSVM but should be compatible with any Linux distribution. The prerequisites for this are:

sudo apt-get install jq
Optional

If you want to execute docker without having to sudo each time then you need to run the following:

sudo groupadd docker
sudo usermod -aG docker $USER

You may need to log out and log back in again for changes to take effect. Instructions from https://docs.docker.com/engine/installation/linux/linux-postinstall/#manage-docker-as-a-non-root-user

*** Tip ***

Anaconda project sets a number of environment variables and when executing sudo commands these won't be available in the sudo environment. To make them available execute your sudo commands with the -E switch.

Setting up project

Once you have docker installed and anaconda-project you can start setting up the project. Many of the interactions happen through anaconda-project and to list the available commands simply run:

anaconda-project list-commands

When you first set the project you will need to set a number of things up. These are handled through anaconda-project. It will create the appropriate environment, install the appropriate packages, download and prepare the data etc. Run the following to prepare the environment:

anaconda-project prepare

It will ask you for a number of variables as well as where to store the data locally. If nothing is specified it will use the local folder data.

Once the environment is set up we need to install some further packages as well as create some Azure resources. The following command will:

  • Install the Azure CLI
  • Install Blobxfer
  • Log you in to Azure and select the appropriate subscription
  • Register for the appropriate services
  • Create a service principal
  • Create a storage account and fileshare
  • Download the CIFAR data and upload it to the fileshare
anaconda-project run setup

The command can take a while. If you want more information on what is going on you can use the --verbose flag. Pay attention since it will ask you to log in to your Azure account. If you want a better understanding of what is going on have a look at the Makefile

Run Batch AI

Instructions on how to setup the cluster and start submitting jobs are detailed in ExploringBatchAI notebook. To start it run the following command:

anaconda-project run notebook --no-browser

We are assuming you are executing on a VM and that is the reason for the --no-browser switch.
This will start a Jupyter notebook server that should be reachable the same way you would reach your standard Jupyter notebook server. The above commands assumes that the appropriate ports are open on the VM and that the server has been set up to accept connections from any ip. Other switches and arguments will also work so if you want to specify ip or port then you can simply add them on to the end.

You can also interact with Batch AI by running:

anaconda-project run ipython -r setup_bait.py

Local Development

When executing jobs on services such as Batch AI it is important to iron out as many of the bugs before executing on the cluster. That is why with this project there is a folder called local_test that includes a Makefile that allows you to run notebook servers inside the containers as well as test the execution of the containers.

To run any of the notebooks:

anaconda-project --verbose run bash
cd local_test
make cntk-nb-server

The above command sets up the environment variables necessary and then executes the notebook server inside the CNTK docker container. The above commands assumes that the appropriate ports are open on the VM and that the server has been set up to accept connections from any ip.

Docker

The dockerfiles used to create the docker images can be found here. There is also a makefile in the folder that has all the commands for creating the docker images.

Clean/Delete project

To clean the project and start from scratch or simply to remove the environment and data files simply run

make clean

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.