deep_bait/{{cookiecutter.repo_name}}/ExploringBatchAI.ipynb

243 строки
5.6 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Batch AI\n",
"In this notebook we will go through the steps of setting up the cluster executing the notebooks and pulling the executed notebooks locally. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We have defined a setup script called setup.py. Here we are simply executing it which will also bring all the varialbes and methods into the notebook namespace. You can also use the setup script inside an ipython environment simply execute anaconda-project run ipython-bait"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from dotenv import load_dotenv\n",
"load_dotenv(override=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"WORKSPACE='workspace'\n",
"EXPERIMENT='multiframework'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%run setup_bait.py"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below we setup the cluster and wait for the VMs to be allocated"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"create_workspace(WORKSPACE)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"setup_cluster(WORKSPACE)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"wait_for_cluster(WORKSPACE)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Below we print the status of the cluster. We can see many details of the cluster we created including its name and the docker images for the various DL frameworks."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print_cluster_list(WORKSPACE)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can submit all the of the jobs with the submit_all function. We also have a submit function for each of the DL frameworks if you wish to execute one seperately."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"create_experiment(WORKSPACE, EXPERIMENT)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"submit_all(WORKSPACE, EXPERIMENT, epochs=10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can periodically execute the command below to observe the status of the jobs. Under the current subscription we only have 2 nodes so 2 nodes will be executing in parallel. If the exit-code is anything other than 0 then there has been a problem with the job."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print_jobs_summary(WORKSPACE, EXPERIMENT)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can use the wait_for_job function to wait for the completion of the job. Once it is completed then the stdout is printed out. Let's take a look at the tf job. We can tell the name of the job from the output of the print_jobs_summary as well as the log messages when we submitted the job."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"wait_for_job(WORKSPACE, EXPERIMENT,'run_tf')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now lets download one of the notebooks we ran."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"download_files(WORKSPACE, EXPERIMENT, 'run_tf', 'notebooks')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Open the notebook and you can compare the output we printed out from the stdout of the job when we executed the command wait_for_job. We can see that the outputs in the cells are identical. You can download the other notebooks as well by simply supplying the name of the job."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once all the jobs are complete we can delete them and delete the cluster."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"delete_all_jobs(WORKSPACE, EXPERIMENT)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"delete_cluster(WORKSPACE)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print_status(WORKSPACE)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These simple methods make it very convenient but may not be suitable for each use case. For more details check out the Batch AI documentation as well as the setup script."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.6"
}
},
"nbformat": 4,
"nbformat_minor": 2
}