Azure AI Camp - 2 day workshop on Databricks and Azure ML
Перейти к файлу
Micheleen Harris a0d9399448
Merge pull request #28 from Azure/users/GitHubPolicyService/7e786d11-5f26-4d44-888c-e0c7d604f87b
Adding Microsoft SECURITY.MD
2023-02-03 09:14:29 -08:00
assets Added IoT VSCode instructions in notebook 2020-06-21 16:43:49 -07:00
day1 Added Databricks overview deck 2020-06-23 01:33:39 -07:00
day2 Updated Azure ML and Spark ML databricks notebooks 2020-06-24 14:32:33 -07:00
instructor Update to DSVM and on-demand instruction 2020-06-30 16:02:04 -07:00
.gitignore Initial commit 2020-06-16 16:25:44 -07:00
CONTRIBUTING.md Initial commit 2020-06-16 16:25:44 -07:00
LICENSE Initial commit 2020-06-16 16:25:44 -07:00
README.md Important notes added on DSVM and DB cluster types 2020-06-30 17:15:58 -07:00
SECURITY.md Microsoft mandatory file 2023-01-24 16:00:10 +00:00

README.md

Through the Azure AI Camp, the ML practitioner will learn how to use Azure ML, Databricks, ML on the Edge and other Microsoft AI technologies to unlock insights on big datasets and deploy AI services to the cloud and edge. It is designed as a hands-on workshop experience, recommended in instructor-led format or on-demand learning by using the documentation and resources provided for guidance.

Prerequisites

Required

  1. Python proficiency - Resources
  2. Azure Subscription
  3. Git proficiency and installed locally - Git Handbook
  4. VSCode (for IoT section) - Download Visual Studio Code

Recommended

  1. Machine learning and computer vision basics - Course material on image classification
  2. Python 3.6+ installed locally - Installation of Anaconda

Resources provisioned

In this workshop, the following resources will get provisioned. In practice, most are shared amongst an organization or group. For this workshop it will depend upon the Azure Subscription setup.

  1. Azure Storage Account - Docs
  2. Azure ML Workspace - Docs
  3. Azure Databricks Workspace (Docs) including:
    • ML runtime cluster
    • Non-ML runtime cluster
  4. Ubuntu Data Science Virtual Machine - Docs

Agenda

Day 1


  1. AI at Microsoft Overview
    • Azure ML overview
    • Cognitive Services overview
    • Data in Azure and Databricks overview
    • AI and ML on Azure overview
  2. Azure ML deep dive and hands-on labs with the DSVM
    • Image classification with PyTorch estimator hands-on lab
    • Object detection with YOLO walkthrough
    • Azure ML with IoT with hands-on lab

Day 2


  1. Databricks deep dive with ETL hands-on lab
  2. Auto ML with Databricks walkthrough
  3. Parallel and distributed training with Horovod walkthrough
  4. Live Video Analytics discussion

Technologies

  1. Azure Databricks
  2. Azure ML
  3. Azure Storage
  4. IoT Edge
  5. Data Science Virtual Machine

Setup on day-of

  1. Take care of the prerequisites above and provision the necessary resources, as well. If provisioning a DSVM for on-demand learning use the instructions in this repository found in the instructor DSVM setup folder.

  2. Git clone repo

    git clone https://github.com/Azure/Azure-AI-Camp.git

  3. Create or download Azure ML Workspace configuration file (config.json) locally - Doc

On-demand learning

Setup

Follow the process above in the Setup on day-of section.

IMPORTANT NOTES:

  • The JupyterHub system on the DSVM does not work well with Safari, however should be fine in Chrome, Firefox and Edge. A security/certificate warning may appear. Click on advanced link and continue to site as we know it is a trusted Azure site. In Chrome, a trick on the warning page (if clicking through is not an option) is on this SO post stating "Note as now mentioned in the question itself, the passphrase has changed again. As of Chrome 65 it is thisisunsafe". Typing that phrase with the window open will allow passage through the warning. This is a known issue with JupyterHub, Chrome and the DSVM.
  • When setting up Databricks clusters care must be taken to not use the ML-type clusters with Azure ML SDK Python package as they are not compatible, however for the ETL notebooks and some of the others, where Azure ML is not used, an ML-type Spark cluster will be fine.

Instructions

Browse through day 1 and day 2 folders, noting that there are individual Readme.md documents in each section. The day 1 platform is an Azure Data Science Machine and for day 2, the work will be done on a Databricks Workspace. Various datasets in computer vision and related fields are used in conjuction with tools like Jupyter notebooks, Databricks notebooks, the Azure ML Python SDK and more.

For day 1, most of the hands-on work will be in Jupyter notebooks run locally or on an Azure Data Science Virtual Machine. Whether local or on the VM, the learner will need to set this up for themselves. More information on provisioning the Ubuntu Data Science Virtual Machine can be found here and using Jupyterhub in the section on that tool here.

For day 2, the hands-on work will be in the form of Databricks notebooks which are very similar to Jupyter notebooks, utilizing a cluster on an Azure Databricks Workspace. The notebooks for day 2 are all stored as archives with the .dbc extension. It is straightforward to import these notebooks into the Databricks workspace - instructions can also be found here (import under workspace or individual user).

Additional notes

In using and contributing to this repo, please adhere to Microsoft Open Source Code of Conduct.

Contributing

For contributing, guidelines may be found in CONTRIBUTING.md.