More additions to setup. Some reorganizataion

This commit is contained in:
Joel Hulen 2018-06-06 18:14:39 -04:00
Родитель b13012d2be
Коммит da0852f897
16 изменённых файлов: 193 добавлений и 98 удалений

77
HOL/README.md Normal file
Просмотреть файл

@ -0,0 +1,77 @@
# Big data and visualization hands-on lab
AdventureWorks Travel (AWT) provides concierge services for business travelers. In an increasingly crowded market, they are always looking for ways to differentiate themselves, and provide added value to their corporate customers.
They are looking to pilot a web app that their internal customer service agents can use to provide additional information useful to the traveler during the flight booking process. They want to enable their agents to enter in the flight information and produce a prediction as to whether the departing flight will encounter a 15-minute or longer delay, considering the weather forecasted for the departure hour.
## Contents
* [Abstract](#abstract)
* [Solution architecture](#solution-architecture)
* [Requirements](#requirements)
* [Before the Hands-on Lab](#before-the-hands-on-lab)
* [Hands-on Lab](#hands-on-lab)
* [After the Hands-on Lab](#after-the-hands-on-lab)
## Abstract
In this workshop, you will deploy a web app using Machine Learning (ML) to predict travel delays given flight delay data and weather conditions. Plan a bulk data import operation, followed by preparation, such as cleaning and manipulating the data for testing, and training your Machine Learning model.
By attending this workshop, you will be better able to build a complete Azure Machine Learning (ML) model for predicting if an upcoming flight will experience delays. In addition, you will learn to:
* Integrate the Azure ML web service in a Web App for both one at a time and batch predictions
* Use Azure Data Factory (ADF) for data movement and operationalizing ML scoring
* Summarize data with HDInsight and Spark SQL
* Visualize batch predictions on a map using Power BI
This hands-on lab is designed to provide exposure to many of Microsoft's transformative line of business applications built using Microsoft big data and advanced analytics. The goal is to show an end-to-end solution, leveraging many of these technologies, but not necessarily doing work in every component possible. The lab architecture is below and includes:
* Azure Machine Learning (Azure ML)
* Azure Data Factory (ADF)
* Azure Storage
* HDInsight Spark
* Power BI Desktop
* Azure App Service
## Solution architecture
Below is a diagram of the solution architecture you will build in this lab. Please study this carefully so you understand the whole of the solution as you are working on the various components.
![The Solution Architecture diagram begins with Lab VM, then flows to Data Factory File Copy Pipeline, which flows to Storage for copied, raw file. This flows to Data Factory Batch Scoring pipeline, which includes Deployed ML Predictive Model (Batch). The pipeline flows to Storage for scored data, which flows to Spark for data processing. Power BI Report reads data from Spark, then sends the data on to Flight Booking Web App. Deployed ML Predictive Model (Request/Response) real-time scoring also sends data to the Flight Booking Web App, which then flows to the End User.](media/image2.png 'Solution Architecture diagram')
The solution begins with loading their historical data into blob storage using Azure Data Factory (ADF). By setting up a pipeline containing a copy activity configured to copy time partitioned source data, they could pull all their historical information, as well as ingest any future data, into Azure blob storage through a scheduled, and continuously running pipeline. Because their historical data is stored on-premises, AWT would need to install and configure an Azure Data Factory Integration Runtime (formerly known as a Data Management Gateway). Azure Machine Learning (Azure ML) would be used to develop a two-class classification machine learning model, which would then be operationalized as a Predictive Web Service using ML Studio. After operationalizing the ML model, a second ADF pipeline, using a Linked Service pointing to Azure ML's Batch Execution API and an AzureMLBatchExecution activity, would be used to apply the operational model to data as it is moved to the proper location in Azure storage. The scored data in Azure storage can be explored and prepared using Spark SQL on HDInsight, and the results visualized using a map visualization in Power BI.
## Requirements
* Microsoft Azure subscription must be pay-as-you-go or MSDN
* Trial subscriptions will not work
## Before the Hands-on Lab
Before attending the hands-on lab workshop, you should set up your environment for use in the rest of the hands-on lab.
You should follow all the steps provided in the [Before the Hands-on Lab](./Setup.md) section to prepare your environment before attending the hands-on lab. Failure to complete the Before the Hands-on Lab setup may result in an inability to complete the lab with in the time allowed.
## Hands-on Lab
Select the guide you are using to complete the Hands-on lab below.
* [Step-by-step guide](./Step-by-step.md)
* Provides detailed, step-by-step instructions for completing the lab.
* [Unguided](./Unguided.md)
* This guide provides minimal instruction, and assumes a high-level of knowledge about the technologies used in this lab. This should typically only be used if you are doing this as part of a group or hackathon.
* [Hackaton](./Hack.md)
## After the Hands-on Lab
After completing the hands-on lab, you should delete any Azure resources that were created in support of the lab.
You should follow all the steps in the [After the Hands-on Lab](./clean-up.md) section after completing the Hands-on lab.

Просмотреть файл

@ -22,7 +22,7 @@ In this exercise, you will set up your environment for use in the rest of the ha
![Select the create button at the bottom of the blade that follows.](media/create-resource-manager.png)
Set the following configuration on the Basics tab.
Set the following configuration on the Basics tab:
* Name: Enter **LabDSVM**
@ -91,7 +91,7 @@ Azure Databricks is an Apache Spark-based analytics platform optimized for Azure
2. Select Create on the bottom of the blade that follows.
3. Set the following configuration on the Azure Databricks Service creation form.
3. Set the following configuration on the Azure Databricks Service creation form:
* Name: Enter a unique name as indicated by a green checkmark.
@ -117,7 +117,7 @@ Create a new Azure Storage account that will be used to store historic and score
2. Select Create on the bottom of the blade that follows.
3. Set the following configuration on the Azure Databricks Service creation form.
3. Set the following configuration on the Azure Databricks Service creation form:
* Name: Enter a unique name as indicated by a green checkmark.
@ -145,7 +145,25 @@ Create a new Azure Storage account that will be used to store historic and score
4. Select **Create** to finish and submit.
### Task 5: Provision Azure Data Factory
### Task 5: Retrieve Azure Storage account information and create container
You will need to have the Azure Storage account name and access key when you create your Azure Databricks cluster during the lab. You will also need to create storage containers in which you will store your flight and weather data files.
1. From the left side menu in the Azure portal, click on **Resource groups**, then enter your resource group name into the filter box, and select it from the list.
2. Next, select your lab Azure Storage account from the list.
![Select the lab Azure Storage account from within your lab resource group](media/select-azure-storage-account.png)
3. Select **Access keys** (1) from the left-hand menu. Copy the **storage account name** (2) and the **key1** key (3) and copy the values to a text editor such as Notepad for later.
![Select Access keys from left-hand menu - copy storage account name - copy key](media/azure-storage-access-keys.png)
4. Select **Containers** (1) from the left-hand menu. Select **+ Container** (2) on the Containers blade, enter **sparkcontainer** for the name (3), leaving the public access level set to Private. Select **OK** (4) to create the container.
![Screenshot showing the steps to create a new storage container](media/azure-storage-create-container.png)
### Task 6: Provision Azure Data Factory
Create a new Azure Data Factory instance that will be used to orchestrate data transfers for analysis.
@ -155,7 +173,7 @@ Create a new Azure Data Factory instance that will be used to orchestrate data t
2. Select Create on the bottom of the blade that follows.
3. Set the following configuration on the Data Factory creation form.
3. Set the following configuration on the Data Factory creation form:
* Name: Enter a unique name as indicated by a green checkmark.
@ -173,7 +191,7 @@ Create a new Azure Data Factory instance that will be used to orchestrate data t
4. Select **Create** to finish and submit.
### Task 6: Initialize Azure Machine Learning Workbench on the Lab DSVM
### Task 7: Initialize Azure Machine Learning Workbench on the Lab DSVM
Before using the Azure Machine Learning Workbench on the Data Science VM, you will need to take the one-time action of double-clicking on the AzureML Workbench Setup icon on the desktop to install your instance of the workbench.
@ -215,7 +233,7 @@ Before using the Azure Machine Learning Workbench on the Data Science VM, you wi
![Let the Azure ML Workbench installation run to completion, then use the X to close the install](media/azure-ml-workbench-install-successful.png)
### Task 7: Provision Azure Machine Learning Experimentation service
### Task 8: Provision Azure Machine Learning Experimentation service
In this exercise, you will setup your Azure Machine Learning Experimentation and Model Management Accounts and get your project environment setup.
@ -261,4 +279,54 @@ In this exercise, you will setup your Azure Machine Learning Experimentation and
![You should see both the Machine Learning Experimentation and Model Management servicces in your resource group](media/machine-learning-experimentation-and-model-management.png)
### Task 9: Create an Azure Databricks cluster
You have provisioned an Azure Databricks workspace, and now you need to create a new cluster within the workspace. Part of the cluster configuration includes setting up an account access key to your Azure Storage account, using the Spark Config within the new cluster form. This will allow your cluster to access the lab files.
1. From the left side menu in the Azure portal, click on **Resource groups**, then enter your resource group name into the filter box, and select it from the list.
2. Next, select your Azure Databricks service from the list.
![Select the Azure Databricks service from within your lab resource group](media/select-azure-databricks-service.png)
3. In the Overview pane of the Azure Databricks service, select **Launch Workspace**.
![Select Launch Workspace within the Azure Databricks service overview pane](media/azure-databricks-launch-workspace.png)
Azure Databricks will automatically log you in using Azure Active Directory Single Sign On.
![Azure Databricks Azure Active Directory Single Sign On](media/azure-databricks-aad.png)
4. Select **Clusters** (1) from the left-hand menu, then select **Create Cluster** (2).
![Select Clusters from left-hand menu then select Create Cluster](media/azure-databricks-create-cluster-button.png)
5. On the Create New Cluster form, provide the following:
* Cluster Type: Standard.
* Cluster Name: lab.
* Databricks Runtime Version: 4.1 (includes Apache Spark 2.3.0, Scala 2.11).
* Python Version: 2.
* Driver Type: Same as worker.
* Worker Type: Standard_F4s.
* Min Workers: 2.
* Max Workers: 8.
* Enable Autoscaling: Leave checked.
* Auto Termination: Check the box and enter 120.
* Spark Config: Edit the Spark Config by entering the connection information for your Azure Storage account that you copied earlier in Task 5. This will allow your cluster to access the lab files. Enter the following: `spark.hadoop.fs.azure.account.key.<STORAGE_ACCOUNT_NAME>.blob.core.windows.net <ACCESS_KEY>`, where <STORAGE_ACCOUNT_NAME> is your Azure Storage account name, and <ACCESS_KEY> is your storage access key. **Example:** `spark.hadoop.fs.azure.account.key.bigdatalabstore.blob.core.windows.net HD+91Y77b+TezEu1lh9QXXU2Va6Cjg9bu0RRpb/KtBj8lWQa6jwyA0OGTDmSNVFr8iSlkytIFONEHLdl67Fgxg==`
![Complete the form using the options as outlined above](media/azure-databricks-create-cluster-form.png)
6. Select **Create Cluster**.
You should follow all these steps provided _before_ attending the Hands-on lab.

Просмотреть файл

@ -1,27 +1,20 @@
![](images/HeaderPic.png "Microsoft Cloud Workshops")
![Microsoft Cloud Workshop](../media/ms-cloud-workshop.png "Microsoft Cloud Workshop")
<div class="MCWHeader1">
Big data and visualization
</div>
[Legal notice](../legal.md)
<div class="MCWHeader2">
Hands-on lab step-by-step
</div>
Updated May 2018
<div class="MCWHeader3">
April 2018
</div>
# Big data and visualization hands-on lab step-by-step
Information in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
AdventureWorks Travel (AWT) provides concierge services for business travelers. In an increasingly crowded market, they are always looking for ways to differentiate themselves, and provide added value to their corporate customers.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
They are looking to pilot a web app that their internal customer service agents can use to provide additional information useful to the traveler during the flight booking process. They want to enable their agents to enter in the flight information and produce a prediction as to whether the departing flight will encounter a 15-minute or longer delay, considering the weather forecasted for the departure hour.
The names of manufacturers, products, or URLs are provided for informational purposes only and Microsoft makes no representations and warranties, either expressed, implied, or statutory, regarding these manufacturers or the use of the products with any Microsoft technologies. The inclusion of a manufacturer or product does not imply endorsement of Microsoft of the manufacturer or product. Links may be provided to third party sites. Such sites are not under the control of Microsoft and Microsoft is not responsible for the contents of any linked site or any link contained in a linked site, or any changes or updates to such sites. Microsoft is not responsible for webcasting or any other form of transmission received from any linked site. Microsoft is providing these links to you only as a convenience, and the inclusion of any link does not imply endorsement of Microsoft of the site or the products contained therein.
© 2018 Microsoft Corporation. All rights reserved.
In this workshop, you will deploy a web app using Machine Learning (ML) to predict travel delays given flight delay data and weather conditions. Plan a bulk data import operation, followed by preparation, such as cleaning and manipulating the data for testing, and training your Machine Learning model.
Microsoft and the trademarks listed at https://www.microsoft.com/en-us/legal/intellectualproperty/Trademarks/Usage/General.aspx are trademarks of the Microsoft group of companies. All other trademarks are property of their respective owners.
If you have not yet completed the steps to set up your environment in [Before the hands-on lab](./Setup.md), you will need to do that before proceeding.
**Contents**
## Contents
<!-- TOC -->
@ -66,60 +59,6 @@ Microsoft and the trademarks listed at https://www.microsoft.com/en-us/legal/int
<!-- /TOC -->
# Big data and visualization hands-on lab step-by-step
## Abstract and learning objectives
In this workshop, you will deploy a web app using Machine Learning (ML) to predict travel delays given flight delay data and weather conditions. Plan a bulk data import operation, followed by preparation, such as cleaning and manipulating the data for testing, and training your Machine Learning model.
By attending this workshop, you will be better able to build a complete Azure Machine Learning (ML) model for predicting if an upcoming flight will experience delays. In addition, you will learn to:
- Integrate the Azure ML web service in a Web App for both one at a time and batch predictions
- Use Azure Data Factory (ADF) for data movement and operationalizing ML scoring
- Summarize data with HDInsight and Spark SQL
- Visualize batch predictions on a map using Power BI
This hands-on lab is designed to provide exposure to many of Microsoft's transformative line of business applications built using Microsoft big data and advanced analytics. The goal is to show an end-to-end solution, leveraging many of these technologies, but not necessarily doing work in every component possible. The lab architecture is below and includes:
- Azure Machine Learning (Azure ML)
- Azure Data Factory (ADF)
- Azure Storage
- HDInsight Spark
- Power BI Desktop
- Azure App Service
## Overview
AdventureWorks Travel (AWT) provides concierge services for business travelers. In an increasingly crowded market, they are always looking for ways to differentiate themselves, and provide added value to their corporate customers.
They are looking to pilot a web app that their internal customer service agents can use to provide additional information useful to the traveler during the flight booking process. They want to enable their agents to enter in the flight information and produce a prediction as to whether the departing flight will encounter a 15-minute or longer delay, considering the weather forecasted for the departure hour.
In this hands-on lab, attendees will build an end-to-end solution to predict flight delays, accounting for the weather forecast.
## Solution architecture
Below is a diagram of the solution architecture you will build in this lab. Please study this carefully so you understand the whole of the solution as you are working on the various components.
![The Solution Architecture diagram begins with Lab VM, then flows to Data Factory File Copy Pipeline, which flows to Storage for copied, raw file. This flows to Data Factory Batch Scoring pipeline, which includes Deployed ML Predictive Model (Batch). The pipeline flows to Storage for scored data, which flows to Spark for data processing. Power BI Report reads data from Spark, then sends the data on to Flight Booking Web App. Deployed ML Predictive Model (Request/Response) real-time scoring also sends data to the Flight Booking Web App, which then flows to the End User.](media/image2.png "Solution Architecture diagram")
The solution begins with loading their historical data into blob storage using Azure Data Factory (ADF). By setting up a pipeline containing a copy activity configured to copy time partitioned source data, they could pull all their historical information, as well as ingest any future data, into Azure blob storage through a scheduled, and continuously running pipeline. Because their historical data is stored on-premises, AWT would need to install and configure an Azure Data Factory Integration Runtime (formerly known as a Data Management Gateway). Azure Machine Learning (Azure ML) would be used to develop a two-class classification machine learning model, which would then be operationalized as a Predictive Web Service using ML Studio. After operationalizing the ML model, a second ADF pipeline, using a Linked Service pointing to Azure ML's Batch Execution API and an AzureMLBatchExecution activity, would be used to apply the operational model to data as it is moved to the proper location in Azure storage. The scored data in Azure storage can be explored and prepared using Spark SQL on HDInsight, and the results visualized using a map visualization in Power BI.
## Requirements
1. Microsoft Azure subscription must be pay-as-you-go or MSDN
a. Trial subscriptions will not work
## Exercise 1: Build a Machine Learning Model
Duration: 60 minutes

Двоичные данные
HOL/media/azure-databricks-aad.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 45 KiB

Двоичные данные
HOL/media/azure-databricks-create-cluster-button.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 40 KiB

Двоичные данные
HOL/media/azure-databricks-create-cluster-form.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 157 KiB

Двоичные данные
HOL/media/azure-databricks-launch-workspace.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 16 KiB

Двоичные данные
HOL/media/azure-storage-access-keys.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 62 KiB

Двоичные данные
HOL/media/azure-storage-account-overview.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 55 KiB

Двоичные данные
HOL/media/azure-storage-create-container.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 48 KiB

Двоичные данные
HOL/media/select-azure-databricks-service.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 38 KiB

Двоичные данные
HOL/media/select-azure-storage-account.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 40 KiB

Просмотреть файл

@ -6,45 +6,47 @@ In this workshop, you will deploy a web app using Machine Learning (ML) to predi
By attending this workshop, you will be better able to build a complete Azure Machine Learning (ML) model for predicting if an upcoming flight will experience delays. In addition, you will learn to:
- Integrate the Azure ML web service in a Web App for both one at a time and batch predictions
* Integrate the Azure ML web service in a Web App for both one at a time and batch predictions
- Use Azure Data Factory (ADF) for data movement and operationalizing ML scoring
* Use Azure Data Factory (ADF) for data movement and operationalizing ML scoring
- Summarize data with HDInsight and Spark SQL
* Summarize data with HDInsight and Spark SQL
- Visualize batch predictions on a map using Power BI
* Visualize batch predictions on a map using Power BI
## Whiteboard Design Session
TBD
## Hand-on Lab
## Hands-on Lab
This hands-on lab is designed to provide exposure to many of Microsoft's transformative line of business applications built using Microsoft big data and advanced analytics. The goal is to show an end-to-end solution, leveraging many of these technologies, but not necessarily doing work in every component possible. The lab architecture is below and includes:
- Azure Machine Learning (Azure ML)
* Azure Machine Learning (Azure ML)
- Azure Data Factory (ADF)
* Azure Data Factory (ADF)
- Azure Storage
* Azure Storage
- HDInsight Spark
* HDInsight Spark
- Power BI Desktop
- Azure App Service
* Power BI Desktop
* Azure App Service
## Azure services and related products
- Azure SQL Data Warehouse
- Azure ML
- Azure Storage
- Azure Active Directory
- Power BI
- HDInsight Spark
- Web Apps
* Azure SQL Data Warehouse
* Azure ML
* Azure Storage
* Azure Active Directory
* Power BI
* HDInsight Spark
* Web Apps
# Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.microsoft.com.
@ -55,4 +57,3 @@ provided by the bot. You will only need to do this once across all repos using o
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

10
legal.md Normal file
Просмотреть файл

@ -0,0 +1,10 @@
# Legal notice
Information in this document, including URL and other Internet Web site references, is subject to change without notice. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places, and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, e-mail address, logo, person, place or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property.
The names of manufacturers, products, or URLs are provided for informational purposes only and Microsoft makes no representations and warranties, either expressed, implied, or statutory, regarding these manufacturers or the use of the products with any Microsoft technologies. The inclusion of a manufacturer or product does not imply endorsement of Microsoft of the manufacturer or product. Links may be provided to third party sites. Such sites are not under the control of Microsoft and Microsoft is not responsible for the contents of any linked site or any link contained in a linked site, or any changes or updates to such sites. Microsoft is not responsible for webcasting or any other form of transmission received from any linked site. Microsoft is providing these links to you only as a convenience, and the inclusion of any link does not imply endorsement of Microsoft of the site or the products contained therein.
© 2018 Microsoft Corporation. All rights reserved.
Microsoft and the trademarks listed at https://www.microsoft.com/en-us/legal/intellectualproperty/Trademarks/Usage/General.aspx are trademarks of the Microsoft group of companies. All other trademarks are property of their respective owners.

Двоичные данные
media/ms-cloud-workshop.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 91 KiB