Merge branch 'master' into chrisworking

This commit is contained in:
Buck Woody 2020-01-31 17:34:01 -05:00 коммит произвёл GitHub
Родитель bfafe3c588 9a392bcace
Коммит fc03bf17f4
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
54 изменённых файлов: 237 добавлений и 103 удалений

Двоичные данные
.DS_Store поставляемый

Двоичный файл не отображается.

Просмотреть файл

@ -246,7 +246,7 @@ Open the **03_WorkingWithData.py** file and enter the code you find for section
Python has many ways to read data in (*sometimes into memory, sometimes streaming as it reads it*) built right in to the standard libraries. Other Libraries, such as Pandas and NumPy, have their own way of reading in data.
In any case, the data is assigned to a data family or *structure*, which you learned about earlier. Depending on which Library you are using, you'll pick a data structure that makes the most sense for how you want to work with it. For instance, Pandas uses a dataframe as the primary data structure it works with. This is why it's important to know the data types, so that you understand what stucture you need to perform your desired operations.
In any case, the data is assigned to a data family or *structure*, which you learned about earlier. Depending on which Library you are using, you'll pick a data structure that makes the most sense for how you want to work with it. For instance, Pandas uses a dataframe as the primary data structure it works with. This is why it's important to know the data types, so that you understand what structure you need to perform your desired operations.
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="./graphics/checkbox.png"><b>Reading from Files</b></p>
@ -465,7 +465,7 @@ Read the [Documentation Reference here](https://docs.microsoft.com/en-us/azure/m
Read the [Documentation Reference here](https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/lifecycle-data)
The Data Aquisition and Understanding phase of the TDSP you ingest or access data from various locations to answer the questions the organization has asked. In most cases, this data will be in multiple locations. Once the data is ingested into the system, youll need to examine it to see what it holds. All data needs cleaning, so after the inspection phase, youll replace missing values, add and change columns. Youve already seen the Libraries you'll need to work with for Data Wrangling - Pandas being the most common in use.
The Data Acquisition and Understanding phase of the TDSP you ingest or access data from various locations to answer the questions the organization has asked. In most cases, this data will be in multiple locations. Once the data is ingested into the system, youll need to examine it to see what it holds. All data needs cleaning, so after the inspection phase, youll replace missing values, add and change columns. Youve already seen the Libraries you'll need to work with for Data Wrangling - Pandas being the most common in use.
<p style="border-bottom: 1px solid lightgrey;"></p>
<h4>Phase Three - Modeling</h4>

Просмотреть файл

@ -61,7 +61,7 @@ The entire repository can be [downloaded as a single ZIP file here](https://gith
### Clone all Workshops using git
You can [clone the entire respository using `git` here](https://github.com/Microsoft/sqlworkshops.git).
You can [clone the entire repository using `git` here](https://github.com/Microsoft/sqlworkshops.git).
### Get only one Workshop
You can follow the steps below to clone individual files from a git repo using a git client.

Двоичные данные
SQLGroundToCloud/.DS_Store поставляемый Normal file

Двоичный файл не отображается.

Двоичные данные
k8stobdc/.DS_Store поставляемый

Двоичный файл не отображается.

Двоичные данные
k8stobdc/KubernetesToBDC/.DS_Store поставляемый

Двоичный файл не отображается.

Просмотреть файл

@ -1,4 +1,4 @@
![](../graphics/microsoftlogo.png)
![](https://github.com/microsoft/sqlworkshops/blob/master/graphics/microsoftlogo.png?raw=true)
# Workshop: <TODO: Enter workshop name>
@ -6,7 +6,7 @@
<p style="border-bottom: 1px solid lightgrey;"></p>
<img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/textbubble.png"> <h2>00 prerequisites</h2>
<img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/textbubble.png?raw=true"> <h2>00 prerequisites</h2>
This workshop is taught using the following components, which you will install and configure in the sections that follow.
@ -26,37 +26,37 @@ The other requirements are:
*Note that all following activities must be completed prior to class - there will not be time to perform these operations during the workshop.*
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity 1: Set up a Microsoft Azure Account</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity 1: Set up a Microsoft Azure Account</b></p>
You have multiple options for setting up Microsoft Azure account to complete this workshop. You can use a Microsoft Developer Network (MSDN) account, a personal or corporate account, or in some cases a pass may be provided by the instructor. (Note: for most classes, the MSDN account is best)
**If you are attending this course in-person:**
Unless you are explicitly told you will be provided an account by the instructor in the invitation to this workshop, you must have your Microsoft Azure account and Data Science Virtual Machine set up before you arrive at class. There will NOT be time to configure these resources during the course.
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><b>Option 1 - Microsoft Developer Network Account (MSDN) Account</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><b>Option 1 - Microsoft Developer Network Account (MSDN) Account</b></p>
The best way to take this workshop is to use your [Microsoft Developer Network (MSDN) benefits if you have a subscription](https://marketplace.visualstudio.com/subscriptions).
- [Open this resource and click the "Activate your monthly Azure credit" button](https://azure.microsoft.com/en-us/pricing/member-offers/credit-for-visual-studio-subscribers/)
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><b>Option 2 - Use Your Own Account</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><b>Option 2 - Use Your Own Account</b></p>
You can also use your own account or one provided to you by your organization, but you must be able to create a resource group and create, start, and manage a Virtual Machine and an Azure AKS cluster.
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><b>Option 3 - Use an account provided by your instructor</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><b>Option 3 - Use an account provided by your instructor</b></p>
Your workshop invitation may have instructed you that they will provide a Microsoft Azure account for you to use. If so, you will receive instructions that it will be provided.
**Unless you received explicit instructions in your workshop invitations, you much create either an MSDN or Personal account. You must have an account prior to the workshop.**
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity 2: Prepare Your Workstation</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity 2: Prepare Your Workstation</b></p>
<br>
The instructions that follow are the same for either a "base metal" workstation or laptop, or a Virtual Machine. It's best to have at least 4MB of RAM on the management system, and these instructions assume that you are not planning to run the database server or any Containers on the workstation. It's also assumed that you are using a current version of Windows, either desktop or server.
<br>
*(You can copy and paste all of the commands that follow in a PowerShell window that you run as the system Administrator)*
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png">Updates<p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true">Updates<p>
First, ensure all of your updates are current. You can use the following commands to do that in an Administrator-level PowerShell session:
@ -73,12 +73,12 @@ Install-WindowsUpdate
*Note: If you get an error during this update process, evaluate it to see if it is fatal. You may recieve certain driver errors if you are using a Virtual Machine, this can be safely ignored.*
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png">Install Big Data Cluster Tools</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true">Install Big Data Cluster Tools</p>
Next, install the tools to work with Big Data Clusters:
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity 3: Install BDC Tools</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity 3: Install BDC Tools</b></p>
Open this resource, and follow all instructions for the Microsoft Windows operating system
@ -87,7 +87,7 @@ Open this resource, and follow all instructions for the Microsoft Windows operat
- [https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-big-data-tools?view=sql-server-ver15](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-big-data-tools?view=sql-server-ver15)
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity 4: Re-Update Your Workstation</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity 4: Re-Update Your Workstation</b></p>
Once again, download the MSI and run it from there. It's always a good idea after this many installations to run Windows Update again:
@ -101,11 +101,11 @@ Install-WindowsUpdate
**Note 2: If you are using a Virtual Machine in Azure, power off the Virtual Machine using the Azure Portal every time you are done with it. Turning off the VM using just the Windows power off in the VM only stops it running, but you are still charged for the VM if you do not stop it from the Portal. Stop the VM from the Portal unless you are actively using it.**
<p><img style="margin: 0px 15px 15px 0px;" src="../graphics/owl.png"><b>For Further Study</b></p>
<p><img style="margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/owl.png?raw=true"><b>For Further Study</b></p>
<ul>
<li><a href="https://docs.microsoft.com/en-us/azure/aks/concepts-clusters-workloads" target="_blank">Official Documentation for this section</a></li>
</ul>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/geopin.png"><b >Next Steps</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/geopin.png?raw=true"><b >Next Steps</b></p>
Next, Continue to <a href="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/KubernetesToBDC/01-introduction.md" target="_blank"><i> Module 1 - Introduction</i></a>.

Просмотреть файл

@ -14,7 +14,7 @@ This module covers Container technologies and how they are different than Virtua
<p style="border-bottom: 1px solid lightgrey;"></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b><a name="aks">Activity: Install Class Environment on AKS (Optional)</a></b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b><a name="aks">Activity: Install Class Environment on AKS (Optional)</a></b></p>
*(If you are taking this course on-line and not with an instructor-provided Kubernetes environment, you can use a Microsoft Azure subscription to deploy a Kubernetes Environment, complete with the SQL Server big data clusters feature. Your instructor may also have you use this deployment mechanism if in-class hardware is not practical or available)*
@ -26,15 +26,15 @@ Using the following steps, you will create a Resource Group in Azure that will h
<p><b>Steps</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"> <a href="https://github.com/Microsoft/sqlworkshops/blob/master/sqlserver2019bigdataclusters/SQL2019BDC/00%20-%20Prerequisites.md" target="_blank"> Ensure that you have completed all prerequisites</a>.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"> <a href="https://github.com/Microsoft/sqlworkshops/blob/master/sqlserver2019bigdataclusters/SQL2019BDC/00%20-%20Prerequisites.md" target="_blank"> Ensure that you have completed all prerequisites</a>.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"> <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-big-data-tools?view=sqlallproducts-allversions" target="_blank"> Read the following article to install the big data cluster Tools, ensuring that you carefully follow each step</a>. Note that if you followed the pre-requisites properly, you will already have <i>Python</i>, <i>kubectl</i>, and <i>Azure Data Studio</i> installed, so those may be skipped. Follow all other instructions.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"> <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-big-data-tools?view=sqlallproducts-allversions" target="_blank"> Read the following article to install the big data cluster Tools, ensuring that you carefully follow each step</a>. Note that if you followed the pre-requisites properly, you will already have <i>Python</i>, <i>kubectl</i>, and <i>Azure Data Studio</i> installed, so those may be skipped. Follow all other instructions.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"> <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/quickstart-big-data-cluster-deploy?view=sqlallproducts-allversions" target="_blank"> Read the following article to deploy the bdc to AKS, ensuring that you carefully follow each step</a>. Stop at the section marked <b>Connect to the cluster</b>.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"> <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/quickstart-big-data-cluster-deploy?view=sqlallproducts-allversions" target="_blank"> Read the following article to deploy the bdc to AKS, ensuring that you carefully follow each step</a>. Stop at the section marked <b>Connect to the cluster</b>.</p>
<p style="border-bottom: 1px solid lightgrey;"></p>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/pencil2.png"><a name="1-3">1.1 Big Data Technologies: Operating Systems</a></h2>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/pencil2.png?raw=true"><a name="1-3">1.1 Big Data Technologies: Operating Systems</a></h2>
In this section you will learn more about the design the primary operating system (Linux) used with a Kubernetes Cluster.
@ -135,21 +135,21 @@ The essential commands you should know for this workshop are below. In Linux you
A <a href="https://opensourceforu.com/2016/07/introduction-linux-system-administration/" target="_blank">longer explanation of system administration for Linux is here</a>.
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity: Work with Linux Commands</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity: Work with Linux Commands</b></p>
<p><b>Steps</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://bellard.org/jslinux/vm.html?url=https://bellard.org/jslinux/buildroot-x86.cfg" target="_blank">Open this link to run a Linux Emulator in a browser</a></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png">Find the mounted file systems, and then show the free space in them.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png">Show the current directory.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png">Show the files in the current directory. </p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png">Create a new directory, navigate to it, and create a file called <i>test.txt</i> with the words <i>This is a test</i> in it. (hint: us the <b>nano</b> editor or the <b>echo</b> command)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png">Display the contents of that file.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png">Show the help for the <b>cat</b> command.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><a href="https://bellard.org/jslinux/vm.html?url=https://bellard.org/jslinux/buildroot-x86.cfg" target="_blank">Open this link to run a Linux Emulator in a browser</a></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true">Find the mounted file systems, and then show the free space in them.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true">Show the current directory.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true">Show the files in the current directory. </p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true">Create a new directory, navigate to it, and create a file called <i>test.txt</i> with the words <i>This is a test</i> in it. (hint: us the <b>nano</b> editor or the <b>echo</b> command)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true">Display the contents of that file.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true">Show the help for the <b>cat</b> command.</p>
<p style="border-bottom: 1px solid lightgrey;"></p>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/pencil2.png"><a name="1-4">1.2 Big Data Technologies: Containers and Controllers</a></h2>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/pencil2.png?raw=true"><a name="1-4">1.2 Big Data Technologies: Containers and Controllers</a></h2>
Bare-metal installations of an operating system such as Windows are deployed on hardware using a <i>Kernel</i>, and additional software to bring all of the hardware into a set of calls.
@ -158,7 +158,7 @@ Bare-metal installations of an operating system such as Windows are deployed on
One abstraction layer above installing software directly on hardware is using a <i>Hypervisor</i>. In essence, this layer uses the base operating system to emulate hardware. You install an operating system (called a *Guest* OS) on the Hypervisor (called the *Host*), and the Guest OS acts as if it is on bare-metal.
<br>
<img style="height: 300;" src="https://docs.docker.com/images/VM%402x.png">
<img style="height: 300;" src="https://docs.docker.com/images/VM%402x.png?raw=true">
<br>
In this abstraction level, you have full control (and responsibility) for the entire operating system, but not the hardware. This isolates all process space and provides an entire "Virtual Machine" to applications. For scale-out systems, a Virtual Machine allows for a distribution and control of complete computer environments using only software.
@ -174,7 +174,7 @@ A Container is provided by the Container Runtime (Such as [containerd](https://c
<i>(NOTE: The Container Image Kernel can run on Windows or Linux, but you will focus on the Linux Kernel Containers in this workshop.)</i>
<br>
<img style="height: 300;" src="https://docs.docker.com/images/Container%402x.png">
<img style="height: 300;" src="https://docs.docker.com/images/Container%402x.png?raw=true">
<br>
This abstraction holds everything for an application to isolate it from other running processes. It is also completely portable - you can create an image on one system, and another system can run it so long as the Container Runtimes (Such as Docker) Runtime is installed. Containers also start very quickly, are easy to create (called <i>Composing</i>) using a simple text file with instructions of what to install on the image. The instructions pull the base Kernel, and then any binaries you want to install. Several pre-built Containers are already available, SQL Server is one of these. <a href="https://docs.microsoft.com/en-us/sql/linux/quickstart-install-connect-docker?view=sql-server-2017" target="_blank">You can read more about installing SQL Server on Container Runtimes (Such as Docker) here</a>.
@ -198,34 +198,36 @@ For Big Data systems, having lots of Containers is very advantageous to segment
</table>
<br>
<p><img style="height: 400; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="../graphics/KubernetesCluster.png"></p>
<p><img style="height: 400; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/KubernetesCluster.png?raw=true"></p>
<br>
You can <a href="https://kubernetes.io/docs/tutorials/kubernetes-basics/" target="_blank">learn much more about Container Orchestration systems here</a>. We're using the Azure Kubernetes Service (AKS) in this workshop, and <a href="https://aksworkshop.io/" target="_blank">they have a great set of tutorials for you to learn more here</a>.
In SQL Server Big Data Clusters, the Container Orchestration system (Such as Kubernetes or OpenShift) is responsible for the state of the BDC; it is reponsible for building and configurint the Nodes, assigns Pods to Nodes,creates and manages the Persistent Voumes (durable storage), and manages the operation of the Cluster.
In SQL Server Big Data Clusters, the Container Orchestration system (Such as Kubernetes or OpenShift) is responsible for the state of the BDC; it is responsible for building and configuring the Nodes, assigns Pods to Nodes,creates and manages the Persistent Volumes (durable storage), and manages the operation of the Cluster.
> NOTE: The OpenShift Container Platform is a commercially supported Platform as a Service (PaaS) based on Kubernetes from RedHat. Many shops require a commercial vendor to implement and support Kubernetes.
(You'll cover the storage aspects of Container Orchestration in more detail in a moment.)
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity: Familiarize Yourself with Container Orchestration using minikube</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity: Familiarize Yourself with Container Orchestration using minikube</b></p>
To practice with Kubernetes, you will use an online emulator to work with the `minikube` platform.
<p><b>Steps</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://kubernetes.io/docs/tutorials/kubernetes-basics/create-cluster/cluster-interactive/
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><a href="https://kubernetes.io/docs/tutorials/kubernetes-basics/create-cluster/cluster-interactive/
" target="_blank">Open this resource, and complete the first module</a>. (You can return to it later to complete all exercises if you wish)</p>
<br>
<p style="border-bottom: 1px solid lightgrey;"></p>
<br>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/pencil2.png"><a name="1-5">1.3 Big Data Technologies: Distributed Data Storage</a></h2>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/pencil2.png?raw=true"><a name="1-5">1.3 Big Data Technologies: Distributed Data Storage</a></h2>
Traditional storage uses a call from the operating system to an underlying I/O system, as you learned earlier. These file systems are either directly connected to the operating system or appear to be connected directly using a Storage Area Network. The blocks of data are stored and managed by the operating system.
For large scale-out data systems, the mounting point for an I/O is another abstraction. For SQL Server BDC, the most commonly used scale-out file system is the <a href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html" target="_blank">Hadoop Data File System</a>, or <i>HDFS</i>. HDFS is a set of Java code that gathers disparate disk subsystems into a <i>Cluster</i> which is comprised of various <i>Nodes</i> - a <i>NameNode</i>, which manages the cluster's metadata, and <i>DataNodes</i> that physically store the data. Files and directories are represented on the NameNode by a structure called <i>inodes</i>. Inodes record attributes such as permissions, modification and access times, and namespace and diskspace quotas.
<p><img style="height: 300; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="../graphics/hdfs.png"></p>
<p><img style="height: 300; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/hdfs.png?raw=true"></p>
With an abstraction such as Containers, storage becomes an issue for two reasons: The storage can disappear when the Container is removed, and other Containers and technologies can't access storage easily within a Container.
@ -239,7 +241,7 @@ You <a href="https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Introduction
<p style="border-bottom: 1px solid lightgrey;"></p>
<br>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/pencil2.png"><a name="1-6">1.4 Big Data Technologies: Command and Control</a></h2>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/pencil2.png?raw=true"><a name="1-6">1.4 Big Data Technologies: Command and Control</a></h2>
There are three primary tools and utilities you will use to control the SQL Server big data cluster:
@ -267,28 +269,28 @@ You can <a href="https://docs.microsoft.com/en-us/sql/azure-data-studio/what-is?
" target="_blank">learn more about Azure Data Studio here</a>.
<br>
<p><img style="height: 300; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="../graphics/ads.png"></p>
<p><img style="height: 300; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/ads.png?raw=true"></p>
<br>
You'll explore further operations with the Azure Data Studio in the final module of this course.
<br>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity: Practice with Notebooks</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity: Practice with Notebooks</b></p>
<p><b>Steps</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://notebooks.azure.com/BuckWoodyNoteBooks/projects/AzureNotebooks" target="_blank">Open this reference, and review the instructions you see there</a>. You can clone this Notebook to work with it later.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><a href="https://notebooks.azure.com/BuckWoodyNoteBooks/projects/AzureNotebooks" target="_blank">Open this reference, and review the instructions you see there</a>. You can clone this Notebook to work with it later.</p>
<br>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity: Azure Data Studio Notebooks Overview</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity: Azure Data Studio Notebooks Overview</b></p>
<p><b>Steps</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://docs.microsoft.com/en-us/sql/azure-data-studio/sql-notebooks?view=sql-server-2017" target="_blank">Open this reference, and read the tutorial - you do not have to follow the steps, but you can if time permist.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><a href="https://docs.microsoft.com/en-us/sql/azure-data-studio/sql-notebooks?view=sql-server-2017" target="_blank">Open this reference, and read the tutorial - you do not have to follow the steps, but you can if time permits.</p>
<br>
<p style="border-bottom: 1px solid lightgrey;"></p>
<br>
<p><img style="margin: 0px 15px 15px 0px;" src="../graphics/owl.png"><b>For Further Study</b></p>
<p><img style="margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/owl.png?raw=true"><b>For Further Study</b></p>
<br>
<ul>
@ -302,6 +304,6 @@ You'll explore further operations with the Azure Data Studio in the final module
<li><a href="https://realpython.com/jupyter-notebook-introduction/" target="_blank">Full tutorial on Jupyter Notebooks</a></li>
</ul>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/geopin.png"><b >Next Steps</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/geopin.png?raw=true"><b >Next Steps</b></p>
Next, Continue to <a href="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/KubernetesToBDC/02-hardware.md" target="_blank"><i> 02 - Hardware and Virtualization environment for Kubernetes </i></a>.

Просмотреть файл

@ -148,7 +148,7 @@ We'll begin with a set of definitions. These aren't all the terms used in Kubern
<tr style="vertical-align:top;">
<td> </td>
<td><a href="https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/"><i>etcd</i></a> </td>
<td>A high performance key value store that stores the clusters state. Since <i>etcd</i> is light-weight, each instance can generally share resources with other Nodes in the cluster. The Hardware recommendations section of the official etcd.io site provides a detailed breakdown of the hardware requirement for <i>etcd</i>. </td>
<td>A high performance key value store that stores the clusters state. Since <i>etcd</i> is light-weight, each instance can generally share resources with other Nodes in the cluster. The Hardware recommendations section of the official http://etcd.io site provides a detailed breakdown of the hardware requirement for <i>etcd</i>. </td>
</tr>
<tr style="vertical-align:top;">
<td> </td>
@ -246,6 +246,7 @@ Luckily help is at hand in the form of a tool that leverages kubeadm in order to
- Add nodes to existing clusters
Kubespray is a Cloud Native Computing Foundation project and with its own [GitHub repository](https://github.com/kubernetes-sigs/kubespray).
### 3.2.5 What Is Ansible? ###
@ -268,6 +269,9 @@ In order to carry out the deployment of the Kubernetes cluster, a basic understa
### 3.2.7 Kubespray Workflow ###
Unlike other available deployment tools, Kubespray does everything for you in “One shot”. For example, Kubeadm requires that certificates on nodes are created manually, Kubespray not only leverages Kubeadm but it also looks after everything including certificate creation for you. Kubespray works against most of the popular public cloud providers and has been tested for the deployment of clusters with thousands of nodes. The real elegance of Kubespray is the reuse it promotes. If an organization has a requirement to deploy multiple clusters, once Kubespray is setup, for every new cluster that needs to be created, the only prerequisite is to create a new inventory file for the nodes the new cluster will use.
3.2.5 High Level Kubespray Workflow
The deployment of a Kubernetes cluster via Kubespray follows this workflow:
- Preinstall step
@ -299,7 +303,6 @@ Note:
### 3.2.8 Requirements ###
Refer to the [requirements](https://github.com/kubernetes-sigs/kubespray#requirements) section in the Kubespray GitHub repo.
### 3.2.9 Post Cluster Deployment Activities ###

Просмотреть файл

@ -19,7 +19,7 @@ This module covers Container technologies and how they are different than Virtua
SQL Server (starting with version 2019) provides three ways to work with large sets of data:
- **Data Virtualization**: Query multiple sources of data technologies using the Polybase SQL Server feature <i>(data left at source)</i>
- **Data Virtualization**: Query multiple sources of data technologies using the PolyBase SQL Server feature <i>(data left at source)</i>
- **Storage Pools**: Create sets of disparate data sources that can be queried from Distributed Data sets <i>(data ingested into sharded databases using PolyBase)</i>
- **SQL Server Big Data Clusters**: Create, manage and control clusters of SQL Server Instances that co-exist in a Kubernetes cluster with Apache Spark and other technologies to access and process large sets of data <i>(Data left in place, ingested through PolyBase, and into/through HDFS)</i>
@ -45,12 +45,12 @@ To leverage PolyBase, you first define the external table using a specific set o
</table>
<br>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png?raw=true"><b>Activity: Review PolyBase Solution</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity: Review PolyBase Solution</b></p>
In this section you will review a solution tutorial similar to one you will perform later. You'll see how to create a reference to an HDFS file store and query it within SQL Server as if it were a standard internal table.
<br>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png">Open <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-query-hdfs-storage-pool?view=sqlallproducts-allversions" target="_blank">this reference and locate numbers 4-5 of the steps in the tutorial</a>. This explains the two steps required to create and query an External table. *Only review this information; you will perform these steps in another Module*.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true">Open <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-query-hdfs-storage-pool?view=sqlallproducts-allversions" target="_blank">this reference and locate numbers 4-5 of the steps in the tutorial</a>. This explains the two steps required to create and query an External table. *Only review this information; you will perform these steps in another Module*.</p>
<br>
<p style="border-bottom: 1px solid lightgrey;"></p>
@ -148,7 +148,7 @@ These components are used in the Compute Pool of the BDC:
<h3>BDC: App Pool</h3>
The App Pool is a set of Pods within a Node that hold multiple types of end-points into the system. SQL Server Integration Services lives in the App Pool, and other Job systems are possible. You could instatiate a long-running job (such as IoT streaming) or Machine Learning (ML) endpoints used for scoring a prediction or returning a classification.
The App Pool is a set of Pods within a Node that hold multiple types of end-points into the system. SQL Server Integration Services lives in the App Pool, and other Job systems are possible. You could instantiate a long-running job (such as IoT streaming) or Machine Learning (ML) endpoints used for scoring a prediction or returning a classification.
These components are used in the Compute Pool of the BDC:
@ -195,18 +195,18 @@ These components are used in the Storage Pool of the BDC:
</table>
<br>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity: Review Data Pool Solution</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity: Review Data Pool Solution</b></p>
In this section you will review the solution tutorial similar to the one you will perform in a future step. You'll see how to load data into the Data Pool.
<br>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png">Open <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-data-pool-ingest-sql?view=sqlallproducts-allversions" target="_blank">this reference and review the steps in the tutorial</a>. This explains the two steps required to create and load an External table in the Data Pool. You'll perform these steps in the <i>Operationalization</i> Module later. *Only review this information at this time. You will perform these steps in another Module.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true">Open <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-data-pool-ingest-sql?view=sqlallproducts-allversions" target="_blank">this reference and review the steps in the tutorial</a>. This explains the two steps required to create and load an External table in the Data Pool. You'll perform these steps in the <i>Operationalization</i> Module later. *Only review this information at this time. You will perform these steps in another Module.</p>
<br>
<p style="border-bottom: 1px solid lightgrey;"></p>
<br>
<p><img style="margin: 0px 15px 15px 0px;" src="../graphics/owl.png"><b>For Further Study</b></p>
<p><img style="margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/owl.png?raw=true"><b>For Further Study</b></p>
<ul>
<li><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-overview?view=sqlallproducts-allversions" target="_blank">Official Documentation for this section</a></li>
<li><a href = "https://cloudblogs.microsoft.com/sqlserver/2018/09/26/sql-server-2019-celebrating-25-years-of-sql-server-database-engine-and-the-path-forward/" target="_blank">Update on 2019 Blog</a></li>

Просмотреть файл

@ -14,18 +14,18 @@ This module covers Container technologies and how they are different than Virtua
<p style="border-bottom: 1px solid lightgrey;"></p>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/pencil2.png"><a name="4-0">4.0 End-To-End Solution with BDC</a></h2>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/pencil2.png?raw=true"><a name="4-0">4.0 End-To-End Solution with BDC</a></h2>
Recall from <i>The Big Data Landscape</i> module that you learned about the Wide World Importers company. <a href="https://azure-scenarios-experience.azurewebsites.net/big-data.html" target="_blank">Wide World Importers </a> (WWI) is a traditional brick and mortar business with a long track record of success, generating profits through strong retail store sales of their unique offering of affordable products from around the world. They have a traditional N-tier application that uses a front-end (mobile, web and installed) that interacts with a scale-out middle-tier software product, which in turn stores data in a large SQL Server database that has been scaled-up to meet demand.
<br>
<img style="height: 150; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="../graphics/WWI-002.png">
<img style="height: 150; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/WWI-002.png?raw=true">
<br>
WWI has now added web and mobile commerce to their platform, which has generated a significant amount of additional data, and data formats. These new platforms were added without integrating into the OLTP system data or Business Intelligence infrastructures. As a result, "silos" of data stores have developed, and ingesting all of this data exceeds the scale of their current RDBMS server:
<br>
<img style="height: 300; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="../graphics/WWI-003.png">
<img style="height: 300; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/WWI-003.png?raw=true">
<br>
This presented the following four challenges - the IT team at WWI needs to:
@ -49,89 +49,89 @@ To meet these challenges, the following solution is proposed. Using the BDC plat
The following diagram illustrates the complete solution that you can use to brief your audience with:
<br>
<img style="height: 400; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="../graphics/bdcsolution1.png">
<img style="height: 400; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/bdcsolution1.png?raw=true">
<br>
In the following sections you'll dive deeper into how this scale is used to solve the rest of the challenges.
<p style="border-bottom: 1px solid lightgrey;"></p>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/pencil2.png"><a name="4-1">4.1 Data Virtualization - <i>Challenge 2: Multiple Data Sources</i></a></h2>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/pencil2.png?raw=true"><a name="4-1">4.1 Data Virtualization - <i>Challenge 2: Multiple Data Sources</i></a></h2>
The next challenge the IT team must solve is to enable a single data query to work across multiple disparate systems, optionally joining to internal SQL Server Tables, and also at scale.
Using the Data Virtualization capability you saw in the <i>02 - SQL Server BDC Components</i> Module, the IT team creates External Tables using the PolyBase feature. These External Table definitions are stored in the database on the SQL Server Master Instance within the cluster. When queried by the user, the queries are engaged from the SQL Server Master Instance through the Compute Pool in the SQL Server BDC, which holds Kubernetes Nodes containing the Pods running SQL Server Instances. These Instances send the query to the PolyBase Connector at the target data system, which processes the query based on the type of target system. The results are processed and returned through the PolyBase Connector to the Compute Pool and then on to the Master Instance, and then on to the user.
<br>
<img style="height: 250; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="../graphics/bdcsolution2.png">
<img style="height: 250; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/bdcsolution2.png?raw=true">
<br>
This process allows not only a query to disparate systems, but also those remote systems can hold extremely large sets of data. Normally you are querying a subset of that data, so the results are all that are sent back over the network. These results can be joined with internal tables for a single view, and all from within the same Transact-SQL statements.
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity: Load and query data in an External Table</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity: Load and query data in an External Table</b></p>
In this activity, you will load the sample data into your big data cluster environment, and then create and use an External table to query the data in HDFS. This process is similar to connecting to any Polybase target.
In this activity, you will load the sample data into your big data cluster environment, and then create and use an External table to query the data in HDFS. This process is similar to connecting to any PolyBase target.
<b>Steps</b>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-load-sample-data?view=sqlallproducts-allversions" target="_blank">Open this reference, and perform all of the instructions you see there</a>. This loads your data in preparattion for the next Activity.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-query-hdfs-storage-pool?view=sqlallproducts-allversions" target="_blank">Open this reference, and perform all of the instructions you see there</a>. This step shows you how to create and query an External table.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-query-oracle?view=sqlallproducts-allversions" target="_blank">(Optional) Open this reference, and review the instructions you see there</a>. (You You must have an Oracle server that your BDC can reach to perform these steps, although you can review them if you do not)</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-load-sample-data?view=sqlallproducts-allversions" target="_blank">Open this reference, and perform all of the instructions you see there</a>. This loads your data in preparation for the next Activity.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-query-hdfs-storage-pool?view=sqlallproducts-allversions" target="_blank">Open this reference, and perform all of the instructions you see there</a>. This step shows you how to create and query an External table.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-query-oracle?view=sqlallproducts-allversions" target="_blank">(Optional) Open this reference, and review the instructions you see there</a>. (You You must have an Oracle server that your BDC can reach to perform these steps, although you can review them if you do not)</p>
<br>
<p style="border-bottom: 1px solid lightgrey;"></p>
<br>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/pencil2.png"><a name="4-2">4.2 Creating a Distributed Data solution using big data cluster - <i>Challenge 3: Deep Analytics</i></a></h2>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/pencil2.png?raw=true"><a name="4-2">4.2 Creating a Distributed Data solution using big data cluster - <i>Challenge 3: Deep Analytics</i></a></h2>
Ad-hoc queries are very useful for many scenarios. There are times when you would like to bring the data into storage, so that you can create denormalized representations of datasets, aggregated data, and other purpose-specific data tasks.
<br>
<img style="height: 250; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="../graphics/bdcsolution3.png">
<img style="height: 250; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/bdcsolution3.png?raw=true">
<br>
Using the Data Virtualization capability you saw in the <i>02 - BDC Components</i> Module, the IT team creates External Tables using PolyBase statements. These External Table definitions are stored in the database on the SQL Server Master Instance within the cluster. When queried by the user, the queries are engaged from the SQL Server Master Instance through the Compute Pool in the SQL Server BDC, which holds Kubernetes Nodes containing the Pods running SQL Server Instances. These Instances send the query to the PolyBase Connector at the target data system, which processes the query based on the type of target system. The results are processed and returned through the PolyBase Connector to the Compute Pool and then on to the Master Instance, and the PolyBase statements can specify the target of the Data Pool. The SQL Server Instances in the Data Pool store the data in a distributed fashion across multiple databases, called <i>Shards</i>.
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity: Load and query data into the Data Pool</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity: Load and query data into the Data Pool</b></p>
In this activity, you will load the sample data into your big data cluster environment, and then create and use an External table to load data into the Data Pool.
<b>Steps</b>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-data-pool-ingest-sql?view=sqlallproducts-allversions" target="_blank">Open this reference, and perform the instructions you see there</a>. This loads data into the Data Pool.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-data-pool-ingest-sql?view=sqlallproducts-allversions" target="_blank">Open this reference, and perform the instructions you see there</a>. This loads data into the Data Pool.</p>
<br>
<p style="border-bottom: 1px solid lightgrey;"></p>
<br>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/pencil2.png"><a name="4-3">4.3 Querying HDFS Data using big data cluster - <i>Challenge 4: Enable AI</i></a></h2>
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/pencil2.png?raw=true"><a name="4-3">4.3 Querying HDFS Data using big data cluster - <i>Challenge 4: Enable AI</i></a></h2>
There are three primary uses for a large cluster of data processing systems for Machine Learning and AI applications. The first is that the users will involved in the creation of the <a href="https://www.codeingschool.com/2018/09/what-are-features-and-labels-in-machine-learning.html" target="_blank">Features used in various ML and AI algorithms, and are often tasked to Label</a> the data. These users can access the Data Pool and Data Storage data stores directly to query and assist with this task.
The SQL Server Master Instance in the BDC installs with <a href="https://docs.microsoft.com/en-us/sql/advanced-analytics/what-is-sql-server-machine-learning?view=sql-server-ver15" target="_blank">Machine Learning Services</a>, which allow creation, training, evaluation and presisting of Machine Learning Models. Data from all parts of the BDC are available, and Data Science oriented languages and libraries in R, Python and Java are enabled. In this scenario, the Data Scientist creates the R or Python code, and the Transact-SQL Developer wraps that code in a Stored Procedure. This code can be used to train, evaluate and create Machine Learning Models. The Models can be stored in the Master Instance for scoring, or sent on to the App Pool where the Machine Learning Server is running, waiting to accept REST-based calls from applications.
The SQL Server Master Instance in the BDC installs with <a href="https://docs.microsoft.com/en-us/sql/advanced-analytics/what-is-sql-server-machine-learning?view=sql-server-ver15" target="_blank">Machine Learning Services</a>, which allow creation, training, evaluation and persisting of Machine Learning Models. Data from all parts of the BDC are available, and Data Science oriented languages and libraries in R, Python and Java are enabled. In this scenario, the Data Scientist creates the R or Python code, and the Transact-SQL Developer wraps that code in a Stored Procedure. This code can be used to train, evaluate and create Machine Learning Models. The Models can be stored in the Master Instance for scoring, or sent on to the App Pool where the Machine Learning Server is running, waiting to accept REST-based calls from applications.
<br>
<img style="height: 400; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="../graphics/bdcsolution4.png">
<img style="height: 400; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/bdcsolution4.png?raw=true">
<br>
The Data Scientist has another option to create and train ML and AI models. The Spark platform within the Storage Pool is accessible through the Knox gateway, using Livy to send Spark Jobs as you learned about in the <i>02 - SQL Server BDC Components</i> Module. This gives access to the full Spark platform, using Jupyter Notebooks (included in <i>Azure Data Studio</i>) or any other standard tools that can access Spark through REST calls.
<br>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity: Load data with Spark, run a Spark Notebook</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity: Load data with Spark, run a Spark Notebook</b></p>
<br>
In this activity, you will load the sample data into your big data cluster environment using Spark, and use a Notebook in Azure Data Studio to work with it.
<b>Steps</b>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-data-pool-ingest-spark?view=sqlallproducts-allversions" target="_blank">Open this reference, and follow the instructions you see there</a>. This loads the data in preparation for the Notebook operations.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-notebook-spark?view=sqlallproducts-allversions" target="_blank">Open this reference, and follow the instructions you see there</a>. This simple example shows you how to work with the data you ingested into the Storage Pool using Spark.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-data-pool-ingest-spark?view=sqlallproducts-allversions" target="_blank">Open this reference, and follow the instructions you see there</a>. This loads the data in preparation for the Notebook operations.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-notebook-spark?view=sqlallproducts-allversions" target="_blank">Open this reference, and follow the instructions you see there</a>. This simple example shows you how to work with the data you ingested into the Storage Pool using Spark.</p>
<br>
<p style="border-bottom: 1px solid lightgrey;"></p>
<br>
<p><img style="margin: 0px 15px 15px 0px;" src="../graphics/owl.png"><b>For Further Study</b></p>
<p><img style="margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/owl.png?raw=true"><b>For Further Study</b></p>
<ul>
<li><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-overview?view=sqlallproducts-allversions" target="_blank">Official Documentation for this section</a></li>
<li><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/data-ingestion-curl?view=sqlallproducts-allversions" target="_blank">Use curl to load data into HDFS on SQL Server 2019 big data clusters</a></li>

Просмотреть файл

@ -0,0 +1,124 @@
# SQL Server 2019 big data cluster
Single-Node Cluster on an Azure Virtual Machine (Unsupported for production - classroom only)
In this set of instructions you'll set up a SQL Server 2019 big data cluster using Ubuntu on
a single-Node using a Microsoft Azure Virtual Machine.
NOTE: This is an unsupported configuration, and should be used only for classroom purposes.
Carefully read the instructions for the parameters you need to replace for your specific
subscription and parameters.
-------------------------------------------------------------------------------------------------------------
## Running these Instructions
These instructions use shell commands, such as PowerShell, bash, or the CMD window from a
system that has the Secure Shell software installed (SSH). You can type:
ssh -h
To see if this tool is installed.
You can copy-and-paste from the lines that show the commands, or you can set your IDE to run the current line
in a Terminal window. (In Visual Studio Code or Azure Data Studio, these are called "Keybindings"):
https://code.visualstudio.com/docs/getstarted/keybindings
You can set this to any key you like:
(Preferences | Keyboard Shortcuts | Terminal: Run Selected Text in Active Terminal)
-------------------------------------------------------------------------------------------------------------
## References
This Notebook uses the script located here:
https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-script-single-node-kubeadm?view=sql-server-ver15
and that reference supersedes the information in the steps listed below.
You can also create a SQL Server Big Data Cluster on the Azure Kubernetes Service (AKS):
Those instructions are located here: https://docs.microsoft.com/en-us/sql/big-data-cluster/quickstart-big-data-cluster-deploy?view=sql-server-ver15
For a complete workshop on SQL Server 2019's big data clusters, see this reference:
https://github.com/Microsoft/sqlworkshops/tree/master/sqlserver2019bigdataclusters
-------------------------------------------------------------------------------------------------------------
### Step 1: Log in to Azure
az login
-------------------------------------------------------------------------------------------------------------
### Step 2: Set your account - show the accounts, replace <YourAccountNameHere> with your account name
az account list --output table
az account set --subscription "<YourAccountNameHere>"
-------------------------------------------------------------------------------------------------------------
### Step 3: Create a Resource Group, and a Virtual Machine - Look for values with the <Replace> characters to change to your values
#### (Note: Needs a machine large enough to run BDC and also have Nested Virtualization)
az group create -n <ResourceGroupName> -l eastus2
az vm create -n <VMName> -g <ResourceGroupName> -l eastus2 --image UbuntuLTS --os-disk-size-gb 200 --storage-sku Premium_LRS --admin-username bdcadmin --admin-password <ReplaceWithPassword> --size Standard_D8s_v3 --public-ip-address-allocation static
ssh -X bdcadmin@<ReplaceWithIPAddressThatReturnsFromLastCommand>
### Step 4: Update and Upgrade VM
sudo apt-get update
sudo apt-get upgrade
sudo apt autoremove
-------------------------------------------------------------------------------------------------------------
### Step 5: (Optional) Install an XWindows server
sudo apt-get install xorg openbox
sudo reboot
#### After about 5 minutes:
ssh -X bdcadmin@<ReplaceWithVMIP>
sudo apt-get install gnome-core
sudo reboot
#### After about 5 minutes:
ssh -X bdcadmin@<ReplaceWithVMIP>
sudo sed -i 's/allowed_users=console/allowed_users=anybody/' /etc/X11/Xwrapper.config
mkdir /home/bdcadmin/.config/nautilus
cd ~
mkdir ./Downloads
touch  /home/bdcadmin/.gtk-bookmarks
wget https://cs7a9736a9346a1x44c6xb00.blob.core.windows.net/backups/ads.deb
#### Check XWindows - Note, requires that you have XWindows software installed on your laptop
nautilus &
-------------------------------------------------------------------------------------------------------------
### Step 6: Install BDC Single Node - Pre-requisites (Current as of 1/31/2020)
sudo apt update && sudo apt upgrade -y
sudo reboot
#### After about 5 minutes:
ssh -X bdcadmin@<ReplaceWithVMIP>
-------------------------------------------------------------------------------------------------------------
### Step 7: Download and mark BDC Setup script
curl --output setup-bdc.sh https://raw.githubusercontent.com/microsoft/sql-server-samples/master/samples/features/sql-big-data-cluster/deployment/kubeadm/ubuntu-single-node-vm/setup-bdc.sh
chmod +x setup-bdc.sh
sudo ./setup-bdc.sh
-------------------------------------------------------------------------------------------------------------
### Step 8: Setup path and Check
source ~/.bashrc
azdata --version
kubectl get pods
#### You can now use the system.
-------------------------------------------------------------------------------------------------------------
## Cleanup - Erase everything
### Only perform this step when you are done experimenting with the system...
az group delete --name <ResourceGroupName>

Просмотреть файл

@ -8,7 +8,7 @@
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/textbubble.png?raw=true"><b> About this Workshop</b></h2>
Welcome to this Microsoft solutions workshop on *Kubernetes - From Bare Metal to SQL Server Big Data Clusters*. In this workshop, you'll learn about setting up a production grade SQL Server 2019 big data cluster environment on Kubernetes. Topics covered include: hardware, virtualization, and Kubernetes, with a full deployment of SQL Server's Big Data Cluster on the environment that you will use in the class. You'll then walk through a set of Jupyter Notebooks in Azure Data Studio to run T-SQL, Spark, and Machine Learning workloads on the cluster. You'll also receive valuable resources to learn more and go deeper on Linux, Containers, Kubernetes and SQL Server big data clusters.
Welcome to this Microsoft solutions workshop on *Kubernetes - From Bare Metal to SQL Server Big Data Clusters*. In this workshop, you'll learn about setting up a production grade SQL Server 2019 big data cluster environment on Kubernetes. Topics covered include: hardware, virtualization, and Kubernetes, with a full deployment of SQL Server's Big Data Cluster on the environment that you will use in the class. You'll then walk through a set of [Jupyter Notebooks](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html) in Microsoft's [Azure Data Studio](https://docs.microsoft.com/en-us/sql/azure-data-studio/what-is?view=sql-server-ver15) tool to run T-SQL, Spark, and Machine Learning workloads on the cluster. You'll also receive valuable resources to learn more and go deeper on Linux, Containers, Kubernetes and SQL Server big data clusters.
The focus of this workshop is to understand the hardware, software, and environment you need to work with [SQL Server 2019's big data clusters](https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-overview?view=sql-server-ver15) on a Kubernetes platform.
@ -86,7 +86,7 @@ or
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/education1.png?raw=true"><b> Workshop Details</b></h2>
This workshop uses <TODO: enter main technologies used to solve the sceanrio>, with a focus on <TODO: architecture and implementation, development and use, etc>.
This workshop uses Kubernetes to deploy a workload, with a focus on Microsoft SQL Server's big data clusters deployment for Big Data and Data Science workloads.
<table style="tr:nth-child(even) {background-color: #f2f2f2;}; text-align: left; display: table; border-collapse: collapse; border-spacing: 5px; border-color: gray;">
@ -128,6 +128,11 @@ This is a modular workshop, and in each section, you'll learn concepts, technolo
Next, Continue to <a href="00-prerequisites.md" target="_blank"><i> Pre-Requisites</i></a>
**Workshop Authors and Contributors**
- [The Microsoft SQL Server Team](http://microsoft.com/sql)
- [Chris Adkin](https://www.linkedin.com/in/wollatondba/), Pure Storage
**Legal Notice**
*Kubernetes and the Kubernetes logo are trademarks or registered trademarks of The Linux Foundation. in the United States and/or other countries. The Linux Foundation and other parties may also have trademark rights in other terms used herein. This course is not certified, accredited, affiliated with, nor endorsed by Kubernetes or The Linux Foundation.*
*Kubernetes and the Kubernetes logo are trademarks or registered trademarks of The Linux Foundation. in the United States and/or other countries. The Linux Foundation and other parties may also have trademark rights in other terms used herein. This Workshop is not certified, accredited, affiliated with, nor endorsed by Kubernetes or The Linux Foundation.*

Двоичные данные
k8stobdc/graphics/.DS_Store поставляемый Normal file

Двоичный файл не отображается.

Двоичные данные
k8stobdc/graphics/KubernetesCluster.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 313 KiB

Двоичные данные
k8stobdc/graphics/WWI-001.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 73 KiB

Двоичные данные
k8stobdc/graphics/WWI-002.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 23 KiB

Двоичные данные
k8stobdc/graphics/WWI-003.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 224 KiB

Двоичные данные
k8stobdc/graphics/WWI-logo.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 15 KiB

Двоичные данные
k8stobdc/graphics/adf.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 41 KiB

Двоичные данные
k8stobdc/graphics/ads.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 239 KiB

Двоичные данные
k8stobdc/graphics/bookpencil.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 1.8 KiB

Двоичные данные
k8stobdc/graphics/building1.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 4.3 KiB

Двоичные данные
k8stobdc/graphics/bulletlist.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 1.1 KiB

Двоичные данные
k8stobdc/graphics/checkbox.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 180 B

Двоичные данные
k8stobdc/graphics/checkmark.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 1.7 KiB

Двоичные данные
k8stobdc/graphics/clipboardcheck.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 2.6 KiB

Двоичные данные
k8stobdc/graphics/cloud1.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 3.2 KiB

Двоичные данные
k8stobdc/graphics/datamart.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 79 KiB

Двоичные данные
k8stobdc/graphics/datavirtualization.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 149 KiB

Двоичные данные
k8stobdc/graphics/education1.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 3.5 KiB

Двоичные данные
k8stobdc/graphics/factory.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 2.6 KiB

Двоичные данные
k8stobdc/graphics/geopin.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 3.2 KiB

Двоичные данные
k8stobdc/graphics/hdfs.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 240 KiB

Двоичные данные
k8stobdc/graphics/kubectl.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 383 KiB

Двоичные данные
k8stobdc/graphics/listcheck.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 2.0 KiB

Двоичные данные
k8stobdc/graphics/microsoftlogo.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 2.4 KiB

Двоичные данные
k8stobdc/graphics/owl.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 3.9 KiB

Двоичные данные
k8stobdc/graphics/paperclip1.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 1.6 KiB

Двоичные данные
k8stobdc/graphics/pencil2.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 2.7 KiB

Двоичные данные
k8stobdc/graphics/pinmap.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 3.6 KiB

Двоичные данные
k8stobdc/graphics/point1.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 2.9 KiB

Двоичные данные
k8stobdc/graphics/solutiondiagram.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 19 KiB

Двоичные данные
k8stobdc/graphics/spark.jpg

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 216 KiB

Двоичные данные
k8stobdc/graphics/spark.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 166 KiB

Двоичные данные
k8stobdc/graphics/sqlbdc.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 467 KiB

Двоичные данные
k8stobdc/graphics/textbubble.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 1.5 KiB

Двоичные данные
sqlserver2019bigdataclusters/.DS_Store поставляемый Normal file

Двоичный файл не отображается.

Просмотреть файл

@ -22,7 +22,7 @@ The other requirements are:
- **The pip3 Package**: The Python package manager *pip3* is used to install various BDC deployment and configuration tools.
- **The kubectl program**: The *kubectl* program is the command-line control feature for Kubernetes.
- **The azdata utility**: The *azdata* program is the deployment and configuration tool for BDC.
- **Azure Data Studio**: The *Azure Data Studio* IDE, along with various Extensions, is used for deploying the system, and querying and management of the BDC. In addition, you will use this tool to participate in the workshop. Note: You can connect to a SQL Server 2019 Big Data Cluster using any SQL Server connection tool or applicaiton, such as SQL Server Management Studio, but this course will use Microsoft Azure Data Studio for cluster management, Jupyter Notebooks and other capabilities.
- **Azure Data Studio**: The *Azure Data Studio* IDE, along with various Extensions, is used for deploying the system, and querying and management of the BDC. In addition, you will use this tool to participate in the workshop. Note: You can connect to a SQL Server 2019 Big Data Cluster using any SQL Server connection tool or application, such as SQL Server Management Studio, but this course will use Microsoft Azure Data Studio for cluster management, Jupyter Notebooks and other capabilities.
*Note that all following activities must be completed prior to class - there will not be time to perform these operations during the workshop.*
@ -71,7 +71,7 @@ Get-WindowsUpdate
Install-WindowsUpdate
</pre>
*Note: If you get an error during this update process, evaluate it to see if it is fatal. You may recieve certain driver errors if you are using a Virtual Machine, this can be safely ignored.*
*Note: If you get an error during this update process, evaluate it to see if it is fatal. You may receive certain driver errors if you are using a Virtual Machine, this can be safely ignored.*
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png">Install Big Data Cluster Tools</p>
@ -97,7 +97,7 @@ Get-WindowsUpdate
Install-WindowsUpdate
</pre>
*Note 1: If you get an error during this update process, evaluate it to see if it is fatal. You may recieve certain driver errors if you are using a Virtual Machine, this can be safely ignored.*
*Note 1: If you get an error during this update process, evaluate it to see if it is fatal. You may receive certain driver errors if you are using a Virtual Machine, this can be safely ignored.*
**Note 2: If you are using a Virtual Machine in Azure, power off the Virtual Machine using the Azure Portal every time you are done with it. Turning off the VM using just the Windows power off in the VM only stops it running, but you are still charged for the VM if you do not stop it from the Portal. Stop the VM from the Portal unless you are actively using it.**

Просмотреть файл

@ -92,7 +92,7 @@ This solution uses an example of a retail organization that has multiple data so
<img style="height: 25;" src="../graphics/WWI-logo.png">
Wide World Importeres (WWI) is a traditional brick and mortar business with a long track record of success, generating profits through strong retail store sales of their unique offering of affordable products from around the world. They have a great training program for new employees, that focuses on connecting with their customers and providing great face-to-face customer service. This strong focus on customer relationships has helped set WWI apart from their competitors.
Wide World Importers (WWI) is a traditional brick and mortar business with a long track record of success, generating profits through strong retail store sales of their unique offering of affordable products from around the world. They have a great training program for new employees, that focuses on connecting with their customers and providing great face-to-face customer service. This strong focus on customer relationships has helped set WWI apart from their competitors.
WWI has now added web and mobile commerce to their platform, which has generated a significant amount of additional data, and data formats. These new platforms have been added without integrating into the OLTP system data or Business Intelligence infrastructures. As a result, "silos" of data stores have developed.
@ -144,7 +144,7 @@ Using the following steps, you will create a Resource Group in Azure that will h
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"> <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-big-data-tools?view=sqlallproducts-allversions" target="_blank"> Read the following article to install the big data cluster Tools, ensuring that you carefully follow each step</a>. Note that if you followed the pre-requisites properly, you will already have <i>Python</i>, <i>kubectl</i>, and <i>Azure Data Studio</i> installed, so those may be skipped. Follow all other instructions.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"> <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/quickstart-big-data-cluster-deploy?view=sqlallproducts-allversions" target="_blank"> Read the following article to deploy the bdc to AKS, ensuring that you carefully follow each step</a>. Stop at the section marked <b>Connect to the cluster</b>.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"> <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/quickstart-big-data-cluster-deploy?view=sqlallproducts-allversions" target="_blank"> Read the following article to deploy the Big Data Cluster to the Azure Kubernetes Service, ensuring that you carefully follow each step</a>. Stop at the section marked <b>Connect to the cluster</b>.</p>
<p style="border-bottom: 1px solid lightgrey;"></p>
@ -309,7 +309,7 @@ You can <a href="https://hackernoon.com/docker-commands-the-ultimate-cheat-sheet
<h3>Container Orchestration <i>(Kubernetes)</i></h3>
For Big Data systems, having lots of Containers is very advantageous to segment purpose and performance profiles. However, dealing with many Container Images, allowing persisted storage, and interconnecting them for network and internetwork communications is a complex task. One such Container Prchestration tool is <i>Kubernetes</i>, an open source Container orchestrator, which can scale Container deployments according to need. The following table defines some important Container Orchestration Tools (Such as Kubernetes or OpenShift) terminology:
For Big Data systems, having lots of Containers is very advantageous to segment purpose and performance profiles. However, dealing with many Container Images, allowing persisted storage, and interconnecting them for network and internetwork communications is a complex task. One such Container Orchestration tool is <i>Kubernetes</i>, an open source Container orchestrator, which can scale Container deployments according to need. The following table defines some important Container Orchestration Tools (Such as Kubernetes or OpenShift) terminology:
<table style="tr:nth-child(even) {background-color: #f2f2f2;}; text-align: left; display: table; border-collapse: collapse; border-spacing: 5px; border-color: gray;">
@ -327,7 +327,7 @@ For Big Data systems, having lots of Containers is very advantageous to segment
You can <a href="https://kubernetes.io/docs/tutorials/kubernetes-basics/" target="_blank">learn much more about Container Orchestration systems here</a>. We're using the Azure Kubernetes Service (AKS) in this workshop, and <a href="https://aksworkshop.io/" target="_blank">they have a great set of tutorials for you to learn more here</a>.
In SQL Server Big Data Clusters, the Container Orchestration system (Such as Kubernetes or OpenShift) is responsible for the state of the BDC; it is reponsible for building and configurint the Nodes, assigns Pods to Nodes,creates and manages the Persistent Voumes (durable storage), and manages the operation of the Cluster.
In SQL Server Big Data Clusters, the Container Orchestration system (Such as Kubernetes or OpenShift) is responsible for the state of the BDC; it is responsible for building and configuring the Nodes, assigns Pods to Nodes,creates and manages the Persistent Volumes (durable storage), and manages the operation of the Cluster.
(You'll cover the storage aspects of Container Orchestration in more detail in a moment.)
@ -440,7 +440,7 @@ You'll explore further operations with the Azure Data Studio in the <i>Operation
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity: Azure Data Studio Notebooks Overview</b></p>
<p><b>Steps</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://docs.microsoft.com/en-us/sql/azure-data-studio/sql-notebooks?view=sql-server-2017" target="_blank">Open this reference, and read the tutorial - you do not have to follow the steps, but you can if time permist.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://docs.microsoft.com/en-us/sql/azure-data-studio/sql-notebooks?view=sql-server-2017" target="_blank">Open this reference, and read the tutorial - you do not have to follow the steps, but you can if time permits.</p>
<br>
@ -486,7 +486,7 @@ Since HDFS is a file-system, data transfer is largely a matter of using it as a
<h3>Data Pipelines using <i>Azure Data Factory</i></h3>
As described earlier, you can use various methods to ingest data ad-hoc and as-needed for your two data targets (HDFS and SQL Server Tables. A more holistic archicture is to use a <i>Pipeline</i> system that can define sources, triggers and events, transforms, targets, and has logging and tracking capabilities. The Microsoft Azure Data Factory provides all of the capabilities, and often serves as the mechanism to transfer data to and from on-premises, in-cloud, and other sources and targets. <a href="https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities" target="_blank">ADF can serve as a full data pipeline system, as described here</a>.
As described earlier, you can use various methods to ingest data ad-hoc and as-needed for your two data targets (HDFS and SQL Server Tables. A more holistic architecture is to use a <i>Pipeline</i> system that can define sources, triggers and events, transforms, targets, and has logging and tracking capabilities. The Microsoft Azure Data Factory provides all of the capabilities, and often serves as the mechanism to transfer data to and from on-premises, in-cloud, and other sources and targets. <a href="https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipelines-activities" target="_blank">ADF can serve as a full data pipeline system, as described here</a>.
<br>
<img style="height: 75;" src="../graphics/adf.png">

Просмотреть файл

@ -28,7 +28,7 @@ You'll cover the following topics in this Module:
SQL Server (starting with version 2019) provides three ways to work with large sets of data:
- **Data Virtualization**: Query multiple sources of data technologies using the Polybase SQL Server feature <i>(data left at source)</i>
- **Data Virtualization**: Query multiple sources of data technologies using the PolyBase SQL Server feature <i>(data left at source)</i>
- **Storage Pools**: Create sets of disparate data sources that can be queried from Distributed Data sets <i>(data ingested into sharded databases using PolyBase)</i>
- **SQL Server Big Data Clusters**: Create, manage and control clusters of SQL Server Instances that co-exist in a Kubernetes cluster with Apache Spark and other technologies to access and process large sets of data <i>(Data left in place, ingested through PolyBase, and into/through HDFS)</i>
@ -228,4 +228,4 @@ In this section you will review the solution tutorial you will perform in the <i
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/geopin.png"><b> Next Steps</b></p>
Next, Continue to <a href="03%20-%20Planning,%20Installation%20and%20Configuration.md" target="_blank"><i> Planning, Installation and Configuration</i></a>.
Next, Continue to <a href="03%20-%20Planning,%20Installation%20and%20Configuration.md" target="_blank"><i> Planning, Installation and Configuration</i></a>.

Просмотреть файл

@ -29,28 +29,28 @@ You'll cover the following topics in this Module:
<i>NOTE: The following Module is based on the Public Preview of the Microsoft SQL Server 2019 big data cluster feature. These instructions will change as the product is updated. <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-get-started?view=sqlallproducts-allversions" target="_blank">The latest installation instructions are located here</a>.</i>
A Big Data Cluster for SQL Server (BDC) is deployed onto a Cluster Orechestration system (such as Kubernetes or OpenShift) using the `azdata` utility which creates the appropriate Nodes, Pods, Containers and other constructs for the system. The installation uses various switches on the `azdata` utility, and reads from several variables contianed within <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/reference-deployment-config?view=sqlallproducts-allversions" target="_blank">an internal JSON document</a> when you run the command. Using a switch, you can change these variables. You can also dump the enitre document to a file, edit it, and then call the installation that uses that file with the `azdata` command. <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-custom-configuration?view=sqlallproducts-allversions" target="_blank">More detail on that process is located here.</a>
A Big Data Cluster for SQL Server (BDC) is deployed onto a Cluster Orchestration system (such as Kubernetes or OpenShift) using the `azdata` utility which creates the appropriate Nodes, Pods, Containers and other constructs for the system. The installation uses various switches on the `azdata` utility, and reads from several variables contained within <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/reference-deployment-config?view=sqlallproducts-allversions" target="_blank">an internal JSON document</a> when you run the command. Using a switch, you can change these variables. You can also dump the entire document to a file, edit it, and then call the installation that uses that file with the `azdata` command. <a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-custom-configuration?view=sqlallproducts-allversions" target="_blank">More detail on that process is located here.</a>
For planning, it is essential that you understand the SQL Server BDC components, and have a firm understanding of Kubernetes and TCP/IP networking. You should also have an understanding of how SQL Server and Apache Spark use the "Big Four" (*CPU, I/O, Memory and Networking*).
Since the Cluster Orechestration system is often made up of Virtual Machines that host the Container Images, they must be as large as possible. For the best possible performance, large physical machines that are tuned for optimal performance is a recommended physical architecture. The least viable production system is a Minimum of 3 Linux physical machines or virtual machines. The recommended configuration per machine is 8 CPUs, 32 GB of memory and 100GB of storage. This configuration would support only one or two users with a standard workload, and you would want to increase the system for each additional user or heavier workload.
Since the Cluster Orchestration system is often made up of Virtual Machines that host the Container Images, they must be as large as possible. For the best possible performance, large physical machines that are tuned for optimal performance is a recommended physical architecture. The least viable production system is a Minimum of 3 Linux physical machines or virtual machines. The recommended configuration per machine is 8 CPUs, 32 GB of memory and 100GB of storage. This configuration would support only one or two users with a standard workload, and you would want to increase the system for each additional user or heavier workload.
You can deploy Kubernetes in a few ways:
- In a Cloud Platform such as Azure Kubernetes Service (AKS)
- In your own Cluster Orechestration system deployment using the appropirate tools such as `KubeADM`
- In your own Cluster Orchestration system deployment using the appropriate tools such as `KubeADM`
Regardless of the Cluster Orechestration system target, the general steps for setting up the system are:
Regardless of the Cluster Orchestration system target, the general steps for setting up the system are:
- Set up Cluster Orechestration system with a Cluster target
- Set up Cluster Orchestration system with a Cluster target
- Install the cluster tools on the administration machine
- Deploy the BDC onto the Cluster Orechestration system
- Deploy the BDC onto the Cluster Orchestration system
In the sections that follow, you'll cover the general process for each of these deployments. The official documentation referenced above have the specific steps for each deployment, and the *Activity* section of this Module has the steps for deploying the BDC on AKS for the classroom enviornment.
In the sections that follow, you'll cover the general process for each of these deployments. The official documentation referenced above have the specific steps for each deployment, and the *Activity* section of this Module has the steps for deploying the BDC on AKS for the classroom environment.
<p style="border-bottom: 1px solid lightgrey;"></p>
@ -94,7 +94,7 @@ With this background, you can find the <a href="https://docs.microsoft.com/en-us
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/pencil2.png"><a name="3-2">3.2 Installing Locally Using KubeADM</h2>
If you choose Kubernetes as your Cluster Orechestration system, the <a href="https://kubernetes.io/docs/setup/independent/install-kubeadm/" target="_blank">kubeadm toolbox</a> helps you bootstrap a Kubernetes cluster that conforms to best practices. Kubeadm also supports other cluster lifecycle functions, such as upgrades, downgrade, and managing bootstrap tokens.
If you choose Kubernetes as your Cluster Orchestration system, the <a href="https://kubernetes.io/docs/setup/independent/install-kubeadm/" target="_blank">kubeadm toolbox</a> helps you bootstrap a Kubernetes cluster that conforms to best practices. Kubeadm also supports other cluster lifecycle functions, such as upgrades, downgrade, and managing bootstrap tokens.
The kubeadm toolbox can deploy a Kubernetes cluster to physical or virtual machines. It works by specifying the TCP/IP addresses of the targets.

Просмотреть файл

@ -81,11 +81,11 @@ This process allows not only a query to disparate systems, but also those remote
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/point1.png"><b>Activity: Load and query data in an External Table</b></p>
In this activity, you will load the sample data into your big data cluster environment, and then create and use an External table to query the data in HDFS. This process is similar to connecting to any Polybase target.
In this activity, you will load the sample data into your big data cluster environment, and then create and use an External table to query the data in HDFS. This process is similar to connecting to any PolyBase target.
<b>Steps</b>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-load-sample-data?view=sqlallproducts-allversions" target="_blank">Open this reference, and perform all of the instructions you see there</a>. This loads your data in preparattion for the next Activity.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-load-sample-data?view=sqlallproducts-allversions" target="_blank">Open this reference, and perform all of the instructions you see there</a>. This loads your data in preparation for the next Activity.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-query-hdfs-storage-pool?view=sqlallproducts-allversions" target="_blank">Open this reference, and perform all of the instructions you see there</a>. This step shows you how to create and query an External table.</p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="../graphics/checkbox.png"><a href="https://docs.microsoft.com/en-us/sql/big-data-cluster/tutorial-query-oracle?view=sqlallproducts-allversions" target="_blank">(Optional) Open this reference, and review the instructions you see there</a>. (You You must have an Oracle server that your BDC can reach to perform these steps, although you can review them if you do not)</p>
@ -118,7 +118,7 @@ In this activity, you will load the sample data into your big data cluster envir
There are three primary uses for a large cluster of data processing systems for Machine Learning and AI applications. The first is that the users will involved in the creation of the <a href="https://www.codeingschool.com/2018/09/what-are-features-and-labels-in-machine-learning.html" target="_blank">Features used in various ML and AI algorithms, and are often tasked to Label</a> the data. These users can access the Data Pool and Data Storage data stores directly to query and assist with this task.
The SQL Server Master Instance in the BDC installs with <a href="https://docs.microsoft.com/en-us/sql/advanced-analytics/what-is-sql-server-machine-learning?view=sql-server-ver15" target="_blank">Machine Learning Services</a>, which allow creation, training, evaluation and presisting of Machine Learning Models. Data from all parts of the BDC are available, and Data Science oriented languages and libraries in R, Python and Java are enabled. In this scenario, the Data Scientist creates the R or Python code, and the Transact-SQL Developer wraps that code in a Stored Procedure. This code can be used to train, evaluate and create Machine Learning Models. The Models can be stored in the Master Instance for scoring, or sent on to the App Pool where the Machine Learning Server is running, waiting to accept REST-based calls from applications.
The SQL Server Master Instance in the BDC installs with <a href="https://docs.microsoft.com/en-us/sql/advanced-analytics/what-is-sql-server-machine-learning?view=sql-server-ver15" target="_blank">Machine Learning Services</a>, which allow creation, training, evaluation and persisting of Machine Learning Models. Data from all parts of the BDC are available, and Data Science oriented languages and libraries in R, Python and Java are enabled. In this scenario, the Data Scientist creates the R or Python code, and the Transact-SQL Developer wraps that code in a Stored Procedure. This code can be used to train, evaluate and create Machine Learning Models. The Models can be stored in the Master Instance for scoring, or sent on to the App Pool where the Machine Learning Server is running, waiting to accept REST-based calls from applications.
<br>
<img style="height: 400; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="../graphics/bdcsolution4.png">

Просмотреть файл

@ -29,7 +29,7 @@ You'll cover the following topics in this Module:
There are two primary areas for monitoring your BDC deployment. The first deals with SQL Server 2019, and the second deals with the set of elements in the Cluster.
For SQL Server, <a href="https://docs.microsoft.com/en-us/sql/relational-databases/database-lifecycle-management?view=sql-server-ver15" target="_blank">management is much as you would normally perform for any SQL Server system</a>. You have the same type of services, surface points, security areas and other control vectors as in a stand-alone installation of SQL Server. The tools you have avalaible for managing the Master Instance in the BDC are the same as managing a stand-alone installation, including SQL Server Management Studio, command-line interfaces, Azure Data Studio, and third party tools.
For SQL Server, <a href="https://docs.microsoft.com/en-us/sql/relational-databases/database-lifecycle-management?view=sql-server-ver15" target="_blank">management is much as you would normally perform for any SQL Server system</a>. You have the same type of services, surface points, security areas and other control vectors as in a stand-alone installation of SQL Server. The tools you have available for managing the Master Instance in the BDC are the same as managing a stand-alone installation, including SQL Server Management Studio, command-line interfaces, Azure Data Studio, and third party tools.
For the cluster components, you have three primary interfaces to use, which you will review next.