This commit is contained in:
Buck Woody 2020-03-04 12:53:22 -05:00
Родитель a3ab2ad5b3
Коммит 3088b2e9e3
5 изменённых файлов: 373 добавлений и 436 удалений

Просмотреть файл

@ -10,9 +10,9 @@
This workshop is taught using various components, which you will install and configure in the sections that follow.
> Note: Due to the nature of working with large-scale systems, it may not be possible for you to set up everything you need to perform each lab exercise. Participation in each Activity is optional - we will be working through the exercises together, but if you cannot install any software or don't have an Azure account, the instructor will work through each exercise in the workshop. You will also have full access to these materials so that you can work through them later when you have more time and resources.
> Note: Due to the nature of working with large-scale systems, it may not be possible for you to set up everything you need to perform each lab exercise. Participation in each Activity is optional - we will be working through the exercises together, but if you cannot install any software or don't have an Azure account, the instructor will work through each exercise in the workshop. You will also have full access to these materials so that you can work through them later when you have more time and resources. Another option is to "pair-up" with a fellow student and work together on a single environment.
For this workshop, you will use an instructor-provided Virtual Machine environment with Ubuntu as the operating system, with Docker, Kubernetes and Microsoft's SQL Server Big Data Cluster pre-installed. The intention of the course is that these systems are provided to you in class. However, to ensure that you have an environment available should something go wrong with the in-class system, or if you want to take this course outside of a classroom, you need to set up a Virtual Machine with these software components.
For this workshop, you will use an instructor-provided Virtual Machine environment with Ubuntu as the operating system, along with Docker, and Kubernetes pre-installed. The intention of the course is that these systems are provided to you in class. However, to ensure that you have an environment available should something go wrong with the in-class system, or if you want to take this course outside of a classroom, you need to set up a Virtual Machine with these software components.
> Note: If you wish to use another cloud service or a local Virtual Machine, it needs to match or exceed the [Microsoft Azure Standard_D8s_v3 Virtual Machine](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-general#dsv3-series-1) which has the following requirements:
- Ubuntu 16.04
@ -22,7 +22,7 @@ For this workshop, you will use an instructor-provided Virtual Machine environme
Whether you use the provided classroom system, the Microsoft Azure Virtual Machine, or another Virtual Machine system, the requirements for your local laptop are (*You will get instructions for setting these up in a moment*):
- **A Microsoft Azure Account**: This workshop uses the Microsoft Azure platform to host the Kubernetes cluster (using an appropriately sized Virtual Machine). You can use an MSDN account for Azure, your own Azure account, or potentially one provided for you, as long as you can create about $50.00 (U.S.) worth of Microsoft Azure assets.
- **A Microsoft Azure Account**: This workshop uses the Microsoft Azure platform to host the backup Kubernetes cluster (using an appropriately sized Virtual Machine). You can use an MSDN account for Azure, your own Azure account, or potentially one provided for you, as long as you can create about $75.00 (U.S.) worth of Microsoft Azure assets.
- **The Azure Command Line Interface**: The Azure CLI allows you to work from the command line on multiple platforms to interact with your Azure subscription, and also has control statements for AKS.
- **Microsoft Azure Data Studio**: The *Azure Data Studio* IDE, along with various Extensions, is used for deploying the system, and querying and management of the BDC. In addition, you will use this tool to participate in the workshop. Note: You can connect to a SQL Server 2019 Big Data Cluster using any SQL Server connection tool or application, such as SQL Server Management Studio, but this course will use Microsoft Azure Data Studio for cluster management, Jupyter Notebooks and other capabilities.
@ -49,7 +49,7 @@ You can also use your own account or one provided to you by your organization, b
Your workshop invitation may have instructed you that they will provide a Microsoft Azure account for you to use. If so, you will receive instructions that it will be provided.
**Unless you received explicit instructions in your workshop invitations, you much create either an MSDN or Personal account. You must have an account prior to the workshop.**
**Unless you received explicit instructions in your workshop invitations, you must create either an MSDN or Personal Microsoft Azure account. You must have an account prior to the workshop. No Microsoft Azure assets will be provided the day of class unless instructed otherwise.**
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity 2: Prepare Your Workstation</b></p>
@ -165,7 +165,7 @@ The system is now ready for class. Use the Linux <pre>shutdown -h</pre> command
Next, run the following command to de-allocate the machine safely so that you are no longer charged (except for the storage) until class starts.
>Note: This command does not remove the Virtual Machine. You will start it again once you start the class.
>Note: This command does not remove the Virtual Machine. You will start it again once you start the class. Do NOT leave the system running in the Auzre Portal or it will use all your credits, and potentially result in additional charges.
<pre>az vm deallocate --name ReplaceWithVMName</pre>

Просмотреть файл

@ -12,7 +12,7 @@ In this workshop you'll cover using a Process and various Platform components to
This module covers Container technologies and how they are different than Virtual Machines. Then you'll learn about the need for container orchestration using Kubernetes. We'll start with some computer architecture basics, and introduce the Linux operating system in case you're new to that topic. There are many references and links throughout the module, and a further set of references at the bottom of this module.
You can now start the Virtual Machine you created in the pre-requisites. You will run all commands from that environment.
You may now start the Virtual Machine you created in the pre-requisites. You will run all commands from that environment.
<p style="border-bottom: 1px solid lightgrey;"></p>
@ -40,7 +40,7 @@ In general, Linux treats almost everything as a file system or a process. For ge
Both Windows and Linux (in the x86 architecture) are *Symmetric Multiprocessing* systems, which means that all processors are addressed as a single unit. In general, distributed processing systems should have larger, and more, processors at the "head" node. General Purpose (GP) processors are normally used in these systems, but for certain uses such as Deep Learning or Artificial Intelligence, Purpose-Built processors such as Graphics Processing Unit (GPU's) are used within the environment, and in some cases, Advanced RISC Machines (ARM) chips can be brought into play for specific tasks. For the purposes of this workshop, you will not need these latter two technologies, although both Windows and Linux support them.
<i>NOTE: Linux can be installed on chipsets other than x86 compatible systems. In this workshop, you will focus on x86 systems.</i>
> NOTE: Linux can be installed on chipsets other than x86 compatible systems. In this workshop, you will focus on x86 systems.
<ul>
<li><a href="https://www.tecmint.com/check-linux-cpu-information/" target="_blank">Getting Processor Information using Linux utilities</a></li>
@ -48,12 +48,11 @@ Both Windows and Linux (in the x86 architecture) are *Symmetric Multiprocessing*
<h3>Memory</h3>
Storage Nodes in a distributed processing system need a nominal amount of memory, and the processing nodes in the framework (Spark) included with BDC need more.
Both Linux and Windows support large amounts of memory (most often as much as the system can hold) natively, and no special configuration for memory is needed.
Storage Nodes in a distributed processing system need a nominal amount of memory, and the processing nodes in the framework (Spark) included with SQL Server big data clusters need much more. Both Linux and Windows support large amounts of memory (most often as much as the system can hold) natively, and no special configuration for memory is needed.
Modern operating systems use a temporary area on storage to act as memory for light caching and also as being able to extend RAM memory. This should be avoided as much as possible in both Windows and Linux.
While technically a Processor system, NUMA (Non-Uniform Memory Access) systems allow for special memory configurations to be mixed in a single server by a processor set. Both Linux and Windows support NUMA access.
While technically a Processor system, [NUMA (Non-Uniform Memory Access) systems](https://www.kernel.org/doc/html/latest/vm/numa.html) allow for special memory configurations to be mixed in a single server by a processor set. Both Linux and Windows support NUMA access.
<ul>
<li><a href="https://www.linux.com/news/all-about-linux-swap-space" target="_blank">Monitoring the Swap File in Linux</a></li>
@ -63,7 +62,7 @@ While technically a Processor system, NUMA (Non-Uniform Memory Access) systems a
The general guidance for large-scale distributed processing systems is that the network between them be as fast and collision-free (in the case of TCP/IP) as possible. The networking stacks in both Windows and Linux require no special settings within the operating system, but you should thoroughly understand the driver settings as they apply to each NIC installed in your system, that each setting is best optimized for your networking hardware and topology and that the drivers are at the latest tested versions.
Although a full discussion of the TCP/IP protocol is beyond the scope of this workshop, it's important that you have a good understanding of how it works, since it permeates every part of the BDC architecture. <a href="https://support.microsoft.com/en-us/help/164015/understanding-tcp-ip-addressing-and-subnetting-basics" target="_blank">You can get a quick overview of TCP/IP here</a>.
Although a full discussion of the TCP/IP protocol is beyond the scope of this workshop, it's important that you have a good understanding of how it works, since it permeates every part of a Kubernetes architecture. <a href="https://support.microsoft.com/en-us/help/164015/understanding-tcp-ip-addressing-and-subnetting-basics" target="_blank">You can get a quick overview of TCP/IP here</a>.
The other rule you should keep in mind is that in general you should have only the presentation of the data handled by the workstation or device that accesses the solution. Do not move large amounts of data back and forth over the network to have the data processed locally. That being said, there are times (such as certain IoT scenarios) where subsets of data should be moved to the client, but you will not face those scenarios in this workshop.
@ -82,7 +81,7 @@ The best way to learn an operating system is to install it and perform real-worl
</table>
The essential commands you should know for this workshop are below. In Linux you can often send the results of one command to another using the "pipe" symbol, similar to PowerShell: <b>|</b>.
The essential commands you should know for this workshop are listed below, with links to learn more. In Linux you can often send the results of one command to another using the "pipe" symbol, similar to PowerShell: <b>|</b>.
<table style="tr:nth-child(even) {background-color: #f2f2f2;}; text-align: left; display: table; border-collapse: collapse; border-spacing: 2px; border-color: gray;">
@ -145,15 +144,15 @@ One abstraction layer above installing software directly on hardware is using a
In this abstraction level, you have full control (and responsibility) for the entire operating system, but not the hardware. This isolates all process space and provides an entire "Virtual Machine" to applications. For scale-out systems, a Virtual Machine allows for a distribution and control of complete computer environments using only software.
You can <a href="https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs" target="_blank">read the details of Microsoft's Hypervisor here</a>.
You can <a href="https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/reference/tlfs" target="_blank">read the details of Microsoft's Hypervisor here</a>, and [the VMWare approach is detailed here](https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.vcenterhost.doc/GUID-302A4F73-CA2D-49DC-8727-81052727A763.html). Another popular Hypervisor is [VirtualBox, which you can read more about here](https://www.virtualbox.org/manual/UserManual.html).
<h3>Containers <i>(Docker)</i></h3>
The next level of Abstraction is a <i>Container</i>. There are various types of Container technologies, in this workshop, you will focus on the docker container format, as implemented in the <a href="https://mobyproject.org/" target="_blank">Moby framework</a>.
A Container is provided by the Container Runtime (Such as [containerd](https://containerd.io/)]) runtime engine, which sits above the operating system (Windows or Linux). In this abstraction, you do not control the hardware <i>or</i> the operating system. The Container has a very small Kernel in it, and can contain binaries such as Python, R, SQL Server, or other binaries. A Container with all its binaries is called an <i>Image</i>. You create a container from a file.
A Container is provided by the Container Runtime (Such as [containerd](https://containerd.io/)]) runtime engine, which sits on the operating system (Windows or Linux). In this abstraction, you do not control the hardware <i>or</i> the operating system. In general, you create a manifest (in the case of Docker a "dockerfile") which is a text file containing the binaries, code, files or other assets you want to run in a Container. You then "build" that file into a binary object called an "Image", which you can then store in a "Repository". Now you can use the runtime commands to download (pull) that Image from the Repository and run it on your system. This running, protected memory space is now a "Container".
<i>(NOTE: The Container Image Kernel can run on Windows or Linux, but you will focus on the Linux Kernel Containers in this workshop.)</i>
> NOTE: The Container service runs natively on Linux, but in Windows it often runs in a Virtual Machine environment running Linux as a VM. So in Windows you may see several layers of abstraction for Containers to run.
<br>
<img style="height: 300;" src="https://docs.docker.com/images/Container%402x.png?raw=true">
@ -173,8 +172,8 @@ For Big Data systems, having lots of Containers is very advantageous to segment
<tr><td style="background-color: AliceBlue; color: black;"><b>Component</b></td><td style="background-color: AliceBlue; color: black;"><b>Used for</b></td></tr>
<tr><td>Cluster</td><td> Container Orchestration (Such as Kubernetes or OpenShift) cluster is a set of machines, known as nodes. One node controls the cluster and is designated the master node; the remaining nodes are worker nodes. The Container Orchestration *master* is responsible for distributing work between the workers, and for monitoring the health of the cluster.</td></tr>
<tr><td style="background-color: AliceBlue; color: black;">Node</td><td td style="background-color: AliceBlue; color: black;"> A node runs containerized applications. It can be either a physical machine or a virtual machine. A Cluster can contain a mixture of physical machine and virtual machines as Nodes.</td></tr>
<tr><td>Cluster</td><td> Container Orchestration (Such as Kubernetes or OpenShift) cluster is a set of machines, known as Nodes. One node controls the cluster and is designated the master node; the remaining nodes are worker nodes. The Container Orchestration *master* is responsible for distributing work between the workers, and for monitoring the health of the cluster.</td></tr>
<tr><td style="background-color: AliceBlue; color: black;">Node</td><td td style="background-color: AliceBlue; color: black;"> A Node runs containerized applications. It can be either a physical machine or a virtual machine. A Cluster can contain a mixture of physical machine and virtual machines as Nodes.</td></tr>
<tr><td>Pod</td><td> A Pod is the atomic deployment unit of a Cluster. A pod is a logical group of one or more Containers and associated resources needed to run an application. Each Pod runs on a Node; a Node can run one or more Pods. The Cluster master automatically assigns Pods to Nodes in the Cluster.</td></tr>
</table>
@ -183,7 +182,7 @@ For Big Data systems, having lots of Containers is very advantageous to segment
<p><img style="height: 400; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/KubernetesCluster.png?raw=true"></p>
<br>
You can <a href="https://kubernetes.io/docs/tutorials/kubernetes-basics/" target="_blank">learn much more about Container Orchestration systems here</a>. We're using the Azure Kubernetes Service (AKS) in this workshop, and <a href="https://aksworkshop.io/" target="_blank">they have a great set of tutorials for you to learn more here</a>.
You can <a href="https://kubernetes.io/docs/tutorials/kubernetes-basics/" target="_blank">learn much more about Container Orchestration systems here</a>. We're using the Azure Kubernetes Service (AKS) as a backup in this workshop, and <a href="https://aksworkshop.io/" target="_blank">they have a great set of tutorials for you to learn more here</a>.
In SQL Server Big Data Clusters, the Container Orchestration system (Such as Kubernetes or OpenShift) is responsible for the state of the BDC; it is responsible for building and configuring the Nodes, assigns Pods to Nodes,creates and manages the Persistent Volumes (durable storage), and manages the operation of the Cluster.
@ -191,9 +190,11 @@ In SQL Server Big Data Clusters, the Container Orchestration system (Such as Kub
(You'll cover the storage aspects of Container Orchestration in more detail in a moment.)
All of this orchestration is controlled by another set of text files, called "Manifests", which you will learn more about in the Modules that follow. You declare the state of the layout, networking, storage, redundancy and more in these files, load them to the Kubernetes environment, and it is responsible for instantiating and maintaining a Cluster with that configuration.
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity: Familiarize Yourself with Container Orchestration using minikube</b></p>
To practice with Kubernetes, you will use an online emulator to work with the `minikube` platform.
To practice with Kubernetes, you will use an online emulator to work with the `minikube` platform. This is a small Virtual Machine with a single Node acting as a full Cluster.
<p><b>Steps</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkbox.png?raw=true"><a href="https://kubernetes.io/docs/tutorials/kubernetes-basics/create-cluster/cluster-interactive/
@ -207,7 +208,7 @@ To practice with Kubernetes, you will use an online emulator to work with the `m
Traditional storage uses a call from the operating system to an underlying I/O system, as you learned earlier. These file systems are either directly connected to the operating system or appear to be connected directly using a Storage Area Network. The blocks of data are stored and managed by the operating system.
For large scale-out data systems, the mounting point for an I/O is another abstraction. For SQL Server BDC, the most commonly used scale-out file system is the <a href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html" target="_blank">Hadoop Data File System</a>, or <i>HDFS</i>. HDFS is a set of Java code that gathers disparate disk subsystems into a <i>Cluster</i> which is comprised of various <i>Nodes</i> - a <i>NameNode</i>, which manages the cluster's metadata, and <i>DataNodes</i> that physically store the data. Files and directories are represented on the NameNode by a structure called <i>inodes</i>. Inodes record attributes such as permissions, modification and access times, and namespace and diskspace quotas.
For large scale-out data systems, the mounting point for an I/O is another abstraction. For SQL Server BDC, the most commonly used scale-out file system is the <a href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html" target="_blank">Hadoop Data File System</a>, or <i>HDFS</i>. HDFS is a set of Java code that gathers disparate disk subsystems into a <i>Cluster</i> which is comprised of various <i>Nodes</i> - a <i>NameNode</i>, which manages the cluster's metadata, and <i>DataNodes</i> that physically store the data. Files and directories are represented on the NameNode by a structure called <i>inodes</i>. Inodes record attributes such as permissions, modification and access times, and namespace and disk space quotas.
<p><img style="height: 300; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/hdfs.png?raw=true"></p>
@ -247,8 +248,7 @@ You'll explore further operations with these tools in the rest of the course.
If you have not completed the prerequisites for this workshop you can <a href="https://docs.microsoft.com/en-us/sql/azure-data-studio/download?view=sql-server-2017
" target="_blank">install Azure Data Studio from this location</a>, and you will install the Extension to work with SQL Server big data clusters in a future module</a>.
You can <a href="https://docs.microsoft.com/en-us/sql/azure-data-studio/what-is?view=sql-server-2017
" target="_blank">learn more about Azure Data Studio here</a>.
You can <a href="https://docs.microsoft.com/en-us/sql/azure-data-studio/what-is?view=sql-server-2017" target="_blank">learn more about Azure Data Studio here</a>.
<br>
<p><img style="height: 300; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/ads.png?raw=true"></p>

Просмотреть файл

@ -9,7 +9,7 @@
# 02 - Hardware and Virtualization environment for Kubernetes #
In this workshop you have covered the fundamentals behind contains and container orchestration. The end of this Module contains several helpful references you can use in these exercises and in production.
In this workshop you have covered the fundamentals behind containers and container orchestration. The end of this Module contains several helpful references you can use in these exercises and in production.
This module covers the infrastructure foundation for building a Kubernetes cluster. You will examine the cluster installation on a Hypervisor environment for the in-person class, and you will also get references for other architectures.
@ -19,8 +19,7 @@ This module covers the infrastructure foundation for building a Kubernetes clust
## 2.1 Kubernetes Targets ##
Kubernetes and it's variants, can run on "bare metal" - a server-hardware system. You can also use a Hypervisor, such as VMWare or Hyper-V. In many companies, the IT infrastructure is completely virtualized, so this module covers the installation on that platform. Most of the principles for deploying Kubernetes apply when carrying out the activity on non-virtualized hardware, and you'll learn the differences as you progress through the exercises.
Kubernetes and its variants can run on "bare metal" - a server hardware system. You can also use a Hypervisor, such as VMWare or Hyper-V. In many companies, the IT infrastructure is completely virtualized, so this module covers the installation using that platform. Most of the principles for deploying Kubernetes apply when carrying out the activity on non-virtualized hardware, and you'll learn the differences as you progress through the exercises.
<p style="border-bottom: 1px solid lightgrey;"></p>
@ -30,31 +29,31 @@ In this instructor-led session, the hardware for the hands on lab exercises cons
<img style="width=80; float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/graphics/2_2_1_workshop_hw.PNG?raw=true">
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity: Basic Sandbox Environment Familiarisation</b></p>
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity: Basic Sandbox Environment Familiarization</b></p>
1. Connect to your sandbox environment using a ssh client. macOS comes with a built in client. Windows users may wish to user either [Putty](https://www.putty.org/) or a DOS command shell window. Your workshops tutors will make login credentials available to each attendee.
1. Connect to your sandbox environment using a ssh client. Apple and Linux systems come with a built in client, as does the latest Windows operating system. You may also wish to use the graphical [Putty](https://www.putty.org/) tool. Your workshops tutors will make login credentials available to each attendee.
2. Install the virt-what package using apt-get:
2. Install the `virt-what` package using the `apt-get` command in Linux, once you have connected to your environment:
```sudo apt-get install virt-what```
`sudo apt-get install virt-what`
3. Execute virt-what to verify the platform that your sandbox environment is running on:
3. Execute `virt-what` to verify the platform that your sandbox environment is running on:
```sudo virt-what```
`sudo virt-what`
4. Obtain the amount of memory your sandbox environment has by running the following command:
```lsmem```
`lsmem`
5. Obtain the processor information for your sandbox environment by running the following command:
```lscpu```
`lscpu`
<p style="border-bottom: 1px solid lightgrey;"></p>
## 2.3 Virtualised Infrastructure Setup and Configuration ##
With the hardware and layout in place, you'll now turn to the configuration of the cluster environment. The operating system for each master and worker node is Ubuntu 18.04.3 LTS, and the storage orchestrator requires the open-iscsi package. To begin with, a virtual machine is required to base the template off for both the master and worker nodes:
With the hardware and layout in place, you'll now turn to the configuration of the cluster environment. The operating system for each master and worker node is Ubuntu 18.04.3 LTS, and the storage orchestrator requires the `open-iscsi` package. To begin with, a virtual machine is required to base the template off for both the master and worker nodes:
1. First start by creating a virtual machine in VMware vCenter:
@ -90,7 +89,7 @@ With the hardware and layout in place, you'll now turn to the configuration of t
<img style="width=80; float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/graphics/2_3_7_vcenter.PNG?raw=true">
8. Review the configuration for the virtual machine and then hit FINISH:
8. Review the configuration for the virtual machine and then hit `FINISH`:
<img style="width=80; float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/graphics/2_3_8_vcenter.PNG?raw=true">
@ -141,15 +140,15 @@ With the hardware and layout in place, you'll now turn to the configuration of t
<img style="width=80; float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/graphics/2_3_21_vcenter.PNG?raw=true">
19. Select the default of /dev/sda as the device to install Ubuntu on:
19. Select the default of `/dev/sda` as the device to install Ubuntu on:
<img style="width=80; float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/graphics/2_3_22_vcenter.PNG?raw=true">
20. Select 'Done' to confirm the filesystem configuration:
20. Select `Done` to confirm the filesystem configuration:
<img style="width=80; float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/graphics/2_3_23_vcenter.PNG?raw=true">
21. Select 'Continue' to confirm that the target disk of the installation will be formatted (destructively):
21. Select `Continue` to confirm that the target disk of the installation will be formatted (destructively):
<img style="width=80; float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/graphics/2_3_24_vcenter.PNG?raw=true">
@ -157,11 +156,11 @@ With the hardware and layout in place, you'll now turn to the configuration of t
<img style="width=80; float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/graphics/2_3_25_vcenter.PNG?raw=true">
23. Install the OpenSSH server:
23. Install the `OpenSSH` server:
<img style="width=80; float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/graphics/2_3_26_vcenter.PNG?raw=true">
24. Hit 'Done' to confirm that no featured server snaps are to be installed, the single node cluster script will install everything that is required:
24. Hit `Done` to confirm that no featured server snaps are to be installed, the single node cluster script will install everything that is required:
<img style="width=80; float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/graphics/2_3_27_vcenter.PNG?raw=true">
@ -185,7 +184,7 @@ With the hardware and layout in place, you'll now turn to the configuration of t
<img style="width=80; float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/graphics/2_3_32_vcenter.PNG?raw=true">
30. **We now have a virtual machine that a single node SQL Server 2019 Big Data Cluster can be delpoyed to using [this script](https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-script-single-node-kubeadm?view=sql-server-ver15)**. If the objective is to deplopy a production grade cluster, the following commands should be executed against the guest operating system, after which the virtual machine can be converted into a template:
30. **We now have a virtual machine that a single node SQL Server 2019 Big Data Cluster can be deployed to using [this script](https://docs.microsoft.com/en-us/sql/big-data-cluster/deployment-script-single-node-kubeadm?view=sql-server-ver15)**. If the objective is to deploy a production grade cluster, the following commands should be executed against the guest operating system, after which the virtual machine can be converted into a template:
```
apt-get install -q -y ebtables ethtool
@ -201,17 +200,17 @@ echo net.ipv6.conf.lo.disable_ipv6=1 > /etc/sysctl.conf
sysctl net.bridge.bridge-nf-call-iptables=1
```
31. To create the virtual machine template, right click on the virtual machine, select 'Power' and then "Power Off":
31. To create the virtual machine template, right click on the virtual machine, select `Power` and then `Power Off`:
<img style="width=80; float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/graphics/2_3_33_vcenter.PNG?raw=true">
32. Right click on the virtual machine, select 'Template' and then "Convert to Template".
32. Right click on the virtual machine, select `Template` and then `Convert to Template`.
33. Deployment of a production grade Kubernetes cluster can be carried out using:
- Kubeadm, as per the instructions in [Microsoft's online documentation](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-with-kubeadm?view=sql-server-ver15).
- `Kubeadm`, as per the instructions in [Microsoft's online documentation](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-with-kubeadm?view=sql-server-ver15).
- Or for an approach that leverages Kubeadm **and** automates the entire deployment process, [Kubespray](https://kubespray.io/#/) can be used.
- Or for an approach that leverages `Kubeadm` **and** automates the entire deployment process, [`Kubespray`](https://kubespray.io/#/) can be used.
<p style="border-bottom: 1px solid lightgrey;"></p>
@ -221,7 +220,7 @@ In this instructor-led workshop, Storage Orchestration is facilitated via the [P
<p><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/point1.png?raw=true"><b>Activity: Installing The Storage Plugin </b></p>
The activity covers the installation of a Container Storage Interface compliant Kubernetes storage plugin using Helm.
The activity covers the installation of a Container Storage Interface compliant Kubernetes storage plugin using [Helm](https://helm.sh/).
<p><img style="margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/checkmark.png?raw=true"><b>Steps</b></p>
@ -231,66 +230,80 @@ kubectl get sc
```
2. Verify that the iSCSI target endpoints are reachable:
```
ping <iSCSI target ip address(es)>
```
In this example, each IP address associated with the interfaces ct0 and ct1 should be reachable from each node host in the cluster:
`ping <iSCSI target ip address(es)>`
In this example, each IP address associated with the interfaces `ct0` and `ct1` should be reachable from each node host in the cluster:
<img style="width=80; float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/k8stobdc/graphics/2_4_2_purity.PNG?raw=true">
3. Confirm the version of helm that is installed by executing the command ```helm version```.
3. Confirm the version of helm that is installed by executing the command
`helm version`
4. Download the YAML template for the storage plugin configuration:
```
curl --output pso-values.yaml https://raw.githubusercontent.com/purestorage/helm-charts/master/pure-csi/values.yaml```
```
4. Using a text editor such as VI or nano, open the pso-values.yaml file.
`curl --output pso-values.yaml https://raw.githubusercontent.com/purestorage/helm-charts/master/pure-csi/values.yaml`
5. Uncomment lines 82 through to 84 by removing the hash symbol from each line.
4. Using a text editor such as VI or nano, open the `pso-values.yaml` file.
6. On line 83, replace the template IP address with the management endpoint IP address of the array that persistent volumes are to be created on.
5. Uncomment lines 82 through to 84 by removing the hash symbol (`#`)from each line.
7. On line 84, replace the template API token with API token for the array that persistent volumes are to be created on.
6. On line 83, replace the `template IP addres`s` with the `management endpoint IP address` of the array that persistent volumes are to be created on.
7. On line 84, replace the `template API token` with `API token for the array` that persistent volumes are to be created on.
8. Add the repo containing the Helm chart for the storage plugin:
- For all versions of Helm run:
```
helm repo add pure https://purestorage.github.io/helm-charts
helm repo update
```
- For Helm version 2, also run:
```
helm search pure-csi
```
- For Helm version 3, also run:
```
helm search repo pure-csi
```
9. Perform a dry run install of the plugin, this will verify that the contents of the pso-values.yaml file is correct:
9. Perform a dry run install of the plugin, this will verify that the contents of the `pso-values.yaml` file is correct:
- For Helm version 2, run:
```
helm install --name pure-storage-driver pure/pure-csi --namespace <namespace> -f <your_own_dir>/pso-values.yaml --dry-run --debug
```
- For Helm version 3, run:
```
helm install pure-storage-driver pure/pure-csi --namespace <namespace> -f <your_own_dir>/pso-values.yaml --dry-run --debug
```
10. If the dry run of the installation completed successfully, the actual install of the plugin can be performed, otherwise the pso-values.yaml file needs to be corrected:
10. If the dry run of the installation completed successfully, the actual install of the plugin can be performed, otherwise the `pso-values.yaml` file needs to be corrected:
- For Helm version 2, run:
```
helm install --name pure-storage-driver pure/pure-csi --namespace <namespace> -f <your_own_dir>/pso-values.yaml
```
- For Helm version 3, run:
```
helm install pure-storage-driver pure/pure-csi --namespace <namespace> -f <your_own_dir>/pso-values.yaml
```
11. List the type of storage classes that are now installed, a new storage class should be present:
```
kubectl get sc
```

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -8,13 +8,13 @@
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/textbubble.png?raw=true"><b> About this Workshop</b></h2>
Welcome to this Microsoft solutions workshop on *Kubernetes - From Bare Metal to SQL Server Big Data Clusters*. In this workshop, you'll learn about setting up a production grade SQL Server 2019 big data cluster environment on Kubernetes. Topics covered include: hardware, virtualization, and Kubernetes, with a full deployment of SQL Server's Big Data Cluster on the environment that you will use in the class. You'll then walk through a set of [Jupyter Notebooks](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html) in Microsoft's [Azure Data Studio](https://docs.microsoft.com/en-us/sql/azure-data-studio/what-is?view=sql-server-ver15) tool to run T-SQL, Spark, and Machine Learning workloads on the cluster. You'll also receive valuable resources to learn more and go deeper on Linux, Containers, Kubernetes and SQL Server big data clusters.
Welcome to this Microsoft solutions workshop on *Kubernetes - From Bare Metal to SQL Server Big Data Clusters*. In this workshop, you'll learn about setting up a production-grade SQL Server 2019 big data cluster environment on Kubernetes. Topics covered include: hardware, virtualization, and Kubernetes, with a full deployment of SQL Server's Big Data Cluster on the environment that you will use in the class. You'll then walk through a set of [Jupyter Notebooks](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html) in Microsoft's [Azure Data Studio](https://docs.microsoft.com/en-us/sql/azure-data-studio/what-is?view=sql-server-ver15) tool to run T-SQL, Spark, and Machine Learning workloads on the cluster. You'll also receive valuable resources to learn more and go deeper on Linux, Containers, Kubernetes and SQL Server big data clusters.
The focus of this workshop is to understand the hardware, software, and environment you need to work with [SQL Server 2019's big data clusters](https://docs.microsoft.com/en-us/sql/big-data-cluster/big-data-cluster-overview?view=sql-server-ver15) on a Kubernetes platform.
You'll start by understanding Containers and Kubernetes, moving on to a discussion of the hardware and software environment for Kubernetes, and then to more in-depth Kubernetes concepts. You'll follow-on with the SQL Server 2019 big data clusters architecture, and then how to use the entire system in a practical application, all with a focus on how to extrapolate what you have learned to create other solutions for your organization.
> NOTE: This course is designed to be taught in-person with hardware provided by the instructional team. You will also get instructions for setting up your own hardware, virtual or Cloud environments for Kubernetes for a workshop backup or if you are not attending in-person.
> NOTE: This course is designed to be taught in-person with hardware or virtual environments provided by the instructional team. You will also get details for setting up your own hardware, virtual or Cloud environments for Kubernetes for a workshop backup or if you are not attending in-person.
This [github README.MD file](https://lab.github.com/githubtraining/introduction-to-github) explains how the workshop is laid out, what you will learn, and the technologies you will use in this solution.
@ -40,7 +40,7 @@ The concepts and skills taught in this workshop form the starting points for:
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/building1.png?raw=true"><b> Business Applications of this Workshop</b></h2>
Businesses require stable, secure environments at scale, which work in on-premises and in-cloud configurations. Using Kubernetes and Containers allows for manifest-driven DevOps practices, which further streamline IT processes.
Businesses require stable, secure environments at scale, which work in secure on-premises and in-cloud configurations. Using Kubernetes and Containers allows for manifest-driven DevOps practices, which further streamline IT processes.
<p style="border-bottom: 1px solid lightgrey;"></p>
@ -65,11 +65,11 @@ The solution includes the following technologies - although you are not limited
There are a few requirements for attending the workshop, listed below:
- You'll need a local system that you are able to install software on. The workshop demonstrations use Microsoft Windows as an operating system and all examples use Windows for the workshop. Optionally, you can use a Microsoft Azure Virtual Machine (VM) to install the software on and work with the solution.
- You must have a Microsoft Azure account with the ability to create assets.
- This workshop expects that you understand computer technologies, networking, SQL Server, HDFS, Spark, and general use of Hypervisors.
- You must have a Microsoft Azure account with the ability to create assets for the "backup" or self-taught path.
- This workshop expects that you understand computer technologies, networking, the basics of SQL Server, HDFS, Spark, and general use of Hypervisors.
- The **Setup** section below explains the steps you should take prior to coming to the workshop
If you are new to these, here are a few references you can complete prior to class:
If you are new to any of these, here are a few references you can complete prior to class:
- [Microsoft SQL Server Administration and Use](https://www.microsoft.com/en-us/learning/course.aspx?cid=OD20764)
- [HDFS](https://data-flair.training/blogs/hadoop-hdfs-tutorial/)
@ -86,7 +86,7 @@ or
<h2><img style="float: left; margin: 0px 15px 15px 0px;" src="https://github.com/microsoft/sqlworkshops/blob/master/graphics/education1.png?raw=true"><b> Workshop Details</b></h2>
This workshop uses Kubernetes to deploy a workload, with a focus on Microsoft SQL Server's big data clusters deployment for Big Data and Data Science workloads.
This workshop uses Kubernetes to deploy a workload, with a focus on Microsoft SQL Server's big data clusters deployment for advanced analytics over large sets of data and Data Science workloads.
<table style="tr:nth-child(even) {background-color: #f2f2f2;}; text-align: left; display: table; border-collapse: collapse; border-spacing: 5px; border-color: gray;">