sqlworkshops/k8stobdc
Buck Woody 2bd744686a Latest Edits 2020-01-21 13:23:48 -05:00
..
.vscode
KubernetesToBDC Latest Edits 2020-01-21 13:23:48 -05:00
graphics
.DS_Store
README.md Updates to primary README for k2g 2019-12-30 14:02:06 -05:00

README.md

Workshop: Kubernetes - From Bare Metal to SQL Server Big Data Clusters

A Microsoft Course from the SQL Server team

About this Workshop

Welcome to this Microsoft solutions workshop on Kubernetes - From Bare Metal to SQL Server Big Data Clusters. In this workshop, you'll learn about setting up a production grade SQL Server 2019 big data cluster environment on Kubernetes. Topics covered include: hardware, virtualization, and Kubernetes, with a full deployment of SQL Server's Big Data Cluster on the environment that you will use in the class. You'll then walk through a set of Jupyter Notebooks in Azure Data Studio to run T-SQL, Spark, and Machine Learning workloads on the cluster. You'll also receive valuable resources to learn more and go deeper on Linux, Containers, Kubernetes and SQL Server big data clusters.

The focus of this workshop is to understand the hardware, software, and environment you need to work with SQL Server 2019's big data clusters on a Kubernetes platform.

You'll start by understanding Containers and Kubernetes, moving on to a discussion of the hardware and software environment for Kubernetes, and then to more in-depth Kubernetes concepts. You'll follow-on with the SQL Server 2019 big data clusters architecture, and then how to use the entire system in a practical application, all with a focus on how to extrapolate what you have learned to create other solutions for your organization.

NOTE: This course is designed to be taught in-person with hardware provided by the instructional team. There are instructions for setting up your own hardware, virtual or Cloud environments for Kubernetes, but they are pointers to a more involved process you will carry out on your own if not attending in-person.

This README.MD file explains how the workshop is laid out, what you will learn, and the technologies you will use in this solution.

(You can view all of the source files for this workshop on this github site, along with other workshops as well. Open this link in a new tab to find out more.)

Learning Objectives

In this workshop you'll learn:

  • How Containers and Kubernetes work and where you can use them
  • Hardware considerations for setting up a production Kubernetes Cluster on -remises
  • Considerations for Virtual and Cloud-based environments for production Kubernetes Cluster

The concepts and skills taught in this workshop form the starting points for:

Solution Architects, to understand how to put design an end-to-end solution. System Administrators, Database Administrators, or Data Engineers, to understand how to put together an end-to-end solution.

Business Applications of this Workshop

Businesses require stable, secure environments at scale, which work in on-premises and in-cloud configurations. Using Kubernetes and Containers allows for manifest-driven Dev-Ops practices, which further streamline IT processes.

Technologies used in this Workshop

The solution includes the following technologies - although you are not limited to these, they form the basis of the workshop. At the end of the workshop you will learn how to extrapolate these components into other solutions. You will cover these at an overview level, with references to much deeper training provided.

Technology Description
LinuxThe primary operating system used in and by Containers and Kubernetes
ContainersThe atomic layer of a Kubernetes Cluster
KubernetesThe primary clustering technology for manifest-driven environments
SQL Server Big Data ClustersRelational and non-relational data at scale with Spark, HDFS and application deployment capabilities

Before Taking this Workshop

You'll need a local system that you are able to install software on. The workshop demonstrations use Microsoft Windows as an operating system and all examples use Windows for the workshop. Optionally, you can use a Microsoft Azure Virtual Machine (VM) to install the software on and work with the solution.

You must have a Microsoft Azure account with the ability to create assets.

This workshop expects that you understand computer technologies, networking, SQL Server, HDFS, Spark, and general use of Hypervisors.

If you are new to these, here are a few references you can complete prior to class:

Setup

A full pre-requisites document is located here. These instructions should be completed before the workshop starts, since you will not have time to cover these in class. Remember to turn off any Virtual Machines from the Azure Portal when not taking the class so that you do incur charges (shutting down the machine in the VM itself is not sufficient).

Workshop Details

This workshop uses <TODO: enter main technologies used to solve the sceanrio>, with a focus on <TODO: architecture and implementation, development and use, etc>.

Primary Audience:Technical processionals tasked with configuring, deploying and managing large-scale clustering systems
Secondary Audience: Data professionals tasked with working with data at scale
Level: 300
Type:TODO: In-Person (self-guided possible)
Length: 8

Related Workshops

Workshop Modules

This is a modular workshop, and in each section, you'll learn concepts, technologies and processes to help you complete the solution.

ModuleTopics
01 - An introduction to Linux, Containers and Kubernetes This module covers Container technologies and how they are different than Virtual Machines. You'll learn about the need for container orchestration using Kubernetes.
02 - Hardware and Virtualization environment for Kubernetes This module explains how to make a production-grade environment using "bare metal" computer hardware or with a virtualized platform, and most importantly the storage hardware aspects.
03 - Kubernetes Concepts and Implementation Covers deploying Kubernetes, Kubernetes contexts, cluster troubleshooting and management, services: load balancing versus node ports, understanding storage from a Kubernetes perspective and making your cluster secure.
04 - SQL Server Big Data Clusters Architecture This module will dig deep into the anatomy of a big data cluster by covering topics that include: the data pool, storage pool, compute pool and cluster control plane, active directory integration, development versus production configurations and the tools required for deploying and managing a big data cluster.
05 - Using the SQL Server big data cluster on Kubernetes for Data Science Now that your big data cluster is up, it's ready for data science workloads. This Jupyter Notebook and Azure Data Studio based module will cover the use of python and PySpark, T-SQL and the execution of Spark and Machine Learning workloads.

Next Steps

Next, Continue to Pre-Requisites