This commit is contained in:
Rajesh Rao 2021-01-15 14:05:58 +05:30
Родитель 6f17454757
Коммит 2b86c14dc6
3 изменённых файлов: 35 добавлений и 27 удалений

Просмотреть файл

@ -1,40 +1,38 @@
[![Build Status](https://microsoftit.visualstudio.com/OneITVSO/_apis/build/status/Compliant/Core%20Services%20Engineering%20and%20Operations/Corporate%20Functions%20Engineering/Professional%20Services/PS%20Data%20And%20Insights/Data%20and%20Integration%20Platforms/PSDI%20Data%20Processing/PS-OMI-DAIP-DProc-MtStr-MetaStore_Build?branchName=master)](https://microsoftit.visualstudio.com/OneITVSO/_build/latest?definitionId=29958&branchName=master)
[![MIT License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/microsoft/Spark-SQL-Deployment-Manager/blob/main/LICENSE)
# Build and Deploy Spark SQL Tables.
- Build and Deploy Spark SQL tables incrementally. Checks for syntax errors before checking in the code to main and Deploy the changes using Continuous Deployment.
- This project aims to create a way to deploy spark sql tables using CI/CD and focus just on table schema changes rather than how to deploy the changes.
- This project currently supports azure data lake as storage source for delta tables.
## Overview
- Build and deploy a project with Spark SQL schema and tables incrementally.
- Checks for syntax errors using Continuous Integration, preventing incorrect table schema from being merged to deployment branch.
- Focus on table schema changes and use this project for the deployment.
- This project currently supports azure data lake as storage source for delta tables and Databricks implementation of Spark.
# Spark Sql Project
## How to use
### Create a Spark Sql Project
- Create a spark sql project which contains details about project files like Schema and Table scripts.
- Add pre and post deployment scripts if needed. These scripts are Scala Notebooks. The Pre deployment and Post deployment notebooks should be executed before and after executing the deployment respectively.
- Add Azure data lake path in values.json file.
- Refer to the exampleSqlProject spark sql project as a starting point.
- Refer to the exampleSqlProject Spark SQL project as a starting point.
# Build
## Features
### Building the Project
#### Features
- SqlBuild jar helps building the spark sql project.
- Build typically checks for syntax errors.
- Once the build succeeds, it will create a build artifact which can be used to Deploy the changes ( by invoking Deployment Manager)
## How to build
- Build will check for syntax errors in the project
- Once the build succeeds, it will create a build artifact which can be used to deploy the changes ( by invoking DeploymentManager)
#### How to build
- BuildSql project creates BuildSql.jar file.
- Use BuildSql.jar like an executable to build spark sql project.
- Run the jar by passing .sparkSql project file as command line arguments.
- Build Artifact is generated once build succeeds. You can find this artifact in bin folder created in project root directory.
- Use BuildSql.jar like an executable to build spark sql project - Run the jar by passing .sparkSql project file as command line arguments.
- Once the build succeeds, the build artifact can be found in bin folder created in project root directory.
# Deploy
## Features
- Currently Supports Delta table Deployment and schema changes.
- Execute Pre and Post Deployment Notebooks (typically to change anything manual or create some source data).
## How to deploy
- Modify the config file in the DeploymentManager project, to pass the relevant Azure Data Lake details and Databricks Scope
- Build the DeploymentManager jar.
- Execute the DeploymentManager jar on the spark cluster by passing output.json (build artifact) path as jar argument.
- Make sure to execute Pre and Post Deployment Notebooks on the cluster before and after executing the DeploymentManager jar respectively.
### Deploying the project
#### Features
- Based on changes in the Spark SQL project file, it will create or modify tables and schemas
- Currently Supports Delta table Deployment.
- Executes Pre and Post Deployment Notebooks (used for one-time manual changes).
#### How to deploy
- DeploymentManager project creates the DeploymentManager.jar file
- Execute the DeploymentManager jar on the spark cluster by passing path to output.json (build artifact) as jar argument.
- Execute Pre and Post Deployment Notebooks on the cluster before and after executing the DeploymentManager jar respectively.
## Trademarks

Просмотреть файл

@ -0,0 +1,7 @@
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
// POST DEPLOYMENT SCRIPT
// PREFER ONLY SCALA
// MAKE SURE THE CODE IS IDEMPOTENT.
println("executing pre deployment script")

Просмотреть файл

@ -1,3 +1,6 @@
// Copyright (c) Microsoft Corporation.
// Licensed under the MIT License.
// PRE DEPLOYMENT SCRIPT
// PREFER ONLY SCALA
// MAKE SURE THE CODE IS IDEMPOTENT.