Updated Readme
This commit is contained in:
Родитель
6f17454757
Коммит
2b86c14dc6
52
README.md
52
README.md
|
@ -1,40 +1,38 @@
|
|||
[![Build Status](https://microsoftit.visualstudio.com/OneITVSO/_apis/build/status/Compliant/Core%20Services%20Engineering%20and%20Operations/Corporate%20Functions%20Engineering/Professional%20Services/PS%20Data%20And%20Insights/Data%20and%20Integration%20Platforms/PSDI%20Data%20Processing/PS-OMI-DAIP-DProc-MtStr-MetaStore_Build?branchName=master)](https://microsoftit.visualstudio.com/OneITVSO/_build/latest?definitionId=29958&branchName=master)
|
||||
[![MIT License](https://img.shields.io/badge/license-MIT-green.svg)](https://github.com/microsoft/Spark-SQL-Deployment-Manager/blob/main/LICENSE)
|
||||
# Build and Deploy Spark SQL Tables.
|
||||
- Build and Deploy Spark SQL tables incrementally. Checks for syntax errors before checking in the code to main and Deploy the changes using Continuous Deployment.
|
||||
- This project aims to create a way to deploy spark sql tables using CI/CD and focus just on table schema changes rather than how to deploy the changes.
|
||||
- This project currently supports azure data lake as storage source for delta tables.
|
||||
## Overview
|
||||
- Build and deploy a project with Spark SQL schema and tables incrementally.
|
||||
- Checks for syntax errors using Continuous Integration, preventing incorrect table schema from being merged to deployment branch.
|
||||
- Focus on table schema changes and use this project for the deployment.
|
||||
- This project currently supports azure data lake as storage source for delta tables and Databricks implementation of Spark.
|
||||
|
||||
# Spark Sql Project
|
||||
## How to use
|
||||
### Create a Spark Sql Project
|
||||
- Create a spark sql project which contains details about project files like Schema and Table scripts.
|
||||
- Add pre and post deployment scripts if needed. These scripts are Scala Notebooks. The Pre deployment and Post deployment notebooks should be executed before and after executing the deployment respectively.
|
||||
- Add Azure data lake path in values.json file.
|
||||
- Refer to the exampleSqlProject spark sql project as a starting point.
|
||||
|
||||
# Build
|
||||
## Features
|
||||
- Refer to the exampleSqlProject Spark SQL project as a starting point.
|
||||
|
||||
- SqlBuild jar helps building the spark sql project.
|
||||
- Build typically checks for syntax errors.
|
||||
- Once the build succeeds, it will create a build artifact which can be used to Deploy the changes ( by invoking Deployment Manager)
|
||||
## How to build
|
||||
### Building the Project
|
||||
#### Features
|
||||
|
||||
- Build will check for syntax errors in the project
|
||||
- Once the build succeeds, it will create a build artifact which can be used to deploy the changes ( by invoking DeploymentManager)
|
||||
#### How to build
|
||||
|
||||
- BuildSql project creates BuildSql.jar file.
|
||||
- Use BuildSql.jar like an executable to build spark sql project.
|
||||
- Run the jar by passing .sparkSql project file as command line arguments.
|
||||
- Build Artifact is generated once build succeeds. You can find this artifact in bin folder created in project root directory.
|
||||
- Use BuildSql.jar like an executable to build spark sql project - Run the jar by passing .sparkSql project file as command line arguments.
|
||||
- Once the build succeeds, the build artifact can be found in bin folder created in project root directory.
|
||||
|
||||
# Deploy
|
||||
## Features
|
||||
|
||||
- Currently Supports Delta table Deployment and schema changes.
|
||||
- Execute Pre and Post Deployment Notebooks (typically to change anything manual or create some source data).
|
||||
|
||||
## How to deploy
|
||||
- Modify the config file in the DeploymentManager project, to pass the relevant Azure Data Lake details and Databricks Scope
|
||||
- Build the DeploymentManager jar.
|
||||
- Execute the DeploymentManager jar on the spark cluster by passing output.json (build artifact) path as jar argument.
|
||||
- Make sure to execute Pre and Post Deployment Notebooks on the cluster before and after executing the DeploymentManager jar respectively.
|
||||
### Deploying the project
|
||||
#### Features
|
||||
- Based on changes in the Spark SQL project file, it will create or modify tables and schemas
|
||||
- Currently Supports Delta table Deployment.
|
||||
- Executes Pre and Post Deployment Notebooks (used for one-time manual changes).
|
||||
#### How to deploy
|
||||
- DeploymentManager project creates the DeploymentManager.jar file
|
||||
- Execute the DeploymentManager jar on the spark cluster by passing path to output.json (build artifact) as jar argument.
|
||||
- Execute Pre and Post Deployment Notebooks on the cluster before and after executing the DeploymentManager jar respectively.
|
||||
|
||||
## Trademarks
|
||||
|
||||
|
|
|
@ -0,0 +1,7 @@
|
|||
// Copyright (c) Microsoft Corporation.
|
||||
// Licensed under the MIT License.
|
||||
|
||||
// POST DEPLOYMENT SCRIPT
|
||||
// PREFER ONLY SCALA
|
||||
// MAKE SURE THE CODE IS IDEMPOTENT.
|
||||
println("executing pre deployment script")
|
|
@ -1,3 +1,6 @@
|
|||
// Copyright (c) Microsoft Corporation.
|
||||
// Licensed under the MIT License.
|
||||
|
||||
// PRE DEPLOYMENT SCRIPT
|
||||
// PREFER ONLY SCALA
|
||||
// MAKE SURE THE CODE IS IDEMPOTENT.
|
||||
|
|
Загрузка…
Ссылка в новой задаче