This project aims to create a way to deploy spark sql tables using CI/CD and focus just on table schema changes rather than how to deploy the changes.
Перейти к файлу
Rajesh Rao 6fc9cc8c0d
Users/rajrao/code coverage (#6)
Added CodeCoverage using scoverage plugin in maven and codecov integration to host it.

Co-authored-by: Rajesh Rao <rajrao@microsoft.com>
2021-02-15 17:11:57 +05:30
DeploymentManager Users/rajrao/code coverage (#6) 2021-02-15 17:11:57 +05:30
buildsql Added 3rd party OSS usage notices, added MIT license details 2021-01-15 11:51:56 +05:30
common Added 3rd party OSS usage notices, added MIT license details 2021-01-15 11:51:56 +05:30
exampleSqlProject Updated Readme 2021-01-15 14:05:58 +05:30
.gitattributes adding default settings 2020-11-12 11:22:58 +00:00
.gitignore Initial Code commit 2021-01-06 00:40:48 +05:30
.scalafmt.conf Merged PR 693305: Added README.md and Code Refactor 2020-12-07 14:04:52 +00:00
CODE_OF_CONDUCT.md Adding additional markdown files 2021-01-06 00:32:08 +05:30
CONTRIBUTING.md Adding contributing file 2021-01-12 10:34:02 +05:30
LICENSE Adding additional markdown files 2021-01-06 00:32:08 +05:30
README.md Users/rajrao/code coverage (#6) 2021-02-15 17:11:57 +05:30
SECURITY.md Adding additional markdown files 2021-01-06 00:32:08 +05:30
SUPPORT.md Initial Code commit 2021-01-06 00:40:48 +05:30
ThirdPartyNotices.md Added 3rd party OSS usage notices, added MIT license details 2021-01-15 11:51:56 +05:30
pom.xml Added 3rd party OSS usage notices, added MIT license details 2021-01-15 11:51:56 +05:30

README.md

Build Status MIT License codecov

Overview

  • Build and deploy a project with Spark SQL schema and tables incrementally.
  • Checks for syntax errors using Continuous Integration, preventing incorrect table schema from being merged to deployment branch.
  • Focus on table schema changes and use this project for the deployment.
  • This project currently supports azure data lake as storage source for delta tables and Databricks implementation of Spark.

How to use

Create a Spark Sql Project

  • Create a spark sql project which contains details about project files like Schema and Table scripts.
  • Add pre and post deployment scripts if needed. These scripts are Scala Notebooks. The Pre deployment and Post deployment notebooks should be executed before and after executing the deployment respectively.
  • Add Azure data lake path in values.json file.
  • Refer to the exampleSqlProject Spark SQL project as a starting point.

Building the Project

Features

  • Build will check for syntax errors in the project
  • Once the build succeeds, it will create a build artifact which can be used to deploy the changes ( by invoking DeploymentManager)

How to build

  • BuildSql project creates BuildSql.jar file.
  • Use BuildSql.jar like an executable to build spark sql project - Run the jar by passing .sparkSql project file as command line arguments.
  • Once the build succeeds, the build artifact can be found in bin folder created in project root directory.

Deploying the project

Features

  • Based on changes in the Spark SQL project file, it will create or modify tables and schemas
  • Currently Supports Delta table Deployment.
  • Executes Pre and Post Deployment Notebooks (used for one-time manual changes).

How to deploy

  • DeploymentManager project creates the DeploymentManager.jar file
  • Execute the DeploymentManager jar on the spark cluster by passing path to output.json (build artifact) as jar argument.
  • Execute Pre and Post Deployment Notebooks on the cluster before and after executing the DeploymentManager jar respectively.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.