azure-databricks-operator/README.md

128 строки
5.4 KiB
Markdown
Исходник Постоянная ссылка Обычный вид История

# Azure Databricks operator (for Kubernetes)
[![Build Status](https://dev.azure.com/ms/azure-databricks-operator/_apis/build/status/microsoft.azure-databricks-operator?branchName=master)](https://dev.azure.com/ms/azure-databricks-operator/_build/latest?definitionId=254&branchName=master)
[![Go Report Card](https://goreportcard.com/badge/github.com/microsoft/azure-databricks-operator)](https://goreportcard.com/report/github.com/microsoft/azure-databricks-operator)
[![License: MIT](https://img.shields.io/github/license/microsoft/azure-databricks-operator)](https://github.com/microsoft/azure-databricks-operator/blob/master/LICENSE)
> This project is experimental. Expect the API to change. It is not recommended for production environments.
2019-05-15 13:59:15 +03:00
## Introduction
Kubernetes offers the facility of extending its API through the concept of [Operators](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/). This repository contains the resources and code to deploy an Azure Databricks Operator for Kubernetes.
The Databricks operator is useful in situations where Kubernetes hosted applications wish to launch and use Databricks data engineering and machine learning tasks.
### Key benefits of using Azure Databricks operator
1. Easy to use: Azure Databricks operations can be done by using Kubectl there is no need to learn or install data bricks utils command line and its python dependency
2. Security: No need to distribute and use Databricks token, the data bricks token is used by operator
3. Version control: All the YAML or helm charts which has azure data bricks operations (clusters, jobs, …) can be tracked
4. Automation: Replicate azure data bricks operations on any data bricks workspace by applying same manifests or helm charts 
![alt text](docs/images/azure-databricks-operator-highlevel.jpg "high level architecture")
2019-05-15 13:59:15 +03:00
![alt text](docs/images/azure-databricks-operator.jpg "high level architecture")
2019-05-15 13:59:15 +03:00
The project was built using
1. [Kubebuilder](https://book.kubebuilder.io/)
Using native DataBricks API model (Job, Run, Secret) (#53) * Initial secret scopes kubebuilder create * add djob, generate manifests * start to implement job * implement basic job structure * Hydrating secret scope and implement submit secret scope * half way on run * Handle submitted SecretScopes with update implementation * upgrade sdk * fix job name reference * Add missing implementation to handle submitted secret scope in reconcile * handle run submit and run now * Secret scope implementation that works * better error handling and logging * fix minor inconsistency in djob and run * remove notebookjob type * add test coverage * adding tests, better naming consistency * add html cover report * Support referencing K8s secrets for DB secret scope secrets * add azure devops test coverage * install tooling for testing * add test files * make test pass for djob, adding kill and wait for gracefully shutting down manager * add run job test * adding testing for secretscope and support for byte value secrets * add event recorder to secretscope. fix sample * update recorder in test setup * fix name of test recorder * increasing timeout for secret scope test * further increase to scope test timeout * another increase to test timeout * Update README.md * export db client in tests to do cleanup before/after tests. * documentation update * adding deploy.md * updated readme.md resources.md and contributing.md * adding roadmap * Updating contributing.md * Updating contributing.md * extending job waiting time because databricks may not reconcile in time * fix typo, fix context to use Background, remove noteboojob reference in PROJECT * Change Principal
2019-07-23 04:26:20 +03:00
2. [Golang SDK for DataBricks](https://github.com/xinsnake/databricks-sdk-golang)
2019-05-15 13:59:15 +03:00
## How to use Azure Databricks operator
1. Download [the latest release manifests](https://github.com/microsoft/azure-databricks-operator/releases):
```sh
wget https://github.com/microsoft/azure-databricks-operator/releases/latest/download/release.zip
unzip release.zip
```
2. Create the `azure-databricks-operator-system` namespace:
```sh
kubectl create namespace azure-databricks-operator-system
```
3. Create Kubernetes secrets with values for `DATABRICKS_HOST` and `DATABRICKS_TOKEN`:
```shell
kubectl --namespace azure-databricks-operator-system \
create secret generic dbrickssettings \
--from-literal=DatabricksHost="https://xxxx.azuredatabricks.net" \
--from-literal=DatabricksToken="xxxxx"
```
4. Apply the manifests for the Operator and CRDs in `release/config`:
```sh
kubectl apply -f release/config
```
For details deployment guides please see [deploy.md](https://github.com/microsoft/azure-databricks-operator/blob/master/docs/deploy.md)
## Samples
1. Create a spark cluster on demand and run a databricks notebook.
![alt text](docs/images/sample1.gif "sample1")
2. Create an interactive spark cluster and Run a databricks job on exisiting cluster.
![alt text](docs/images/sample2.gif "sample2")
3. Create azure databricks secret scope by using kuberentese secrets
![alt text](docs/images/sample3.gif "sample3")
2019-05-15 13:59:15 +03:00
For samples and simple use cases on how to use the operator please see [samples.md](docs/samples.md)
## Quick start
On click start by using [vscode](https://code.visualstudio.com/)
![alt text](docs/images/devcontainer.gif "devcontainer")
For more details please see
[contributing.md](https://github.com/microsoft/azure-databricks-operator/blob/master/docs/contributing.md)
Using native DataBricks API model (Job, Run, Secret) (#53) * Initial secret scopes kubebuilder create * add djob, generate manifests * start to implement job * implement basic job structure * Hydrating secret scope and implement submit secret scope * half way on run * Handle submitted SecretScopes with update implementation * upgrade sdk * fix job name reference * Add missing implementation to handle submitted secret scope in reconcile * handle run submit and run now * Secret scope implementation that works * better error handling and logging * fix minor inconsistency in djob and run * remove notebookjob type * add test coverage * adding tests, better naming consistency * add html cover report * Support referencing K8s secrets for DB secret scope secrets * add azure devops test coverage * install tooling for testing * add test files * make test pass for djob, adding kill and wait for gracefully shutting down manager * add run job test * adding testing for secretscope and support for byte value secrets * add event recorder to secretscope. fix sample * update recorder in test setup * fix name of test recorder * increasing timeout for secret scope test * further increase to scope test timeout * another increase to test timeout * Update README.md * export db client in tests to do cleanup before/after tests. * documentation update * adding deploy.md * updated readme.md resources.md and contributing.md * adding roadmap * Updating contributing.md * Updating contributing.md * extending job waiting time because databricks may not reconcile in time * fix typo, fix context to use Background, remove noteboojob reference in PROJECT * Change Principal
2019-07-23 04:26:20 +03:00
## Roadmap
2019-05-28 09:48:23 +03:00
Using native DataBricks API model (Job, Run, Secret) (#53) * Initial secret scopes kubebuilder create * add djob, generate manifests * start to implement job * implement basic job structure * Hydrating secret scope and implement submit secret scope * half way on run * Handle submitted SecretScopes with update implementation * upgrade sdk * fix job name reference * Add missing implementation to handle submitted secret scope in reconcile * handle run submit and run now * Secret scope implementation that works * better error handling and logging * fix minor inconsistency in djob and run * remove notebookjob type * add test coverage * adding tests, better naming consistency * add html cover report * Support referencing K8s secrets for DB secret scope secrets * add azure devops test coverage * install tooling for testing * add test files * make test pass for djob, adding kill and wait for gracefully shutting down manager * add run job test * adding testing for secretscope and support for byte value secrets * add event recorder to secretscope. fix sample * update recorder in test setup * fix name of test recorder * increasing timeout for secret scope test * further increase to scope test timeout * another increase to test timeout * Update README.md * export db client in tests to do cleanup before/after tests. * documentation update * adding deploy.md * updated readme.md resources.md and contributing.md * adding roadmap * Updating contributing.md * Updating contributing.md * extending job waiting time because databricks may not reconcile in time * fix typo, fix context to use Background, remove noteboojob reference in PROJECT * Change Principal
2019-07-23 04:26:20 +03:00
Check [roadmap.md](https://github.com/microsoft/azure-databricks-operator/blob/master/docs/roadmap.md) for what has been supported and what's coming.
2019-05-15 13:59:15 +03:00
2019-05-24 04:21:11 +03:00
## Resources
Using native DataBricks API model (Job, Run, Secret) (#53) * Initial secret scopes kubebuilder create * add djob, generate manifests * start to implement job * implement basic job structure * Hydrating secret scope and implement submit secret scope * half way on run * Handle submitted SecretScopes with update implementation * upgrade sdk * fix job name reference * Add missing implementation to handle submitted secret scope in reconcile * handle run submit and run now * Secret scope implementation that works * better error handling and logging * fix minor inconsistency in djob and run * remove notebookjob type * add test coverage * adding tests, better naming consistency * add html cover report * Support referencing K8s secrets for DB secret scope secrets * add azure devops test coverage * install tooling for testing * add test files * make test pass for djob, adding kill and wait for gracefully shutting down manager * add run job test * adding testing for secretscope and support for byte value secrets * add event recorder to secretscope. fix sample * update recorder in test setup * fix name of test recorder * increasing timeout for secret scope test * further increase to scope test timeout * another increase to test timeout * Update README.md * export db client in tests to do cleanup before/after tests. * documentation update * adding deploy.md * updated readme.md resources.md and contributing.md * adding roadmap * Updating contributing.md * Updating contributing.md * extending job waiting time because databricks may not reconcile in time * fix typo, fix context to use Background, remove noteboojob reference in PROJECT * Change Principal
2019-07-23 04:26:20 +03:00
Few topics are discussed in the [resources.md](https://github.com/microsoft/azure-databricks-operator/blob/master/docs/resources.md)
2019-05-24 04:21:11 +03:00
- Dev container
- Build pipelines
- Operator metrics
- Kubernetes on WSL
2019-05-24 04:21:11 +03:00
2019-05-15 13:59:15 +03:00
## Contributing
2019-05-13 08:04:24 +03:00
Using native DataBricks API model (Job, Run, Secret) (#53) * Initial secret scopes kubebuilder create * add djob, generate manifests * start to implement job * implement basic job structure * Hydrating secret scope and implement submit secret scope * half way on run * Handle submitted SecretScopes with update implementation * upgrade sdk * fix job name reference * Add missing implementation to handle submitted secret scope in reconcile * handle run submit and run now * Secret scope implementation that works * better error handling and logging * fix minor inconsistency in djob and run * remove notebookjob type * add test coverage * adding tests, better naming consistency * add html cover report * Support referencing K8s secrets for DB secret scope secrets * add azure devops test coverage * install tooling for testing * add test files * make test pass for djob, adding kill and wait for gracefully shutting down manager * add run job test * adding testing for secretscope and support for byte value secrets * add event recorder to secretscope. fix sample * update recorder in test setup * fix name of test recorder * increasing timeout for secret scope test * further increase to scope test timeout * another increase to test timeout * Update README.md * export db client in tests to do cleanup before/after tests. * documentation update * adding deploy.md * updated readme.md resources.md and contributing.md * adding roadmap * Updating contributing.md * Updating contributing.md * extending job waiting time because databricks may not reconcile in time * fix typo, fix context to use Background, remove noteboojob reference in PROJECT * Change Principal
2019-07-23 04:26:20 +03:00
For instructions about setting up your environment to develop and extend the operator, please see
[contributing.md](https://github.com/microsoft/azure-databricks-operator/blob/master/docs/contributing.md)
2019-05-13 08:04:24 +03:00
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.