azure-databricks-operator/README.md

8.4 KiB

Build Status

Build Status

Azure Databricks operator

Introduction

Azure Databricks operator contains two projects. The golang application is a Kubernetes controller that watches CRDs that defines a Databricks job and The Python Flask App which sends commands to the Databricks.

alt text

The project was built using

  1. Kubebuilder
  2. Swagger Codegen
  3. Flask-RESTPlus
  4. Flask

alt text

Prerequisites And Assumptions

  1. You have Minikube,Kind or docker for desktop installed on your local computer with RBAC enabled.
  2. You have a Kubernetes cluster running.
  3. You have the kubectl command line (kubectl CLI) installed.
  4. You have Helm and Tiller installed.
  • Configure a Kubernetes cluster in your machine

    You need to make sure a kubeconfig file is configured. if you opt AKS, you can use: az aks get-credentials --resource-group $RG_NAME --name $Cluster_NAME

Basic commands to check your cluster

    kubectl config get-contexts
    kubectl cluster-info
    kubectl version
    kubectl get pods -n kube-system

Kubernetes on WSL

On windows command line run kubectl config view to find the values of [windows-user-name],[minikubeip],[port]

mkdir ~/.kube \
&& cp /mnt/c/Users/[windows-user-name]/.kube/config ~/.kube

kubectl config set-cluster minikube --server=https://<minikubeip>:<port> --certificate-authority=/mnt/c/Users/<windows-user-name>/.minikube/ca.crt
kubectl config set-credentials minikube --client-certificate=/mnt/c/Users/<windows-user-name>/.minikube/client.crt --client-key=/mnt/c/Users/<windows-user-name>/.minikube/client.key
kubectl config set-context minikube --cluster=minikube --user=minikub

More info:

  1. https://devkimchi.com/2018/06/05/running-kubernetes-on-wsl/
  2. https://www.jamessturtevant.com/posts/Running-Kubernetes-Minikube-on-Windows-10-with-WSL/

How to use operator

Docs are work in progress

  1. Create a secret set values of DATABRICKS_HOST and DATABRICKS_TOKEN

    kubectl create secret testdatabricks --from-literal=DatabricksHost="https://xxxx.azuredatabricks.net" --from-literal=DatabricksToken="xxxxx"
    

    Make sure your secret name is set correctly in databricks-operator/config/default/azure_databricks_api_image_patch.yaml

  2. To install NotebookJob CRD in the configured Kubernetes cluster in ~/.kube/config, run kubectl apply -f databricks-operator/config/crds or make install -C databricks-operator

  3. To deploy controller in the configured Kubernetes cluster in ~/.kube/config, run kustomize build databricks-operator/config | kubectl apply -f -

  4. Change NotebookJob name from sample1run1 to your desired name, set Databricks notebook path and update the values in microsoft_v1beta2_notebookjob.yaml

    kubectl apply -f databricks-operator/config/samples/microsoft_v1beta2_notebookjob.yaml
    
  5. Basic commands to check the new Notebookjob

    kubectl get crd
    kubectl -n databricks-operator-system get svc
    kubectl -n databricks-operator-system get pod
    kubectl -n databricks-operator-system describe  pod databricks-operator-controller-manager-0
    kubectl -n databricks-operator-system logs  databricks-operator-controller-manager-0 -c dbricks -f
    kubectl get notebookjob
    kubectl describe notebookjob kubectl sample1run1
    

How to extend the operator and build your own images

Updating databricks operator:

This Repo is generated by Kubebuilder.

To Extend the operator databricks-operator:

  1. Run dep ensure to download dependencies. It doesn't show any progress bar and takes a while to download all of dependencies.

  2. Update pkg\apis\microsoft\v1beta1\notebookjob_types.go.

  3. Regenerate CRD make manifests.

  4. Install updated CRD make install

  5. Generate code make generate

  6. Update operator pkg\controller\notebookjob\notebookjob_controller.go

  7. Update tests and run make test

  8. Build make build

  9. Deploy

    make docker-build IMG=azadehkhojandi/databricks-operator
    make docker-push IMG=azadehkhojandi/databricks-operator
    make deploy
    

Main Contributors

  1. Jordan Knight Github, Linkedin
  2. Paul Bouwer Github, Linkedin
  3. Lace Lofranco Github, Linkedin
  4. Allan Targino Github, Linkedin
  5. Rian Finnegan Github, Linkedin
  6. Jason Goodselli Github, Linkedin
  7. Craig Rodger Github, Linkedin
  8. Justin Chizer Github, Linkedin
  9. Azadeh Khojandi Github, Linkedin

Resources

Build pipelines

  1. Create a pipeline and add a status badge to Github
  2. Customize status badge with shields.io

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.