ff32dde9e9 | ||
---|---|---|
.github/workflows | ||
csharp | ||
.gitconfig | ||
.gitignore | ||
LICENSE | ||
README.md | ||
azure-pipelines.yml |
README.md
Azure Databricks Client Library
The Azure Databricks Client Library offers a convenient interface for automating your Azure Databricks workspace through Azure Databricks REST Api.
The implementation of this library is based on REST Api version 2.0 and above.
Requirements
You must have personal access tokens (PAT) or Azure Active Directory tokens (AAD Token) to access the databricks REST API.
- To generate a PAT, follow the steps listed in this document.
- To generate a AAD token, follow the steps listed in this document.
Supported APIs
API | Version | Description |
---|---|---|
Clusters | 2.0 | The Clusters API allows you to create, start, edit, list, terminate, and delete clusters. |
Jobs | 2.1 | The Jobs API allows you to programmatically manage Azure Databricks jobs. |
Dbfs | 2.0 | The DBFS API is a Databricks API that makes it simple to interact with various data sources without having to include your credentials every time you read a file. |
Secrets | 2.0 | The Secrets API allows you to manage secrets, secret scopes, and access permissions. |
Groups | 2.0 | The Groups API allows you to manage groups of users. |
Libraries | 2.0 | The Libraries API allows you to install and uninstall libraries and get the status of libraries on a cluster. |
Token | 2.0 | The Token API allows you to create, list, and revoke tokens that can be used to authenticate and access Azure Databricks REST APIs. |
Workspace | 2.0 | The Workspace API allows you to list, import, export, and delete notebooks and folders. |
InstancePool | 2.0 | The Instance Pools API allows you to create, edit, delete and list instance pools. |
Permissions | 2.0 | The Permissions API lets you manage permissions for Token, Cluster, Pool, Job, Delta Live Tables pipeline, Notebook, Directory, MLflow experiment, MLflow registered model, SQL warehouse, and Repo. |
Usage
Check out the Sample project for more detailed usages.
In the following examples, the baseUrl
variable should be set to the workspace base URL, which looks like "https://adb-<workspace-id>.<random-number>.azuredatabricks.net", and token
variable should be set to your Databricks personal access token.
Creating client
using (var client = DatabricksClient.CreateClient(baseUrl, token))
{
// ...
}
Cluster API
Create a standard cluster
var clusterConfig = ClusterInfo.GetNewClusterConfiguration("Sample cluster")
.WithRuntimeVersion(RuntimeVersions.Runtime_4_2_Scala_2_11)
.WithAutoScale(3, 7)
.WithAutoTermination(30)
.WithClusterLogConf("dbfs:/logs/")
.WithNodeType(NodeTypes.Standard_D3_v2)
.WithPython3(true);
var clusterId = await client.Clusters.Create(clusterConfig);
Delete a cluster
await client.Clusters.Delete(clusterId);
Jobs API
// Job schedule
var schedule = new CronSchedule
{
QuartzCronExpression = "0 0 9 ? * MON-FRI",
TimezoneId = "Europe/London",
PauseStatus = PauseStatus.UNPAUSED
};
// Run with a job cluster
var newCluster = ClusterAttributes.GetNewClusterConfiguration()
.WithClusterMode(ClusterMode.SingleNode)
.WithNodeType(NodeTypes.Standard_D3_v2)
.WithRuntimeVersion(RuntimeVersions.Runtime_10_4);
// Create job settings
var jobSettings = new JobSettings
{
MaxConcurrentRuns = 1,
Schedule = schedule,
Name = "Sample Job"
};
// Adding 3 tasks to the job settings.
var task1 = jobSettings.AddTask("task1", new NotebookTask { NotebookPath = SampleNotebookPath })
.WithDescription("Sample Job - task1")
.WithNewCluster(newCluster);
var task2 = jobSettings.AddTask("task2", new NotebookTask { NotebookPath = SampleNotebookPath })
.WithDescription("Sample Job - task2")
.WithNewCluster(newCluster);
jobSettings.AddTask("task3", new NotebookTask { NotebookPath = SampleNotebookPath }, new[] { task1, task2 })
.WithDescription("Sample Job - task3")
.WithNewCluster(newCluster);
// Create the job.
Console.WriteLine("Creating new job");
var jobId = await client.Jobs.Create(jobSettings);
Console.WriteLine("Job created: {0}", jobId);
// Start the job and retrieve the run id.
Console.WriteLine("Run now: {0}", jobId);
var runId = await client.Jobs.RunNow(jobId);
// Keep polling the run by calling RunsGet until run terminates:
// var run = await client.Jobs.RunsGet(runId);
Secrets API
Creating secret scope
const string scope = "SampleScope";
await client.Secrets.CreateScope(scope, null);
Create text secret
var secretName = "secretkey.text";
await client.Secrets.PutSecret("secret text", scope, secretName);
Create binary secret
var secretName = "secretkey.bin";
await client.Secrets.PutSecret(new byte[]{0x01, 0x02, 0x03, 0x04}, scope, secretName);
Breaking changes
-
The v2 of the library targets .NET 6 runtime.
-
The Jobs API was redesigned to align with the version 2.1 of the REST API.
-
In the previous version, the Jobs API only supports single task per job. The new Jobs API supports multiple tasks per job, where the tasks are represented as a DAG.
-
The new version supports two more types of task: Python Wheel task and Delta Live Tables pipeline task.
-
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.