7.8 KiB

Исходник Ответственный История

Introduction
Provisioning ADB: Guidelines for Networking and Security
Developing applications on ADB: Guidelines for selecting clusters
- Sub-sub-heading
Monitoring
- Collect resource utilization metrics across Azure Databricks cluster in a Log Analytics workspace
Appendix A
- Installation for being able to capture VM metrics in Log Analytics

Table of Figures

Figure 1: Databricks user menu
Figure 2: Business Unit Subscription Design Pattern
Figure 3: Azure Databricks Isolation Domains Workspace
Figure 4: Hub and Spoke Model
Figure 5: Interactive clusters
Figure 6: Ephemeral Job Cluster
Figure 7: Shuffle vs. no-shuffleu

Table of Tables

Table 1: CIDR Ranges
Table 2: Cluster modes and their characteristics
Table 3: Batch vs Interactive Workloads

Table of x - save for later

Heading
- Sub-heading
  - Sub-sub-heading

Heading levels

This is a fixture to test heading levels

Introduction

Planning, deploying, and running Azure Databricks (ADB) at scale requires one to make many architectural decisions.

While each ADB deployment is unique to an organization's needs we have found that some patterns are common across most successful ADB projects. Unsurprisingly, these patterns are also in-line with modern Cloud-centric development best practices.

This short guide summarizes these patterns into prescriptive and actionable best practices for Azure Databricks. We follow a logical path of planning the infrastructure, provisioning the workspaces, developing Azure Databricks applications, and finally, running Azure Databricks in production.

The audience of this guide are system architects, field engineers, and development teams of customers, Microsoft, and Databricks. Since the Azure Databricks product goes through fast iteration cycles, we have avoided recommendations based on roadmap or Private Preview features.

Our recommendations should apply to a typical Fortune 500 enterprise with at least intermediate level of Azure and Databricks knowledge. We've also classified each recommendation according to its likely impact on solution's quality attributes. Using the Impact factor, you can weigh the recommendation against other competing choices. Example: if the impact is classified as “Very High”, the implications of not adopting the best practice can have a significant impact on your deployment.

As ardent cloud proponents, we value agility and bringing value quickly to our customers. Hence, we’re releasing the first version somewhat quickly, omitting some important but advanced topics in the interest of time. We will cover the missing topics and add more details in the next round, while sincerely hoping that this version is still useful to you.

Provisioning ADB: Guidelines for Networking and Security

Azure Databricks (ADB) deployments for very small organizations, PoC applications, or for personal education hardly require any planning. You can spin up a Workspace using Azure Portal in a matter of minutes, create a Notebook, and start writing code.

Enterprise-grade large scale deployments are a different story altogether. Some upfront planning is necessary to avoid cost overruns, throttling issues, etc. In particular, you need to understand:

● Networking requirements of Databricks

● The number and the type of Azure networking resources required to launch clusters

● Relationship between Azure and Databricks jargon: Subscription, VNet., Workspaces, Clusters, Subnets, etc.

● Overall Capacity Planning process: where to begin, what to consider? Let’s start with a short Azure Databricks 101 and then discuss some best practices for scalable and secure deployments.

Azure Databricks 101

ADB is a Big Data analytics service. Being a Cloud Optimized managed PaaS offering, it is designed to hide the underlying distributed systems and networking complexity as much as possible from the end user. It is backed by a team of support staff who monitor its health, debug tickets filed via Azure, etc. This allows ADB users to focus on developing value generating apps rather than stressing over infrastructure management.

You can deploy ADB using Azure Portal or using ARM templates. One successful ADB deployment produces exactly one Workspace, a space where users can log in and author analytics apps. It comprises the file browser, notebooks, tables, clusters, DBFS storage, etc. More importantly, Workspace is a fundamental isolation unit in Databricks. All workspaces are expected to be completely isolated from each other -- i.e., we intend that no action in one workspace should noticeably impact another workspace.

Each workspace is identified by a globally unique 53-bit number, called Workspace ID or Organization ID. The URL that a customer sees after logging in always uniquely identifies the workspace they are using:

https://regionName.azuredatabricks.net/?o=workspaceId

Azure Databricks uses Azure Active Directory (AAD) as the exclusive Identity Provider and there’s a seamless out of the box integration between them. Any AAD member belonging to the Owner or Contributor role can deploy Databricks and is automatically added to the ADB members list upon first login. If a user is not a member of the Active Directory tenant, they can’t login to the workspace.

Azure Databricks comes with its own user management interface. You can create users and groups in a workspace, assign them certain privileges, etc. While users in AAD are equivalent to Databricks users, by default AAD roles have no relationship with groups created inside ADB. ADB also has a special group called Admin, not to be confused with AAD’s admin.

The first user to login and initialize the workspace is the workspace owner. This person can invite other users to the workspace, create groups, etc. The ADB logged in user’s identity is provided by AAD, and shows up under the user menu in Workspace:

Figure 1: Databricks user menu

Sub-heading

This is an h2 heading

Sub-sub-heading

This is an h3 heading

Heading

This is an h1 heading

Sub-heading

This is an h2 heading

Sub-sub-heading

This is an h3 heading

Heading

This is an h1 heading

Sub-heading

This is an h2 heading

Sub-sub-heading

This is an h3 heading

7.8 KiB Исходник Ответственный История Убрать экранирование Экранировать

Table of Contents

Table of Figures

Table of Tables

Table of x - save for later

Heading levels

Introduction

Provisioning ADB: Guidelines for Networking and Security

Azure Databricks 101

Sub-heading

Sub-sub-heading

Heading

Sub-heading

Sub-sub-heading

Heading

Sub-heading

Sub-sub-heading

7.8 KiB

Исходник Ответственный История