Azure ARM Template for Lustre filesystem deployment
Перейти к файлу
edwardsp cc1e62d68d
Expand globbed devices before running lfsoss.sh
2023-03-13 10:48:30 +00:00
.github fixing workflow 2021-12-13 15:01:49 +00:00
packer updated lemur version 2023-02-02 09:31:46 +00:00
scripts Expand globbed devices before running lfsoss.sh 2023-03-13 10:48:30 +00:00
terraform Added Terraform template README and links to it 2021-02-24 16:54:21 -08:00
test initial commit 2020-12-09 22:06:14 +00:00
toolset Updating lemur package in image 2021-12-13 16:40:42 +00:00
.gitmodules updated image to use lemur release from github 2021-11-18 09:26:07 +00:00
README.md Fix docs for specifying ephemeral filesystem 2023-03-13 09:48:34 +00:00
SECURITY.md Microsoft mandatory file 2023-01-24 15:55:37 +00:00
azuredeploy_template.json make RBH disk configurable 2022-12-13 14:54:16 -05:00
build.sh Updated to use device links from /dev/disk/azure 2022-10-21 19:12:12 +01:00
build_marketplace.sh building release from action 2021-12-13 14:54:56 +00:00
createUiDefinition.json Fixed bug creating ephemeral filesystems 2023-02-02 09:24:17 +00:00
createUiDefinitionFunctions.sh Fixing issues with publishing 2021-09-21 13:10:04 +01:00
create_ci.sh Updated for new lemur and marketplace lustre 2021-09-08 09:45:18 +01:00

README.md

azlustre

Deploy a Lustre filesystem on Azure

Click the button below to get started:

Deploy to Azure

General Parameters

Parameter Description
name The name for the Lustre filesystem
mdsSku The SKU for the MDS VMs
ossSku The SKU for the OSS VMs
instanceCount The number of OSS VMs
rsaPublicKey The RSA public key to access the VMs
existingVnetResourceGroupName The resource group containing the VNET where Lustre is to be deployed
existingVnetName The name of the VNET where Lustre is to be deployed
existingSubnetName The name of the subnet where Lustre is to be deployed
mdtStorageSku The SKU to use for the MDT disks
mdtCacheOption The caching option for the MDT disks (e.g. None or ReadWrite)
mdtDiskSize The size of each MDT disk
mdtNumDisks The number of disks in the MDT RAID (set to 0 to use the VM ephemeral disks)
ostStorageSku The SKU to use for OST disks
ostCacheOption The caching option for the OST disks (e.g. None or ReadWrite)
ostDiskSize The size of each OST disk
ostNumDisks The number of OST disks per OSS (set to 0 to use the VM ephemeral disks)
ossDiskSetup Either separate where each disk is an OST or raid to combine into a single OST

Hierarchical Storage Management (HSM) Parameters

The additional parameters can be used to enable HSM for the Lustre deployment.

Parameter Description
storageAccount The storage account to use for HSM
storageContainer The container name to use
storageSas The SAS key for the storage account

The SAS key requires read, write, list and delete permissions. Be aware when choosing an expiry time that it will be stored on the OSS VMs and used for the HSM operations (and HSM will stop working once it expires). Here is an example of creating a SAS key with a 1 month expiry using the Azure CLI:

# TODO: set the account name and container name below
account_name=
container_name=

start_date=$(date +"%Y-%m-%dT%H:%M:%SZ")
expiry_date=$(date +"%Y-%m-%dT%H:%M:%SZ" --date "next month")

az storage container generate-sas \
   --account-name $account_name \
   --name $container_name \
   --permissions rwld \
   --start $start_date \
   --expiry $expiry_date \
   -o tsv

Logging Parameters

The additional parameters can be used to log metrics for the Lustre deployment. Be aware this does require a log analytics workspace to be set up. Leave this empty to disable logging.

Parameter Description
logAnalyticsWorkspaceId The log analytics workspace id to use
logAnalyticsKey The key for the log analytics account

Example configurations

When creating a Lustre configuration you pay attention to the following:

This section provides options for three types of setup:

  1. Ephemeral This is the cheapest option and uses local disks to the VMs. This can also provide the lowest latency as the physical storage resides on the host. Any VM failure will result in data loss but is a good option for scratch storage.

    Size: 7.6 TB per OSS

    Expected performance: 1600 MB/s per OSS (limited by NIC on VM)

  2. Persistent Premium This option uses premium disks attached to the VMs. A VM failing will not result in data loss.

    Size: 6 TB per OSS

    Expected performance: 1152 MB/s per OSS (limited by uncached disk throughput)

  3. Persistent Standard This option uses standard disks attached to the VMs. This requires relatively higher storage per OSS since the larger disks are needed in order to maximise the bandwidth to storage for a VM.

    Size: 32 TB per OSS

    Expected performance: 1152 MB/s per OSS (limited by uncached disk throughput)

These are the parameters that can be used when deploying:

Parameter Ephemeral Persistent Premium Persistent Standard
mdsSku Standard_L8s_v2 Standard_D8_v3 Standard_D8_v3
ossSku Standard_L48s_v2 Standard_D48_v3 Standard_D48_v3
mdtStorageSku Ephemeral Premium_LRS Standard_LRS
mdtCacheOption None ReadWrite ReadWrite
mdtDiskSize 0 1024 1024
mdtNumDisks 0 2 2
ostStorageSku Ephemeral Premium_LRS Standard_LRS
ostCacheOption None None None
ostDiskSize 0 1024 8192
ostNumDisks 0 6 4

Developer notes

This project includes the following:

  • An ARM template to deploy a Lustre cluster using the image.
  • A packer script to build an image with the Lustre packages installed.
  • A Terraform template to deploy a Lustre cluster automatically based on a marketplace image.

The Lustre setup scripts are modified versions from the AzureHPC project.

Lustre ARM Template

The ARM template performs the installation with cloud init where the installation scripts are embedded. The azuredeploy.json includes the embedded scripts but the repo includes script to create this from the azuredeploy_template.json.

The cloud-ci.sh script performs this step and the build.sh executes this with the parameters used for the currently distributed ARM template in the repository. The scripts are embedded as a self-extracting compressed tar archive to be run by cloud-init. The makeself tool is required for this step.

Rebuilding is only required when making changes to the scripts.

Packer scripts

The ARM templates in this repository use the AzureHPC Lustre marketplace image. This image has been created with the packer scripts provided in this repository.

Packer is required for the build so download the latest version for your operating system from https://www.packer.io. It is distributed as a single file so just put it somewhere that is in your PATH. Go into the packer directory:

cd azlustre/packer

The following options are required to build:

Variable Description
var_subscription_id Azure subscription ID
var_tenant_id Tenant ID for the service principal
var_client_id Client ID for the service principal
var_client_secret Client password for the service principal
var_resource_group The resource group to put the image in (must exist)
var_image The image name to create

These can be read by packer from a JSON file. Use this template to create options.json and populate the fields:

{
    "var_subscription_id": "",
    "var_tenant_id": "",
    "var_client_id": "",
    "var_client_secret": "",
    "var_resource_group": "",
    "var_image": "azurehpc-lustre-2.12.5"
}

Use the following command to build with packer:

packer build -var-file=options.json azurehpc-lustre-2.12.5.json

Once this successfully completes the image will be available (although the ARM template will need to be modified to use this new image).