This commit is contained in:
Paul Edwards 2020-12-09 22:06:14 +00:00
Коммит 9325f939d7
34 изменённых файлов: 1972 добавлений и 0 удалений

157
README.md Normal file
Просмотреть файл

@ -0,0 +1,157 @@
# azlustre
[![Deploy to Azure](https://azuredeploy.net/deploybutton.png)](https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAzure%2Faz-lustre%2Fmaster%2Fazuredeploy.json)
This is a project to provision a Lustre cluster as quickly as possible. All the Lustre setup scripting is taken from the [AzureHPC](https://github.com/Azure/azurehpc) but the difference in this project is the Lustre cluster is provisioned through an [ARM template](https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/) using a custom image.
This project includes the following:
* A [packer](https://www.packer.io/) script to build an image with the Lustre packages installed.
* An [ARM template](https://docs.microsoft.com/en-us/azure/azure-resource-manager/templates/) to deploy a Lustre cluster using the image.
The ARM template performs the installation with [cloud init](https://docs.microsoft.com/en-us/azure/virtual-machines/linux/using-cloud-init) where the installation scripts are embedded. The `azuredeploy.json` includes the embedded scripts but the repo includes script to create this from the `azuredeploy_template.json`.
## Getting Started
Check out the repository:
```
git clone https://github.com/Azure/az-lustre
```
### Building the image
Packer is required for the build so download the latest version for your operating system from https://www.packer.io. It is distributed as a single file so just put it somewhere that is in your `PATH`. Go into the packer directory:
```
cd az-lustre/packer
```
The following options are required to build:
| Variable | Description |
|---------------------|-----------------------------------------------------|
| var_subscription_id | Azure subscription ID |
| var_tenant_id | Tenant ID for the service principal |
| var_client_id | Client ID for the service principal |
| var_client_secret | Client password for the service principal |
| var_resource_group | The resource group to put the image in (must exist) |
| var_image | The image name to create |
These can be read by packer from a JSON file. Use this template to create `options.json` and populate the fields:
```
{
"var_subscription_id": "",
"var_tenant_id": "",
"var_client_id": "",
"var_client_secret": "",
"var_resource_group": "",
"var_image": "lustre-7.8-lustre-2.13.5"
}
```
Use the following command to build with packer:
```
packer build -var-file=options.json centos-7.8-lustre-2.12.5.json
```
Once this successfully completes the image will be available.
### Deploying the Lustre cluster
The "Deploy to Azure" button can be used once the image is available (alternatively the CLI can be used with `az deployment group create`). Below is a description of the parameters:
| Parameter | Description |
|-------------------------------|------------------------------------------------------------------------------------|
| name | The name for the Lustre filesystem |
| mdsSku | The SKU for the MDS VMs |
| ossSku | The SKU for the OSS VMs |
| instanceCount | The number of OSS VMs |
| rsaPublicKey | The RSA public key to access the VMs |
| imageResourceGroup | The name of the resource group containing the image |
| imageName | The name of the Lustre image to use |
| existingVnetResourceGroupName | The resource group containing the VNET where Lustre is to be deployed |
| existingVnetName | The name of the VNET where Lustre is to be deployed |
| existingSubnetName | The name of the subnet where Lustre is to be deployed |
| mdtStorageSku | The SKU to use for the MDT disks |
| mdtCacheOption | The caching option for the MDT disks (e.g. `None` or `ReadWrite`) |
| mdtDiskSize | The size of each MDT disk |
| mdtNumDisks | The number of disks in the MDT RAID (set to `0` to use the VM ephemeral disks) |
| ostStorageSku | The SKU to use for OST disks |
| ostCacheOption | The caching option for the OST disks (e.g. `None` or `ReadWrite`) |
| ostDiskSize | The size of each OST disk |
| ostNumDisks | The number of OST disks per OSS (set to `0` to use the VM ephemeral disks) |
| ossDiskSetup | Either `separate` where each disk is an OST or `raid` to combine into a single OST |
#### Options for Lustre Hierarchical Storage Management (HSM)
The additional parameters can be used to enable HSM for the Lustre deployment.
| Parameter | Description |
|------------------|------------------------------------|
| storageAccount | The storage account to use for HSM |
| storageContainer | The container name to use |
| storageKey | The key for the storage account |
#### Options for Logging with Log Analytics
The additional parameters can be used to log metrics for the Lustre deployment.
| Parameter | Description |
|-------------------------|---------------------------------------|
| logAnalyticsWorkspaceId | The log analytics workspace id to use |
| logAnalyticsKey | The key for the log analytics account |
## Example configurations
When creating a Lustre configuration you pay attention to the following:
* The [expected network bandwidth](https://docs.microsoft.com/en-us/azure/virtual-network/virtual-machine-network-throughput) for the VM type
* The max uncached disk throughput when using managed disks
* The throughput for the [managed disks](https://azure.microsoft.com/en-gb/pricing/details/managed-disks/)
This section provides options for three types of setup:
1. **Ephemeral**
This is the cheapest option and uses local disks to the VMs. This can also provide the lowest latency as the physical storage resides on the host. Any VM failure will result in data loss but is a good option for scratch storage.
Size: 7.6 GB per OSS
Expected performance: 1600 MB/s per OSS (limited by NIC on VM)
2. **Persistent Premium**
This option uses premium disks attached to the VMs. A VM failing will not result in data loss.
Size: 6 GB per OSS
Expected performance: 1152 MB/s per OSS (limited by uncached disk throughput)
3. **Persistent Standard**
This option uses standard disks attached to the VMs. This requires relatively higher storage per OSS since the larger disks are needed in order to maximise the bandwidth to storage for a VM.
Size: 32 GB per OSS
Expected performance: 1152 MB/s per OSS (limited by uncached disk throughput)
These are the parameters that can be used when deploying:
| Parameter | Ephemeral | Persistent Premium | Persistent Standard |
|----------------|-----------------|--------------------|---------------------|
| mdsSku | Standard_L8_v2 | Standard_D8_v3 | Standard_D8_v3 |
| ossSku | Standard_L48_v2 | Standard_D48_v3 | Standard_D48_v3 |
| mdtStorageSku | Premium_LRS | Premium_LRS | Standard_LRS |
| mdtCacheOption | None | ReadWrite | ReadWrite |
| mdtDiskSize | 0 | 1024 | 1024 |
| mdtNumDisks | 0 | 2 | 2 |
| ostStorageSku | Premium_LRS | Premium_LRS | Standard_LRS |
| ostCacheOption | None | None | None |
| ostDiskSize | 0 | 1024 | 8192 |
| ostNumDisks | 0 | 6 | 4 |
## Generating the embedded ARM template
*This is only required when making changes to the scripts.*
The scripts are placed in a self-extracting compressed tar archive and embedded into the ARM template to be executed by cloud-init. The `cloud-ci.sh` script performs this step and the `build.sh` executes this with the parameters used for the currently distributed ARM template in the repository.
Note: The [makeself](https://makeself.io/) tool is required for this step.

355
azuredeploy.json Normal file

Различия файлов скрыты, потому что одна или несколько строк слишком длинны

354
azuredeploy_template.json Normal file
Просмотреть файл

@ -0,0 +1,354 @@
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"name": {
"type": "string",
"maxLength": 20,
"metadata": {
"description": "The name for the Lustre filesystem."
}
},
"mdsSku": {
"defaultValue": "Standard_D8s_v3",
"type": "string",
"metadata": "The SKU for the MDS"
},
"ossSku": {
"defaultValue": "Standard_L8s_v2",
"type": "string",
"metadata": {
"description": "The VM type for the Lustre nodes."
}
},
"instanceCount": {
"maxValue": 300,
"type": "int",
"metadata": {
"description": "Number of additional Lustre nodes."
}
},
"rsaPublicKey": {
"type": "string",
"metadata": {
"description": "The RSA public key to access the nodes."
}
},
"imageResourceGroup": {
"type": "string",
"metadata": {
"description": "Name of the the resource group containing the Lustre image"
}
},
"imageName": {
"type": "string",
"metadata": {
"description": "Name of the Lustre image to use"
}
},
"existingVnetResourceGroupName": {
"type": "string",
"metadata": {
"description": "Name of the resource group for the existing virtual network to deploy the scale set into."
}
},
"existingVnetName": {
"type": "string",
"metadata": {
"description": "Name of the existing virtual network to deploy the scale set into."
}
},
"existingSubnetName": {
"type": "string",
"metadata": {
"description": "Name of the existing subnet to deploy the scale set into."
}
},
"storageAccount": {
"type": "string",
"defaultValue": "",
"metadata": {
"description": "Optional. The storage account to use (leave blank to disable HSM)"
}
},
"storageContainer": {
"type": "string",
"defaultValue": "",
"metadata": {
"description": "The storage container to use for archive"
}
},
"storageKey": {
"type": "string",
"defaultValue": "",
"metadata": {
"description": "The storage account key"
}
},
"logAnalyticsAccount": {
"type": "string",
"defaultValue": "",
"metadata": {
"description": "Optional. The log analytics account to use (leave blank to disable logging)"
}
},
"logAnalyticsWorkspaceId": {
"type": "string",
"defaultValue": "",
"metadata": {
"description": " The log analytics workspace id"
}
},
"logAnalyticsKey": {
"type": "string",
"defaultValue": "",
"metadata": {
"description": "The log analytics account key"
}
},
"mdtStorageSku": {
"type": "string",
"defaultValue": "Premium_LRS",
"metadata": {
"description": "The size of the MDT disks"
}
},
"mdtCacheOption": {
"type": "string",
"defaultValue": "ReadWrite",
"metadata": {
"description": "The size of the MDT disks"
}
},
"mdtDiskSize": {
"type": "int",
"defaultValue": 1024,
"metadata": {
"description": "The size of the MDT disks"
}
},
"mdtNumDisks": {
"type": "int",
"defaultValue": 2,
"metadata": {
"description": "The number of disks in the MDT RAID"
}
},
"ostStorageSku": {
"type": "string",
"defaultValue": "Premium_LRS",
"metadata": {
"description": "The size of the MDT disks"
}
},
"ostCacheOption": {
"type": "string",
"defaultValue": "None",
"metadata": {
"description": "The size of the MDT disks"
}
},
"ostDiskSize": {
"type": "int",
"defaultValue": 1024,
"metadata": {
"description": "The size of the OSS disks"
}
},
"ostNumDisks": {
"type": "int",
"defaultValue": 6,
"metadata": {
"description": "The number of disks on each OSS"
}
},
"ossDiskSetup": {
"type": "string",
"defaultValue": "raid",
"allowedValues": [ "raid", "separate" ],
"metadata": {
"description": "Create a single RAID or use multiple OSTs"
}
}
},
"variables": {
"tagname": "[concat('LustreFS-', parameters('name'))]",
"subnet": "[resourceId(parameters('existingVnetResourceGroupName'), 'Microsoft.Network/virtualNetworks/subnets', parameters('existingVnetName'), parameters('existingSubNetName'))]",
"imageReference": {
"id": "[resourceId(parameters('imageResourceGroup'), 'Microsoft.Compute/images', parameters('imageName'))]"
},
"ciScript": "",
"copy": [
{
"name": "mdtDataDisks",
"count": "[parameters('mdtNumDisks')]",
"input": {
"caching": "[parameters('mdtCacheOption')]",
"managedDisk": {
"storageAccountType": "[parameters('mdtStorageSku')]"
},
"createOption": "Empty",
"lun": "[copyIndex('mdtDataDisks')]",
"diskSizeGB": "[parameters('mdtDiskSize')]"
}
},
{
"name": "ostDataDisks",
"count": "[parameters('ostNumDisks')]",
"input": {
"caching": "[parameters('ostCacheOption')]",
"managedDisk": {
"storageAccountType": "[parameters('ostStorageSku')]"
},
"createOption": "Empty",
"lun": "[copyIndex('ostDataDisks')]",
"diskSizeGB": "[parameters('ostDiskSize')]"
}
}
]
},
"resources": [
{
"name": "[concat(parameters('name'), '-NetworkInterface')]",
"type": "Microsoft.Network/networkInterfaces",
"apiVersion": "2018-08-01",
"location": "[resourceGroup().location]",
"tags": {
"filesystem": "[variables('tagname')]"
},
"properties": {
"enableAcceleratedNetworking": true,
"ipConfigurations": [
{
"name": "ipConfig",
"properties": {
"privateIPAllocationMethod": "Dynamic",
"subnet": {
"id": "[variables('subnet')]"
}
}
}
]
}
},
{
"name": "[parameters('name')]",
"type": "Microsoft.Compute/virtualMachines",
"apiVersion": "2017-03-30",
"location": "[resourceGroup().location]",
"dependsOn": [
"[resourceId('Microsoft.Network/networkInterfaces', concat(parameters('name'), '-NetworkInterface'))]"
],
"tags": {
"filesystem": "[variables('tagname')]"
},
"properties": {
"hardwareProfile": {
"vmSize": "[parameters('mdsSku')]"
},
"osProfile": {
"computerName": "[parameters('name')]",
"adminUsername": "lustre",
"customData": "[base64(variables('ciScript'))]",
"linuxConfiguration": {
"disablePasswordAuthentication": true,
"ssh": {
"publicKeys": [
{
"path": "/home/lustre/.ssh/authorized_keys",
"keyData": "[parameters('rsaPublicKey')]"
}
]
}
}
},
"storageProfile": {
"imageReference": "[variables('imageReference')]",
"osDisk": {
"createOption": "FromImage",
"caching": "ReadWrite"
},
"dataDisks": "[variables('mdtDataDisks')]"
},
"networkProfile": {
"networkInterfaces": [
{
"id": "[resourceId('Microsoft.Network/networkInterfaces', concat(parameters('name'), '-NetworkInterface'))]"
}
]
}
}
},
{
"name": "[concat(parameters('name'), '-vmss')]",
"type": "Microsoft.Compute/virtualMachineScaleSets",
"tags": {
"filesystem": "[variables('tagname')]"
},
"sku": {
"name": "[parameters('ossSku')]",
"tier": "Standard",
"capacity": "[parameters('instanceCount')]"
},
"apiVersion": "2018-10-01",
"location": "[resourceGroup().location]",
"properties": {
"overprovision": true,
"upgradePolicy": {
"mode": "Manual"
},
"virtualMachineProfile": {
"storageProfile": {
"imageReference": "[variables('imageReference')]",
"osDisk": {
"createOption": "FromImage",
"caching": "ReadWrite"
},
"dataDisks": "[variables('ostDataDisks')]"
},
"osProfile": {
"computerNamePrefix": "[parameters('name')]",
"adminUsername": "lustre",
"customData": "[base64(variables('ciScript'))]",
"linuxConfiguration": {
"disablePasswordAuthentication": true,
"ssh": {
"publicKeys": [
{
"path": "/home/lustre/.ssh/authorized_keys",
"keyData": "[parameters('rsaPublicKey')]"
}
]
}
}
},
"networkProfile": {
"networkInterfaceConfigurations": [
{
"name": "[concat(parameters('name'), '-nic')]",
"properties": {
"primary": true,
"enableAcceleratedNetworking": true,
"ipConfigurations": [
{
"name": "ipConfig-vmss",
"properties": {
"subnet": {
"id": "[variables('subnet')]"
}
}
}
]
}
}
]
}
}
}
}
],
"outputs": {
},
"functions": [
]
}

16
build.sh Executable file
Просмотреть файл

@ -0,0 +1,16 @@
#!/bin/bash
./create_ci.sh \
azuredeploy.json \
azuredeploy_template.json \
ciScript \
packer/lustre-setup-scripts \
setup_lustre.sh \
name \
storageAccount \
storageKey \
storageContainer \
logAnalyticsAccount \
logAnalyticsWorkspaceId \
logAnalyticsKey \
ossDiskSetup

51
create_ci.sh Executable file
Просмотреть файл

@ -0,0 +1,51 @@
#!/bin/bash
if [ "$1" = "-h" ]; then
echo "Usage:"
echo " $0 <new_azure_deploy> <old_azure_deploy> <variable_name> <script_dir> <entry_script> [<parameter>]*"
exit 0
fi
new_azure_deploy=$1
shift
old_azure_deploy=$1
shift
variable_name=$1
shift
script_dir=$1
shift
entry_script=$1
shift
echo "new_azure_deploy=$new_azure_deploy"
echo "old_azure_deploy=$old_azure_deploy"
echo "variable_name=$variable_name"
echo "script_dir=$script_dir"
echo "entry_script=$entry_script"
if [ -e $new_azure_deploy ]; then
echo "ERROR: new file already exists"
exit 1
fi
script_name="cloudinit_$(date +"%Y-%m-%d_%H-%M-%S")"
makeself --base64 $script_dir ${script_name}.sh "Cloudinit script" ./$entry_script
sed -i '1d;4d' ${script_name}.sh
echo "[concat('#!/bin/bash" >${script_name}.str
echo -n "set --'," >>${script_name}.str
while test $# -gt 0
do
echo -n "' ',parameters('$1')," >>${script_name}.str
shift
done
echo "'" >>${script_name}.str
echo -n "','" >>${script_name}.str
sed "s/'/''/g" ${script_name}.sh >>${script_name}.str
echo -n "')]" >>${script_name}.str
jq ".variables.${variable_name} = $(jq -Rs '.' <${script_name}.str)" $old_azure_deploy >$new_azure_deploy
rm ${script_name}.sh ${script_name}.str

Просмотреть файл

@ -0,0 +1,45 @@
{
"builders": [
{
"type": "azure-arm",
"subscription_id": "{{user `var_subscription_id`}}",
"tenant_id": "{{user `var_tenant_id`}}",
"client_id": "{{user `var_client_id`}}",
"client_secret": "{{user `var_client_secret`}}",
"image_publisher": "OpenLogic",
"image_offer": "CentOS",
"image_sku": "7_8",
"image_version": "7.8.2020111300",
"managed_image_resource_group_name": "{{user `var_resource_group`}}",
"managed_image_name": "{{user `var_image`}}",
"os_type": "Linux",
"vm_size": "Standard_D8s_v3",
"ssh_pty": "true",
"build_resource_group_name": "{{user `var_resource_group`}}"
}
],
"provisioners": [
{
"type": "file",
"source": "lustre-setup-scripts",
"destination": "/tmp"
},
{
"execute_command": "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'",
"inline": [
"chmod +x /tmp/lustre-setup-scripts/*.sh",
"/tmp/lustre-setup-scripts/disable-selinux.sh",
"/tmp/lustre-setup-scripts/additional-pkgs.sh",
"/tmp/lustre-setup-scripts/lfsrepo.sh 2.12.5",
"/tmp/lustre-setup-scripts/lfspkgs.sh",
"rm -rf /tmp/lustre-setup-scripts",
"yum -y install https://azurehpc.azureedge.net/rpms/lemur-azure-hsm-agent-1.0.0-lustre_2.12.x86_64.rpm https://azurehpc.azureedge.net/rpms/lemur-azure-data-movers-1.0.0-lustre_2.12.x86_64.rpm",
"sed -i '/^ - disk_setup$/d;/^ - mounts$/d' /etc/cloud/cloud.cfg",
"/usr/sbin/waagent -force -deprovision+user && export HISTSIZE=0 && sync"
],
"inline_shebang": "/bin/sh -x",
"type": "shell",
"skip_clean": true
}
]
}

Просмотреть файл

@ -0,0 +1,4 @@
#!/bin/bash
yum install -y epel-release
yum install -y dstat

Просмотреть файл

@ -0,0 +1,44 @@
#!/bin/bash
# arg: $1 = raid_device (e.g. /dev/md10)
# arg: $* = devices to use (can use globbing)
raid_device=$1
shift
devices=
while (( "$#" )); do
devices="$devices $1"
shift
done
echo "devices=$devices"
# print partition information
parted -s --list 2>/dev/null
# creating the partitions
for disk in $devices; do
echo "partitioning $disk"
parted -s $disk "mklabel gpt"
parted -s $disk -a optimal "mkpart primary 1 -1"
parted -s $disk print
parted -s $disk "set 1 raid on"
done
# make sure all the partitions are ready
sleep 10
# get the partition names
partitions=
for disk in $devices; do
partitions="$partitions $(lsblk -no kname -p $disk | tail -n1)"
done
echo "partitions=$partitions"
ndevices=$(echo $partitions | wc -w)
echo "creating raid device"
mdadm --create $raid_device --level 0 --raid-devices $ndevices $partitions || exit 1
sleep 10
mdadm --verbose --detail --scan > /etc/mdadm.conf

Просмотреть файл

@ -0,0 +1,4 @@
#!/bin/bash
setenforce 0
sed -i 's/SELINUX=.*$/SELINUX=disabled/g' /etc/selinux/config

Просмотреть файл

@ -0,0 +1,3 @@
#!/bin/bash
ethtool -L eth1 tx 8 rx 8 && ifconfig eth1 down && ifconfig eth1 up

Просмотреть файл

@ -0,0 +1,16 @@
#!/bin/bash
# arg: $1 = lfsserver
# arg: $2 = mount point (default: /lustre)
master=$1
lfs_mount=${2:-/lustre}
if [ "$lustre_version" = "2.10" ]; then
yum install -y kmod-lustre-client
weak-modules --add-kernel $(uname -r)
fi
mkdir $lfs_mount
echo "${master}@tcp0:/LustreFS $lfs_mount lustre defaults,_netdev 0 0" >> /etc/fstab
mount -a
chmod 777 $lfs_mount

Просмотреть файл

@ -0,0 +1,89 @@
#!/bin/bash
# arg: $1 = lfsserver
# arg: $2 = storage account
# arg: $3 = storage key
# arg: $4 = storage container
# arg: $5 = lustre version (default 2.10)
master=$1
storage_account=$2
storage_key=$3
storage_container=$4
lustre_version=${5-2.10}
# adding kernel module for lustre client
if [ "$lustre_version" = "2.10" ]; then
yum install -y kmod-lustre-client
weak-modules --add-kernel $(uname -r)
fi
if ! rpm -q lemur-azure-hsm-agent lemur-azure-data-movers; then
yum -y install \
https://azurehpc.azureedge.net/rpms/lemur-azure-hsm-agent-1.0.0-lustre_${lustre_version}.x86_64.rpm \
https://azurehpc.azureedge.net/rpms/lemur-azure-data-movers-1.0.0-lustre_${lustre_version}.x86_64.rpm
fi
mkdir -p /var/run/lhsmd
chmod 755 /var/run/lhsmd
mkdir -p /etc/lhsmd
chmod 755 /etc/lhsmd
cat <<EOF >/etc/lhsmd/agent
# Lustre NID and filesystem name for the front end filesystem, the agent will mount this
client_device="${master}@tcp:/LustreFS"
# Do you want to use S3 and POSIX, in this example we use POSIX
enabled_plugins=["lhsm-plugin-az"]
## Directory to look for the plugins
plugin_dir="/usr/libexec/lhsmd"
# TBD, I used 16
handler_count=16
# TBD
snapshots {
enabled = false
}
EOF
chmod 600 /etc/lhsmd/agent
cat <<EOF >/etc/lhsmd/lhsm-plugin-az
az_storage_account = "$storage_account"
az_storage_key = "$storage_key"
num_threads = 32
#
# One or more archive definition is required.
#
archive "az-blob" {
id = 1 # Must be unique to this endpoint
container = "$storage_container" # Container used for this archive
prefix = "" # Optional prefix
num_threads = 32
}
EOF
chmod 600 /etc/lhsmd/lhsm-plugin-az
cat <<EOF >/etc/systemd/system/lhsmd.service
[Unit]
Description=The lhsmd server
After=syslog.target network.target remote-fs.target nss-lookup.target
[Service]
Type=simple
PIDFile=/run/lhsmd.pid
ExecStartPre=/bin/mkdir -p /var/run/lhsmd
ExecStart=/sbin/lhsmd -config /etc/lhsmd/agent
Restart=always
[Install]
WantedBy=multi-user.target
EOF
chmod 600 /etc/systemd/system/lhsmd.service
systemctl daemon-reload
systemctl enable lhsmd
systemctl start lhsmd

Просмотреть файл

@ -0,0 +1,25 @@
#!/bin/bash
# arg: $1 = storage account
# arg: $2 = storage key
# arg: $3 = storage container
# arg: $3 = lfs mount
# arg: $4 = lustre mount (default=/lustre)
# arg: $5 = lustre version (default=2.10)
storage_account=$1
storage_key=$2
storage_container=$3
lfs_mount=${4:-/lustre}
lustre_version=${5-2.10}
if ! rpm -q lemur-azure-hsm-agent lemur-azure-data-movers; then
yum -y install \
https://azurehpc.azureedge.net/rpms/lemur-azure-hsm-agent-1.0.0-lustre_${lustre_version}.x86_64.rpm \
https://azurehpc.azureedge.net/rpms/lemur-azure-data-movers-1.0.0-lustre_${lustre_version}.x86_64.rpm
fi
cd $lfs_mount
export STORAGE_ACCOUNT=$storage_account
export STORAGE_KEY=$storage_key
/sbin/azure-import ${storage_container}

Просмотреть файл

@ -0,0 +1,31 @@
#!/bin/bash
# arg: $1 = name
# arg: $2 = log analytics workspace id
# arg: $3 = log analytics key
name=$1
log_analytics_workspace_id=$2
log_analytics_key=$3
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
sed "s#__FS_NAME__#${name}#g;s#__LOG_ANALYTICS_WORKSPACE_ID__#${log_analytics_workspace_id}#g;s#__LOG_ANALYTICS_KEY__#${log_analytics_key}#g" $DIR/lfsloganalyticsd.sh.in >/usr/bin/lfsloganalyticsd.sh
chmod +x /usr/bin/lfsloganalyticsd.sh
cat <<EOF >/lib/systemd/system/lfsloganalytics.service
[Unit]
Description=Lustre logging service to Log Analytics.
[Service]
Type=simple
ExecStart=/bin/bash /usr/bin/lfsloganalyticsd.sh
Restart=always
[Install]
WantedBy=multi-user.target
EOF
systemctl enable lfsloganalytics
systemctl start lfsloganalytics

Просмотреть файл

@ -0,0 +1,65 @@
#!/bin/bash
fs_name=__FS_NAME__
workspace_id=__LOG_ANALYTICS_WORKSPACE_ID__
key="__LOG_ANALYTICS_KEY__"
DATE=`date '+%Y-%m-%d %H:%M:%S'`
echo "Lustre Log Analytics service started at ${DATE}" | systemd-cat -p info
me=$(hostname)
node=$(ls /proc/fs/lustre/osd-ldiskfs | grep LustreFS)
eth0=$(grep eth0 /proc/net/dev | sed 's/ */ /g')
bytesrecv_last=$(cut -d' ' -f 3 <<<"$eth0")
bytessend_last=$(cut -d' ' -f 11 <<<"$eth0")
while true
do
sleep 60;
eth0=$(grep eth0 /proc/net/dev | sed 's/ */ /g')
bytesrecv=$(cut -d' ' -f 3 <<<"$eth0")
bytessend=$(cut -d' ' -f 11 <<<"$eth0")
bytesrecv_int=$(($bytesrecv - $bytesrecv_last))
bytessend_int=$(($bytessend - $bytessend_last))
bytesrecv_last=$bytesrecv
bytessend_last=$bytessend
loadavg=$(cut -f1 -d' ' < /proc/loadavg)
kbytesfree=$(</proc/fs/lustre/osd-ldiskfs/${node}/kbytesfree)
content=$(cat <<EOF
{
"fsname":"$fs_name",
"hostname":"$me",
"uuid":"$node",
"loadavg":$loadavg,
"kbytesfree":$kbytesfree,
"bytessend":$bytessend_int,
"bytesrecv":$bytesrecv_int
}
EOF
)
content_len=${#content}
rfc1123date="$(date -u +%a,\ %d\ %b\ %Y\ %H:%M:%S\ GMT)"
string_to_hash="POST\n${content_len}\napplication/json\nx-ms-date:${rfc1123date}\n/api/logs"
utf8_to_hash=$(echo -n "$string_to_hash" | iconv -t utf8)
decoded_hex_key="$(echo "$key" | base64 --decode --wrap=0 | xxd -p -c256)"
signature="$(echo -ne "$utf8_to_hash" | openssl dgst -sha256 -mac HMAC -macopt "hexkey:$decoded_hex_key" -binary | base64)"
auth_token="SharedKey $workspace_id:$signature"
curl -s -S \
-H "Content-Type: application/json" \
-H "Log-Type: $fs_name" \
-H "Authorization: $auth_token" \
-H "x-ms-date: $rfc1123date" \
-X POST \
--data "$content" \
https://$workspace_id.ods.opinsights.azure.com/api/logs?api-version=2016-04-01
done

Просмотреть файл

@ -0,0 +1,18 @@
#!/bin/bash
# arg: $1 = device (e.g. L=/dev/sdb Lv2=/dev/nvme0n1)
device=$1
mkfs.lustre --fsname=LustreFS --mgs --mdt --mountfsoptions="user_xattr,errors=remount-ro" --backfstype=ldiskfs --reformat $device --index 0
mkdir /mnt/mgsmds
echo "$device /mnt/mgsmds lustre noatime,nodiratime,nobarrier 0 2" >> /etc/fstab
mount -a
# set up hsm
lctl set_param -P mdt.*-MDT0000.hsm_control=enabled
lctl set_param -P mdt.*-MDT0000.hsm.default_archive_id=1
lctl set_param mdt.*-MDT0000.hsm.max_requests=128
# allow any user and group ids to write
lctl set_param mdt.*-MDT0000.identity_upcall=NONE

Просмотреть файл

@ -0,0 +1,31 @@
#!/bin/bash
# arg: $1 = lfsmaster
# arg: $2 = device (e.g. L=/dev/sdb Lv2=/dev/nvme0n1)
# arg: $3 = start index
master=$1
devices=$2
index=$3
ndevices=$(wc -w <<<$devices)
for device in $devices; do
mkfs.lustre \
--fsname=LustreFS \
--backfstype=ldiskfs \
--reformat \
--ost \
--mgsnode=$master \
--index=$index \
--mountfsoptions="errors=remount-ro" \
$device
mkdir /mnt/oss${index}
echo "$device /mnt/oss${index} lustre noatime,nodiratime,nobarrier 0 2" >> /etc/fstab
index=$(( $index + 1 ))
done
mount -a

Просмотреть файл

@ -0,0 +1,11 @@
#!/bin/bash
yum -y install lustre kmod-lustre-osd-ldiskfs lustre-osd-ldiskfs-mount lustre-resource-agents e2fsprogs lustre-tests
sed -i 's/ResourceDisk\.Format=y/ResourceDisk.Format=n/g' /etc/waagent.conf
systemctl restart waagent
weak-modules --add-kernel --no-initramfs
umount /mnt/resource

Просмотреть файл

@ -0,0 +1,22 @@
#!/bin/bash
lustre_version=${1-2.12.5}
cat << EOF >/etc/yum.repos.d/LustrePack.repo
[lustreserver]
name=lustreserver
baseurl=https://downloads.whamcloud.com/public/lustre/lustre-${lustre_version}/el7/patchless-ldiskfs-server/
enabled=1
gpgcheck=0
[e2fs]
name=e2fs
baseurl=https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/
enabled=1
gpgcheck=0
[lustreclient]
name=lustreclient
baseurl=https://downloads.whamcloud.com/public/lustre/lustre-${lustre_version}/el7/client/
enabled=1
gpgcheck=0
EOF

Просмотреть файл

@ -0,0 +1,118 @@
#!/bin/bash
exec > /var/log/setup_lustre.log
exec 2>&1
script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"
echo "script_dir = $script_dir"
mds="$1"
storage_account="$2"
storage_key="$3"
storage_container="$4"
log_analytics_name="$5"
log_analytics_workspace_id="$6"
log_analytics_key="$7"
oss_disk_setup="$8"
# vars used in script
if [ -e /dev/nvme0n1 ]; then
devices='/dev/nvme*n1'
n_devices=$(echo $devices | wc -w)
echo "Using $n_devices NVME devices"
elif [ -e /dev/sdc ]; then
devices='/dev/sd[c-m]'
n_devices=$(echo $devices | wc -w)
echo "Using $n_devices NVME devices"
else
echo "ERROR: cannot find devices for storage"
exit 1
fi
lustre_version=2.12.5
if [ "$storage_account" = "" ]; then
use_hsm=false
else
use_hsm=true
fi
if [ "$log_analytics_name" = "" ]; then
use_log_analytics=false
else
use_log_analytics=true
fi
if [[ "$n_devices" -gt "1" && ( "$oss_disk_setup" = "raid" || "$HOSTNAME" = "$mds" ) ]]; then
device=/dev/md10
echo "creating raid ($device) from $n_devices devices : $devices"
$script_dir/create_raid0.sh $device $devices
devices=$device
n_devices=1
fi
echo "using $n_devices device(s) : $devices"
# SETUP LUSTRE YUM REPO
#$script_dir/lfsrepo.sh $lustre_version
# INSTALL LUSTRE PACKAGES
#$script_dir/lfspkgs.sh
ost_index=1
if [ "$HOSTNAME" = "$mds" ]; then
# SETUP MDS
$script_dir/lfsmaster.sh $devices
else
echo "wait for the mds to start"
modprobe lustre
while ! lctl ping $mds@tcp; do
sleep 2
done
idx=0
for c in $(echo ${HOSTNAME##$mds} | grep -o .); do
echo $c
idx=$(($idx * 36))
if [ -z "${c##[0-9]}" ]; then
idx=$(($idx + $c))
else
idx=$(($(printf "$idx + 10 + %d - %d" "'${c^^}" "'A")))
fi
done
ost_index=$(( ( $idx * $n_devices ) + 1 ))
echo "starting ost index=$ost_index"
mds_ip=$(ping -c 1 $mds | head -1 | sed 's/^[^)]*(//g;s/).*$//g')
$script_dir/lfsoss.sh $mds_ip "$devices" $ost_index
fi
if [ "${use_hsm,,}" = "true" ]; then
$script_dir/lfshsm.sh "$mds_ip" "$storage_account" "$storage_key" "$storage_container" "$lustre_version"
if [ "$HOSTNAME" = "$mds" ]; then
# IMPORT CONTAINER
$script_dir/lfsclient.sh $mds_ip /lustre
$script_dir/lfsimport.sh "$storage_account" "$storage_key" "$storage_container" /lustre "$lustre_version"
fi
fi
if [ "${use_log_analytics,,}" = "true" ]; then
$script_dir/lfsloganalytics.sh $log_analytics_name $log_analytics_workspace_id "$log_analytics_key"
fi

207
test/config.json Normal file
Просмотреть файл

@ -0,0 +1,207 @@
{
"location": "variables.location",
"resource_group": "variables.resource_group",
"install_from": "headnode",
"admin_user": "hpcadmin",
"variables": {
"hpc_image": "OpenLogic:CentOS-HPC:7.7:7.7.2020062600",
"location": "westeurope",
"resource_group": "<NOT-SET>",
"lustre_image": "centos-7.8-lustre-2.13.5",
"compute_instances": 4,
"vm_type": "Standard_D48s_v3",
"vnet_resource_group": "variables.resource_group",
"lustre_name": "lustre",
"lustre_mount": "/lustre",
"lustre_tier": "prem",
"oss_disk_setup": "separate",
"ost_per_oss": 6,
"lustre_stripe": 1
},
"vnet": {
"resource_group": "variables.vnet_resource_group",
"name": "hpcvnet",
"address_prefix": "10.2.0.0/20",
"subnets": {
"compute": "10.2.4.0/22"
}
},
"resources": {
"headnode": {
"type": "vm",
"vm_type": "Standard_D8s_v3",
"accelerated_networking": true,
"public_ip": true,
"image": "variables.hpc_image",
"subnet": "compute",
"data_disks": [1024, 1024],
"storage_sku": "Premium_LRS",
"tags": [
"all",
"headnode"
]
},
"compute": {
"type": "vmss",
"vm_type": "variables.vm_type",
"instances": "variables.compute_instances",
"accelerated_networking": true,
"image": "variables.hpc_image",
"subnet": "compute",
"tags": [
"all",
"compute"
]
}
},
"install": [
{
"script": "disable-selinux.sh",
"tag": "all",
"sudo": true
},
{
"script": "cndefault.sh",
"tag": "all",
"sudo": true
},
{
"script": "create_raid0.sh",
"tag": "headnode",
"args": ["/dev/md10", "/dev/sd[c-d]"],
"sudo": true
},
{
"script": "make_filesystem.sh",
"tag": "headnode",
"args": ["/dev/md10", "xfs", "/share"],
"sudo": true
},
{
"script": "install-nfsserver.sh",
"tag": "headnode",
"args": ["/share"],
"sudo": true
},
{
"script": "nfsclient.sh",
"args": [
"$(<hostlists/tags/headnode)"
],
"tag": "compute",
"sudo": true
},
{
"script": "localuser.sh",
"args": [
"$(<hostlists/tags/headnode)"
],
"tag": "all",
"sudo": true
},
{
"script": "lustre_client_packages.sh",
"tag": "all",
"sudo": true
},
{
"script": "build_ior.sh",
"tag": "headnode"
},
{
"type": "local_script",
"script": "deploy_lustre.sh",
"args": [
"variables.lustre_name",
"variables.compute_instances",
"$(<../hpcadmin_id_rsa.pub)",
"variables.resource_group",
"variables.lustre_image",
"variables.resource_group",
"hpcvnet",
"compute",
"variables.lustre_tier",
"variables.oss_disk_setup"
],
"deps": [
"azuredeploy.json"
]
},
{
"script": "wait_for_lustre.sh",
"tag": "headnode",
"args": [
"variables.lustre_name"
],
"sudo": true
},
{
"script": "lustre_mount.sh",
"tag": "all",
"args": [
"variables.lustre_name",
"variables.lustre_mount"
],
"sudo": true
},
{
"script": "wait_for_all_oss.sh",
"tag": "headnode",
"args": [
"variables.lustre_mount",
"variables.compute_instances",
"variables.ost_per_oss"
],
"sudo": true
},
{
"script": "write_oss_hostfile.sh",
"tag": "headnode"
},
{
"script": "fix_lv2_network.sh",
"tag": "headnode",
"args": [
"variables.lustre_tier",
"lustre",
"oss"
]
},
{
"script": "check_an.sh",
"tag": "headnode",
"args": [
"lustre",
"oss"
]
},
{
"script": "check_an.sh",
"tag": "headnode",
"args": [
"hpcadmin",
"hostlists/compute"
]
},
{
"script": "run_ior.sh",
"tag": "headnode",
"args": [
"hostlists/tags/compute",
24,
16,
"lustre",
"oss",
"variables.lustre_tier",
"variables.oss_disk_setup",
"variables.lustre_stripe",
"variables.ost_per_oss"
]
},
{
"type": "local_script",
"script": "copy_back_results.sh",
"tag": "headnode"
}
]
}

Просмотреть файл

@ -0,0 +1 @@
../../azuredeploy.json

40
test/scripts/build_ior.sh Executable file
Просмотреть файл

@ -0,0 +1,40 @@
#!/bin/bash
APP_NAME=ior
SHARED_APP=${SHARED_APP:-/apps}
MODULE_DIR=${SHARED_APP}/modulefiles
MODULE_NAME=${APP_NAME}
PARALLEL_BUILD=8
IOR_VERSION=3.2.1
INSTALL_DIR=${SHARED_APP}/${APP_NAME}-$IOR_VERSION
source /etc/profile.d/modules.sh # so we can load modules
export MODULEPATH=/usr/share/Modules/modulefiles:$MODULE_DIR
module load gcc-9.2.0
module load mpi/impi_2018.4.274
module list
function create_modulefile {
mkdir -p ${MODULE_DIR}
cat << EOF > ${MODULE_DIR}/${MODULE_NAME}
#%Module
prepend-path PATH ${INSTALL_DIR}/bin;
prepend-path LD_LIBRARY_PATH ${INSTALL_DIR}/lib;
prepend-path MAN_PATH ${INSTALL_DIR}/share/man;
setenv IOR_BIN ${INSTALL_DIR}/bin
EOF
}
cd $SHARED_APP
IOR_PACKAGE=ior-$IOR_VERSION.tar.gz
wget https://github.com/hpc/ior/releases/download/$IOR_VERSION/$IOR_PACKAGE
tar xvf $IOR_PACKAGE
rm $IOR_PACKAGE
cd ior-$IOR_VERSION
CC=`which mpicc`
./configure --prefix=${INSTALL_DIR}
make -j ${PARALLEL_BUILD}
make install
create_modulefile

6
test/scripts/check_an.sh Executable file
Просмотреть файл

@ -0,0 +1,6 @@
#!/bin/bash
user=$1
hostlist=$2
pssh -l $user -i -h $hostlist '/usr/sbin/lspci | grep Mellanox'

Просмотреть файл

@ -0,0 +1,12 @@
#/bin/bash
source ~/azurehpc/install.sh
echo "running from $(pwd)"
echo "moving up one directory"
cd ..
echo "trying the copy"
azhpc-scp -- -r 'hpcadmin@headnode:results-*' .

72
test/scripts/deploy_lustre.sh Executable file
Просмотреть файл

@ -0,0 +1,72 @@
#!/bin/bash
name=$1
instanceCount=$2
rsaPublicKey=$3
imageResourceGroup=$4
imageName=$5
existingVnetResourceGroupName=$6
existingVnetName=$7
existingSubnetName=$8
lustreTier=$9
ossDiskSetup=${10}
if [ "$lustreTier" = "eph" ]; then
mdsSku=Standard_L8s_v2
ossSku=Standard_L48s_v2
mdtStorageSku=Premium_LRS
mdtCacheOption=None
mdtDiskSize=0
mdtNumDisks=0
ostStorageSku=Premium_LRS
ostCacheOption=None
ostDiskSize=0
ostNumDisks=0
elif [ "$lustreTier" = "prem" ]; then
mdsSku=Standard_D8s_v3
ossSku=Standard_D48s_v3
mdtStorageSku=Premium_LRS
mdtCacheOption=ReadWrite
mdtDiskSize=1024
mdtNumDisks=2
ostStorageSku=Premium_LRS
ostCacheOption=None
ostDiskSize=1024
ostNumDisks=6
elif [ "$lustreTier" = "std" ]; then
mdsSku=Standard_D8s_v3
ossSku=Standard_D48s_v3
mdtStorageSku=Standard_LRS
mdtCacheOption=ReadWrite
mdtDiskSize=1024
mdtNumDisks=4
ostStorageSku=Standard_LRS
ostCacheOption=None
ostDiskSize=8192
ostNumDisks=4
else
echo "Unknown lustre tier ($lustreTier)."
exit 1
fi
az deployment group create -g $imageResourceGroup --template-file scripts/azuredeploy.json --parameters \
name="$name" \
mdsSku="$mdsSku" \
ossSku="$ossSku" \
instanceCount="$instanceCount" \
rsaPublicKey="$rsaPublicKey" \
imageResourceGroup="$imageResourceGroup" \
imageName="$imageName" \
existingVnetResourceGroupName="$existingVnetResourceGroupName" \
existingVnetName="$existingVnetName" \
existingSubnetName="$existingSubnetName" \
mdtStorageSku="$mdtStorageSku" \
mdtCacheOption="$mdtCacheOption" \
mdtDiskSize="$mdtDiskSize" \
mdtNumDisks="$mdtNumDisks" \
ostStorageSku="$ostStorageSku" \
ostCacheOption="$ostCacheOption" \
ostDiskSize="$ostDiskSize" \
ostNumDisks="$ostNumDisks" \
ossDiskSetup="$ossDiskSetup"

13
test/scripts/fix_lv2_network.sh Executable file
Просмотреть файл

@ -0,0 +1,13 @@
#!/bin/bash
tier=$1
oss_user=$2
oss_hostfile=$3
if [ "$tier" = "eph" ]; then
echo "running 'sudo ethtool -L eth1 tx 8 rx 8 && sudo ifconfig eth1 down && sudo ifconfig eth1 up' on nodes"
pssh -t 0 -i -l $oss_user -h $oss_hostfile 'sudo ethtool -L eth1 tx 8 rx 8 && sudo ifconfig eth1 down && sudo ifconfig eth1 up'
fi

Просмотреть файл

@ -0,0 +1,7 @@
#!/bin/bash
resource_group=$1
vmss=$2
output_file=$3
az vmss list-instances --resource-group $resource_group --name $vmss -o tsv --query [].osProfile.computerName | tee $output_file

Просмотреть файл

@ -0,0 +1,37 @@
#!/bin/bash
lustre_dir=latest-2.12-release
cat << EOF >/etc/yum.repos.d/LustrePack.repo
[lustreserver]
name=lustreserver
baseurl=https://downloads.whamcloud.com/public/lustre/${lustre_dir}/el7/patchless-ldiskfs-server/
enabled=1
gpgcheck=0
[e2fs]
name=e2fs
baseurl=https://downloads.whamcloud.com/public/e2fsprogs/latest/el7/
enabled=1
gpgcheck=0
[lustreclient]
name=lustreclient
baseurl=https://downloads.whamcloud.com/public/lustre/${lustre_dir}/el7/client/
enabled=1
gpgcheck=0
EOF
# install the right kernel devel if not installed
release_version=$(cat /etc/redhat-release | cut -d' ' -f4)
kernel_version=$(uname -r)
if ! rpm -q kernel-devel-${kernel_version}; then
yum -y install http://olcentgbl.trafficmanager.net/centos/${release_version}/updates/x86_64/kernel-devel-${kernel_version}.rpm
fi
# install the client RPMs if not already installed
if ! rpm -q lustre-client lustre-client-dkms; then
yum -y install lustre-client lustre-client-dkms || exit 1
fi
weak-modules --add-kernel $(uname -r)

13
test/scripts/lustre_mount.sh Executable file
Просмотреть файл

@ -0,0 +1,13 @@
#!/bin/bash
# arg: $1 = lfsserver
# arg: $2 = mount point (default: /lustre)
master=$1
lfs_mount=${2:-/lustre}
mkdir $lfs_mount
echo "${master}@tcp0:/LustreFS $lfs_mount lustre flock,defaults,_netdev 0 0" >> /etc/fstab
mount -a
chmod 777 $lfs_mount
df -h

70
test/scripts/run_ior.sh Executable file
Просмотреть файл

@ -0,0 +1,70 @@
#!/bin/bash
hostfile=$1
nodes=$(wc -l <$hostfile)
ppn=$2
sz_in_gb=$3
cores=$(($nodes * $ppn))
oss_user=$4
oss_hostfile=$5
lustre_tier=$6
oss_disk_setup=$7
lustre_stripe=$8
ost_per_oss=$9
timestamp=$(date "+%Y%m%d-%H%M%S")
source /etc/profile.d/modules.sh
export MODULEPATH=/usr/share/Modules/modulefiles:/apps/modulefiles
module load gcc-9.2.0
module load mpi/impi_2018.4.274
module load ior
device_list=()
if [[ "$oss_disk_setup" = "raid" ]]; then
device_list+=(md10)
elif [[ "$lustre_tier" = "eph" ]]; then
for i in $(seq 0 $(( $ost_per_oss - 1 )) ); do
device_list+=(nvme${i}n1)
done
elif [[ "$lustre_tier" = "prem" || "$lustre_tier" = "std" ]]; then
for i in $( seq 0 $(( $ost_per_oss - 1 )) ); do
# get the letter for the device
c_dec=$(printf "%d" "'c")
dev_dec=$(( $c_dec + $i ))
dev_hex=$(printf "%x" $dev_dec)
dev_char=$(printf "\x$dev_hex")
device_list+=(sd${dev_char})
done
else
echo "unrecognised lustre type ($lustre_tier)."
exit 1
fi
devices=$(echo "${device_list[@]}" | tr ' ' ',')
echo "Monitoring devices: $devices"
pssh -t 0 -l $oss_user -h $oss_hostfile 'dstat -n -Neth0,eth1 -d -D'$devices' --output $(hostname)-'${timestamp}'.dstat' 2>&1 >/dev/null &
test_dir=/lustre/test-${timestamp}
lfs setstripe --stripe-count $lustre_stripe $test_dir
mpirun -np $cores -ppn $ppn -hostfile $hostfile ior -k -a POSIX -v -i 1 -B -m -d 1 -F -w -r -t 32M -b ${sz_in_gb}G -o $test_dir
lfs df -h
df -h /lustre
kill %1
for h in $(<${oss_hostfile}); do
scp ${oss_user}@${h}:'*'-${timestamp}.dstat .
done
results_dir=~/results-${lustre_tier}-${oss_disk_setup}-${lustre_stripe}-${timestamp}
mkdir $results_dir
for i in *-${timestamp}.dstat; do
# remove the first value as it is often very high
sed '1,5d' $i > ${results_dir}/${i%%-${timestamp}.dstat}.csv
done

Просмотреть файл

@ -0,0 +1,18 @@
#!/bin/bash
mount_point=$1
num_oss=$2
ost_per_oss=$3
# wait for all the OSS to be present first
while [ "$(lctl get_param osc.*.ost_conn_uuid | cut -d'=' -f2 | uniq | wc -l)" != "$num_oss" ]; do
echo " waiting for all $num_oss OSS"
sleep 5
done
echo "all $num_ost OSS have started"
while [ "$(lctl get_param osc.*.ost_conn_uuid | cut -d'=' -f2 | uniq -c | sed 's/^ *//g' | cut -f1 -d' ' | uniq -c | sed 's/^ *//g')" != "$num_oss $ost_per_oss" ]; do
echo " waiting for all $ost_per_oss OSTs on each OSS"
sleep 5
done

Просмотреть файл

@ -0,0 +1,8 @@
#!/bin/bash
mds=$1
modprobe lustre
while ! lctl ping $mds@tcp; do
sleep 2
done

Просмотреть файл

@ -0,0 +1,9 @@
#!/bin/bash
for i in {0..9} {a..z}; do echo "lustre00000${i}"; done > oss
for i in {0..9} {a..z}; do echo "lustre00001${i}"; done >> oss
for i in {0..9} {a..z}; do echo "lustre00002${i}"; done >> oss
for i in {0..9} {a..z}; do echo "lustre00003${i}"; done >> oss
pssh -l lustre -h oss hostname 2>/dev/null | grep SUCCESS |sed 's/.*\[SUCCESS\] //g' | sort | tee oss.real && mv oss.real oss