azurehpc/examples/simple_hpc_pbs_docker
Xavier Pillons b468abb015 Merge branch 'fix-pbsinstall' into xps-healthchecks 2020-05-13 11:41:41 +02:00
..
config.json Merge branch 'fix-pbsinstall' into xps-healthchecks 2020-05-13 11:41:41 +02:00
readme.md

readme.md

Build a PBS compute cluster

Visualisation: config.json

This example will create an HPC cluster ready to run with PBS Pro and docker.

Initialise the project

To start you need to copy this directory and update the config.json. Azurehpc provides the azhpc-init command that can help here by copying the directory and substituting the unset variables. First run with the -s parameter to see which variables need to be set:

azhpc-init -c $azhpc_dir/examples/simple_hpc_pbs_docker -d simple_hpc_pbs_docker -s

The variables can be set with the -v option where variables are comma separated. The output from the previous command as a starting point. The -d option is required and will create a new directory name for you. Please update to whatever resource_group you would like to deploy to:

azhpc-init -c $azhpc_dir/examples/simple_hpc_pbs_docker -d simple_hpc_pbs_docker -v resource_group=azurehpc-cluster,acr_repo=my_acr,docker_user=hpcuser,monitor_workspace="xxxx-xxxxx-xxxx-xxxx",key_vault=my_keyvault

Note: You can still update variables even if they are already set. For example, in the command below we change the region to westus2 and the SKU to Standard_HC44rs:

azhpc-init -c $azhpc_dir/examples/simple_hpc_pbs_docker -d simple_hpc_pbs_docker -v location=westus2,vm_type=Standard_HC44rs,resource_group=azhpc-cluster,acr_repo=my_acr,docker_user=hpcuser,monitor_workspace="xxxx-xxxxx-xxxx-xxxx",key_vault=my_keyvault

Create the cluster

cd simple_hpc_pbs_docker
azhpc-build

Allow ~10 minutes for deployment. You are able to view the status VMs being deployed by running azhpc-status in another terminal.

Log in the cluster

Connect to the headnode and check PBS and NFS

$ azhpc-connect -u hpcuser headnode
Fri Jun 28 09:18:04 UTC 2019 : logging in to headnode (via headnode6cfe86.westus2.cloudapp.azure.com)
[hpcuser@headnode ~]$ pbsnodes -avS
vnode           state           OS       hardware host            queue        mem     ncpus   nmics   ngpus  comment
--------------- --------------- -------- -------- --------------- ---------- -------- ------- ------- ------- ---------
compuc407000003 free            --       --       10.2.4.8        --            224gb      60       0       0 --
compuc407000002 free            --       --       10.2.4.7        --            224gb      60       0       0 --
[hpcuser@headnode ~]$ sudo exportfs -v
/share/apps     <world>(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)
/share/data     <world>(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)
/share/home     <world>(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)
/mnt/resource/scratch
                <world>(sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,root_squash,no_all_squash)
[hpcuser@headnode ~]$

To check the state of the cluster you can run the following commands

azhpc-connect -u hpcuser headnode
qstat -Q
pbsnodes -avS
df -h

docker command should now be available on the compute nodes. (e.g docker pull, docker run etc).