Add OpenFOAM-Infiniband-IntelMPI recipe

- Add real NAMD-Infiniband-IntelMPI image
- GlusterFS mountpoint now inside AZ_BATCH_NODE_SHARED_DIR
This commit is contained in:
Fred Park 2016-09-28 20:50:53 -07:00
Родитель 2ac48b846d
Коммит be36e2face
25 изменённых файлов: 299 добавлений и 54 удалений

Просмотреть файл

@ -2,7 +2,12 @@
## [Unreleased]
### Added
- NAMD-GPU recipe
- NAMD-GPU, OpenFOAM-Infiniband-IntelMPI recipe
### Changed
- GlusterFS mountpoint is now within `$AZ_BATCH_NODE_SHARED_DIR` so files can
be viewed/downloaded with Batch APIs
- NAMD-Infiniband-IntelMPI recipe now contains a real Docker image link
## [1.0.0] - 2016-09-22
### Added

Просмотреть файл

@ -19,7 +19,7 @@ a large number of VMs via private peer-to-peer distribution of Docker images
among the compute nodes
* Automated Docker Private Registry instance creation on compute nodes with
Docker images backed to Azure Storage if specified
* Automatic shared data volume support for:
* Automatic shared data volume support:
* [Azure File Docker Volume Driver](https://github.com/Azure/azurefile-dockervolumedriver)
installation and share setup for SMB/CIFS backed to Azure Storage if
specified
@ -40,9 +40,9 @@ on [Azure N-Series VM instances](https://azure.microsoft.com/en-us/blog/azure-n-
cluster applications on compute pools with automatic job cleanup
* Transparent assist for running Docker containers utilizing Infiniband/RDMA
for MPI on HPC low-latency Azure VM instances:
* A-Series: STANDARD\_A8, STANDARD\_A9 ([info](https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-a8-a9-a10-a11-specs/))
* H-Series: STANDARD\_H16R, STANDARD\_H16MR
* N-Series: STANDARD\_NC24R
* [A-Series](https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-a8-a9-a10-a11-specs/): STANDARD\_A8, STANDARD\_A9
* [H-Series](https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-sizes/#h-series): STANDARD\_H16R, STANDARD\_H16MR
* [N-Series](https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-sizes/#n-series-preview): STANDARD\_NC24R
* Automatic setup of SSH tunneling to Docker Hosts on compute nodes if
specified

Просмотреть файл

@ -11,7 +11,7 @@
"shared_data_volumes": [
"glustervol"
],
"command": "mpirun --allow-run-as-root --host $AZ_BATCH_HOST_LIST,$AZ_BATCH_HOST_LIST,$AZ_BATCH_HOST_LIST,$AZ_BATCH_HOST_LIST --mca btl_tcp_if_exclude docker0 /bin/bash -c \"export LD_LIBRARY_PATH=/usr/local/openblas/lib:/usr/local/nvidia/lib64 && cp -r /cntk/Examples/Other/Simple2d/* . && /cntk/build/gpu/release/bin/cntk configFile=Config/Multigpu.cntk RootDir=. OutputDir=$AZ_BATCH_NODE_SHARED_DIR/azurefileshare/Output parallelTrain=true\"",
"command": "mpirun --allow-run-as-root --host $AZ_BATCH_HOST_LIST,$AZ_BATCH_HOST_LIST,$AZ_BATCH_HOST_LIST,$AZ_BATCH_HOST_LIST --mca btl_tcp_if_exclude docker0 /bin/bash -c \"export LD_LIBRARY_PATH=/usr/local/openblas/lib:/usr/local/nvidia/lib64 && cp -r /cntk/Examples/Other/Simple2d/* . && /cntk/build/gpu/release/bin/cntk configFile=Config/Multigpu.cntk RootDir=. OutputDir=$AZ_BATCH_NODE_SHARED_DIR/gfs/Output parallelTrain=true\"",
"multi_instance": {
"num_instances": "pool_specification_vm_count",
"coordination_command": null

Просмотреть файл

@ -1,20 +1,18 @@
# NAMD-Infiniband
# NAMD-Infiniband-IntelMPI
This recipe shows how to run [NAMD](http://www.ks.uiuc.edu/Research/namd/)
2.10 on Linux using the Intel MPI libraries over Infiniband/RDMA Azure VM
on Linux using the Intel MPI libraries over Infiniband/RDMA Azure VM
instances in an Azure Batch compute pool. Execution of this distributed
workload requires the use of
[multi-instance tasks](../docs/80-batch-shipyard-multi-instance-tasks.md).
Interested in an TCP/IP-enabled version of NAMD for use with Batch Shipyard
instead? Visit [this recipe](../NAMD-TCP).
## Configuration
Please see refer to this [set of sample configuration files](./config) for
this recipe.
### Pool Configuration
The pool configuration should enable the following properties:
* `vm_size` must be either `STANDARD_A8` or `STANDARD_A9`
* `vm_size` must be either `STANDARD_A8`, `STANDARD_A9`, `STANDARD_H16R`,
`STANDARD_H16MR`
* `inter_node_communication_enabled` must be set to `true`
* `max_tasks_per_node` must be set to 1 or omitted
* `publisher` should be `OpenLogic`. `SUSE` will be supported in a future
@ -26,13 +24,16 @@ supported by the Azure Batch service.
### Global Configuration
The global configuration should set the following properties:
* `docker_images` array must have a reference to a valid NAMD-Infiniband
image compiled against Intel MPI.
* `docker_images` array must have a reference to a valid
NAMD-Infiniband-IntelMPI image compiled against Intel MPI. This
can be `alfpark/namd:2.11-icc-mkl-intelmpi` which is published on
[Docker Hub](https://hub.docker.com/r/alfpark/namd/).
### Jobs Configuration
The jobs configuration should set the following properties within the `tasks`
array which should have a task definition containing:
* `image` should be the name of the Docker image for this container invocation.
* `image` should be the name of the Docker image for this container invocation,
e.g., `alfpark/namd:2.11-icc-mkl-intelmpi`
* `name` is a unique name given to the Docker container instance. This is
required for Multi-Instance tasks.
* `command` should contain the `mpirun` command. If using the sample

Просмотреть файл

@ -4,7 +4,7 @@ FROM centos:7.1.1503
MAINTAINER Fred Park <https://github.com/Azure/batch-shipyard>
# set up base
COPY ssh_config /root/.ssh/
COPY ssh_config /root/.ssh/config
RUN yum swap -y fakesystemd systemd \
&& yum install -y openssh-clients openssh-server net-tools libmlx4 librdmacm libibverbs dapl rdma \
&& yum clean all \
@ -15,20 +15,17 @@ RUN yum swap -y fakesystemd systemd \
&& sed -i 's/#RSAAuthentication yes/RSAAuthentication yes/g' /etc/ssh/sshd_config \
&& sed -i 's/#PubkeyAuthentication yes/PubkeyAuthentication yes/g' /etc/ssh/sshd_config \
&& ssh-keygen -f /root/.ssh/id_rsa -t rsa -N '' \
&& chmod 600 /root/.ssh/ssh_config \
&& chmod 600 /root/.ssh/config \
&& chmod 700 /root/.ssh \
&& cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys \
&& mv /root/.ssh/ssh_config /root/.ssh/config
&& cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
# add software
ADD NAMD_2.10_Linux-x86_64-MPI-icc-mkl.tar.gz /sw
ADD NAMD_2.11_Linux-x86_64-MPI-icc-mkl.tar.gz /sw
ADD apoa1.tar.gz stmv.tar.gz /sw/namd/
COPY run_namd.sh /sw/
# export environment
ENV NAMD_DIR=/sw/namd NAMD_SCRIPT=/sw/run_namd.sh
# intel mpi infiniband vars will be automatically set by Batch Shipyard
#ENV I_MPI_FABRICS=shm:dapl I_MPI_DAPL_PROVIDER=ofa-v2-ib0 I_MPI_DYNAMIC_CONNECTION=0 MANPATH=/usr/share/man:/usr/local/man
# set up sshd on port 23
EXPOSE 23

Просмотреть файл

@ -19,6 +19,6 @@ nodes=${#HOSTS[@]}
np=$(($nodes * $ppn))
# execute NAMD
source /opt/intel/compilers_and_libraries/linux/mpi/bin64/mpivars.sh
echo "Executing namd on $np processors (ppn=$ppn)..."
source $MPIVARS_SCRIPT
mpirun -np $np -ppn $ppn -hosts $AZ_BATCH_HOST_LIST $NAMD_DIR/namd2 $1.namd

Просмотреть файл

@ -1,15 +1,12 @@
# NAMD-TCP
This recipe shows how to run [NAMD](http://www.ks.uiuc.edu/Research/namd/)
2.10 on Linux using the
on Linux using the
[Charm++ runtime](http://charm.cs.illinois.edu/manuals/html/charm++/)
(as opposed to pure MPI) over TCP/IP-connected machines in an Azure Batch
compute pool. Regardless of the underlying parallel/distributed programming
paradigm, execution of this distributed workload requires the use of
[multi-instance tasks](../docs/80-batch-shipyard-multi-instance-tasks.md).
Interested in an Infiniband-enabled version of NAMD for use with Batch
Shipyard? Visit [this recipe](../NAMD-Infiniband-IntelMPI).
## Configuration
Please see refer to this [set of sample configuration files](./config) for
this recipe.
@ -22,14 +19,14 @@ The pool configuration should enable the following properties:
### Global Configuration
The global configuration should set the following properties:
* `docker_images` array must have a reference to the NAMD-TCP image. This
can be `alfpark/namd:2.10-tcp` which is published on
can be `alfpark/namd:2.11-tcp` which is published on
[Docker Hub](https://hub.docker.com/r/alfpark/namd/).
### Jobs Configuration
The jobs configuration should set the following properties within the `tasks`
array which should have a task definition containing:
* `image` should be the name of the Docker image for this container invocation,
e.g., `alfpark/namd:2.10-tcp`
e.g., `alfpark/namd:2.11-tcp`
* `name` is a unique name given to the Docker container instance. This is
required for Multi-Instance tasks.
* `command` should contain the `mpirun` command. If using the sample NAMD-TCP

Просмотреть файл

@ -5,7 +5,7 @@
},
"global_resources": {
"docker_images": [
"alfpark/namd:2.10-tcp"
"alfpark/namd:2.11-tcp"
]
}
}

Просмотреть файл

@ -5,7 +5,7 @@
"multi_instance_auto_complete": true,
"tasks": [
{
"image": "alfpark/namd:2.10-tcp",
"image": "alfpark/namd:2.11-tcp",
"name": "namd",
"remove_container_after_exit": true,
"command": "/sw/run_namd.sh apoa1 100",

Просмотреть файл

@ -4,7 +4,7 @@ FROM centos:7.1.1503
MAINTAINER Fred Park <https://github.com/Azure/batch-shipyard>
# set up base and ssh keys
COPY ssh_config /root/.ssh/
COPY ssh_config /root/.ssh/config
RUN yum swap -y fakesystemd systemd \
&& yum install -y openssh-clients openssh-server net-tools \
&& yum clean all \
@ -15,13 +15,12 @@ RUN yum swap -y fakesystemd systemd \
&& sed -i 's/#RSAAuthentication yes/RSAAuthentication yes/g' /etc/ssh/sshd_config \
&& sed -i 's/#PubkeyAuthentication yes/PubkeyAuthentication yes/g' /etc/ssh/sshd_config \
&& ssh-keygen -f /root/.ssh/id_rsa -t rsa -N '' \
&& chmod 600 /root/.ssh/ssh_config \
&& chmod 600 /root/.ssh/config \
&& chmod 700 /root/.ssh \
&& cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys \
&& mv /root/.ssh/ssh_config /root/.ssh/config
&& cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
# export environment
ENV NAMD_VER=NAMD_2.10_Linux-x86_64-TCP
ENV NAMD_VER=NAMD_2.11_Linux-x86_64-TCP
ENV NAMD_DIR=/sw/$NAMD_VER NAMD_SCRIPT=/sw/run_namd.sh
# add software

Просмотреть файл

@ -0,0 +1,60 @@
# OpenFOAM-Infiniband-IntelMPI
This recipe shows how to run [OpenFoam](http://www.openfoam.com/)
on Linux using Intel MPI over Infiniband/RDMA Azure VM instances in an Azure
Batch compute pool. Execution of this distributed workload requires the use of
[multi-instance tasks](../docs/80-batch-shipyard-multi-instance-tasks.md).
## Configuration
Please see refer to this [set of sample configuration files](./config) for
this recipe.
### Pool Configuration
The pool configuration should enable the following properties:
* `vm_size` must be either `STANDARD_A8`, `STANDARD_A9`, `STANDARD_H16R`,
`STANDARD_H16MR`
* `inter_node_communication_enabled` must be set to `true`
* `max_tasks_per_node` must be set to 1 or omitted
* `publisher` should be `OpenLogic`. `SUSE` will be supported in a future
version of Batch Shipyard.
* `offer` should be `CentOS-HPC`. `SLES-HPC` will be supported in a future
version of Batch Shipyard.
* `sku` should be `7.1` for the current latest RDMA-enabled CentOS-HPC sku
supported by the Azure Batch service.
### Global Configuration
The global configuration should set the following properties:
* `docker_images` array must have a reference to a valid OpenFOAM image
that can be run with Intel MPI and Infiniband in a Docker container context
on Azure VM instances. This can be
`alfpark/openfoam:v1606plus-icc-intelmpi` which is published on
[Docker Hub](https://hub.docker.com/r/alfpark/openfoam).
* `docker_volumes` must be populated with the following:
* `shared_data_volumes` should contain an Azure File Docker volume driver,
a GlusterFS share or a manually configured NFS share. Batch
Shipyard has automatic support for setting up Azure File Docker Volumes
and GlusterFS, please refer to the
[Batch Shipyard Configuration doc](../../docs/10-batch-shipyard-configuration.md).
### Jobs Configuration
The jobs configuration should set the following properties within the `tasks`
array which should have a task definition containing:
* `image` should be the name of the Docker image for this container invocation.
For this example, this should be `alfpark/openfoam:v1606plus-icc-intelmpi`.
* `name` is a unique name given to the Docker container instance. This is
required for Multi-Instance tasks.
* `command` should contain the `mpirun` command. If using the sample
`run_sample.sh` script then the command should be simply:
`/opt/OpenFOAM/run_sample.sh`
* `shared_data_volumes` should have a valid volume name as defined in the
global configuration file. Please see the previous section for details.
* `multi_instance` property must be defined
* `num_instances` should be set to `pool_specification_vm_count` or
`pool_current_dedicated`
* `coordination_command` should be unset or `null`
* `resource_files` array can be empty
## Dockerfile and supplementary files
The `Dockerfile` for the Docker image can be found [here](./docker). Please
note that you must agree with the
[OpenFOAM license](http://openfoam.org/licence/) before using this Docker
image.

Просмотреть файл

@ -0,0 +1,19 @@
{
"batch_shipyard": {
"storage_account_settings": "<storage account specified in credentials.json>",
"storage_entity_prefix": "shipyard"
},
"global_resources": {
"docker_images": [
"alfpark/openfoam:v1606plus-icc-intelmpi"
],
"docker_volumes": {
"shared_data_volumes": {
"glustervol": {
"volume_driver": "glusterfs",
"container_path": "$AZ_BATCH_NODE_SHARED_DIR/gfs"
}
}
}
}
}

Просмотреть файл

@ -0,0 +1,16 @@
{
"credentials": {
"batch": {
"account": "<batch account name>",
"account_key": "<batch account key>",
"account_service_url": "<batch account service url>"
},
"storage": {
"mystorageaccount": {
"account": "<storage account name>",
"account_key": "<storage account key>",
"endpoint": "core.windows.net"
}
}
}
}

Просмотреть файл

@ -0,0 +1,24 @@
{
"job_specifications": [
{
"id": "openfoamjob",
"multi_instance_auto_complete": true,
"tasks": [
{
"image": "alfpark/openfoam:v1606plus-icc-intelmpi",
"name": "openfoam",
"remove_container_after_exit": true,
"shared_data_volumes": [
"glustervol"
],
"command": "/opt/OpenFOAM/run_sample.sh",
"infiniband": true,
"multi_instance": {
"num_instances": "pool_specification_vm_count",
"coordination_command": null
}
}
]
}
]
}

Просмотреть файл

@ -0,0 +1,17 @@
{
"pool_specification": {
"id": "docker-openfoam-rdma",
"vm_size": "STANDARD_A9",
"vm_count": 2,
"inter_node_communication_enabled": true,
"publisher": "OpenLogic",
"offer": "CentOS-HPC",
"sku": "7.1",
"ssh_docker_tunnel": {
"username": "docker",
"generate_tunnel_script": true
},
"reboot_on_start_task_failed": true,
"block_until_all_global_resources_loaded": true
}
}

Просмотреть файл

@ -0,0 +1,45 @@
# Dockerfile for OpenFOAM-Infiniband-IntelMPI for use with Batch Shipyard on Azure Batch
FROM centos:7.1.1503
MAINTAINER Fred Park <https://github.com/Azure/batch-shipyard>
# set up base and ssh keys
COPY ssh_config /root/.ssh/config
RUN yum swap -y fakesystemd systemd \
&& yum install -y epel-release \
&& yum install -y \
openssh-clients openssh-server net-tools gnuplot mpfr-devel \
qt-devel qt-assistant qt-x11 qtwebkit-devel libGLU-devel \
libmlx4 librdmacm libibverbs dapl rdma \
&& yum clean all \
&& mkdir -p /var/run/sshd \
&& ssh-keygen -A \
&& sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config \
&& sed -i 's/#PermitRootLogin yes/PermitRootLogin yes/g' /etc/ssh/sshd_config \
&& sed -i 's/#RSAAuthentication yes/RSAAuthentication yes/g' /etc/ssh/sshd_config \
&& sed -i 's/#PubkeyAuthentication yes/PubkeyAuthentication yes/g' /etc/ssh/sshd_config \
&& ssh-keygen -f /root/.ssh/id_rsa -t rsa -N '' \
&& chmod 600 /root/.ssh/config \
&& chmod 700 /root/.ssh \
&& cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
# add intel redistributables
ADD l_comp_lib_2016.4.258_comp.cpp_redist.tgz l_comp_lib_2016.4.258_comp.for_redist.tgz /tmp/
RUN cd /tmp/l_comp_lib_2016.4.258_comp.cpp_redist \
&& ./install.sh -i /opt/intel2 -e \
&& cd /tmp/l_comp_lib_2016.4.258_comp.for_redist \
&& ./install.sh -i /opt/intel2 -e \
&& rm -rf /tmp/l_comp_lib_2016.4.258_comp.cpp_redist /tmp/l_comp_lib_2016.4.258_comp.for_redist
ENV INTELCOMPILERVARS=/opt/intel2/bin/compilervars.sh
# add openfoam with env vars
ADD openfoam-v1606plus-icc-intelmpi.tar.gz /opt
ENV OPENFOAM_VER=v1606+ FOAM_INST_DIR=/opt/OpenFOAM PATH=${PATH}:/usr/lib64/qt4/bin
ENV OPENFOAM_DIR=${FOAM_INST_DIR}/OpenFOAM-${OPENFOAM_VER}
# copy sample run script
COPY run_sample.sh ${FOAM_INST_DIR}
# set up sshd on port 23
EXPOSE 23
CMD ["/usr/sbin/sshd", "-D", "-p", "23"]

Просмотреть файл

@ -0,0 +1,3 @@
# Dockerfile for OpenFOAM-Infiniband-IntelMPI
You must agree to the [OpenFOAM license](http://openfoam.org/licence/)
prior to use.

Просмотреть файл

@ -0,0 +1,40 @@
#!/usr/bin/env bash
set -e
set -o pipefail
# set up mpi and set up openfoam env
source $INTELCOMPILERVARS intel64
source /opt/intel/compilers_and_libraries/linux/mpi/bin64/mpivars.sh
export MPI_ROOT=$I_MPI_ROOT
OPENFOAM_DIR=/opt/OpenFOAM/OpenFOAM-v1606+
source $OPENFOAM_DIR/etc/bashrc
# copy sample into glusterfs shared area
GFS_DIR=$AZ_BATCH_NODE_SHARED_DIR/gfs
cd $GFS_DIR
cp -r $OPENFOAM_DIR/tutorials/incompressible/simpleFoam/pitzDaily .
cp $OPENFOAM_DIR/tutorials/incompressible/simpleFoam/pitzDailyExptInlet/system/decomposeParDict pitzDaily/system/
# get nodes and compute number of processors
IFS=',' read -ra HOSTS <<< "$AZ_BATCH_HOST_LIST"
nodes=${#HOSTS[@]}
ppn=`nproc`
np=$(($nodes * $ppn))
# substitute proper number of subdomains
sed -i -e "s/^numberOfSubdomains 4/numberOfSubdomains $np;/" pitzDaily/system/decomposeParDict
root=`python -c "import math; x=int(math.sqrt($np)); print x if x*x==$np else -1"`
if [ $root -eq -1 ]; then
sed -i -e "s/\s*n\s*(2 2 1)/ n ($ppn $nodes 1)/g" pitzDaily/system/decomposeParDict
else
sed -i -e "s/\s*n\s*(2 2 1)/ n ($root $root 1)/g" pitzDaily/system/decomposeParDict
fi
# decompose
cd pitzDaily
blockMesh
decomposePar -force
# execute mpi job
mpirun -np $np -ppn $ppn -hosts $AZ_BATCH_HOST_LIST simpleFoam -parallel

Просмотреть файл

@ -0,0 +1,4 @@
Host *
Port 23
StrictHostKeyChecking no
UserKnownHostsFile /dev/null

Просмотреть файл

@ -22,7 +22,7 @@ values:
The global configuration should set the following properties:
* `docker_images` array must have a reference to a valid OpenFOAM image
that can be run with MPI in a Docker container context. This can be
`alfpark/openfoam:v1606plus-openmpi` which is published on
`alfpark/openfoam:v1606plus-gcc-openmpi` which is published on
[Docker Hub](https://hub.docker.com/r/alfpark/openfoam).
* `docker_volumes` must be populated with the following:
* `shared_data_volumes` should contain an Azure File Docker volume driver,
@ -35,7 +35,7 @@ that can be run with MPI in a Docker container context. This can be
The jobs configuration should set the following properties within the `tasks`
array which should have a task definition containing:
* `image` should be the name of the Docker image for this container invocation.
For this example, this should be `alfpark/openfoam:v1606+-openmpi`.
For this example, this should be `alfpark/openfoam:v1606plus-gcc-openmpi`.
* `name` is a unique name given to the Docker container instance. This is
required for Multi-Instance tasks.
* `command` should contain the `mpirun` command. If using the sample

Просмотреть файл

@ -5,7 +5,7 @@
},
"global_resources": {
"docker_images": [
"alfpark/openfoam:v1606plus-openmpi"
"alfpark/openfoam:v1606plus-gcc-openmpi"
],
"docker_volumes": {
"shared_data_volumes": {

Просмотреть файл

@ -5,7 +5,7 @@
"multi_instance_auto_complete": true,
"tasks": [
{
"image": "alfpark/openfoam:v1606plus-openmpi",
"image": "alfpark/openfoam:v1606plus-gcc-openmpi",
"name": "openfoam",
"remove_container_after_exit": true,
"shared_data_volumes": [

Просмотреть файл

@ -65,9 +65,10 @@ This NAMD-TCP recipe contains information on how to Dockerize distributed
[NAMD](http://www.ks.uiuc.edu/Research/namd/) across multiple Azure Batch
compute nodes using TCP.
### OpenFOAM-Infiniband-IntelMPI
TBC.
[OpenFoam](http://www.openfoam.com/)
### [OpenFOAM-Infiniband-IntelMPI](./OpenFOAM-Infiniband-IntelMPI)
This OpenFOAM-Infiniband-IntelMPI recipe contains information on how to
Dockerized distributed [OpenFoam](http://www.openfoam.com/) across
Infiniband/RDMA Azure VMs with Intel MPI.
### [OpenFOAM-TCP-OpenMPI](./OpenFOAM-TCP-OpenMPI)
This OpenFOAM-TCP-OpenMPI recipe contains information on how to Dockerized

Просмотреть файл

@ -36,6 +36,7 @@ if [ $AZ_BATCH_IS_CURRENT_NODE_MASTER == "true" ]; then
done
set -e
echo "$numpeers joined peering"
# delay to wait for peers to connect
sleep 5
# create volume
echo "creating gv0 ($bricks)"
@ -52,6 +53,8 @@ while :
do
gluster volume info gv0
if [ $? -eq 0 ]; then
# delay to wait for subvolumes
sleep 5
break
fi
sleep 1
@ -59,14 +62,32 @@ done
set -e
# add gv0 to /etc/fstab for auto-mount on reboot
mountpoint=$1/gluster/gv0
mountpoint=$AZ_BATCH_NODE_SHARED_DIR/.gluster/gv0
mkdir -p $mountpoint
echo "adding $mountpoint to fstab"
echo "$ipaddress:/gv0 $mountpoint glusterfs defaults,_netdev 0 0" >> /etc/fstab
# mount it
echo "mounting $mountpoint"
mount $mountpoint
START=$(date -u +"%s")
set +e
while :
do
mount $mountpoint
if [ $? -eq 0 ]; then
break
else
NOW=$(date -u +"%s")
DIFF=$((($NOW-$START)/60))
# fail after 5 minutes of attempts
if [ $DIFF -ge 5 ]; then
echo "could not mount gluster volume: $mountpoint"
exit 1
fi
sleep 1
fi
done
set -e
# touch file noting success
touch .glusterfs_success

Просмотреть файл

@ -774,8 +774,9 @@ def _setup_glusterfs(batch_client, blob_client, config, nodes):
pool_id, node.id,
('workitems/{}/job-1/gluster-setup/wd/'
'.glusterfs_success').format(job_id))
except batchmodels.BatchErrorException as ex:
logger.exception(ex)
except batchmodels.BatchErrorException:
logger.error('gluster success file absent on node {}'.format(
node.id))
success = False
break
# delete job
@ -1452,19 +1453,14 @@ def add_jobs(batch_client, blob_client, config):
else:
if (shared_data_volumes is not None and
len(shared_data_volumes) > 0):
# get pool spec for gluster mount paths
if (config['pool_specification']['offer'].lower() ==
'ubuntuserver'):
gfspath = '/mnt/gluster/gv0'
else:
gfspath = '/mnt/resource/gluster/gv0'
for key in shared_data_volumes:
dvspec = config[
'global_resources']['docker_volumes'][
'shared_data_volumes'][key]
if dvspec['volume_driver'] == 'glusterfs':
run_opts.append('-v {}:{}'.format(
gfspath, dvspec['container_path']))
'$AZ_BATCH_NODE_SHARED_DIR/.gluster/gv0',
dvspec['container_path']))
else:
run_opts.append('-v {}:{}'.format(
key, dvspec['container_path']))