Add OpenFOAM-Infiniband-IntelMPI recipe
- Add real NAMD-Infiniband-IntelMPI image - GlusterFS mountpoint now inside AZ_BATCH_NODE_SHARED_DIR
This commit is contained in:
Родитель
2ac48b846d
Коммит
be36e2face
|
@ -2,7 +2,12 @@
|
|||
|
||||
## [Unreleased]
|
||||
### Added
|
||||
- NAMD-GPU recipe
|
||||
- NAMD-GPU, OpenFOAM-Infiniband-IntelMPI recipe
|
||||
|
||||
### Changed
|
||||
- GlusterFS mountpoint is now within `$AZ_BATCH_NODE_SHARED_DIR` so files can
|
||||
be viewed/downloaded with Batch APIs
|
||||
- NAMD-Infiniband-IntelMPI recipe now contains a real Docker image link
|
||||
|
||||
## [1.0.0] - 2016-09-22
|
||||
### Added
|
||||
|
|
|
@ -19,7 +19,7 @@ a large number of VMs via private peer-to-peer distribution of Docker images
|
|||
among the compute nodes
|
||||
* Automated Docker Private Registry instance creation on compute nodes with
|
||||
Docker images backed to Azure Storage if specified
|
||||
* Automatic shared data volume support for:
|
||||
* Automatic shared data volume support:
|
||||
* [Azure File Docker Volume Driver](https://github.com/Azure/azurefile-dockervolumedriver)
|
||||
installation and share setup for SMB/CIFS backed to Azure Storage if
|
||||
specified
|
||||
|
@ -40,9 +40,9 @@ on [Azure N-Series VM instances](https://azure.microsoft.com/en-us/blog/azure-n-
|
|||
cluster applications on compute pools with automatic job cleanup
|
||||
* Transparent assist for running Docker containers utilizing Infiniband/RDMA
|
||||
for MPI on HPC low-latency Azure VM instances:
|
||||
* A-Series: STANDARD\_A8, STANDARD\_A9 ([info](https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-a8-a9-a10-a11-specs/))
|
||||
* H-Series: STANDARD\_H16R, STANDARD\_H16MR
|
||||
* N-Series: STANDARD\_NC24R
|
||||
* [A-Series](https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-a8-a9-a10-a11-specs/): STANDARD\_A8, STANDARD\_A9
|
||||
* [H-Series](https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-sizes/#h-series): STANDARD\_H16R, STANDARD\_H16MR
|
||||
* [N-Series](https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-sizes/#n-series-preview): STANDARD\_NC24R
|
||||
* Automatic setup of SSH tunneling to Docker Hosts on compute nodes if
|
||||
specified
|
||||
|
||||
|
|
|
@ -11,7 +11,7 @@
|
|||
"shared_data_volumes": [
|
||||
"glustervol"
|
||||
],
|
||||
"command": "mpirun --allow-run-as-root --host $AZ_BATCH_HOST_LIST,$AZ_BATCH_HOST_LIST,$AZ_BATCH_HOST_LIST,$AZ_BATCH_HOST_LIST --mca btl_tcp_if_exclude docker0 /bin/bash -c \"export LD_LIBRARY_PATH=/usr/local/openblas/lib:/usr/local/nvidia/lib64 && cp -r /cntk/Examples/Other/Simple2d/* . && /cntk/build/gpu/release/bin/cntk configFile=Config/Multigpu.cntk RootDir=. OutputDir=$AZ_BATCH_NODE_SHARED_DIR/azurefileshare/Output parallelTrain=true\"",
|
||||
"command": "mpirun --allow-run-as-root --host $AZ_BATCH_HOST_LIST,$AZ_BATCH_HOST_LIST,$AZ_BATCH_HOST_LIST,$AZ_BATCH_HOST_LIST --mca btl_tcp_if_exclude docker0 /bin/bash -c \"export LD_LIBRARY_PATH=/usr/local/openblas/lib:/usr/local/nvidia/lib64 && cp -r /cntk/Examples/Other/Simple2d/* . && /cntk/build/gpu/release/bin/cntk configFile=Config/Multigpu.cntk RootDir=. OutputDir=$AZ_BATCH_NODE_SHARED_DIR/gfs/Output parallelTrain=true\"",
|
||||
"multi_instance": {
|
||||
"num_instances": "pool_specification_vm_count",
|
||||
"coordination_command": null
|
||||
|
|
|
@ -1,20 +1,18 @@
|
|||
# NAMD-Infiniband
|
||||
# NAMD-Infiniband-IntelMPI
|
||||
This recipe shows how to run [NAMD](http://www.ks.uiuc.edu/Research/namd/)
|
||||
2.10 on Linux using the Intel MPI libraries over Infiniband/RDMA Azure VM
|
||||
on Linux using the Intel MPI libraries over Infiniband/RDMA Azure VM
|
||||
instances in an Azure Batch compute pool. Execution of this distributed
|
||||
workload requires the use of
|
||||
[multi-instance tasks](../docs/80-batch-shipyard-multi-instance-tasks.md).
|
||||
|
||||
Interested in an TCP/IP-enabled version of NAMD for use with Batch Shipyard
|
||||
instead? Visit [this recipe](../NAMD-TCP).
|
||||
|
||||
## Configuration
|
||||
Please see refer to this [set of sample configuration files](./config) for
|
||||
this recipe.
|
||||
|
||||
### Pool Configuration
|
||||
The pool configuration should enable the following properties:
|
||||
* `vm_size` must be either `STANDARD_A8` or `STANDARD_A9`
|
||||
* `vm_size` must be either `STANDARD_A8`, `STANDARD_A9`, `STANDARD_H16R`,
|
||||
`STANDARD_H16MR`
|
||||
* `inter_node_communication_enabled` must be set to `true`
|
||||
* `max_tasks_per_node` must be set to 1 or omitted
|
||||
* `publisher` should be `OpenLogic`. `SUSE` will be supported in a future
|
||||
|
@ -26,13 +24,16 @@ supported by the Azure Batch service.
|
|||
|
||||
### Global Configuration
|
||||
The global configuration should set the following properties:
|
||||
* `docker_images` array must have a reference to a valid NAMD-Infiniband
|
||||
image compiled against Intel MPI.
|
||||
* `docker_images` array must have a reference to a valid
|
||||
NAMD-Infiniband-IntelMPI image compiled against Intel MPI. This
|
||||
can be `alfpark/namd:2.11-icc-mkl-intelmpi` which is published on
|
||||
[Docker Hub](https://hub.docker.com/r/alfpark/namd/).
|
||||
|
||||
### Jobs Configuration
|
||||
The jobs configuration should set the following properties within the `tasks`
|
||||
array which should have a task definition containing:
|
||||
* `image` should be the name of the Docker image for this container invocation.
|
||||
* `image` should be the name of the Docker image for this container invocation,
|
||||
e.g., `alfpark/namd:2.11-icc-mkl-intelmpi`
|
||||
* `name` is a unique name given to the Docker container instance. This is
|
||||
required for Multi-Instance tasks.
|
||||
* `command` should contain the `mpirun` command. If using the sample
|
||||
|
|
|
@ -4,7 +4,7 @@ FROM centos:7.1.1503
|
|||
MAINTAINER Fred Park <https://github.com/Azure/batch-shipyard>
|
||||
|
||||
# set up base
|
||||
COPY ssh_config /root/.ssh/
|
||||
COPY ssh_config /root/.ssh/config
|
||||
RUN yum swap -y fakesystemd systemd \
|
||||
&& yum install -y openssh-clients openssh-server net-tools libmlx4 librdmacm libibverbs dapl rdma \
|
||||
&& yum clean all \
|
||||
|
@ -15,20 +15,17 @@ RUN yum swap -y fakesystemd systemd \
|
|||
&& sed -i 's/#RSAAuthentication yes/RSAAuthentication yes/g' /etc/ssh/sshd_config \
|
||||
&& sed -i 's/#PubkeyAuthentication yes/PubkeyAuthentication yes/g' /etc/ssh/sshd_config \
|
||||
&& ssh-keygen -f /root/.ssh/id_rsa -t rsa -N '' \
|
||||
&& chmod 600 /root/.ssh/ssh_config \
|
||||
&& chmod 600 /root/.ssh/config \
|
||||
&& chmod 700 /root/.ssh \
|
||||
&& cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys \
|
||||
&& mv /root/.ssh/ssh_config /root/.ssh/config
|
||||
&& cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
|
||||
|
||||
# add software
|
||||
ADD NAMD_2.10_Linux-x86_64-MPI-icc-mkl.tar.gz /sw
|
||||
ADD NAMD_2.11_Linux-x86_64-MPI-icc-mkl.tar.gz /sw
|
||||
ADD apoa1.tar.gz stmv.tar.gz /sw/namd/
|
||||
COPY run_namd.sh /sw/
|
||||
|
||||
# export environment
|
||||
ENV NAMD_DIR=/sw/namd NAMD_SCRIPT=/sw/run_namd.sh
|
||||
# intel mpi infiniband vars will be automatically set by Batch Shipyard
|
||||
#ENV I_MPI_FABRICS=shm:dapl I_MPI_DAPL_PROVIDER=ofa-v2-ib0 I_MPI_DYNAMIC_CONNECTION=0 MANPATH=/usr/share/man:/usr/local/man
|
||||
|
||||
# set up sshd on port 23
|
||||
EXPOSE 23
|
||||
|
|
|
@ -19,6 +19,6 @@ nodes=${#HOSTS[@]}
|
|||
np=$(($nodes * $ppn))
|
||||
|
||||
# execute NAMD
|
||||
source /opt/intel/compilers_and_libraries/linux/mpi/bin64/mpivars.sh
|
||||
echo "Executing namd on $np processors (ppn=$ppn)..."
|
||||
source $MPIVARS_SCRIPT
|
||||
mpirun -np $np -ppn $ppn -hosts $AZ_BATCH_HOST_LIST $NAMD_DIR/namd2 $1.namd
|
||||
|
|
|
@ -1,15 +1,12 @@
|
|||
# NAMD-TCP
|
||||
This recipe shows how to run [NAMD](http://www.ks.uiuc.edu/Research/namd/)
|
||||
2.10 on Linux using the
|
||||
on Linux using the
|
||||
[Charm++ runtime](http://charm.cs.illinois.edu/manuals/html/charm++/)
|
||||
(as opposed to pure MPI) over TCP/IP-connected machines in an Azure Batch
|
||||
compute pool. Regardless of the underlying parallel/distributed programming
|
||||
paradigm, execution of this distributed workload requires the use of
|
||||
[multi-instance tasks](../docs/80-batch-shipyard-multi-instance-tasks.md).
|
||||
|
||||
Interested in an Infiniband-enabled version of NAMD for use with Batch
|
||||
Shipyard? Visit [this recipe](../NAMD-Infiniband-IntelMPI).
|
||||
|
||||
## Configuration
|
||||
Please see refer to this [set of sample configuration files](./config) for
|
||||
this recipe.
|
||||
|
@ -22,14 +19,14 @@ The pool configuration should enable the following properties:
|
|||
### Global Configuration
|
||||
The global configuration should set the following properties:
|
||||
* `docker_images` array must have a reference to the NAMD-TCP image. This
|
||||
can be `alfpark/namd:2.10-tcp` which is published on
|
||||
can be `alfpark/namd:2.11-tcp` which is published on
|
||||
[Docker Hub](https://hub.docker.com/r/alfpark/namd/).
|
||||
|
||||
### Jobs Configuration
|
||||
The jobs configuration should set the following properties within the `tasks`
|
||||
array which should have a task definition containing:
|
||||
* `image` should be the name of the Docker image for this container invocation,
|
||||
e.g., `alfpark/namd:2.10-tcp`
|
||||
e.g., `alfpark/namd:2.11-tcp`
|
||||
* `name` is a unique name given to the Docker container instance. This is
|
||||
required for Multi-Instance tasks.
|
||||
* `command` should contain the `mpirun` command. If using the sample NAMD-TCP
|
||||
|
|
|
@ -5,7 +5,7 @@
|
|||
},
|
||||
"global_resources": {
|
||||
"docker_images": [
|
||||
"alfpark/namd:2.10-tcp"
|
||||
"alfpark/namd:2.11-tcp"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
|
|
@ -5,7 +5,7 @@
|
|||
"multi_instance_auto_complete": true,
|
||||
"tasks": [
|
||||
{
|
||||
"image": "alfpark/namd:2.10-tcp",
|
||||
"image": "alfpark/namd:2.11-tcp",
|
||||
"name": "namd",
|
||||
"remove_container_after_exit": true,
|
||||
"command": "/sw/run_namd.sh apoa1 100",
|
||||
|
|
|
@ -4,7 +4,7 @@ FROM centos:7.1.1503
|
|||
MAINTAINER Fred Park <https://github.com/Azure/batch-shipyard>
|
||||
|
||||
# set up base and ssh keys
|
||||
COPY ssh_config /root/.ssh/
|
||||
COPY ssh_config /root/.ssh/config
|
||||
RUN yum swap -y fakesystemd systemd \
|
||||
&& yum install -y openssh-clients openssh-server net-tools \
|
||||
&& yum clean all \
|
||||
|
@ -15,13 +15,12 @@ RUN yum swap -y fakesystemd systemd \
|
|||
&& sed -i 's/#RSAAuthentication yes/RSAAuthentication yes/g' /etc/ssh/sshd_config \
|
||||
&& sed -i 's/#PubkeyAuthentication yes/PubkeyAuthentication yes/g' /etc/ssh/sshd_config \
|
||||
&& ssh-keygen -f /root/.ssh/id_rsa -t rsa -N '' \
|
||||
&& chmod 600 /root/.ssh/ssh_config \
|
||||
&& chmod 600 /root/.ssh/config \
|
||||
&& chmod 700 /root/.ssh \
|
||||
&& cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys \
|
||||
&& mv /root/.ssh/ssh_config /root/.ssh/config
|
||||
&& cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
|
||||
|
||||
# export environment
|
||||
ENV NAMD_VER=NAMD_2.10_Linux-x86_64-TCP
|
||||
ENV NAMD_VER=NAMD_2.11_Linux-x86_64-TCP
|
||||
ENV NAMD_DIR=/sw/$NAMD_VER NAMD_SCRIPT=/sw/run_namd.sh
|
||||
|
||||
# add software
|
||||
|
|
|
@ -0,0 +1,60 @@
|
|||
# OpenFOAM-Infiniband-IntelMPI
|
||||
This recipe shows how to run [OpenFoam](http://www.openfoam.com/)
|
||||
on Linux using Intel MPI over Infiniband/RDMA Azure VM instances in an Azure
|
||||
Batch compute pool. Execution of this distributed workload requires the use of
|
||||
[multi-instance tasks](../docs/80-batch-shipyard-multi-instance-tasks.md).
|
||||
|
||||
## Configuration
|
||||
Please see refer to this [set of sample configuration files](./config) for
|
||||
this recipe.
|
||||
|
||||
### Pool Configuration
|
||||
The pool configuration should enable the following properties:
|
||||
* `vm_size` must be either `STANDARD_A8`, `STANDARD_A9`, `STANDARD_H16R`,
|
||||
`STANDARD_H16MR`
|
||||
* `inter_node_communication_enabled` must be set to `true`
|
||||
* `max_tasks_per_node` must be set to 1 or omitted
|
||||
* `publisher` should be `OpenLogic`. `SUSE` will be supported in a future
|
||||
version of Batch Shipyard.
|
||||
* `offer` should be `CentOS-HPC`. `SLES-HPC` will be supported in a future
|
||||
version of Batch Shipyard.
|
||||
* `sku` should be `7.1` for the current latest RDMA-enabled CentOS-HPC sku
|
||||
supported by the Azure Batch service.
|
||||
|
||||
### Global Configuration
|
||||
The global configuration should set the following properties:
|
||||
* `docker_images` array must have a reference to a valid OpenFOAM image
|
||||
that can be run with Intel MPI and Infiniband in a Docker container context
|
||||
on Azure VM instances. This can be
|
||||
`alfpark/openfoam:v1606plus-icc-intelmpi` which is published on
|
||||
[Docker Hub](https://hub.docker.com/r/alfpark/openfoam).
|
||||
* `docker_volumes` must be populated with the following:
|
||||
* `shared_data_volumes` should contain an Azure File Docker volume driver,
|
||||
a GlusterFS share or a manually configured NFS share. Batch
|
||||
Shipyard has automatic support for setting up Azure File Docker Volumes
|
||||
and GlusterFS, please refer to the
|
||||
[Batch Shipyard Configuration doc](../../docs/10-batch-shipyard-configuration.md).
|
||||
|
||||
### Jobs Configuration
|
||||
The jobs configuration should set the following properties within the `tasks`
|
||||
array which should have a task definition containing:
|
||||
* `image` should be the name of the Docker image for this container invocation.
|
||||
For this example, this should be `alfpark/openfoam:v1606plus-icc-intelmpi`.
|
||||
* `name` is a unique name given to the Docker container instance. This is
|
||||
required for Multi-Instance tasks.
|
||||
* `command` should contain the `mpirun` command. If using the sample
|
||||
`run_sample.sh` script then the command should be simply:
|
||||
`/opt/OpenFOAM/run_sample.sh`
|
||||
* `shared_data_volumes` should have a valid volume name as defined in the
|
||||
global configuration file. Please see the previous section for details.
|
||||
* `multi_instance` property must be defined
|
||||
* `num_instances` should be set to `pool_specification_vm_count` or
|
||||
`pool_current_dedicated`
|
||||
* `coordination_command` should be unset or `null`
|
||||
* `resource_files` array can be empty
|
||||
|
||||
## Dockerfile and supplementary files
|
||||
The `Dockerfile` for the Docker image can be found [here](./docker). Please
|
||||
note that you must agree with the
|
||||
[OpenFOAM license](http://openfoam.org/licence/) before using this Docker
|
||||
image.
|
|
@ -0,0 +1,19 @@
|
|||
{
|
||||
"batch_shipyard": {
|
||||
"storage_account_settings": "<storage account specified in credentials.json>",
|
||||
"storage_entity_prefix": "shipyard"
|
||||
},
|
||||
"global_resources": {
|
||||
"docker_images": [
|
||||
"alfpark/openfoam:v1606plus-icc-intelmpi"
|
||||
],
|
||||
"docker_volumes": {
|
||||
"shared_data_volumes": {
|
||||
"glustervol": {
|
||||
"volume_driver": "glusterfs",
|
||||
"container_path": "$AZ_BATCH_NODE_SHARED_DIR/gfs"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,16 @@
|
|||
{
|
||||
"credentials": {
|
||||
"batch": {
|
||||
"account": "<batch account name>",
|
||||
"account_key": "<batch account key>",
|
||||
"account_service_url": "<batch account service url>"
|
||||
},
|
||||
"storage": {
|
||||
"mystorageaccount": {
|
||||
"account": "<storage account name>",
|
||||
"account_key": "<storage account key>",
|
||||
"endpoint": "core.windows.net"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
|
@ -0,0 +1,24 @@
|
|||
{
|
||||
"job_specifications": [
|
||||
{
|
||||
"id": "openfoamjob",
|
||||
"multi_instance_auto_complete": true,
|
||||
"tasks": [
|
||||
{
|
||||
"image": "alfpark/openfoam:v1606plus-icc-intelmpi",
|
||||
"name": "openfoam",
|
||||
"remove_container_after_exit": true,
|
||||
"shared_data_volumes": [
|
||||
"glustervol"
|
||||
],
|
||||
"command": "/opt/OpenFOAM/run_sample.sh",
|
||||
"infiniband": true,
|
||||
"multi_instance": {
|
||||
"num_instances": "pool_specification_vm_count",
|
||||
"coordination_command": null
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
|
@ -0,0 +1,17 @@
|
|||
{
|
||||
"pool_specification": {
|
||||
"id": "docker-openfoam-rdma",
|
||||
"vm_size": "STANDARD_A9",
|
||||
"vm_count": 2,
|
||||
"inter_node_communication_enabled": true,
|
||||
"publisher": "OpenLogic",
|
||||
"offer": "CentOS-HPC",
|
||||
"sku": "7.1",
|
||||
"ssh_docker_tunnel": {
|
||||
"username": "docker",
|
||||
"generate_tunnel_script": true
|
||||
},
|
||||
"reboot_on_start_task_failed": true,
|
||||
"block_until_all_global_resources_loaded": true
|
||||
}
|
||||
}
|
|
@ -0,0 +1,45 @@
|
|||
# Dockerfile for OpenFOAM-Infiniband-IntelMPI for use with Batch Shipyard on Azure Batch
|
||||
|
||||
FROM centos:7.1.1503
|
||||
MAINTAINER Fred Park <https://github.com/Azure/batch-shipyard>
|
||||
|
||||
# set up base and ssh keys
|
||||
COPY ssh_config /root/.ssh/config
|
||||
RUN yum swap -y fakesystemd systemd \
|
||||
&& yum install -y epel-release \
|
||||
&& yum install -y \
|
||||
openssh-clients openssh-server net-tools gnuplot mpfr-devel \
|
||||
qt-devel qt-assistant qt-x11 qtwebkit-devel libGLU-devel \
|
||||
libmlx4 librdmacm libibverbs dapl rdma \
|
||||
&& yum clean all \
|
||||
&& mkdir -p /var/run/sshd \
|
||||
&& ssh-keygen -A \
|
||||
&& sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config \
|
||||
&& sed -i 's/#PermitRootLogin yes/PermitRootLogin yes/g' /etc/ssh/sshd_config \
|
||||
&& sed -i 's/#RSAAuthentication yes/RSAAuthentication yes/g' /etc/ssh/sshd_config \
|
||||
&& sed -i 's/#PubkeyAuthentication yes/PubkeyAuthentication yes/g' /etc/ssh/sshd_config \
|
||||
&& ssh-keygen -f /root/.ssh/id_rsa -t rsa -N '' \
|
||||
&& chmod 600 /root/.ssh/config \
|
||||
&& chmod 700 /root/.ssh \
|
||||
&& cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
|
||||
|
||||
# add intel redistributables
|
||||
ADD l_comp_lib_2016.4.258_comp.cpp_redist.tgz l_comp_lib_2016.4.258_comp.for_redist.tgz /tmp/
|
||||
RUN cd /tmp/l_comp_lib_2016.4.258_comp.cpp_redist \
|
||||
&& ./install.sh -i /opt/intel2 -e \
|
||||
&& cd /tmp/l_comp_lib_2016.4.258_comp.for_redist \
|
||||
&& ./install.sh -i /opt/intel2 -e \
|
||||
&& rm -rf /tmp/l_comp_lib_2016.4.258_comp.cpp_redist /tmp/l_comp_lib_2016.4.258_comp.for_redist
|
||||
ENV INTELCOMPILERVARS=/opt/intel2/bin/compilervars.sh
|
||||
|
||||
# add openfoam with env vars
|
||||
ADD openfoam-v1606plus-icc-intelmpi.tar.gz /opt
|
||||
ENV OPENFOAM_VER=v1606+ FOAM_INST_DIR=/opt/OpenFOAM PATH=${PATH}:/usr/lib64/qt4/bin
|
||||
ENV OPENFOAM_DIR=${FOAM_INST_DIR}/OpenFOAM-${OPENFOAM_VER}
|
||||
|
||||
# copy sample run script
|
||||
COPY run_sample.sh ${FOAM_INST_DIR}
|
||||
|
||||
# set up sshd on port 23
|
||||
EXPOSE 23
|
||||
CMD ["/usr/sbin/sshd", "-D", "-p", "23"]
|
|
@ -0,0 +1,3 @@
|
|||
# Dockerfile for OpenFOAM-Infiniband-IntelMPI
|
||||
You must agree to the [OpenFOAM license](http://openfoam.org/licence/)
|
||||
prior to use.
|
|
@ -0,0 +1,40 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
set -e
|
||||
set -o pipefail
|
||||
|
||||
# set up mpi and set up openfoam env
|
||||
source $INTELCOMPILERVARS intel64
|
||||
source /opt/intel/compilers_and_libraries/linux/mpi/bin64/mpivars.sh
|
||||
export MPI_ROOT=$I_MPI_ROOT
|
||||
OPENFOAM_DIR=/opt/OpenFOAM/OpenFOAM-v1606+
|
||||
source $OPENFOAM_DIR/etc/bashrc
|
||||
|
||||
# copy sample into glusterfs shared area
|
||||
GFS_DIR=$AZ_BATCH_NODE_SHARED_DIR/gfs
|
||||
cd $GFS_DIR
|
||||
cp -r $OPENFOAM_DIR/tutorials/incompressible/simpleFoam/pitzDaily .
|
||||
cp $OPENFOAM_DIR/tutorials/incompressible/simpleFoam/pitzDailyExptInlet/system/decomposeParDict pitzDaily/system/
|
||||
|
||||
# get nodes and compute number of processors
|
||||
IFS=',' read -ra HOSTS <<< "$AZ_BATCH_HOST_LIST"
|
||||
nodes=${#HOSTS[@]}
|
||||
ppn=`nproc`
|
||||
np=$(($nodes * $ppn))
|
||||
|
||||
# substitute proper number of subdomains
|
||||
sed -i -e "s/^numberOfSubdomains 4/numberOfSubdomains $np;/" pitzDaily/system/decomposeParDict
|
||||
root=`python -c "import math; x=int(math.sqrt($np)); print x if x*x==$np else -1"`
|
||||
if [ $root -eq -1 ]; then
|
||||
sed -i -e "s/\s*n\s*(2 2 1)/ n ($ppn $nodes 1)/g" pitzDaily/system/decomposeParDict
|
||||
else
|
||||
sed -i -e "s/\s*n\s*(2 2 1)/ n ($root $root 1)/g" pitzDaily/system/decomposeParDict
|
||||
fi
|
||||
|
||||
# decompose
|
||||
cd pitzDaily
|
||||
blockMesh
|
||||
decomposePar -force
|
||||
|
||||
# execute mpi job
|
||||
mpirun -np $np -ppn $ppn -hosts $AZ_BATCH_HOST_LIST simpleFoam -parallel
|
|
@ -0,0 +1,4 @@
|
|||
Host *
|
||||
Port 23
|
||||
StrictHostKeyChecking no
|
||||
UserKnownHostsFile /dev/null
|
|
@ -22,7 +22,7 @@ values:
|
|||
The global configuration should set the following properties:
|
||||
* `docker_images` array must have a reference to a valid OpenFOAM image
|
||||
that can be run with MPI in a Docker container context. This can be
|
||||
`alfpark/openfoam:v1606plus-openmpi` which is published on
|
||||
`alfpark/openfoam:v1606plus-gcc-openmpi` which is published on
|
||||
[Docker Hub](https://hub.docker.com/r/alfpark/openfoam).
|
||||
* `docker_volumes` must be populated with the following:
|
||||
* `shared_data_volumes` should contain an Azure File Docker volume driver,
|
||||
|
@ -35,7 +35,7 @@ that can be run with MPI in a Docker container context. This can be
|
|||
The jobs configuration should set the following properties within the `tasks`
|
||||
array which should have a task definition containing:
|
||||
* `image` should be the name of the Docker image for this container invocation.
|
||||
For this example, this should be `alfpark/openfoam:v1606+-openmpi`.
|
||||
For this example, this should be `alfpark/openfoam:v1606plus-gcc-openmpi`.
|
||||
* `name` is a unique name given to the Docker container instance. This is
|
||||
required for Multi-Instance tasks.
|
||||
* `command` should contain the `mpirun` command. If using the sample
|
||||
|
|
|
@ -5,7 +5,7 @@
|
|||
},
|
||||
"global_resources": {
|
||||
"docker_images": [
|
||||
"alfpark/openfoam:v1606plus-openmpi"
|
||||
"alfpark/openfoam:v1606plus-gcc-openmpi"
|
||||
],
|
||||
"docker_volumes": {
|
||||
"shared_data_volumes": {
|
||||
|
|
|
@ -5,7 +5,7 @@
|
|||
"multi_instance_auto_complete": true,
|
||||
"tasks": [
|
||||
{
|
||||
"image": "alfpark/openfoam:v1606plus-openmpi",
|
||||
"image": "alfpark/openfoam:v1606plus-gcc-openmpi",
|
||||
"name": "openfoam",
|
||||
"remove_container_after_exit": true,
|
||||
"shared_data_volumes": [
|
||||
|
|
|
@ -65,9 +65,10 @@ This NAMD-TCP recipe contains information on how to Dockerize distributed
|
|||
[NAMD](http://www.ks.uiuc.edu/Research/namd/) across multiple Azure Batch
|
||||
compute nodes using TCP.
|
||||
|
||||
### OpenFOAM-Infiniband-IntelMPI
|
||||
TBC.
|
||||
[OpenFoam](http://www.openfoam.com/)
|
||||
### [OpenFOAM-Infiniband-IntelMPI](./OpenFOAM-Infiniband-IntelMPI)
|
||||
This OpenFOAM-Infiniband-IntelMPI recipe contains information on how to
|
||||
Dockerized distributed [OpenFoam](http://www.openfoam.com/) across
|
||||
Infiniband/RDMA Azure VMs with Intel MPI.
|
||||
|
||||
### [OpenFOAM-TCP-OpenMPI](./OpenFOAM-TCP-OpenMPI)
|
||||
This OpenFOAM-TCP-OpenMPI recipe contains information on how to Dockerized
|
||||
|
|
|
@ -36,6 +36,7 @@ if [ $AZ_BATCH_IS_CURRENT_NODE_MASTER == "true" ]; then
|
|||
done
|
||||
set -e
|
||||
echo "$numpeers joined peering"
|
||||
# delay to wait for peers to connect
|
||||
sleep 5
|
||||
# create volume
|
||||
echo "creating gv0 ($bricks)"
|
||||
|
@ -52,6 +53,8 @@ while :
|
|||
do
|
||||
gluster volume info gv0
|
||||
if [ $? -eq 0 ]; then
|
||||
# delay to wait for subvolumes
|
||||
sleep 5
|
||||
break
|
||||
fi
|
||||
sleep 1
|
||||
|
@ -59,14 +62,32 @@ done
|
|||
set -e
|
||||
|
||||
# add gv0 to /etc/fstab for auto-mount on reboot
|
||||
mountpoint=$1/gluster/gv0
|
||||
mountpoint=$AZ_BATCH_NODE_SHARED_DIR/.gluster/gv0
|
||||
mkdir -p $mountpoint
|
||||
echo "adding $mountpoint to fstab"
|
||||
echo "$ipaddress:/gv0 $mountpoint glusterfs defaults,_netdev 0 0" >> /etc/fstab
|
||||
|
||||
# mount it
|
||||
echo "mounting $mountpoint"
|
||||
mount $mountpoint
|
||||
START=$(date -u +"%s")
|
||||
set +e
|
||||
while :
|
||||
do
|
||||
mount $mountpoint
|
||||
if [ $? -eq 0 ]; then
|
||||
break
|
||||
else
|
||||
NOW=$(date -u +"%s")
|
||||
DIFF=$((($NOW-$START)/60))
|
||||
# fail after 5 minutes of attempts
|
||||
if [ $DIFF -ge 5 ]; then
|
||||
echo "could not mount gluster volume: $mountpoint"
|
||||
exit 1
|
||||
fi
|
||||
sleep 1
|
||||
fi
|
||||
done
|
||||
set -e
|
||||
|
||||
# touch file noting success
|
||||
touch .glusterfs_success
|
||||
|
|
14
shipyard.py
14
shipyard.py
|
@ -774,8 +774,9 @@ def _setup_glusterfs(batch_client, blob_client, config, nodes):
|
|||
pool_id, node.id,
|
||||
('workitems/{}/job-1/gluster-setup/wd/'
|
||||
'.glusterfs_success').format(job_id))
|
||||
except batchmodels.BatchErrorException as ex:
|
||||
logger.exception(ex)
|
||||
except batchmodels.BatchErrorException:
|
||||
logger.error('gluster success file absent on node {}'.format(
|
||||
node.id))
|
||||
success = False
|
||||
break
|
||||
# delete job
|
||||
|
@ -1452,19 +1453,14 @@ def add_jobs(batch_client, blob_client, config):
|
|||
else:
|
||||
if (shared_data_volumes is not None and
|
||||
len(shared_data_volumes) > 0):
|
||||
# get pool spec for gluster mount paths
|
||||
if (config['pool_specification']['offer'].lower() ==
|
||||
'ubuntuserver'):
|
||||
gfspath = '/mnt/gluster/gv0'
|
||||
else:
|
||||
gfspath = '/mnt/resource/gluster/gv0'
|
||||
for key in shared_data_volumes:
|
||||
dvspec = config[
|
||||
'global_resources']['docker_volumes'][
|
||||
'shared_data_volumes'][key]
|
||||
if dvspec['volume_driver'] == 'glusterfs':
|
||||
run_opts.append('-v {}:{}'.format(
|
||||
gfspath, dvspec['container_path']))
|
||||
'$AZ_BATCH_NODE_SHARED_DIR/.gluster/gv0',
|
||||
dvspec['container_path']))
|
||||
else:
|
||||
run_opts.append('-v {}:{}'.format(
|
||||
key, dvspec['container_path']))
|
||||
|
|
Загрузка…
Ссылка в новой задаче