batch-shipyard/recipes/HPCG-Infiniband-IntelMPI
Fred Park 7f2200a31d
Update recipes to refer to platform image docs
- Resolves #186
2018-04-18 12:35:26 -07:00
..
config Update recipes SSH username 2017-11-13 09:25:20 -08:00
docker Add HPCG benchmark recipe 2016-11-02 08:47:39 -07:00
README.md Update recipes to refer to platform image docs 2018-04-18 12:35:26 -07:00

README.md

HPCG-Infiniband-IntelMPI

This recipe shows how to run the High Performance Conjugate Gradients HPCG benchmark on Linux using Intel MPI over Infiniband/RDMA Azure VM instances in an Azure Batch compute pool. Execution of this distributed workload requires the use of multi-instance tasks.

Execution under both Docker and Singularity are shown in this recipe.

Configuration

Please see refer to the set of sample configuration files for this recipe. The directory docker will contain the Docker-based execution while the singularity directory will contain the Singularity-based execution configuration.

Pool Configuration

The pool configuration should enable the following properties:

  • vm_size should be a CPU-only RDMA-enabled instance.
  • vm_configuration is the VM configuration. Please select an appropriate platform_image with IB/RDMA as supported by Batch Shipyard.
  • inter_node_communication_enabled must be set to true
  • max_tasks_per_node must be set to 1 or omitted

Global Configuration

Docker-based

The global configuration should set the following properties:

  • docker_images array must have a reference to a valid HPCG image that can be run with Intel MPI and Infiniband in a Docker container context on Azure VM instances. This can be alfpark/linpack:cpu-intel-mkl which is published on Docker Hub. HPCG is included in the Linpack image.

Singularity-based

The global configuration should set the following properties:

  • singularity_images array must have a reference to a valid HPCG image that can be run with Intel MPI and Infiniband. This can be shub://alfpark/linpack which is published on Singularity Hub.

Jobs Configuration

Docker-based

The jobs configuration should set the following properties within the tasks array which should have a task definition containing:

  • docker_image should be the name of the Docker image for this container invocation. For this example, this can be alfpark/linpack:cpu-intel-mkl.
  • command should contain the mpirun command. If using the sample run_hpcg.sh script then the command can be: /sw/run_hpcg.sh -n <problem size> -t <run time>. -n <problem size> should be selected such that the problem is large enough to fit in available memory. The run_hpcg.sh script has many configuration parameters:
    • -2: Use the AVX2 optimized version of the benchmark. Specify this option for H-series VMs.
    • -n <problem size>: nx, ny and nz are set to this value
    • -t <run time>: limit execution time to specified seconds. Official runs must be at least 1800 seconds (30 min).
    • -x <nx>: set nx to this value
    • -y <ny>: set ny to this value
    • -z <nz>: set nz to this value
  • infiniband can be set to true, however, it is implicitly enabled by Batch Shipyard when executing on a RDMA-enabled compute pool.
  • multi_instance property must be defined
    • num_instances should be set to pool_specification_vm_count_dedicated, pool_vm_count_low_priority, pool_current_dedicated, or pool_current_low_priority
    • coordination_command should be unset or null. For pools with native container support, this command should be supplied if a non-standard sshd is required.

Singularity-based

The jobs configuration should set the following properties within the tasks array which should have a task definition containing:

  • singularity_image should be the name of the Singularity image for this container invocation. For this example, this should be shub://alfpark/linpack.
  • command should contain the mpirun command. If using the sample run_hpcg.sh script then the command can be: /sw/run_hpcg.sh -n <problem size> -t <run time>. -n <problem size> should be selected such that the problem is large enough to fit in available memory. The run_hpcg.sh script has many configuration parameters:
    • -2: Use the AVX2 optimized version of the benchmark. Specify this option for H-series VMs.
    • -n <problem size>: nx, ny and nz are set to this value
    • -t <run time>: limit execution time to specified seconds. Official runs must be at least 1800 seconds (30 min).
    • -x <nx>: set nx to this value
    • -y <ny>: set ny to this value
    • -z <nz>: set nz to this value
  • infiniband can be set to true, however, it is implicitly enabled by Batch Shipyard when executing on a RDMA-enabled compute pool.
  • multi_instance property must be defined
    • num_instances should be set to pool_specification_vm_count_dedicated, pool_vm_count_low_priority, pool_current_dedicated, or pool_current_low_priority

Supplementary files

The Dockerfile for the Docker image can be found here. The Singularity hub build and resource files can be found here.

Please note that you must agree with the Intel Linpack License before using this Docker image.