Simplify HPC and Batch workloads on Azure

azure azure-batch azure-functions batch-processing containers docker glusterfs gpu hpc infiniband mpi nfs rdma serverless singularity slurm windows-containers

Перейти к файлу

Fred Park 00f1c95b1d Update Dockerfiles to Alpine 3.10		2019-08-08 20:11:33 +00:00
.github	Fix nvidia-docker2 installation	2018-07-20 09:14:10 -07:00
.vsts	Update Alpine and Python	2019-07-15 03:32:24 +00:00
cargo	Update Dockerfiles to Alpine 3.10	2019-08-08 20:11:33 +00:00
cascade	Update Dockerfiles to Alpine 3.10	2019-08-08 20:11:33 +00:00
config_templates	Merge branch 'sriov-merge' into singularity3	2019-07-23 21:02:52 +00:00
contrib	Update docs for Shared Image Gallery support	2019-06-27 20:49:37 +00:00
convoy	Update to Singularity 3.3.0	2019-08-07 21:13:30 +00:00
docker	Update to Singularity 3.3.0	2019-08-07 21:13:30 +00:00
docs	Update to Singularity 3.3.0	2019-08-07 21:13:30 +00:00
federation	Update Dockerfiles to Alpine 3.10	2019-08-08 20:11:33 +00:00
heimdall	Update Dockerfiles to Alpine 3.10	2019-08-08 20:11:33 +00:00
recipes	Add Infiniband support with Open MPI and MPICH (#297 )	2019-08-05 10:39:08 -04:00
schemas	Merge branch 'sriov-merge' into singularity3	2019-07-23 21:02:52 +00:00
scripts	Update to Singularity 3.3.0	2019-08-07 21:13:30 +00:00
site-extension	Tag for 3.5.0b2 release	2018-06-12 14:00:25 -07:00
slurm	Update Dockerfiles to Alpine 3.10	2019-08-08 20:11:33 +00:00
.gitattributes	Add AppVeyor build	2017-08-10 10:29:22 -07:00
.gitignore	Allow CentOS 7.3 on NC/NV	2017-07-06 11:12:05 -07:00
.travis.yml	Pin flake8 to 3.6.0	2019-02-26 13:05:49 -08:00
CHANGELOG.md	Merge branch 'master' into singularity3	2019-08-05 18:28:19 +00:00
CODE_OF_CONDUCT.md	Update docs	2017-08-29 08:04:30 -07:00
CONTRIBUTING.md	Update docs	2017-08-29 08:04:30 -07:00
LICENSE	Add dummy README	2016-07-18 08:15:56 -07:00
README.md	Update to Singularity 3.3.0	2019-08-07 21:13:30 +00:00
THIRD_PARTY_NOTICES.txt	Tag for 3.7.0 release	2019-02-28 14:03:35 -08:00
appveyor.yml	Pin flake8 to 3.6.0	2019-02-26 13:05:49 -08:00
install.cmd	Update dependencies	2018-11-05 11:24:08 -08:00
install.sh	Update build to Python 3.7.1	2018-10-30 14:24:31 -07:00
mkdocs.yml	Doc updates	2019-02-28 13:48:16 -08:00
req_nodeps.txt	Update dependencies	2018-11-05 11:24:08 -08:00
requirements.txt	Update to Batch 7.0.0 SDK	2019-06-27 20:08:49 +00:00
shipyard.py	Update to Batch 7.0.0 SDK	2019-06-27 20:08:49 +00:00

README.md

Batch Shipyard

Batch Shipyard is a tool to help provision, execute, and monitor container-based batch processing and HPC workloads on Azure Batch. Batch Shipyard supports both Docker and Singularity containers. No experience with the Azure Batch SDK is needed; run your containers with easy-to-understand configuration files. All Azure regions are supported, including non-public Azure regions.

Additionally, Batch Shipyard provides the ability to provision and manage entire standalone remote file systems (storage clusters) in Azure, independent of any integrated Azure Batch functionality.

Major Features

Container Runtime and Image Management

Support for multiple container runtimes including Docker, Singularity, and Kata Containers tuned for Azure Batch compute nodes
Automated deployment of container images required for tasks to compute nodes
Transparent support for GPU-accelerated container applications on both Docker and Singularity on Azure N-Series VM instances
Support for Docker Registries including Azure Container Registry, other Internet-accessible public and private registries, and support for the Singularity Hub Container Registry

Data Management and Shared File Systems

Comprehensive data movement support: move data easily between locally accessible storage systems, remote filesystems, Azure Blob or File Storage, and compute nodes
Standalone Remote Filesystem Provisioning with integration to auto-link these filesystems to compute nodes with support for NFS and GlusterFS distributed network file system
Automatic shared data volume support for linking to Remote Filesystems, Azure File via SMB, Azure Blob via blobfuse, GlusterFS provisioned directly on compute nodes, and custom Linux mount support (fstab)
Support for automated on-demand, per-job distributed scratch space provisioning via BeeGFS BeeOND

Monitoring

Automated, integrated resource monitoring with Prometheus and Grafana for Batch pools and RemoteFS storage clusters
Support for Batch Insights

Open Source Scheduler Integration

Support for elastic cloud bursting on Slurm to Batch pools with automated RemoteFS shared file system linking

Azure Ecosystem Integration

Support for serverless execution binding with Azure Functions
Support for credential management through Azure KeyVault

Azure Batch Integration and Enhancements

Federation support: enables unified, constraint-based scheduling to collections of heterogeneous pools, including across multiple Batch accounts and Azure regions
Support for simple, scenario-based pool autoscale and autopool to dynamically scale and control computing resources on-demand
Support for Task Factories with the ability to generate tasks based on parametric (parameter) sweeps, randomized input, file enumeration, replication, and custom Python code-based generators
Support for multi-instance tasks to accommodate MPI and multi-node cluster applications packaged as Docker or Singularity containers on compute pools with automatic job completion and task termination
Transparent assist for running Docker and Singularity containers utilizing Infiniband/RDMA for MPI on HPC low-latency Azure VM instances including A-Series, H-Series, and N-Series
Seamless integration with Azure Batch job, task and file concepts along with full pass-through of the Azure Batch API to containers executed on compute nodes
Support for Azure Batch task dependencies allowing complex processing pipelines and DAGs
Support for merge or final task specification that automatically depends on all other tasks within the job
Support for job schedules and recurrences for automatic execution of tasks at set intervals
Support for live job and job schedule migration between pools
Support for Low Priority Compute Nodes
Support for deploying Batch compute nodes into a specified Virtual Network
Automatic setup of SSH or RDP users to all nodes in the compute pool and optional creation of SSH tunneling scripts to Docker Hosts on compute nodes
Support for custom host images
Support for Windows Containers on compliant Windows compute node pools with the ability to activate Azure Hybrid Use Benefit if applicable

Installation

Local Installation

Please see the installation guide for more information regarding the various local installation options and requirements.

Azure Cloud Shell

Batch Shipyard is integrated directly into Azure Cloud Shell and you can execute any Batch Shipyard workload using your web browser or the Microsoft Azure Android and iOS app.

Simply request a Cloud Shell session and type shipyard to invoke the CLI; no installation is required. Try Batch Shipyard now from your browser:

Documentation and Recipes

Please refer to the Batch Shipyard Documentation on Read the Docs.

Visit the Batch Shipyard Recipes section for various sample container workloads using Azure Batch and Batch Shipyard.

Batch Shipyard Compute Node Host OS Support

Batch Shipyard is currently compatible with popular Azure Batch supported Marketplace Linux VMs, compliant Linux custom images, and native Azure Batch Windows Server with Containers VMs. Please see the platform image support documentation for more information specific to Batch Shipyard support of compute node host operating systems.

Change Log

Please see the Change Log for project history.

Please see this project's Code of Conduct and Contributing guidelines.