Граф коммитов

251 Коммитов

Автор SHA1 Сообщение Дата
Fred Park 85985d6a3e
Update shared file systems installers
- Fix BeeGFS Beeond provisioning on CentOS
- Disallow Ubuntu 18.04 autoscratch due to incompatibility
- Update GlusterFS to supported versions
2019-12-13 17:47:47 +00:00
Fred Park a54f872326
Add ignore GPU warnings option 2019-12-12 20:42:47 +00:00
Fred Park 79a1a45fdf
STANDARD_NCv3 SR-IOV IB/RDMA transition 2019-11-15 23:23:03 +00:00
Fred Park be8b9e8037
Improve Singularity compatibility checks
- Allow non-container cascade to run on custom image for testing
2019-11-15 19:09:39 +00:00
Fred Park 88935f0a67
Update Docker CE to 19.03.5 2019-11-15 19:09:38 +00:00
Fred Park 2b4bb852c3
Improve Azure blob/file mount logic
- Add retries to mount commands
- Simplify through script template
2019-11-15 19:09:37 +00:00
Fred Park fb01aa7d93
Fix Gen2 ephemeral device detection 2019-11-13 03:15:14 +00:00
Fred Park 15c35d0374
Add DKMS option to nvidia installer 2019-11-13 03:15:13 +00:00
Fred Park 134262158b
Support Singularity image encryption
- Modify singularity_images global resources to support encryption
options
- Automatically bind certificates to encrypted containers when a task
executes
2019-11-13 03:15:05 +00:00
Fred Park 4e90c2cdb6
Add cryptsetup dependency for Singularity 3.4.2 2019-11-04 18:38:52 +00:00
Fred Park 074ca55a1c
Extend retries on package manager contention 2019-11-01 15:00:58 +00:00
Fred Park 07a2105e48
Fix blobfuse mount on reboot
- Add ephemeral device detection logic
- Resolves #320
2019-10-28 21:36:52 +00:00
Fred Park 9fd1f9b08c
Get CentOS 7.6 kernel sources from vault
- Unify GPU query path and add diagnostics query
2019-10-18 16:27:04 +00:00
Fred Park 4fb1ef7238
Unify Docker root dir check 2019-10-16 02:01:28 +00:00
Fred Park b460f4510c
Fix RemoteFS provisioning issues with 1 disk 2019-10-10 18:03:59 +00:00
Fred Park 3521739088
Fix remotefs bootstrap
- Samba options were being invoked for non-samba enabled clusters
2019-10-09 15:48:40 +00:00
Fred Park 43fda94278
Update drivers and dependencies
- Docker CE 19.03.2
- blobxfer to 1.9.2
- NC/ND driver to 418.87.00
2019-09-11 17:51:13 +00:00
Fred Park 1706902959
Fix non-native data transfer sequence coupling
- Non-native input_data or output_data of azure_storage type with
sequences greater than 1 would have each individual action depend upon
the success of the prior action
- Resolves #310
2019-09-05 19:29:33 +00:00
Fred Park 0d5850c8c9
Fix task termination in non-native mode
- SSH side-channel docker kill signal was not being sent as Docker tasks
were not being detected properly
- Also fix issue with pool images update not executing if block on
images is false
- Resolves #308
2019-08-30 20:59:51 +00:00
Fred Park 8b7b17f465
Fix Task Runner regressions
- Input/output data phases not correctly triggered for multi-instance
and MPI jobs
- Output data was not triggered at all
- Pre-exec triggering on native
- Resolves #301
2019-08-16 17:07:44 +00:00
Fred Park 07e86a3928
Fix Network Direct RDMA VM provisioning
- Resolves #299
2019-08-14 21:56:45 +00:00
Fred Park e9130f83f4
MCR migration
- Migrate images to Microsoft Container Registry
- Fix Shellcheck issues
- Related to #278
2019-08-14 03:23:03 +00:00
Fred Park 290209381e
Update Dependencies
- Update NVIDIA compute driver to 418.67
- Update NVIDIA grid driver to 430.30
- Update Batch Insights to 1.3.0
- Update blobxfer to 1.9.0
- Update Python dependencies
- Drop Python 3.4 support
2019-08-12 20:42:32 +00:00
Fred Park 3052e98c8b
Add MVAPICH support
- More changes for #287
- Automatically source environment modules if it exists
- Fix some typos
2019-08-12 01:58:39 +00:00
Fred Park be52a9c3b0
Various updates
- Fail VM provisioning if expected IB card is not present
- Update platform image native support
2019-08-12 01:58:28 +00:00
Fred Park b6044b3489
Update GPU support
- Update to Docker CE 19.03.1
- Use "native" Docker/containerd GPU support
- Breaking change in jobs configuration to allow arbitrary configuration
- Update docs
- Resolves #293
2019-08-08 20:36:41 +00:00
Fred Park e6709409a2
Update to Singularity 3.3.0
- Check for expected ephemeral mount point
2019-08-07 21:13:30 +00:00
Fred Park 7ae3cb9e50
Merge branch 'master' into singularity3 2019-08-05 18:28:19 +00:00
Vincent Labonté b64c3cb324 Add Infiniband support with Open MPI and MPICH (#297)
* Add Infinibnad support with Open MPI

* Add mpiBench-Infiniband-OpenMPI recipe

* Add setup script for OpenFOAM-Infiniband-OpenMPI recipe

* Update setup script for OpenFOAM-Infiniband-OpenMPI recipe

* Add OpenFOAM-Infiniband-OpenMPI recipe

* Add documentation for recipes

* Add Infiniband support with MPICH

* Add mpiBench-Infiniband-MPICH recipe
2019-08-05 10:39:08 -04:00
Fred Park 3c376224a3
Fix GPU node provisioning
- Start task failures due to docker-ce-cli info changing output
- Pin docker-ce-cli
- Make docker root dir parsing more robust
- Fix LIS and CentOS 7.6 GPU provisioning
- Resolves #291
2019-07-24 02:55:35 +00:00
Fred Park ce0caaa24d
Add promo VM size (NC/NV/H) support 2019-07-16 16:07:03 +00:00
Fred Park 25fec92273
Support Hc/Hb
- Support RDMA bifurcation
- Update platform docs for CentOS-HPC 7.6
2019-07-15 03:32:04 +00:00
Fred Park 559463cd12
Merge branch 'develop' into sriov-merge 2019-07-09 21:45:31 +00:00
Vincent Labonté e6e60048a7 Improve MPI Interface for Intel MPI and Open MPI with Singularity images (#288)
* Add MPI config support for IntelMPI

* Separate prologue command into user and system

* Add MpiSettings

* Add MPI config support for Open MPI

* Fix MPI config support for IntelMPI

* Workaround for Open MPI btl tcp

* Correct documentation

* Fix non mpi multi instance execution

* Resolve PR comments

* Resolve PR comments

* Partially address #287
2019-07-03 12:40:54 -07:00
Fred Park b93f60213d
Support conditional output data
- Resolves #230
2019-06-24 18:03:43 +00:00
Fred Park eb3c70bbf5
Fix autoscratch setup issue 2019-06-21 19:55:27 +00:00
Vincent Labonté 9f58ad0042 Fixes for Singularity 3 support (#285)
* Fix credentials when running task with Singularity docker:// images

* Fix Singularity cache directory's ownership

* Fix images update command

* Fix running cascade with use_shipyard_docker_image

* Remove envfile dump in task runner
2019-06-21 10:36:28 -07:00
Fred Park 824f6de415
Merge branch 'develop' into singularity3
- Move username/password run options to settings for singularity
2019-06-18 20:49:38 +00:00
Fred Park bc4be6dbc3
Proxy non-native task execution via script
- Resolves #235
2019-06-18 20:11:12 +00:00
Vincent Labonté 6a0f90d509 Singularity list images and run ORAS images (#284)
* Remove unused directories

* Augment pool images list to support Singularity images

* Fix specific image update with private ORAS registries

* Add support to run ORAS image from a private registry

* Only log in used registries

* Fix checks

* Update documentation

* Resolve PR comments

* Resolve PR comments
2019-06-17 08:29:25 -07:00
Vincent Labonté 8293a20be3 Support multiple Singularity registries (#283)
* Add support for multiple singularity registries (://docker and ://oras)

* Resolve PR comments

* Resolve PR comments
2019-06-13 10:22:18 -07:00
Vincent Labonté 5307f1779d Fix image update command (#281)
* Create one log file per container mode

* Make singularity 3 work

* Minor fixes

* Fix cascade with docker image and singularity image

* Add capability to pull from library://

* Add singularity signed images to config file

* Add singularity signed images to the global resource table

* Pull and verify signed singularity images

* Put the singularity sypgp directory in the mount directory

* Add ability to provide key file to verify a singularity image

* Resolve PR comments

* Fix Singularity registry credemtials

* Extract cascade logic from nodeprep

* Re-run cascade if the image update command has no specified image

* Fix prefix errors when using shipyard docker image

* Make sure that the cascade log files are not overridden

* Fix wrong parameter name

* Clarify error message when trying to update images on Windows

* Update documentation

* Fix checks

* Resolve PR comments
2019-06-10 10:35:28 -07:00
Vincent Labonté 305d376cdc Support Singularity signed image verification (#280)
* Create one log file per container mode

* Make singularity 3 work

* Minor fixes

* Fix cascade with docker image and singularity image

* Add capability to pull from library://

* Add singularity signed images to config file

* Add singularity signed images to the global resource table

* Pull and verify signed singularity images

* Put the singularity sypgp directory in the mount directory

* Add ability to provide key file to verify a singularity image

* Resolve PR comments

* Fix Singularity registry credemtials
2019-06-05 11:14:37 -07:00
Vincent Labonté f9d0ad9a7f Initial support for Singularity 3 and SIF (#279)
* Create one log file per container mode

* Make singularity 3 work

* Minor fixes

* Fix cascade with docker image and singularity image

* Add capability to pull from library://
2019-05-29 14:15:22 -07:00
Fred Park 509834b3fb
Cascade Docker/Singularity image split 2019-05-23 21:29:24 +00:00
Vincent Labonté a68579c095 Prepare for Singularity3 work (#276)
* Remove torrent functionality

* Remove torrent storage

* Fix singularity permissions

* Add container mode in cascade.py

* Fix errors

* Fix PR comments

* Fix flake8 errors
2019-05-22 14:46:44 -07:00
Fred Park ed5a21d416
Doc updates
- Add support for CentOS 7.6 native
2019-02-28 13:48:16 -08:00
Fred Park f5cde186bf
Update Docker CE to 18.09.2
- runc CVE-2019-5736
2019-02-28 12:11:18 -08:00
Fred Park 4fa60af37a
Fix accelerated networking provisioning
- Add pool exists command
- Add recreate option to pool add
2019-02-28 12:11:18 -08:00
Fred Park 314037f76f
Slurm on Batch feature
- Package and use Slurm 18.08 instead of default from distro repo
- Slurm "master" contains separate controller and login nodes
- Integrate RemoteFS shared file system into Slurm cluster
- Auto feature tagging on Slurm nodes
- Support CentOS 7, Ubuntu 16.04, Ubuntu 18.04 Batch pools as Slurm
  node targets
- Unify login and Batch pools on cluster user based on login user
- Auto provision passwordless SSH user on compute nodes with login user
  context
- Add slurm cluster commands, including orchestrate command
- Add separate SSH for controller, login, nodes
- Add Slurm configuration doc
- Add Slurm guide
- Add Slurm recipe
- Update usage doc
- Remove deprecated MSI VM extension from monitoring and federation
- Fix pool nodes count on non-existent pool
- Refactor SSH info to allow offsets
- Add fs cluster orchestrate command
2019-02-28 12:11:10 -08:00