Граф коммитов

42 Коммитов

Автор SHA1 Сообщение Дата
vgamayunov cece6c7fb6
bumped SLURM version to 21.08.2 (#544)
* changed default SLURM version to 21.08.2
* added necessary rpms and modified slurm.conf
* added config parameter for SLURM version
2021-10-22 15:12:12 +01:00
Xavier Pillons 2616d6dfed use common group variables 2020-10-19 13:07:39 +02:00
Davide Vanzo 9c8ba5db79
Disabled ulimits propagation to nodes for IB
By default Slurm propagates to the job environment the same ulimits in place on the node where the job has been submitted. 
If this is a VM with a non-HPC image, max locked memory is set to 64 KB. This would prevent IB to pin enough memory for RDMA in the compute node.
2020-09-30 09:55:09 -05:00
Davide Vanzo c22b9f114a
Added Slurm cluster usage in README 2020-09-29 17:08:33 -05:00
Davide Vanzo 8323703b65 Correctly handle single instance or invalid value 2020-09-17 18:11:18 -05:00
Davide Vanzo ed57edc0d1 Use actual memory per CPU as partition default mem 2020-09-17 17:51:49 -05:00
Paul Edwards 156c0378cd fixed path for skus.lst 2020-09-17 17:58:00 +01:00
Davide Vanzo 3eb0addda2 Corrected typo after file rename 2020-09-17 09:41:43 -05:00
Davide Vanzo 85034da018
Merge pull request #359 from vanzod/slurm_multisku
Added multiple SKU support and node oversubscription to Slurm Autoscale
2020-09-17 08:55:52 -05:00
Davide Vanzo 81ec00c276 Used relative path for skus.lst 2020-09-16 13:33:35 -05:00
Xavier Pillons 9f37acb62d added weekly build 2020-09-15 15:42:40 +02:00
Davide Vanzo 3bb2568403 Added hyphen to node names
This prevents node index confusion when partition name ends with number
2020-09-14 18:20:03 -05:00
Davide Vanzo 479b7113da Added explicit availability set 2020-09-14 18:06:12 -05:00
Davide Vanzo 905e82ae6c Merge branch 'master' into slurm_multisku 2020-09-11 17:37:51 -05:00
Xavier Pillons 480e9dfdfe add pyazhpc dependency 2020-09-11 12:51:41 +02:00
Davide Vanzo 8d1a7ae7a1 Updated readme 2020-09-10 10:56:26 -05:00
Davide Vanzo dc96b69b7e Added os_storage_sku for uniformity 2020-09-10 10:56:17 -05:00
Davide Vanzo 9bbe0b4b56 Pointed back to AzureHPC main repo 2020-09-10 10:29:36 -05:00
Davide Vanzo 157841381f Added partition oversubscription 2020-09-10 10:26:06 -05:00
Davide Vanzo 17c684c436 Corrected field index 2020-09-09 15:44:14 -05:00
Davide Vanzo aef7f2ebc1 Get number of sockets from lookup table 2020-09-09 15:40:22 -05:00
Davide Vanzo 2aa0bcf30a Updated SKUs lookup table 2020-09-09 15:36:51 -05:00
Davide Vanzo 9d41aa3f79 Allocate only physical cores 2020-09-09 15:35:18 -05:00
Davide Vanzo f5c560722f Generalized configuration 2020-09-08 17:28:40 -05:00
Davide Vanzo 681bcd3dbe Added cgroup resources control 2020-09-08 16:28:55 -05:00
Davide Vanzo eda4ac79ea Added required standard OS disk 2020-09-08 15:55:48 -05:00
Davide Vanzo fb0e9ff053 Use lookup table for memory 2020-09-08 10:23:20 -05:00
Davide Vanzo 2aba75e804 Temporarily pull from dev branch 2020-09-08 10:22:08 -05:00
Davide Vanzo 1a33cb7f6e Added explicit accelerated networking 2020-09-08 10:21:04 -05:00
Davide Vanzo ff1cd9ff8f Fixed typo 2020-09-04 10:11:54 -05:00
Davide Vanzo bf9f6cdee2 Improved error message 2020-09-03 16:58:46 -05:00
Davide Vanzo f24a779ce2 Use lookup table for memory instead of Azure CLI 2020-09-03 15:53:35 -05:00
Davide Vanzo bde19ca0a1 Added Azure CLI login 2020-09-03 15:52:45 -05:00
Davide Vanzo 823079c3da Removed temporary file paths 2020-09-03 07:58:29 -05:00
Davide Vanzo 9b07a7b852 Bash does not like Python syntax... 2020-09-02 16:07:54 -05:00
Davide Vanzo a0d0cf7b9a Added support for multiple sku autoscale 2020-09-02 14:50:04 -05:00
Xavier Pillons a2186607ea retrieve properties dynamically 2020-05-11 19:23:03 +02:00
Xavier Pillons 7a3478d6d2 fix wrong script_remote_dest value 2020-05-05 18:52:54 +02:00
Xavier Pillons 3bfd0e6386 added badge 2020-05-05 16:36:42 +02:00
Xavier Pillons 756038288f added new pipelines 2020-05-05 16:29:00 +02:00
Paul Edwards cce7ff534e updated with doc 2020-04-29 05:02:58 +01:00
Paul Edwards 087fb2ea74 Renamed example 2020-04-29 04:56:35 +01:00