vgamayunov
cece6c7fb6
bumped SLURM version to 21.08.2 ( #544 )
...
* changed default SLURM version to 21.08.2
* added necessary rpms and modified slurm.conf
* added config parameter for SLURM version
2021-10-22 15:12:12 +01:00
Xavier Pillons
2616d6dfed
use common group variables
2020-10-19 13:07:39 +02:00
Davide Vanzo
9c8ba5db79
Disabled ulimits propagation to nodes for IB
...
By default Slurm propagates to the job environment the same ulimits in place on the node where the job has been submitted.
If this is a VM with a non-HPC image, max locked memory is set to 64 KB. This would prevent IB to pin enough memory for RDMA in the compute node.
2020-09-30 09:55:09 -05:00
Davide Vanzo
c22b9f114a
Added Slurm cluster usage in README
2020-09-29 17:08:33 -05:00
Davide Vanzo
8323703b65
Correctly handle single instance or invalid value
2020-09-17 18:11:18 -05:00
Davide Vanzo
ed57edc0d1
Use actual memory per CPU as partition default mem
2020-09-17 17:51:49 -05:00
Paul Edwards
156c0378cd
fixed path for skus.lst
2020-09-17 17:58:00 +01:00
Davide Vanzo
3eb0addda2
Corrected typo after file rename
2020-09-17 09:41:43 -05:00
Davide Vanzo
85034da018
Merge pull request #359 from vanzod/slurm_multisku
...
Added multiple SKU support and node oversubscription to Slurm Autoscale
2020-09-17 08:55:52 -05:00
Davide Vanzo
81ec00c276
Used relative path for skus.lst
2020-09-16 13:33:35 -05:00
Xavier Pillons
9f37acb62d
added weekly build
2020-09-15 15:42:40 +02:00
Davide Vanzo
3bb2568403
Added hyphen to node names
...
This prevents node index confusion when partition name ends with number
2020-09-14 18:20:03 -05:00
Davide Vanzo
479b7113da
Added explicit availability set
2020-09-14 18:06:12 -05:00
Davide Vanzo
905e82ae6c
Merge branch 'master' into slurm_multisku
2020-09-11 17:37:51 -05:00
Xavier Pillons
480e9dfdfe
add pyazhpc dependency
2020-09-11 12:51:41 +02:00
Davide Vanzo
8d1a7ae7a1
Updated readme
2020-09-10 10:56:26 -05:00
Davide Vanzo
dc96b69b7e
Added os_storage_sku for uniformity
2020-09-10 10:56:17 -05:00
Davide Vanzo
9bbe0b4b56
Pointed back to AzureHPC main repo
2020-09-10 10:29:36 -05:00
Davide Vanzo
157841381f
Added partition oversubscription
2020-09-10 10:26:06 -05:00
Davide Vanzo
17c684c436
Corrected field index
2020-09-09 15:44:14 -05:00
Davide Vanzo
aef7f2ebc1
Get number of sockets from lookup table
2020-09-09 15:40:22 -05:00
Davide Vanzo
2aa0bcf30a
Updated SKUs lookup table
2020-09-09 15:36:51 -05:00
Davide Vanzo
9d41aa3f79
Allocate only physical cores
2020-09-09 15:35:18 -05:00
Davide Vanzo
f5c560722f
Generalized configuration
2020-09-08 17:28:40 -05:00
Davide Vanzo
681bcd3dbe
Added cgroup resources control
2020-09-08 16:28:55 -05:00
Davide Vanzo
eda4ac79ea
Added required standard OS disk
2020-09-08 15:55:48 -05:00
Davide Vanzo
fb0e9ff053
Use lookup table for memory
2020-09-08 10:23:20 -05:00
Davide Vanzo
2aba75e804
Temporarily pull from dev branch
2020-09-08 10:22:08 -05:00
Davide Vanzo
1a33cb7f6e
Added explicit accelerated networking
2020-09-08 10:21:04 -05:00
Davide Vanzo
ff1cd9ff8f
Fixed typo
2020-09-04 10:11:54 -05:00
Davide Vanzo
bf9f6cdee2
Improved error message
2020-09-03 16:58:46 -05:00
Davide Vanzo
f24a779ce2
Use lookup table for memory instead of Azure CLI
2020-09-03 15:53:35 -05:00
Davide Vanzo
bde19ca0a1
Added Azure CLI login
2020-09-03 15:52:45 -05:00
Davide Vanzo
823079c3da
Removed temporary file paths
2020-09-03 07:58:29 -05:00
Davide Vanzo
9b07a7b852
Bash does not like Python syntax...
2020-09-02 16:07:54 -05:00
Davide Vanzo
a0d0cf7b9a
Added support for multiple sku autoscale
2020-09-02 14:50:04 -05:00
Xavier Pillons
a2186607ea
retrieve properties dynamically
2020-05-11 19:23:03 +02:00
Xavier Pillons
7a3478d6d2
fix wrong script_remote_dest value
2020-05-05 18:52:54 +02:00
Xavier Pillons
3bfd0e6386
added badge
2020-05-05 16:36:42 +02:00
Xavier Pillons
756038288f
added new pipelines
2020-05-05 16:29:00 +02:00
Paul Edwards
cce7ff534e
updated with doc
2020-04-29 05:02:58 +01:00
Paul Edwards
087fb2ea74
Renamed example
2020-04-29 04:56:35 +01:00