Граф коммитов

495 Коммитов

Автор SHA1 Сообщение Дата
Fred Park c39a919bf7
Add more Slurm partition settings
- Partition preemption settings
- Partition other options
2019-04-02 15:59:36 -07:00
Fred Park 97dac7d5aa
Update blobxfer to 1.7.1
- Update some docs
2019-03-05 08:05:02 -08:00
Fred Park 2301e20cfc
Tag for 3.7.0 release 2019-02-28 14:03:35 -08:00
Fred Park 163f1d0cb6
Suspend and restart support for Slurm clusters 2019-02-28 13:48:20 -08:00
Fred Park ed5a21d416
Doc updates
- Add support for CentOS 7.6 native
2019-02-28 13:48:16 -08:00
Fred Park 6e8d2a119f
Component updates
- Update blobxfer to 1.7.0
- Update Batch Insights to 1.2.0
- Update LIS
- Update NV driver to 410.92
- Update NC/ND driver to 410.104
2019-02-28 12:11:19 -08:00
Fred Park 4fa60af37a
Fix accelerated networking provisioning
- Add pool exists command
- Add recreate option to pool add
2019-02-28 12:11:18 -08:00
Fred Park 314037f76f
Slurm on Batch feature
- Package and use Slurm 18.08 instead of default from distro repo
- Slurm "master" contains separate controller and login nodes
- Integrate RemoteFS shared file system into Slurm cluster
- Auto feature tagging on Slurm nodes
- Support CentOS 7, Ubuntu 16.04, Ubuntu 18.04 Batch pools as Slurm
  node targets
- Unify login and Batch pools on cluster user based on login user
- Auto provision passwordless SSH user on compute nodes with login user
  context
- Add slurm cluster commands, including orchestrate command
- Add separate SSH for controller, login, nodes
- Add Slurm configuration doc
- Add Slurm guide
- Add Slurm recipe
- Update usage doc
- Remove deprecated MSI VM extension from monitoring and federation
- Fix pool nodes count on non-existent pool
- Refactor SSH info to allow offsets
- Add fs cluster orchestrate command
2019-02-28 12:11:10 -08:00
Fred Park a30cb674ca
Migrate to Azure Batch Python SDK 6.0.0
- Fix breaking changes
- Update dependencies
- Gate some debug messages behind the verbose flag
2019-01-16 13:03:30 -08:00
Fred Park 1253ee0062
Component updates
- Update blobxfer to 1.6.0
- Update Singularity to 2.6.1
- Update Docker CE to 18.09.1
- Move monitor setup to after GPU driver installation
2019-01-16 13:03:30 -08:00
Fred Park 458c69e6a9
Improve task factory generation/submission speed
- Amortize copy cost over single deep copy and pop
- Add more feedback during generation/submission for large task
  factories and task sets
2019-01-16 13:03:29 -08:00
Fred Park 775f602e5a
Add Batch Insights integration
- Resolves #259
2019-01-16 13:03:29 -08:00
Fred Park e210e6032f
Re-organize environment variables on node prep
- Resolve #252
2019-01-16 13:03:28 -08:00
jackpimbert 30ec7ad03b
Add custom environment variable options to pool configuration (#253)
* Add env vars option to pool configuration

- Add the option to include custom environment variables in the pool
  configuration
- This allows users to setup Batch environment variables for the start
  task
- This in turn could allow for the ability to use batch insights

* Add env var spec to pool schema for validation

- CHANGELOG updated with pool env var feature

* Add a keyvault option for pool env vars

- Added a keyvault option to the pool configuration
- Updated docs, schemas and templates accordingly
- Changed schema type from `str` to `text` for env vars

* Add keyvault client check

- Add a keyvault client check to ensure a valid client is used when
  using keyvault to add env vars to the start task.
2019-01-16 13:03:28 -08:00
Fred Park f53eee7bbd
Block job submission on non-active pools
- Resolves #251
2019-01-10 13:47:08 -08:00
Fred Park 17e26f091b
Fix minor nodeid None issues 2018-12-10 10:54:54 -08:00
Fred Park 99f15879cd
Tag for 3.6.1 release 2018-12-03 09:04:08 -08:00
Fred Park eea4286724
Update dependencies
- NC/ND driver to 410.79
- NV Grid driver to 410.71 with CUDA10 support
- LIS
2018-12-03 09:03:49 -08:00
Fred Park 2adf9456ad
Fix NV provisioning 2018-11-29 09:39:34 -08:00
Fred Park 70532fa4ae
Add Genomics recipes
- BLAST and RNASeq pipelines
- Fix adding tasks to an existing job with existing merge tasks
- Add support for force_enable_task_dependencies at the job level
- Fix doc typos
2018-11-29 08:58:01 -08:00
Fred Park 387fd14d54
Fix --tail console output
- Fix occurrences where the stream would occassionally repeat characters
- Allow incremental unicode decoding of the stream
2018-11-29 08:58:01 -08:00
Fred Park 5aae4832b8
Add Windows Server 2019 support 2018-11-19 12:48:33 -08:00
Fred Park 2519f3cedd
Update dependencies 2018-11-19 11:20:08 -08:00
Fred Park 5c5feaf244
Various fixes
- fs cluster status typo
- Add delay in MSI binding for resources
- Singularity envfile naming
2018-11-19 09:54:57 -08:00
Fred Park 2ad67da15d
Tag for 3.6.0 release 2018-11-06 14:21:50 -08:00
Fred Park 342b7fc2e2
Fix various issues
- Monitoring SSH login
- Grafana update regression with Batch Shipyard Dashboard
- Federation job submission
2018-11-06 14:21:03 -08:00
Fred Park 95dec309fe
Add support for standard and ultra SSDs
- Breaking change on premium property in managed disks
- Add availability zone support
2018-11-06 10:00:25 -08:00
Fred Park 7ad6a1df05
Remove Debian 8 support 2018-11-05 11:29:02 -08:00
Fred Park 54c78b7c49
Auto scratch support 2018-11-05 11:24:22 -08:00
Fred Park 02c6e110d7
Kata containers support
- Make Singularity runtime install optional
- Add `restrict_default_bind_mounts` option to jobs spec
- Provide a default container runtime option
2018-11-05 11:24:17 -08:00
Fred Park a53bb2a044
Fix Singularity issues in latest update 2018-11-05 11:24:17 -08:00
Fred Park 49b7e48857
Fix non-public cloud SP AAD auth 2018-11-05 11:24:16 -08:00
Fred Park 62e8ebcac1
Update dependencies
- Update blobxfer to 1.5.4
- Resolves #243
2018-11-05 11:24:08 -08:00
Fred Park ab9cc70828
Update build to Python 3.7.1
- Update Windows Docker images to Python 3.7.1
- Fix flake8 errors
- Fix shellcheck errors
- Various build updates and fixes
2018-10-30 14:24:31 -07:00
Fred Park 124bb429a0
Update download location for NV driver 2018-10-01 11:58:18 -07:00
Fred Park 784393a6ce
Tag for 3.6.0b1 release 2018-09-20 12:49:12 -07:00
Fred Park 32561ae264
Support CentOS 7.5 native and native conversion 2018-09-20 09:24:29 -07:00
Fred Park 584dada9f8
Update Singularity and Alpine
- Update to 3.8, rebuild 3.7 due to CVE
- Update Singularity to 2.6.0
2018-09-20 08:48:15 -07:00
Fred Park 06ab86c655
Update various components
- Update Nvidia Tesla driver to 396.44 for NC
- Update LIS to 4.2.6
- Update prometheus and grafana
2018-09-18 13:56:27 -07:00
Fred Park b06bb20d4f
Fix autoscale scaling beyond low pri limit
- Refactor formulas
- Resolves #239
2018-09-18 13:56:27 -07:00
Fred Park 96c220df34
Update to Azure Batch 5.1.0 SDK
- Accommodate breaking changes
- Add compute node agent info
2018-09-18 13:56:26 -07:00
Fred Park 1d666ae6aa
Update dependencies
- Update blobxfer to 1.5.0
2018-09-18 13:56:26 -07:00
Fred Park 1a4ad686ef
Fix federation task id generator
- Fix list issue with empty addition timestamps or uids
- Expedite generating task ids for federation bound tasks with autogenerated
  task ids
2018-08-23 13:37:03 -07:00
Fred Park c1bbd5131d
Add count commands
- jobs tasks count and pool nodes count commands with --raw support
- Update usage doc
- Resolves #228
2018-08-09 13:07:06 -07:00
Fred Park bc7d87c397
Enhance blocked action tracking
- Track blocked actions in jobs table
- Enhance fed jobs list to list both blocked and queued actions
- Update docs
2018-08-09 09:27:38 -07:00
Fred Park 4abfaf1675
Update blobxfer to 1.4.0 2018-08-08 15:48:58 -07:00
Fred Park 6e1409c16f
Fix jobs tasks term command without pool ssh info 2018-08-08 15:48:36 -07:00
Fred Park acdea94722
Tag for 3.6.0a1 release 2018-08-06 10:35:31 -07:00
Fred Park 52628d27cf
Federation support
- Federation proxy lifecycle management
- Federation lifecycle management
- Federation job submission and management
- Mount Azure File share for auto-rotated log persistence
- FIFO within job support
- Constraint matching
- Federations can be created in "unique job id" mode requiring all
  submitted jobs via fed jobs add be unique across the entire federation
- Supports nearly 15K actions per job (in non-unique job id mode)
- Task dependency rewrite engine for federated jobs
  - Verify dependencies only within task group
  - Uniquely identify task dependencies
- Allow tuning of scheduling behavior options
- Package federation logic on proxy into Docker container
- Full guide/walkthrough for federation feature
- Refactor common code between monitor/fed proxy into resource
- Other doc updates
2018-08-06 09:30:36 -07:00
Fred Park 977c2e920b
Tag for 3.5.3 release 2018-07-31 11:04:25 -07:00