Fred Park
1a17391105
Remove Ubuntu 16.04-LTS latest redirect
...
- Update to blobxfer 1.0.0rc2
2017-10-16 13:03:54 -07:00
Fred Park
607bfd252e
Migrate to storage split library
...
- Remove queue deletion code
- Resolves #133
2017-10-05 21:40:50 -07:00
Fred Park
7c87b043a0
Various updates
...
- Normalize gluster on compute mount path
- Fix gluster on compute data ingress
- Lock blobxfer verison to 1.0.0rc1
2017-10-05 13:02:30 -07:00
Fred Park
298b00d946
Mount Azure file shares to host ( #123 )
...
- Allow multiple file shares per pool
- Move root mount point for all shared data volumes
2017-10-04 17:59:30 -07:00
Fred Park
fc459e7333
Doc fixes
...
- Raise exception when docker_registry is detected in the global config
2017-10-04 13:31:48 -07:00
Fred Park
01995e97b6
Tag for 3.0.0a1 release
2017-10-04 09:26:01 -07:00
Fred Park
49a374416f
Add unusable node recovery option
2017-10-04 09:25:57 -07:00
Fred Park
796a5e33b4
Combine rjm/tfm to cargo ( #125 )
2017-10-03 18:24:50 -07:00
Fred Park
6315be3a6b
Transition to blobxfer 1.x command structure
...
- Data ingress/egress changes
- Task factory file changes
- Resolves #47
2017-10-03 18:24:49 -07:00
Fred Park
e783744e00
Container registry logic overhaul
...
- Remove private registry back to Azure storage blob support (#44 )
- Require fully qualified Docker image names (#106 )
- Support multiple public/private registries on a single pool (#127 )
2017-10-03 18:24:42 -07:00
Fred Park
dcddb04150
Fixes and log more info in start task
2017-10-03 10:05:17 -07:00
Fred Park
cbddcdfbff
Use docker_image in favor of image in tasks
2017-10-03 10:05:17 -07:00
Fred Park
60b4fc446f
Support ARM Images for custom images ( #126 )
2017-10-03 10:05:17 -07:00
Fred Park
238982db77
Add ARM VNet support in Batch service mode ( #126 )
...
- Support "global" aad property in credentials
- Add Virtual Network guide
2017-10-03 10:05:17 -07:00
Fred Park
a45f3b9a66
Multi-instance fixes for native
2017-10-03 10:04:03 -07:00
Fred Park
afdad167a8
Add native support to custom images
...
- Update TSG doc for native container support
2017-10-03 10:04:03 -07:00
Fred Park
6aac272e9e
Autoenable ib/gpu on tasks if settings allow
...
- Resolves #124
2017-10-03 10:04:03 -07:00
Fred Park
e2ddf3b750
Add YAML configuration support
...
- Resolves #122
2017-10-03 10:04:03 -07:00
Fred Park
9c700dfbd5
Fix jrtask and suppress CUDA vars for native
...
- Doc updates
- Suppress coordination/task commands that are empty
2017-10-03 10:03:20 -07:00
Fred Park
f1775c3ad7
Native container job schedule support
2017-10-03 10:03:20 -07:00
Fred Park
cccc5fa3b8
output_data to storage blob support for native
2017-10-03 10:03:20 -07:00
Fred Park
0a9689f5a8
Native container support
...
- Allow pool conversion with native flag
2017-10-03 10:03:20 -07:00
Fred Park
aed9f8145b
Add test cluster support
2017-10-03 10:03:20 -07:00
Fred Park
75157e91f6
Add Read the Docs build
...
- Tag for 2.9.6 release (mainly to generate a 2.9.x RTD version)
2017-10-03 09:53:32 -07:00
Fred Park
bb3b4d6a41
Fix RemoteFS disk model create option type
2017-09-25 12:45:23 -07:00
Fred Park
53477a720a
Tag for 2.9.5 release
...
- Add --all-starting for pool delnode
2017-09-24 19:50:14 -07:00
Fred Park
c13793dd57
Support version in platform image
...
- Override UbuntuServer 16.04-LTS latest to prior version due to
linux-azure kernel issues
2017-09-22 19:41:33 -07:00
Fred Park
093cfdbc83
Prevent invalid mix of HPC offer and non-RDMA VM
...
- Fix unusable nodes on allocation exception in pool stats
- Expand network tuning exemptions
2017-09-22 08:27:16 -07:00
Fred Park
260e1609ee
Fix various OS, Docker and nvidia issues
...
- Update Docker CE versions for Ubuntu and CentOS
- Update NC driver
- Add special nvidia install path for CentOS 7.3 during 7.4 rollout
2017-09-21 22:13:33 -07:00
Fred Park
5621ec5e3e
Fix regression in ssh private key check on Windows
2017-09-21 13:11:26 -07:00
Fred Park
7e1c4c7e75
NV-series driver updates
...
- Resolves #119
2017-09-13 09:19:06 -07:00
Fred Park
9602608871
Tag for 2.9.4 release
2017-09-12 08:56:03 -07:00
Fred Park
1b0e1c449d
Refactor SSH to common path
...
- Check for SSH private key filemode
- Resolves #116
2017-09-07 08:02:40 -07:00
hieuhc
28bfda3278
Replace clear() method when invoking pool udi with ssh ( #118 )
2017-09-07 09:18:46 +01:00
Fred Park
1d5cdcbbdf
Tag for 2.9.3 release
2017-08-29 08:05:34 -07:00
Fred Park
010b160e43
Provide warning and note on job migration
2017-08-17 13:08:31 -07:00
Fred Park
5f393beb1d
Disallow resize_timeout in AS-enabled pools
2017-08-17 07:06:03 -07:00
Fred Park
9d66d75b62
Tag for 2.9.2 release
...
- Attempt another fix at site extension upgrade (#113 )
2017-08-16 08:59:53 -07:00
Fred Park
f91107e89d
Tag for 2.9.1 release
2017-08-16 08:33:19 -07:00
Fred Park
9e3308ff2b
Fix issues in RemoteFS
...
- Public ip not being assigned for resize command
- Need to wait for block device to show up for attached disks (expand
command)
- Add more logging
2017-08-16 08:14:08 -07:00
Fred Park
71706dec6e
Fail faster for storage issues in remotefs
2017-08-15 19:40:41 -07:00
Fred Park
3f244d0e4b
Fix ssh private key issue in RemoteFS
2017-08-15 16:39:52 -07:00
Fred Park
b16685348d
Tag for 2.9.0 release
2017-08-15 13:28:55 -07:00
Fred Park
466c7d4a3b
Perform client checks
2017-08-15 13:26:59 -07:00
Fred Park
e434b83cb3
Fix truncated P50 provisioning
...
- Support "s_v3" suffixed premium VM SKUs
2017-08-15 13:26:31 -07:00
Fred Park
7a815dff6f
Minor updates and fixes
2017-08-14 09:07:07 -07:00
Fred Park
1e4cd777be
Fix division by zero in pool stats
...
- Fix flake8 issues
2017-08-10 15:08:39 -07:00
Fred Park
745082029f
Misc doc updates
...
- Update requests
- Check task id length
- Drop Python 3.3 support due to cryptography
2017-08-10 08:40:07 -07:00
Fred Park
284c4d9c23
Tag for 2.9.0rc1 release
2017-08-09 08:57:33 -07:00
Fred Park
4573180293
Validate and prompt certain job schedule adds
2017-08-09 08:39:26 -07:00
Fred Park
44a1f14b31
Add monitor_task_completion for recurring jobs
2017-08-09 07:57:56 -07:00
Fred Park
5ae9001716
Add job schedule support to commands
...
- Resolves #19
2017-08-08 15:02:53 -07:00
Fred Park
9add2444ec
Change autogen task id property to complex
...
- Update job recurrence docs
2017-08-08 08:45:15 -07:00
Fred Park
be530e63c0
Job recurrence support
2017-08-07 19:42:09 -07:00
Fred Park
99e72c0c3f
Add custom task factory support ( #93 )
2017-08-07 10:38:08 -07:00
Fred Park
754b5ee5a6
Tag for 2.9.0b2 release
2017-08-04 14:51:27 -07:00
Fred Park
8a396f0e18
Add pool and jobs stats
...
- Resolves #110
2017-08-04 14:47:16 -07:00
Fred Park
c5fa85adcb
Add file task factory ( #93 )
...
- Split out task factory settings into separate file
- Change uniform to be a, b instead of min, max
- Update blobxfer script for single target ingress to place file
directly to destination
2017-08-04 11:02:33 -07:00
Fred Park
1650ce4a95
Add random task factory ( #93 )
2017-08-03 20:10:56 -07:00
Fred Park
e5ffd492ab
Update CNTK CPU infiniband recipe to 2.1
2017-08-03 16:28:23 -07:00
Fred Park
4d09a09a80
Add --all-unused to pool delnode
2017-08-03 10:41:38 -07:00
Fred Park
d539e2923a
Fix pool udi terminal mangling
2017-08-03 08:52:20 -07:00
Fred Park
9eb8fd4c55
Support CentOS-HPC 7.3
...
- Update misc tensorboard to latest
- Fix term tasks in disable jobs
- Update NVIDIA driver
- Doc updates
2017-08-02 15:12:45 -07:00
Fred Park
bab8628ed5
Tag for 2.9.0b1 release
2017-07-31 15:05:58 -07:00
Fred Park
ed8ca2d225
Add autogen task id setting
2017-07-31 13:40:18 -07:00
Fred Park
196a36336e
Add rebalance based on preempted node count
2017-07-31 13:40:15 -07:00
Fred Park
4105acc2f8
Add task factory (parameter sweep) support
...
- Resolves #93
2017-07-28 14:36:42 -07:00
Fred Park
23a753a110
RemoteFS fixes
2017-07-27 08:12:34 -07:00
Fred Park
7a9177b16b
Fix pool deletion with poolid arg
2017-07-26 10:38:29 -07:00
Fred Park
5fef683af4
Universally increase SAS expiry time
2017-07-21 13:28:56 -07:00
Fred Park
e32fc4d93e
Add Autopool support
...
- Resolves #33
- Add --poolid to storage clear and storage del
- jobs del and jobs term now cleanup storage data if autopool is
detected
2017-07-21 11:10:03 -07:00
Fred Park
30ea8c280f
Add autoscale guide
2017-07-21 11:10:03 -07:00
Fred Park
7ba85e7496
Add job migration support
...
- Add enable/disable job support too
- Resolves #108
2017-07-21 11:10:03 -07:00
Fred Park
3b65ba684f
Support job priorities
...
- Resolves #109
2017-07-21 11:10:03 -07:00
Fred Park
23e9584852
Add compute node fill type support
...
- Resolves #107
2017-07-21 11:10:03 -07:00
Fred Park
82a46a615a
Basic Autoscale functionality
...
- Allow pools to be added with zero target nodes
- Add pool autoscale commands
2017-07-21 11:10:03 -07:00
Fred Park
5291ff1130
Move to blob leasing for download ticketing
...
- Greatly increase resource file SAS expiry timedelta
- Make concurrent_source_downloads generic, remove non-p2p option
- Update Dockerfiles
- Update to latest azure-storage
2017-07-21 11:10:03 -07:00
Fred Park
d197c9be28
Minor fixups
2017-07-07 09:07:40 -07:00
Fred Park
03fe791171
Tag for 2.8.0 release
2017-07-06 11:12:24 -07:00
Fred Park
8eb2197d23
Allow CentOS 7.3 on NC/NV
2017-07-06 11:12:05 -07:00
Fred Park
de45b18a67
Add backoff to cascade docker image pull retries
2017-07-01 01:25:30 -07:00
Fred Park
2a48885da1
More improvements for scale out robustness
...
- Add --all-start-task-failed to delnode
- Reduce node output on pool allocation wait with number of nodes > 10
2017-06-30 23:50:21 -07:00
Fred Park
06188c1944
Tag for 2.8.0rc2 release
...
- Fix regression with private docker image pulls
- Resolves #103
- Resolves #105
2017-06-30 11:45:26 -07:00
Fred Park
ade6a27b60
Tag for 2.8.0rc1 release
2017-06-27 11:38:36 -07:00
Fred Park
54422ce2eb
Add retry handling for cascade docker pull
...
- Add cascade.log download for start up failures
2017-06-27 09:28:03 -07:00
Fred Park
cefa72e443
Add version metadata to pool and jobs
...
- Resolves #89
2017-06-26 13:20:49 -07:00
Fred Park
a61449ec9c
Fix tensorboard command with custom image changes
...
- Fix ref during exception handling for invalid platform image
- Remove max size note for remote fs managed disks
2017-06-26 10:51:53 -07:00
Fred Park
35c9779d68
Fix job auto_complete overwrite of job properties
...
- Resolves #97
2017-06-09 11:34:26 -07:00
Fred Park
887c597fab
Tag for 2.8.0b1 release
2017-06-07 08:30:04 -07:00
Fred Park
a41713c5ee
Add custom image guide
...
- Update recipes for vm_configuration
- Fix some issues with platform pools with new changes
2017-06-06 12:41:42 -07:00
Fred Park
8397b411c5
Initial custom image support
2017-06-06 08:43:33 -07:00
Fred Park
549f50aac5
Tag for 2.7.0 release
2017-05-31 07:43:14 -07:00
Fred Park
004413e36e
Fix pool udi with no logins/encryption over SSH
...
- Resolves #92
2017-05-28 15:18:32 -07:00
Fred Park
ec986323fb
Duplicate volume checks between job and task
2017-05-28 15:18:32 -07:00
Fred Park
3d5958195c
Remove print statements
2017-05-24 13:43:20 -07:00
Fred Park
3a5fb452d5
Various fixes
...
- Add poolid param for pool del
- Fix vm_count deprecation check on fs actions
- Improve robustness of package index updates
- Prompt for jobs cmi action
- Update to latest dependencies
2017-05-24 09:54:09 -07:00
Fred Park
06fd4f8e62
Tag for 2.7.0rc1 release
2017-05-24 07:43:19 -07:00
Fred Park
4514920b6a
Add pool listimages command
...
- Resolves #60
- Fix some resize/wait issues
2017-05-23 14:14:29 -07:00
Fred Park
d80d938063
More inheritable job to task properties
...
- Add max_wall_time property
- Resolves #69
2017-05-23 09:29:00 -07:00
Fred Park
5a82cc79f8
Add --tty option to ssh commands
2017-05-22 20:03:35 -07:00
Fred Park
a623e5b26f
Add list tasks poll option
...
- Resolves #77
- Add deprecation path for multi-instance pool spec vm count
- Fix outdated recipe doc for multi-instance pool spec
- Cache last task id to speed up task collection adds
2017-05-22 19:45:25 -07:00
Fred Park
199ac70e22
Tag for 2.7.0b2 release
2017-05-18 09:18:43 -07:00
Fred Park
d2b066bf6d
Add tasks via collection
...
- Resolves #86
2017-05-17 18:48:16 -07:00
Fred Park
644e86ddb6
Prevent glusterfs on compute and max tasks > 1
2017-05-17 15:03:30 -07:00
Fred Park
c3a72fa4e3
Allow workdir to be set
...
- Resolves #87
2017-05-17 09:28:19 -07:00
Fred Park
11c5dc700b
Improve pool resize logic with mixed nodes
2017-05-16 09:49:01 -07:00
Fred Park
d9304794bc
Add deprecation path for vm_count change
...
- Resolves #84
2017-05-16 09:43:58 -07:00
Fred Park
b09a37f22f
Add preempted state checking
2017-05-15 08:08:30 -07:00
Fred Park
fd6c45505e
Update recipes for vm_count change
2017-05-12 19:23:29 -07:00
Fred Park
7ed7429a24
Add Low Priority Batch VM support
...
- Resolves #82
- Resolves #83
2017-05-12 14:42:55 -07:00
Fred Park
a17d6b64c9
Update dependencies
...
- Fix breaking changes in keyvault library
- Fix inverted order for fs cluster ssh and optional command
2017-05-11 09:21:20 -07:00
Fred Park
70f7317c13
Add --clear-tables option to storage del
...
- Update limitations doc
- Resolves #80
2017-05-08 09:07:09 -07:00
Fred Park
1050c5da0e
Tag for 2.6.2 release
2017-05-05 08:42:53 -07:00
Fred Park
983a7eed45
Node prep script improvements
...
- Blacklist nouveau universally on GPU VMs
- Change URL retrieval to requests
- Update requirements to latest
2017-05-05 08:42:19 -07:00
Fred Park
2d53e411e4
Add develop branch Dockerfile for hub
...
- Allow NVIDIA license agreement to be auto-confirmed with -y
2017-05-04 09:03:09 -07:00
Fred Park
b3f959801c
Fix docker login for missing images
...
- Resolves #66
2017-05-02 20:11:24 -07:00
Fred Park
2e62f12729
Fix misc tensorboard default image issue
2017-05-02 13:16:57 -07:00
Fred Park
ad423c0e3d
Tag for 2.6.1 release
...
- Optimize some Batch calls
2017-05-01 22:17:43 -07:00
Fred Park
3d2c8cc191
Fix termtasks with disable
2017-05-01 18:47:41 -07:00
Fred Park
f9912b7a52
Pool-level resource file support
2017-05-01 10:17:09 -07:00
Fred Park
b559ba3fb5
Fix data ingress to single node on pool add
2017-05-01 08:38:45 -07:00
Fred Park
aee7c2018b
Batch exception handling fixes
...
- Add more tensorboard log switches for autodetection
2017-04-30 20:08:15 -07:00
Fred Park
c8f7521196
Add COMMAND arg for ssh commands
2017-04-30 14:22:13 -07:00
Fred Park
d638fe10a5
Tensorboard docker image auto-detect
...
- Fix jobs del --termtasks to disable job first
- Fix jobs listtasks and data listfiles to accept jobid not in jobs
config
- Perform double port remapping to avoid conflicts in tensorboard
2017-04-29 23:56:53 -07:00
Fred Park
fb978a4788
Update TensorFlow recipes to r1.1
...
- Remove custom build of TF for non-distributed mode
2017-04-28 23:13:04 -07:00
Fred Park
4c29dd22e6
Fix streaming off by one
2017-04-28 12:53:27 -07:00
Fred Park
0394cd9848
Add misc tensorboard command
2017-04-28 12:53:19 -07:00
Fred Park
6612150352
Catch ssh user add exception in pool add
...
- Update various docs
2017-04-26 11:09:56 -07:00
Fred Park
e1452fc7c9
Update Azure site extension docs
2017-04-21 19:42:49 -07:00
Fred Park
12216930fe
Tag for 2.6.0 release
2017-04-20 13:09:26 -07:00
Fred Park
e997441169
Fix regression in data ingress
2017-04-19 08:32:28 -07:00
Fred Park
b8d36a065a
Refactor pool ssh settings
2017-04-18 18:46:57 -07:00
Fred Park
0161169daa
Remove Windows checks for scp/ssh/openssl
...
- Update docs
- Remove unused vars in nodeprep script
2017-04-18 13:51:05 -07:00
Fred Park
77db9dbd82
Fix quotes in parameters
...
- Disallow newline character in smb password
2017-04-14 22:39:27 -07:00
Fred Park
7b99cf0b85
Modify glusterfs race fix with iptables
...
- Restrict smb account password from containing certain characters due
to echo reinterpret issues
- Fix some more ssh/pathlib issues
2017-04-14 14:56:35 -07:00
Fred Park
469e5cb56f
Tag for 2.6.0rc1 release
...
- Fix Docker setup issues
- Pin Docker release version
- Update NC nvidia driver
2017-04-14 12:54:43 -07:00
Fred Park
741a0bdd85
Add fault_domains property
...
- Add RemoteFS-GlusterFS+BatchPool recipe
- Various fixes
2017-04-14 08:14:13 -07:00
Fred Park
0d974fa0aa
Add additional SSH options
...
- Fix samba to auto-restart
2017-04-13 09:31:35 -07:00
Fred Park
9b30c60b10
Tag for 2.6.0b3 release
2017-04-03 14:20:24 -07:00
Fred Park
96395fa68a
Allow docker_images to be empty
2017-04-03 14:20:20 -07:00
Fred Park
f61f91423e
multi_instance_auto_complete -> auto_complete
...
- Resolves #61
2017-04-03 10:48:54 -07:00
Fred Park
b01b835fa2
pool udi with 0 nodes returns warning
...
- Resolves #64
2017-04-03 08:50:39 -07:00
Fred Park
3ded07634e
Fix some Python2 issues in remotefs
...
- Properly map the gluster volume mountpath and not the brick for SMB
2017-03-31 13:21:22 -07:00
Fred Park
5088c8f26e
Fix pathlib and future compatibility issues
2017-03-31 09:55:34 -07:00
Fred Park
d6e72ca22f
Fix glusterfs_on_compute issues
2017-03-31 08:07:23 -07:00
Fred Park
b426ce9c39
Add Samba NSG rules and stat
2017-03-30 19:48:17 -07:00
Fred Park
667d273c09
Move additional node prep commands as last set
...
- Resolves #63
2017-03-30 15:16:36 -07:00
Fred Park
130401af75
Add samba support on storage cluster nodes
2017-03-30 15:03:11 -07:00
Artem Sobolev
bb14d2224e
Fix ingress data call ( #62 )
2017-03-30 13:16:03 -07:00
Fred Park
db16e4cb7e
Allow public IP to be disabled
...
- Fix fs cluster status --detail
- Expand non-retry on async ops to include all 400-level status codes
2017-03-28 20:49:09 -07:00
Fred Park
f4b08a9f77
Fix for multi-instance auto complete
...
- Also only read credentials json if valid
2017-03-24 14:54:19 -07:00
Fred Park
5879e48586
Remove batch credential for fs ops
...
- Add RemoteFS recipes
- Replace Batch Explorer with Batch Labs
2017-03-23 15:06:42 -07:00
Fred Park
dd1b9f3de5
Tag for 2.6.0b2 release
...
- Resolves #59
2017-03-22 10:05:10 -07:00
Fred Park
f952605e58
Fix server options arg parsing
2017-03-17 19:32:44 -07:00
Fred Park
8871b8697d
Move storage container creation/deletion for fs
...
- Move storage container actions closer to create/delete for the cluster
to reduce chance of storage container/blob orphaning
2017-03-17 15:48:20 -07:00
Fred Park
0f742a3cf3
Do not run udi when there are no nodes
...
- Resolves #58
2017-03-17 10:35:04 -07:00
Fred Park
c15ea84840
Return helpful text about sc id not found
2017-03-16 19:20:35 -07:00
Fred Park
791c5726e0
Tag for 2.6.0b1 release
2017-03-16 17:53:15 -07:00
Fred Park
b269ea7f06
Add multi-volume/server support
2017-03-16 15:18:29 -07:00
Fred Park
38ac358d9d
Add pre-existing checks
...
- Switch to hostname peering in add brick (resize)
- Update docs regarding max_tasks_per_node
2017-03-16 08:58:53 -07:00
Fred Park
9dfccad392
Cap vm extension install to 1 attempt
...
- Fail async op at first chance to preserve traceback
2017-03-15 13:22:26 -07:00
Fred Park
a95bc22e52
Add automatic retries for async ops in fs commands
...
- Single-source resource name generation
- Add --generate-from-prefix option for fs cluster del command
2017-03-15 11:30:01 -07:00
Fred Park
5325395522
Add glusterfs local mount option and NSG rule
...
- Add --hosts option for fs cluster status to print required hosts
changes on the local machine to mount the remote fs
2017-03-14 22:07:51 -07:00
Fred Park
ca2f9d73ab
Add support for docker run uid/gid
...
- Resolves #54
2017-03-14 08:52:09 -07:00
Fred Park
071ec86831
Minor fixes and updates
2017-03-13 08:16:53 -07:00
Fred Park
89b722df54
Populate the fs config doc
...
- Update base README
- Rename disk_ids to disk_names in fs.json
2017-03-12 13:07:56 -07:00
Fred Park
d9966e645d
Experimental support for gluster cluster resize
...
- Add more robustness to gluster provisioning
- Stat script fixes
- Add --detail option for stat
- More disks del/list options
2017-03-12 11:18:41 -07:00
Fred Park
9dc4673530
Enable data ingress to remote storage clusters
...
- Refactor some constants to function access from the proper locations
2017-03-11 19:15:14 -08:00
Fred Park
cb7b42a231
Support glusterfs <-> pool autolinking
...
- Support glusterfs expand (additional disks)
- Provide `mount_options` for `file_server` which applies to local mount
on the file server of the disks
- Allow gluster volume name to be specified
- Provide stronger cross-checking between pool virtual network and
storage cluster virtual network
- Increase ud/fd in AS to maximums
- Install acl tools for nfsv4 and glusterfs
2017-03-11 15:23:55 -08:00
Fred Park
0ed28d96fc
Allow scp dm and credential encryption on Windows
...
- Rename old glusterfs scripts to be less confusing with remote
glusterfs support
2017-03-11 09:21:33 -08:00
Fred Park
675c6c37f8
Glusterfs support for add/suspend/start
...
- Simple logging by default
- Fix logging format
2017-03-10 22:54:16 -08:00
Fred Park
3f47fda0b9
Checkpoint multi-vm glusterfs support
...
- Allow resource_group overrides in managed_disks and storage_cluster
- Add server_options to file_server
- Add named resource group support to disk deletion
- Fix Batch and ARM client issues in non-AAD mode
2017-03-10 15:10:31 -08:00
Fred Park
33291504c2
Support missing image tasks, pool check
...
- Break out configs into separate pages
- Update all configs using 16.04.0-LTS to 16.04-LTS
- Remove Batch `account` from recipe credentials
2017-03-09 15:07:37 -08:00
Fred Park
e349a004cd
Support pool <-> storage cluster auto-linkage
...
- Update to latest batch management client library supporting
UserSubscription
- Begin breakout of config doc into multiple pages
2017-03-09 09:40:16 -08:00
Fred Park
5fcddad7ea
Pool <-> storage cluster linkage checkpoint
2017-03-08 23:43:16 -08:00
Fred Park
91403de98f
Add pool vnet spec
...
- Refactor vnet/subnet creation so pool creation can use it
- Allow read of fs.json for pool add
- Rename "glusterfs" volume_driver to "glusterfs_on_compute"
2017-03-08 20:23:05 -08:00
Fred Park
66d90dde90
Prep for add pool with vnet changes
...
- Centralize various client creation logic
2017-03-08 14:56:39 -08:00
Fred Park
8f7aee3a2f
Support AAD auth for Batch accounts
2017-03-08 11:13:09 -08:00
Fred Park
c118b7e2d9
Allow custom inbound network security rules
2017-03-08 09:52:21 -08:00
Fred Park
587ab7faa4
Fix suspend/start issues with software raid
...
- Disallow expand action with mdadm-based arrays on RAID-0
- Change "remotefs" to "fs" for commands
2017-03-08 09:52:21 -08:00
Fred Park
748cf64bfb
Refactor and unify AAD settings across commands
...
- All KeyVault AAD endpoints to be specified
2017-03-08 09:52:21 -08:00
Fred Park
a6a672a82e
Begin expand functionality
...
- Fix issues with ext4 + mdadm
2017-03-08 09:52:21 -08:00
Fred Park
f8e3fa52ed
Add stat script
...
- Better organize some remotefs json settings
- Reduce redundant lookups in ssh path
- Create --output-config option to separate from --verbose
2017-03-08 09:52:21 -08:00
Fred Park
0b172eccce
Add remotefs bootstrap script
2017-03-08 09:52:21 -08:00
Fred Park
82398360ff
Add cluster status and ssh commands
...
- Start integration with CustomScript extension
2017-03-08 09:52:21 -08:00
Fred Park
69335d7287
Add cluster suspend and start commands
...
- Begin work on status command
2017-03-08 09:52:21 -08:00
Fred Park
94bde5b076
Add first version of cluster add and del commands
...
- Modify remotefs json for more properties
2017-03-08 09:52:21 -08:00
Fred Park
acbce84fa1
Add disk del and list commands
2017-03-08 09:52:21 -08:00
Fred Park
cba7086511
Add disk add command
...
- Add first iteration of remotefs.json
- Modify set of TCP no tune VMs
2017-03-08 09:52:21 -08:00
Fred Park
cd2cb4352a
Scaffold base changes for remotefs
2017-03-08 09:52:21 -08:00
Fred Park
48834c1a49
Fix certificate vis for encrypted credentials
2017-03-08 09:36:43 -08:00
Fred Park
9cd39534cb
Update for azure-batch 2.0.0
...
- Dependency refresh
2017-03-08 09:36:43 -08:00
Fred Park
00302b75db
Tag for 2.5.4 release
...
- Update nvidia-docker for docker ce
- Update tesla nc driver
- Use SHA256 checksums instead of MD5 for downloads
2017-03-08 08:39:51 -08:00
Fred Park
f9782878f1
Tag for 2.5.3 release
2017-03-01 07:39:00 -08:00
Fred Park
b8a94c378d
Add rebootnode command
...
- Update recipes that download files to use resource files instead
2017-02-28 19:37:57 -08:00
Fred Park
6ce173ca05
Pin blobxfer version and add termtasks option
...
- Clarify docs for usage scope between tooling/APIs
2017-02-28 09:45:24 -08:00
Fred Park
ace0dde416
Tag for 2.5.2 release
2017-02-23 08:06:27 -08:00
Fred Park
7dd02b3c27
Automatic path sub for Gluster/AzStorage xfer
...
- Resolves #37
2017-02-23 07:51:49 -08:00
Fred Park
af1e03cfa8
Add troubleshooting guide
...
- Minor convoy.batch fixes
2017-02-22 20:34:20 -08:00
Christian
65576235a1
Wrap local command in a OS specific shell ( #39 )
...
* Wrap local command in a OS specific shell
* _ON_WIN const instead of os.name
2017-02-21 07:31:03 -08:00
Fred Park
4eea944bb3
Tag for 2.5.1 release
2017-02-01 11:06:26 -08:00
Fred Park
78fad1c3e3
Add support for task retention time
...
- Resolves #30
2017-01-31 09:40:16 -08:00
Fred Park
25dcc983ef
Fix unencrypted task file mover delimiter issue
...
- Resolves #29
2017-01-30 15:06:32 -08:00
Fred Park
0fe858c14e
Add FAQ and fix autogen task id rollover
...
- Resolves #27
2017-01-26 08:30:53 -08:00
Andrea Dotti
c6176d01ce
Fix issue with listtasks failing on active status tasks ( #28 )
...
* Fix issue with listtasks failing on active status tasks
Fix issue with tasks that are not in state completed or running that cause jobs listtasks fails
and causes shipyard to crash.
* Fix formatting issue
Fix spaces and too long line
* Fix code notations
2017-01-26 08:22:02 -08:00
Fred Park
270ef0c7b1
Fix Docker tmpdir
...
- Fix typo with ev secret id ref to keyvault
- Add travis py36 env
2017-01-24 14:43:44 -08:00
Derrick Liu
5fabd07fef
Add max_task_retry_count to job and task definitions ( #23 )
...
* Add `max_task_retry_count` to json template as reference
* Add job-level and task-level max_task_retry_count properties
If set, we create a `azure.batch.models.JobConstraints` or `azure.batch.models.TaskConstraints` object, and pass it into the call to `JobAddParameter` or `TaskAddParameter` as a constraints argument.
* Update configuration documentation to include `max_task_retry_count`
* Fixed various minor issues and linting
Squashed commit:
[d794908] No retry means retry_count is 0, not 1
[29de812] Forgot to define these earlier
[8336700] Don't check for empty since it's an int
[c59d52a] Fix flake8 linting line length (+2 squashed commit)
Squashed commit:
[8336700] Don't check for empty since it's an int
[c59d52a] Fix flake8 linting line length
* Rename `max_task_retry_count` to `max_task_retries` and fix other PR comments
2017-01-24 07:46:52 -08:00
Fred Park
fa4e1f847c
Add env var secret id support
...
- Tag for 2.5.0 release
- Resolves #12
- Partially resolves #15
2017-01-19 10:16:42 -08:00
Fred Park
040a068265
Various fixes
...
- This resolves #13 and resolves #16
2017-01-19 10:16:42 -08:00
Gonzalo
520db51019
nvidia-docker updated to v1.0.0
2017-01-19 09:48:13 -08:00
Derrick Liu
829258de2b
Remove extra azure.mgmt import
2017-01-18 20:54:43 -08:00
Fred Park
9b6dbef19f
Add task dependency id range support
2017-01-12 09:30:41 -08:00
Fred Park
c95520eaea
Tag for 2.4.0 release
...
- Update KeyVault docs with Azure CLI 2.0 commands
- Resolves #10
2017-01-11 09:22:45 -08:00
Fred Park
348ceebc65
Add AAD X.509 cert auth support ( #10 )
...
- AAD/Keyvault credential support in credentials.json
2017-01-10 11:48:39 -08:00
Fred Park
be04d89410
Update docs for KeyVault support ( #10 )
2017-01-06 08:03:26 -08:00
Fred Park
2048d8b289
Add KeyVault support ( #10 )
2017-01-05 10:20:13 -08:00
Fred Park
ae7e5df410
Tag for 2.3.1 release
...
- Update some docs
2017-01-03 08:51:19 -08:00
Fred Park
b69334de58
Fix multi-job jpcmd bug
2016-12-15 15:44:18 -08:00
Fred Park
d9bf6c92da
Add nvidia-docker support to ssh tunnel
2016-12-15 11:05:52 -08:00
Fred Park
57b47b353f
Add pool ssh command, resolves #9
...
- Make the ssh docker tunnel script much easier to use
- Add an ssh guide to docs
2016-12-15 07:39:04 -08:00
Fred Park
38ba61245d
Add /dev/shm option, resolves #8
2016-12-14 08:36:51 -08:00
Fred Park
7f37f81e93
Increment version to 2.2.0
2016-12-12 08:13:37 -08:00
Fred Park
f00f222877
Infiniband settings changes
...
- Add CNTK ib recipe
- Update READMEs to remove GPU preview notes
- Tag for 2.2.0 release
2016-12-09 11:34:21 -08:00
Fred Park
5baf61d8f4
Fix SAS key and KeyError masking in data movement
2016-11-30 14:41:39 -08:00
Fred Park
c603e8f6f5
Fix tfm docker image latest reference
2016-11-30 13:44:44 -08:00
jasper-schneider
6f7d474874
Fix ssh_public_key typo when using Windows ( #6 )
2016-11-30 13:40:29 -08:00
Fred Park
8f0fa2f446
Tag for 2.1.0 release
...
- Pass version to nodeprep and pull backend docker images by version
2016-11-30 08:27:46 -08:00
Fred Park
28732f2aea
Add listskus subcommand
...
- Update docs for envvars
2016-11-29 15:31:23 -08:00
Fred Park
8577232349
Tag for 2.0.0 release
2016-11-23 09:06:37 -08:00
Fred Park
fe0403de9a
Update MXNet GPU docker image
2016-11-22 14:27:33 -08:00
Fred Park
c047b522c3
Update CNTK docker images to 2.0beta4
...
- Fix termtasks for multi-instance tasks with named containers
2016-11-22 00:16:16 -08:00
Fred Park
44080c123a
Prepend job id to Docker container names
...
- Update TensorFlow to 0.11.0 and custom compile to add compute/sm 3.7
2016-11-20 14:57:45 -08:00
Fred Park
8cb2ba9583
Allow GPU property to be optional for NC VMs
...
- Update all GPU compute recipes to omit gpu driver
2016-11-20 09:00:19 -08:00
Fred Park
453ae98a65
Terminate cascade on thread failures
2016-11-19 10:39:58 -08:00
Fred Park
c7744f95bf
Support for internet accessible private registries
2016-11-19 09:00:01 -08:00
Fred Park
4399dbf4db
Tag for 2.0.0rc3 release
...
- Fix flake8 issues
2016-11-14 11:10:59 -08:00
Fred Park
62b532d233
Fix encoding issue for env file write
2016-11-13 23:26:48 -08:00
Fred Park
0ae05f2d84
Add CUDA_CACHE_ vars for GPU tasks
2016-11-13 11:42:56 -08:00
Fred Park
b8fcdede8f
Add --tail option for jobs add
...
- Simplify quickstart with --tail
2016-11-12 22:35:36 -08:00
Fred Park
4f41d95e32
Finish settings refactor
...
- Change recipes to use current_dedicated for multi-instance count
2016-11-12 22:13:55 -08:00
Fred Park
e6593281eb
Refactor direct config access out of data/storage
2016-11-12 12:35:56 -08:00
Fred Park
6e20e1b512
Refactor direct config accesses in crypto
...
- Refactor os path calls to pathlib
2016-11-12 09:13:09 -08:00
Fred Park
285f86ae9b
Fleet no longer directly accesses config json
2016-11-11 23:19:28 -08:00
Fred Park
cb4a077776
Fleet add pool no longer directly accesses config
2016-11-11 21:51:11 -08:00
Fred Park
03ced70c38
Continue settings refactor
...
- Credentials
- Some of global config
2016-11-11 21:08:58 -08:00
Fred Park
392af0bd55
Start pool settings refactor
2016-11-11 19:23:16 -08:00
Fred Park
bff72f4d04
Fix removal of shared path from glusterfs ingress
2016-11-11 09:45:36 -08:00
Fred Park
e700ee05b7
Add docker login prior to image update
...
- Move docker hub creds to credentials json
- Begin refactor of configuration settings retrieval
2016-11-11 09:30:14 -08:00
Fred Park
da573524de
Preliminary steps for ACR support
...
- Fix update docker images with private registry
- Automatically clean dangling image refs on update
- Remove private registry file/image id support
- Refactor fleet initialization steps to one entry point
- Simplify shipyard context init
2016-11-10 09:48:00 -08:00
Fred Park
092c2a22d1
Fix single node transfer with single file
2016-11-09 19:06:06 -08:00