Граф коммитов

149 Коммитов

Автор SHA1 Сообщение Дата
Fred Park 172d752938 Add RDP support
- Improve command hierarchy
2017-11-03 16:20:50 -07:00
Fred Park 6da607c9b9 Multi-instance/IB support for Singularity tasks
- Make cascade work in Docker container
2017-10-22 13:59:35 -07:00
Fred Park 48172e115e Add initial Singularity task support
- Auto-GPU
- Fix ownership issues with Singularity image pre-load
2017-10-20 23:10:12 -07:00
Fred Park 4e5d5abf6b Add Singularity support into cascade
- Remove singularity suport in native container support pools as it's
  impossible to execute a singularity container in this mode
2017-10-17 18:51:27 -07:00
Fred Park adc0c865ea docker_volumes is now volumes 2017-10-17 12:57:02 -07:00
Fred Park 298b00d946 Mount Azure file shares to host (#123)
- Allow multiple file shares per pool
- Move root mount point for all shared data volumes
2017-10-04 17:59:30 -07:00
Fred Park 49a374416f Add unusable node recovery option 2017-10-04 09:25:57 -07:00
Fred Park 6315be3a6b Transition to blobxfer 1.x command structure
- Data ingress/egress changes
- Task factory file changes
- Resolves #47
2017-10-03 18:24:49 -07:00
Fred Park e783744e00 Container registry logic overhaul
- Remove private registry back to Azure storage blob support (#44)
- Require fully qualified Docker image names (#106)
- Support multiple public/private registries on a single pool (#127)
2017-10-03 18:24:42 -07:00
Fred Park cbddcdfbff Use docker_image in favor of image in tasks 2017-10-03 10:05:17 -07:00
Fred Park 60b4fc446f Support ARM Images for custom images (#126) 2017-10-03 10:05:17 -07:00
Fred Park 238982db77 Add ARM VNet support in Batch service mode (#126)
- Support "global" aad property in credentials
- Add Virtual Network guide
2017-10-03 10:05:17 -07:00
Fred Park afdad167a8 Add native support to custom images
- Update TSG doc for native container support
2017-10-03 10:04:03 -07:00
Fred Park e2ddf3b750 Add YAML configuration support
- Resolves #122
2017-10-03 10:04:03 -07:00
Fred Park 9c700dfbd5 Fix jrtask and suppress CUDA vars for native
- Doc updates
- Suppress coordination/task commands that are empty
2017-10-03 10:03:20 -07:00
Fred Park c13793dd57 Support version in platform image
- Override UbuntuServer 16.04-LTS latest to prior version due to
  linux-azure kernel issues
2017-09-22 19:41:33 -07:00
Fred Park 745082029f Misc doc updates
- Update requests
- Check task id length
- Drop Python 3.3 support due to cryptography
2017-08-10 08:40:07 -07:00
Fred Park 44a1f14b31 Add monitor_task_completion for recurring jobs 2017-08-09 07:57:56 -07:00
Fred Park 9add2444ec Change autogen task id property to complex
- Update job recurrence docs
2017-08-08 08:45:15 -07:00
Fred Park be530e63c0 Job recurrence support 2017-08-07 19:42:09 -07:00
Fred Park 99e72c0c3f Add custom task factory support (#93) 2017-08-07 10:38:08 -07:00
Fred Park c5fa85adcb Add file task factory (#93)
- Split out task factory settings into separate file
- Change uniform to be a, b instead of min, max
- Update blobxfer script for single target ingress to place file
  directly to destination
2017-08-04 11:02:33 -07:00
Fred Park 1650ce4a95 Add random task factory (#93) 2017-08-03 20:10:56 -07:00
Fred Park ed8ca2d225 Add autogen task id setting 2017-07-31 13:40:18 -07:00
Fred Park 4105acc2f8 Add task factory (parameter sweep) support
- Resolves #93
2017-07-28 14:36:42 -07:00
Fred Park e32fc4d93e Add Autopool support
- Resolves #33
- Add --poolid to storage clear and storage del
- jobs del and jobs term now cleanup storage data if autopool is
  detected
2017-07-21 11:10:03 -07:00
Fred Park 3b65ba684f Support job priorities
- Resolves #109
2017-07-21 11:10:03 -07:00
Fred Park 23e9584852 Add compute node fill type support
- Resolves #107
2017-07-21 11:10:03 -07:00
Fred Park 82a46a615a Basic Autoscale functionality
- Allow pools to be added with zero target nodes
- Add pool autoscale commands
2017-07-21 11:10:03 -07:00
Fred Park 5291ff1130 Move to blob leasing for download ticketing
- Greatly increase resource file SAS expiry timedelta
- Make concurrent_source_downloads generic, remove non-p2p option
- Update Dockerfiles
- Update to latest azure-storage
2017-07-21 11:10:03 -07:00
Fred Park 8397b411c5 Initial custom image support 2017-06-06 08:43:33 -07:00
Fred Park d80d938063 More inheritable job to task properties
- Add max_wall_time property
- Resolves #69
2017-05-23 09:29:00 -07:00
Fred Park 7ed7429a24 Add Low Priority Batch VM support
- Resolves #82
- Resolves #83
2017-05-12 14:42:55 -07:00
Fred Park f9912b7a52 Pool-level resource file support 2017-05-01 10:17:09 -07:00
Fred Park 741a0bdd85 Add fault_domains property
- Add RemoteFS-GlusterFS+BatchPool recipe
- Various fixes
2017-04-14 08:14:13 -07:00
Fred Park 0d974fa0aa Add additional SSH options
- Fix samba to auto-restart
2017-04-13 09:31:35 -07:00
Fred Park f61f91423e multi_instance_auto_complete -> auto_complete
- Resolves #61
2017-04-03 10:48:54 -07:00
Fred Park b426ce9c39 Add Samba NSG rules and stat 2017-03-30 19:48:17 -07:00
Fred Park 130401af75 Add samba support on storage cluster nodes 2017-03-30 15:03:11 -07:00
Fred Park db16e4cb7e Allow public IP to be disabled
- Fix fs cluster status --detail
- Expand non-retry on async ops to include all 400-level status codes
2017-03-28 20:49:09 -07:00
Fred Park b269ea7f06 Add multi-volume/server support 2017-03-16 15:18:29 -07:00
Fred Park 5325395522 Add glusterfs local mount option and NSG rule
- Add --hosts option for fs cluster status to print required hosts
  changes on the local machine to mount the remote fs
2017-03-14 22:07:51 -07:00
Fred Park ca2f9d73ab Add support for docker run uid/gid
- Resolves #54
2017-03-14 08:52:09 -07:00
Fred Park 89b722df54 Populate the fs config doc
- Update base README
- Rename disk_ids to disk_names in fs.json
2017-03-12 13:07:56 -07:00
Fred Park e0490cf0b4 Doc updates for global config
- Create empty doc holder for remote fs config settings
2017-03-11 16:32:27 -08:00
Fred Park cb7b42a231 Support glusterfs <-> pool autolinking
- Support glusterfs expand (additional disks)
- Provide `mount_options` for `file_server` which applies to local mount
on the file server of the disks
- Allow gluster volume name to be specified
- Provide stronger cross-checking between pool virtual network and
storage cluster virtual network
- Increase ud/fd in AS to maximums
- Install acl tools for nfsv4 and glusterfs
2017-03-11 15:23:55 -08:00
Fred Park 675c6c37f8 Glusterfs support for add/suspend/start
- Simple logging by default
- Fix logging format
2017-03-10 22:54:16 -08:00
Fred Park 3f47fda0b9 Checkpoint multi-vm glusterfs support
- Allow resource_group overrides in managed_disks and storage_cluster
- Add server_options to file_server
- Add named resource group support to disk deletion
- Fix Batch and ARM client issues in non-AAD mode
2017-03-10 15:10:31 -08:00
Fred Park 33291504c2 Support missing image tasks, pool check
- Break out configs into separate pages
- Update all configs using 16.04.0-LTS to 16.04-LTS
- Remove Batch `account` from recipe credentials
2017-03-09 15:07:37 -08:00
Fred Park e349a004cd Support pool <-> storage cluster auto-linkage
- Update to latest batch management client library supporting
  UserSubscription
- Begin breakout of config doc into multiple pages
2017-03-09 09:40:16 -08:00
Fred Park 5fcddad7ea Pool <-> storage cluster linkage checkpoint 2017-03-08 23:43:16 -08:00
Fred Park 91403de98f Add pool vnet spec
- Refactor vnet/subnet creation so pool creation can use it
- Allow read of fs.json for pool add
- Rename "glusterfs" volume_driver to "glusterfs_on_compute"
2017-03-08 20:23:05 -08:00
Fred Park 66d90dde90 Prep for add pool with vnet changes
- Centralize various client creation logic
2017-03-08 14:56:39 -08:00
Fred Park 8f7aee3a2f Support AAD auth for Batch accounts 2017-03-08 11:13:09 -08:00
Fred Park c118b7e2d9 Allow custom inbound network security rules 2017-03-08 09:52:21 -08:00
Fred Park 587ab7faa4 Fix suspend/start issues with software raid
- Disallow expand action with mdadm-based arrays on RAID-0
- Change "remotefs" to "fs" for commands
2017-03-08 09:52:21 -08:00
Fred Park 748cf64bfb Refactor and unify AAD settings across commands
- All KeyVault AAD endpoints to be specified
2017-03-08 09:52:21 -08:00
Fred Park f8e3fa52ed Add stat script
- Better organize some remotefs json settings
- Reduce redundant lookups in ssh path
- Create --output-config option to separate from --verbose
2017-03-08 09:52:21 -08:00
Fred Park 0b172eccce Add remotefs bootstrap script 2017-03-08 09:52:21 -08:00
Fred Park 94bde5b076 Add first version of cluster add and del commands
- Modify remotefs json for more properties
2017-03-08 09:52:21 -08:00
Fred Park cba7086511 Add disk add command
- Add first iteration of remotefs.json
- Modify set of TCP no tune VMs
2017-03-08 09:52:21 -08:00
Fred Park cd2cb4352a Scaffold base changes for remotefs 2017-03-08 09:52:21 -08:00
Fred Park 78fad1c3e3 Add support for task retention time
- Resolves #30
2017-01-31 09:40:16 -08:00
Fred Park 270ef0c7b1 Fix Docker tmpdir
- Fix typo with ev secret id ref to keyvault
- Add travis py36 env
2017-01-24 14:43:44 -08:00
Derrick Liu 5fabd07fef Add max_task_retry_count to job and task definitions (#23)
* Add `max_task_retry_count` to json template as reference

* Add job-level and task-level max_task_retry_count properties

If set, we create a `azure.batch.models.JobConstraints` or `azure.batch.models.TaskConstraints` object, and pass it into the call to `JobAddParameter` or `TaskAddParameter` as a constraints argument.

* Update configuration documentation to include `max_task_retry_count`

* Fixed various minor issues and linting

Squashed commit:

[d794908] No retry means retry_count is 0, not 1

[29de812] Forgot to define these earlier

[8336700] Don't check for empty since it's an int

[c59d52a] Fix flake8 linting line length (+2 squashed commit)

Squashed commit:

[8336700] Don't check for empty since it's an int

[c59d52a] Fix flake8 linting line length

* Rename `max_task_retry_count` to `max_task_retries` and fix other PR comments
2017-01-24 07:46:52 -08:00
Fred Park fa4e1f847c Add env var secret id support
- Tag for 2.5.0 release
- Resolves #12
- Partially resolves #15
2017-01-19 10:16:42 -08:00
Fred Park 9b6dbef19f Add task dependency id range support 2017-01-12 09:30:41 -08:00
Fred Park 348ceebc65 Add AAD X.509 cert auth support (#10)
- AAD/Keyvault credential support in credentials.json
2017-01-10 11:48:39 -08:00
Fred Park be04d89410 Update docs for KeyVault support (#10) 2017-01-06 08:03:26 -08:00
Fred Park 57b47b353f Add pool ssh command, resolves #9
- Make the ssh docker tunnel script much easier to use
- Add an ssh guide to docs
2016-12-15 07:39:04 -08:00
Fred Park 38ba61245d Add /dev/shm option, resolves #8 2016-12-14 08:36:51 -08:00
Fred Park c7744f95bf Support for internet accessible private registries 2016-11-19 09:00:01 -08:00
Fred Park 4f41d95e32 Finish settings refactor
- Change recipes to use current_dedicated for multi-instance count
2016-11-12 22:13:55 -08:00
Fred Park e700ee05b7 Add docker login prior to image update
- Move docker hub creds to credentials json
- Begin refactor of configuration settings retrieval
2016-11-11 09:30:14 -08:00
Fred Park da573524de Preliminary steps for ACR support
- Fix update docker images with private registry
- Automatically clean dangling image refs on update
- Remove private registry file/image id support
- Refactor fleet initialization steps to one entry point
- Simplify shipyard context init
2016-11-10 09:48:00 -08:00
Fred Park f8ac2ccc40 Add support for single node direct ingress
- Add missing support for relative_destination_path in single node
transfers
2016-11-09 15:38:19 -08:00
Fred Park 9db24df307 Add relative destination path
- Check for vm_count for glusterfs setup
2016-11-07 10:23:43 -08:00
Fred Park fa1024d191 Improve Python2/3 compatibility
- Add generated sas key expiry config option
2016-10-31 08:45:05 -07:00
Fred Park beeb118b19 Add generated_file_export_path option
- Add Dockerfile for cli
- Update docs for docker cli
- Update travis build to include tfm
2016-10-25 22:15:35 -07:00
Fred Park 705ae40065 Add support for pool resize up with GlusterFS
- Update azure-batch dependency to 1.1.0
2016-10-24 10:08:13 -07:00
Fred Park 92464b3b54 Add Azure Batch Task data ingress
- Rearrange Dockerfiles
- Update TensorFlow-Distributed recipe
- Rename CASCADE env vars to SHIPYARD
2016-10-20 21:18:31 -07:00
Fred Park 481d298e7c Add credential encryption guide 2016-10-20 10:48:25 -07:00
Fred Park bb515e3812 Add Data Movement guide 2016-10-17 13:33:12 -07:00
Fred Park bd7101df16 Rename generate_tunnel_script property
- Add Torch-CPU to quickstart
2016-10-15 17:54:16 -07:00
Fred Park dc1c8d46a3 Add include pattern support for gettaskallfiles 2016-10-15 14:33:12 -07:00
Fred Park 33300c551c Add pool/job/task-level data ingress support 2016-10-14 15:49:20 -07:00
Fred Park 8101a1407f Add arbitrary file split support 2016-10-12 15:23:27 -07:00
Fred Park 1bc8e60fe2 Add include/exclude filter support for source path 2016-10-12 09:30:06 -07:00
Fred Park 261984020e Add rsync transfer methods
- Refactor data transfer functions into single/multinode
- Expand docs for data ingress
2016-10-11 10:39:54 -07:00
Fred Park a4ec217f66 First stage in shipyard modularization
- Update configuration docs for new data ingress spec
2016-10-09 15:22:15 -07:00
Fred Park 487223e8fa Change pool config ssh_docker_tunnel to ssh 2016-10-06 11:03:10 -07:00
Fred Park 08204092be Add CentOS GlusterFS support
- Update recipes
2016-09-15 12:47:43 -07:00
Fred Park 646cff6631 Add TensorFlow-Distributed recipe
- Fix SSH user expiry within 1 day
- Fix some README/dockerfile typos
2016-09-13 11:43:28 -07:00
Fred Park e8d5e7a8a3 Automatically detect nvidia driver version
- Fix azure-storage dependencies for non-shipyard docker image setup
- Add no-install-recommends to apt-gets in node prep
2016-09-08 21:06:21 -07:00
Fred Park b4e6e90f1d Add FFmpeg GPU recipe
- Fix NV-series provisioning
- Fix up various READMEs
- Add maintained by tags in Dockerfiles
- Add missing config flag in jobs json
- Fix non-Docker shipyard azure-storage req
2016-09-08 11:52:21 -07:00
Fred Park f9dac5bd93 Add GPU documentation
- Fix node prep issues with GPU
- Correct node prep finished file location
- Add TensorFlow-GPU recipe
2016-09-02 09:39:35 -07:00
Fred Park a666d7d9f4 Add explicit inter node comm property for pool
- Allow inter node comm to work independently of p2p transfers
2016-08-31 22:20:18 -07:00
Fred Park 7f49641074 First part of the guide/docs
- Modify placement of some configuration settings
2016-08-31 15:35:33 -07:00
Fred Park dad22994bc Rename sample configs as config templates
- Add Changelog file
2016-08-31 09:38:33 -07:00