Граф коммитов

251 Коммитов

Автор SHA1 Сообщение Дата
Fred Park a3dfa8c35f
Ensure nvidia driver is avail through upgrades
- Resolves #174
2018-03-28 10:14:12 -07:00
Fred Park f1c27c366e
Update to Docker CE 18.03.0 for Ubuntu/CentOS 2018-03-26 09:26:38 -07:00
Fred Park 287c86ce0c
Fix NFS exports multi-target parsing 2018-03-26 09:25:06 -07:00
Fred Park 330c193422
Improve prep scripts
- Add timestamps for logging
- Add more Docker and nvidia details
- Save prior startup logs
- Update dependencies
2018-03-22 10:45:28 -07:00
Fred Park 2b3ecac70b
Update Singularity to 2.4.4 2018-03-19 14:07:12 -07:00
Fred Park 088f2d5e34
Add support for arbitrary exports config for NFS
- Add user agent for ARM clients
- Update README
2018-03-13 15:03:31 -07:00
Fred Park 78b3342666
Update Docker CE to 17.12.1 2018-03-05 13:50:11 -08:00
Fred Park f0c9656ca2
Fix nvidia-docker overwriting daemon.json
- Update packer scripts
2018-02-28 15:09:22 -08:00
Fred Park a195d7e242
Update dependencies 2018-02-28 15:08:33 -08:00
Fred Park 8664c6cfa6
Add additional TLS modes in powershell
- Resolves #171
2018-02-28 09:54:31 -08:00
Fred Park 370da96ed5
Do not automatically mount added fstab entries
- Disable Docker service install in non-native mode and always manually
  start
2018-02-28 09:54:30 -08:00
Fred Park cb04700f08
Improve node prep scripts
- Migrate to daemon.json files
- Fix missing blobfuse mount in native mode
- Ensure docker check happens every boot
2018-02-23 12:25:08 -08:00
Fred Park 53fcd2c313
Fix interaction between custom image and native
- Enable/start docker service on custom image in native mode if not
  found
2018-02-22 18:51:33 -08:00
Fred Park f3ab5ef489
Fix image update/list
- Fix issue with pure docker pools
- Fix updates/list over SSH for older distros requiring pseudo-tty
2018-02-16 09:18:41 -08:00
Fred Park b4e6e4320d
Various updates
- Fix image update to work in multi-instance mode with registry logins
- Allow CentOS 7.3 provisioning to continue to work
- Allow CentOS-HPC 7.1 provisioning
- Add CentOS 7.4 support
- Add Debian 9 support
- Update dependencies
2018-02-16 09:18:34 -08:00
Fred Park ae20b27643
Add Custom Linux Mount support
- Add pre/post support for additional node prep commands
2018-02-12 11:20:34 -08:00
Fred Park b3b98162c6
Add support for native mode image update
- Fix custom image + native pool startup for Linux
- pool images update over SSH fix
2018-02-08 14:25:28 -08:00
Fred Park b40a812d3d
Tag for 3.1.0 release 2018-01-30 13:06:10 -08:00
Fred Park b59d42022c
Upgrade nvidia-docker to nvidia-docker2 2018-01-29 12:42:00 -08:00
Fred Park 4d5b704905
Add support for Azure blob container mounts
- Support via blobfuse
- Resolves #159
2018-01-23 14:29:38 -08:00
Fred Park 20a2324eb7
Update Docker CE and blobxfer 2018-01-22 16:39:31 -08:00
Fred Park 5853b4787f
Update Docker images
- Update to alpine 3.7
- Update Windows images to Python 3.6.4
- Update libtorrent image
- Update to Singularity 2.4.2
2018-01-22 14:14:39 -08:00
Fred Park a731ecc5f5 Support more than 16 disks per fileserver 2017-11-17 09:12:42 -08:00
Fred Park 954275696c Ensure persistence daemon/mode 2017-11-13 09:25:25 -08:00
Fred Park 90283298e6 Update dependencies 2017-11-10 09:23:15 -08:00
Fred Park 2e8b43df55 Add Azure File mount support for Windows pools
- Fix coordination command of None issue
2017-11-05 10:38:22 -08:00
Fred Park 3e94831dc3 Build cargo Docker image for Windows 2017-11-03 16:20:50 -07:00
Fred Park 0411de5828 Windows task execution support
- Blobxfer on windows support
- Disable all native updateimages/udi
2017-11-03 16:20:27 -07:00
Fred Park c9443fc91a Initial Windows Server Container support
- pool updateimages command supporting singularity images
- fix aad mfa token cache on python2
2017-11-03 16:20:10 -07:00
Fred Park 5b2af24f00 Retry image configuration errors
- Add TensorFlow-GPU Singularity recipe
2017-10-29 20:29:34 -07:00
Fred Park edd602aed4 Add HPCG Singularity recipe 2017-10-29 09:38:46 -07:00
Fred Park 6da607c9b9 Multi-instance/IB support for Singularity tasks
- Make cascade work in Docker container
2017-10-22 13:59:35 -07:00
Fred Park 48172e115e Add initial Singularity task support
- Auto-GPU
- Fix ownership issues with Singularity image pre-load
2017-10-20 23:10:12 -07:00
Fred Park db64027ef4 Add ARM Image creation steps to custom image guide 2017-10-20 08:02:37 -07:00
Fred Park 4e5d5abf6b Add Singularity support into cascade
- Remove singularity suport in native container support pools as it's
  impossible to execute a singularity container in this mode
2017-10-17 18:51:27 -07:00
Fred Park be2eb9dea7 Node prep for Singularity on Ubuntu/CentOS 2017-10-17 12:37:18 -07:00
Fred Park 607bfd252e Migrate to storage split library
- Remove queue deletion code
- Resolves #133
2017-10-05 21:40:50 -07:00
Fred Park 298b00d946 Mount Azure file shares to host (#123)
- Allow multiple file shares per pool
- Move root mount point for all shared data volumes
2017-10-04 17:59:30 -07:00
Fred Park 796a5e33b4 Combine rjm/tfm to cargo (#125) 2017-10-03 18:24:50 -07:00
Fred Park a7f874f6e8 Docker image naming changes (#130) 2017-10-03 18:24:50 -07:00
Fred Park 6315be3a6b Transition to blobxfer 1.x command structure
- Data ingress/egress changes
- Task factory file changes
- Resolves #47
2017-10-03 18:24:49 -07:00
Fred Park e783744e00 Container registry logic overhaul
- Remove private registry back to Azure storage blob support (#44)
- Require fully qualified Docker image names (#106)
- Support multiple public/private registries on a single pool (#127)
2017-10-03 18:24:42 -07:00
Fred Park dcddb04150 Fixes and log more info in start task 2017-10-03 10:05:17 -07:00
Fred Park 0a9689f5a8 Native container support
- Allow pool conversion with native flag
2017-10-03 10:03:20 -07:00
Fred Park 1784e06eb4 Add nvidia docker volume inspection check 2017-09-27 09:26:34 -07:00
Fred Park c13793dd57 Support version in platform image
- Override UbuntuServer 16.04-LTS latest to prior version due to
  linux-azure kernel issues
2017-09-22 19:41:33 -07:00
Fred Park 01c2f89ba5 Handle package db conflicts
- TensorFlow recipe typos
2017-09-22 14:59:59 -07:00
Fred Park 260e1609ee Fix various OS, Docker and nvidia issues
- Update Docker CE versions for Ubuntu and CentOS
- Update NC driver
- Add special nvidia install path for CentOS 7.3 during 7.4 rollout
2017-09-21 22:13:33 -07:00
Fred Park 7e1c4c7e75 NV-series driver updates
- Resolves #119
2017-09-13 09:19:06 -07:00
Fred Park 9602608871 Tag for 2.9.4 release 2017-09-12 08:56:03 -07:00
Fred Park 767f59992b Missing join_by function in blobxfer helper
- Resolves #115
2017-08-31 10:28:22 -07:00
Fred Park 9e3308ff2b Fix issues in RemoteFS
- Public ip not being assigned for resize command
- Need to wait for block device to show up for attached disks (expand
  command)
- Add more logging
2017-08-16 08:14:08 -07:00
Fred Park e434b83cb3 Fix truncated P50 provisioning
- Support "s_v3" suffixed premium VM SKUs
2017-08-15 13:26:31 -07:00
Fred Park be530e63c0 Job recurrence support 2017-08-07 19:42:09 -07:00
Fred Park c5fa85adcb Add file task factory (#93)
- Split out task factory settings into separate file
- Change uniform to be a, b instead of min, max
- Update blobxfer script for single target ingress to place file
  directly to destination
2017-08-04 11:02:33 -07:00
Fred Park 5291ff1130 Move to blob leasing for download ticketing
- Greatly increase resource file SAS expiry timedelta
- Make concurrent_source_downloads generic, remove non-p2p option
- Update Dockerfiles
- Update to latest azure-storage
2017-07-21 11:10:03 -07:00
Fred Park 8eb2197d23 Allow CentOS 7.3 on NC/NV 2017-07-06 11:12:05 -07:00
Fred Park 2a48885da1 More improvements for scale out robustness
- Add --all-start-task-failed to delnode
- Reduce node output on pool allocation wait with number of nodes > 10
2017-06-30 23:50:21 -07:00
Fred Park 06188c1944 Tag for 2.8.0rc2 release
- Fix regression with private docker image pulls
- Resolves #103
- Resolves #105
2017-06-30 11:45:26 -07:00
Fred Park dca5473504 Improve robustness of package downloads 2017-06-27 07:14:31 -07:00
Fred Park e53a5bb88d Fix pathing for detecting docker graph location 2017-06-09 11:33:41 -07:00
Fred Park a41713c5ee Add custom image guide
- Update recipes for vm_configuration
- Fix some issues with platform pools with new changes
2017-06-06 12:41:42 -07:00
Fred Park 8397b411c5 Initial custom image support 2017-06-06 08:43:33 -07:00
Fred Park 3a5fb452d5 Various fixes
- Add poolid param for pool del
- Fix vm_count deprecation check on fs actions
- Improve robustness of package index updates
- Prompt for jobs cmi action
- Update to latest dependencies
2017-05-24 09:54:09 -07:00
Fred Park 78fa235e94 Substitute include for remoteresource if possible
- Resolves #88
2017-05-17 12:56:56 -07:00
Fred Park a17d6b64c9 Update dependencies
- Fix breaking changes in keyvault library
- Fix inverted order for fs cluster ssh and optional command
2017-05-11 09:21:20 -07:00
Fred Park c4b740bb85 Remove xserver-xorg-dev from NC path 2017-05-08 12:29:37 -07:00
Fred Park 983a7eed45 Node prep script improvements
- Blacklist nouveau universally on GPU VMs
- Change URL retrieval to requests
- Update requirements to latest
2017-05-05 08:42:19 -07:00
Fred Park 77e5fecc84 Add some additional checks in nodeprep 2017-04-28 10:55:06 -07:00
Fred Park 0161169daa Remove Windows checks for scp/ssh/openssl
- Update docs
- Remove unused vars in nodeprep script
2017-04-18 13:51:05 -07:00
Fred Park 77db9dbd82 Fix quotes in parameters
- Disallow newline character in smb password
2017-04-14 22:39:27 -07:00
Fred Park 7b99cf0b85 Modify glusterfs race fix with iptables
- Restrict smb account password from containing certain characters due
  to echo reinterpret issues
- Fix some more ssh/pathlib issues
2017-04-14 14:56:35 -07:00
Fred Park 469e5cb56f Tag for 2.6.0rc1 release
- Fix Docker setup issues
- Pin Docker release version
- Update NC nvidia driver
2017-04-14 12:54:43 -07:00
Fred Park 0d974fa0aa Add additional SSH options
- Fix samba to auto-restart
2017-04-13 09:31:35 -07:00
Fred Park 6074985ded Fix possible race condition in gluster/disk setup 2017-04-05 11:30:00 -07:00
Fred Park 3ded07634e Fix some Python2 issues in remotefs
- Properly map the gluster volume mountpath and not the brick for SMB
2017-03-31 13:21:22 -07:00
Fred Park b426ce9c39 Add Samba NSG rules and stat 2017-03-30 19:48:17 -07:00
Fred Park 130401af75 Add samba support on storage cluster nodes 2017-03-30 15:03:11 -07:00
Fred Park f952605e58 Fix server options arg parsing 2017-03-17 19:32:44 -07:00
Fred Park 4c34b23761 Force enable ssd optimizations for btrfs+premium 2017-03-17 11:55:07 -07:00
Fred Park b269ea7f06 Add multi-volume/server support 2017-03-16 15:18:29 -07:00
Fred Park 38ac358d9d Add pre-existing checks
- Switch to hostname peering in add brick (resize)
- Update docs regarding max_tasks_per_node
2017-03-16 08:58:53 -07:00
Fred Park a06377a93e Don't mount at boot glusterfs volumes on the host
- Use systemd automount instead due to race between server up (self) and
  mount
2017-03-15 12:53:38 -07:00
Fred Park 5325395522 Add glusterfs local mount option and NSG rule
- Add --hosts option for fs cluster status to print required hosts
  changes on the local machine to mount the remote fs
2017-03-14 22:07:51 -07:00
Fred Park ca2f9d73ab Add support for docker run uid/gid
- Resolves #54
2017-03-14 08:52:09 -07:00
Fred Park d9966e645d Experimental support for gluster cluster resize
- Add more robustness to gluster provisioning
- Stat script fixes
- Add --detail option for stat
- More disks del/list options
2017-03-12 11:18:41 -07:00
Fred Park cb7b42a231 Support glusterfs <-> pool autolinking
- Support glusterfs expand (additional disks)
- Provide `mount_options` for `file_server` which applies to local mount
on the file server of the disks
- Allow gluster volume name to be specified
- Provide stronger cross-checking between pool virtual network and
storage cluster virtual network
- Increase ud/fd in AS to maximums
- Install acl tools for nfsv4 and glusterfs
2017-03-11 15:23:55 -08:00
Fred Park 0ed28d96fc Allow scp dm and credential encryption on Windows
- Rename old glusterfs scripts to be less confusing with remote
  glusterfs support
2017-03-11 09:21:33 -08:00
Fred Park 675c6c37f8 Glusterfs support for add/suspend/start
- Simple logging by default
- Fix logging format
2017-03-10 22:54:16 -08:00
Fred Park 3f47fda0b9 Checkpoint multi-vm glusterfs support
- Allow resource_group overrides in managed_disks and storage_cluster
- Add server_options to file_server
- Add named resource group support to disk deletion
- Fix Batch and ARM client issues in non-AAD mode
2017-03-10 15:10:31 -08:00
Fred Park 33291504c2 Support missing image tasks, pool check
- Break out configs into separate pages
- Update all configs using 16.04.0-LTS to 16.04-LTS
- Remove Batch `account` from recipe credentials
2017-03-09 15:07:37 -08:00
Fred Park e349a004cd Support pool <-> storage cluster auto-linkage
- Update to latest batch management client library supporting
  UserSubscription
- Begin breakout of config doc into multiple pages
2017-03-09 09:40:16 -08:00
Fred Park 5fcddad7ea Pool <-> storage cluster linkage checkpoint 2017-03-08 23:43:16 -08:00
Fred Park 587ab7faa4 Fix suspend/start issues with software raid
- Disallow expand action with mdadm-based arrays on RAID-0
- Change "remotefs" to "fs" for commands
2017-03-08 09:52:21 -08:00
Fred Park a6a672a82e Begin expand functionality
- Fix issues with ext4 + mdadm
2017-03-08 09:52:21 -08:00
Fred Park f8e3fa52ed Add stat script
- Better organize some remotefs json settings
- Reduce redundant lookups in ssh path
- Create --output-config option to separate from --verbose
2017-03-08 09:52:21 -08:00
Fred Park 0b172eccce Add remotefs bootstrap script 2017-03-08 09:52:21 -08:00
Fred Park 6ce173ca05 Pin blobxfer version and add termtasks option
- Clarify docs for usage scope between tooling/APIs
2017-02-28 09:45:24 -08:00
Fred Park 4ce2689aca Install all intel mpi rpms on SLES-HPC 2017-01-26 10:44:43 -08:00
Fred Park 270ef0c7b1 Fix Docker tmpdir
- Fix typo with ev secret id ref to keyvault
- Add travis py36 env
2017-01-24 14:43:44 -08:00
Fred Park 8f0fa2f446 Tag for 2.1.0 release
- Pass version to nodeprep and pull backend docker images by version
2016-11-30 08:27:46 -08:00
Fred Park 6548dc3508 Fix cascade run exit code not propagating 2016-11-28 14:15:53 -08:00
Fred Park 453ae98a65 Terminate cascade on thread failures 2016-11-19 10:39:58 -08:00
Fred Park c7744f95bf Support for internet accessible private registries 2016-11-19 09:00:01 -08:00
Fred Park da573524de Preliminary steps for ACR support
- Fix update docker images with private registry
- Automatically clean dangling image refs on update
- Remove private registry file/image id support
- Refactor fleet initialization steps to one entry point
- Simplify shipyard context init
2016-11-10 09:48:00 -08:00
Fred Park 3559458d46 Update data movement doc 2016-11-04 08:12:55 -07:00
Fred Park efb8c3105f Add wait option for pool resize
- Fix TMPDIR sed command
- Add generated shipyard script to gitignore
2016-10-30 01:44:57 -07:00
Fred Park eb2f108e86 Add TMPDIR redirect
- Fix Debian Jessie docker opts not loading
2016-10-29 23:27:23 -07:00
Fred Park 0a702d1f8b Prep for multi image Batch-Shipyard docker repo 2016-10-25 15:02:28 -07:00
Fred Park 180627a229 Default SLES docker install to module 2016-10-24 12:51:25 -07:00
Fred Park 705ae40065 Add support for pool resize up with GlusterFS
- Update azure-batch dependency to 1.1.0
2016-10-24 10:08:13 -07:00
Fred Park 9a13dbe83b Update CNTK to 1.7.2 and recipes
- Fix python2+Windows file encoding issue
- Add deljobswait action
2016-10-22 22:43:48 -07:00
Fred Park 92464b3b54 Add Azure Batch Task data ingress
- Rearrange Dockerfiles
- Update TensorFlow-Distributed recipe
- Rename CASCADE env vars to SHIPYARD
2016-10-20 21:18:31 -07:00
Fred Park 74d3eea339 Add Encrypted Credential support 2016-10-19 21:14:53 -07:00
Fred Park 1436cd4378 Add compute node to Azure storage egress support 2016-10-16 16:57:48 -07:00
Fred Park 33300c551c Add pool/job/task-level data ingress support 2016-10-14 15:49:20 -07:00
Fred Park 4ce2f1d6c2 Add HPN-SSH support for Ubuntu
- Fix some issues with azure file setup and Windows
- Add some validation with container naming
- Clean up storage with delpool action
- Update .gitignore
2016-10-13 10:55:49 -07:00
Fred Park c9648c5cfd Fix GlusterFS mount ownership/permissions 2016-10-06 09:29:53 -07:00
Fred Park b7a5335874 Add preliminary SUSE SLES-HPC support for IB 2016-09-30 22:00:16 -07:00
Fred Park 88862ad57d Fix GlusterFS setup on Ubuntu
- OpenFOAM default swap from v1606+ to 4.0
2016-09-29 19:15:38 -07:00
Fred Park be36e2face Add OpenFOAM-Infiniband-IntelMPI recipe
- Add real NAMD-Infiniband-IntelMPI image
- GlusterFS mountpoint now inside AZ_BATCH_NODE_SHARED_DIR
2016-09-28 21:03:36 -07:00
Fred Park 6aea6782ce Update Azure File DVD to 0.5.1
- Update quickstart to accommodate choice
- Change STANDARD_F1 to STANDARD_D1_V2 for some recipes
2016-09-21 09:23:00 -07:00
Fred Park 1188a1e885 Fix shipyard container detach/cleanup
- Add @FIRSTRUNNING task id for streamfile/gettaskfile
2016-09-18 17:01:42 -07:00
Fred Park 9d12dab8be Fix cascade start issue without private registry
- Add GlusterFS support for ubuntu and opensuse/sles
- Add --filespec and --verbose parameters
2016-09-16 11:46:59 -07:00
Fred Park 08204092be Add CentOS GlusterFS support
- Update recipes
2016-09-15 12:47:43 -07:00
Fred Park e8d5e7a8a3 Automatically detect nvidia driver version
- Fix azure-storage dependencies for non-shipyard docker image setup
- Add no-install-recommends to apt-gets in node prep
2016-09-08 21:06:21 -07:00
Fred Park b4e6e90f1d Add FFmpeg GPU recipe
- Fix NV-series provisioning
- Fix up various READMEs
- Add maintained by tags in Dockerfiles
- Add missing config flag in jobs json
- Fix non-Docker shipyard azure-storage req
2016-09-08 11:52:21 -07:00
Fred Park 028768a61a Add CNTK recipes
- Add TCP optimization
- Fix job autocompletion
- Update azure-storage requirement to 0.33.0
2016-09-07 21:40:57 -07:00
Fred Park e52e30cf0c Fix docker images issue with non-p2p transfer
- Fix node prep cascade timing issues
- Update various READMEs
2016-09-02 15:02:07 -07:00
Fred Park f9dac5bd93 Add GPU documentation
- Fix node prep issues with GPU
- Correct node prep finished file location
- Add TensorFlow-GPU recipe
2016-09-02 09:39:35 -07:00
Fred Park ada1feb00a Add GPU support 2016-09-02 01:42:54 -07:00
Fred Park f551e07da3 Finalize repo for batch shipyard docker image 2016-08-31 09:19:22 -07:00
Fred Park c5368207cd Add more features to pool/tasks
- Add infiniband support
- Add max tasks per node
- Properly handle multi-instance tasks with docker run <-> exec
- Add docker multi-instance cleanup helper
2016-08-31 02:21:50 -07:00
Fred Park 351800344d Add multi-instance support for tasks 2016-08-30 15:31:23 -07:00
Fred Park 0d48919afa Azure file dvd support for all supported hosts
- Add default init scripts to avoid surprise config changes
- Add block flag to pool for image ready
2016-08-29 12:12:50 -07:00
Fred Park 35fb3f588b Add support for more host OSes
- Ubuntu 14.04, Debian 8, CentOS 7.x, RHEL 7.x, OpenSUSE 13.2/42.1,
  SLES 12/12-sp1
- Improve graphing
- Prevent metadata clear on existing pool
2016-08-28 19:43:53 -07:00
Fred Park e9ce32e69f Fix shipyard container issues
- Downgrade libtorrent to 1.0.9 due to DHT issues
- Add task dependency support
- Add CONTRIBUTING.md, requirements.txt and .travis.yml
2016-08-26 23:09:10 -07:00
Fred Park aa4add34d5 Support shipyard as docker container
- Add Dockerfile
- Add command file for container
2016-08-26 15:34:30 -07:00
Fred Park bc30023330 Scale fixes
- Add timing recording toggle
2016-08-25 13:07:57 -07:00
Fred Park 359dfdad85 Use proper logging in shipyard and cascade
- Fix shipyard.py for Python2.7 compatibility
- Allow option to reboot nodes that go into start task failed state
- Correctly pin gr-done events
- Reduce chattiness of torrent info dumps
2016-08-24 15:36:23 -07:00
Fred Park ee242c21d6 Split configuration files further
- Allow docker public hub passthrough in private context
- Pass config through in more places in shipyard
2016-08-23 14:50:17 -07:00
Fred Park 6fb2d7e221 Add azurefile docker volume driver support
- Add auto packaging of registry:2 docker image if not present
2016-08-19 15:13:33 -07:00
Fred Park f1a666db41 Add redirect for torrents to blob
- Always install private registry to every node
- Fix uncompressed torrenting
2016-08-18 15:17:04 -07:00
Fred Park e4f58c6ff4 Allow non-p2p and no dpr modes
- Add jp block script
- Fix some data graph bugs
- Begin fixing non-reproducible .tar.gz docker save images
2016-08-17 08:30:53 -07:00
Fred Park d4c44f811c Add more configurable options for P2P mode
- Add WIP of graph.py
2016-08-12 15:31:05 -07:00
Fred Park b186ae3099 Disable DHT connection tracking 2016-08-12 09:30:34 -07:00
Fred Park d06d3649a0 Configuration changes
- Move more private registry settings from hardcode to config
- Split config file into two
2016-08-12 09:21:33 -07:00
Fred Park 0271e54b1a Upload perf timings to table 2016-08-09 15:23:15 -07:00
Fred Park 83a422c64c Updates for single session/DHT fixes 2016-08-08 09:38:12 -07:00
Fred Park 553380df8a Break out private registry setup from cascade
- Fix torrent start, queue msg get and torrentinfo table merge
2016-07-20 13:12:20 -07:00