Timothee Guerin
90aceaefb5
merge master
2018-04-13 10:05:31 -07:00
Jacob Freck
7ef721f0c1
Feature: getting started script ( #475 )
...
* initial changes for getting started scripts
* add temp error handling
* rename file - fix typo
* add debug strings
* add handling for existing user
* WIP: wait for subprocess to complete to get exit code
* WIP: handle existing user and refactor code
* WIP: add missing return statements
* WIP: fix typo
* start sdk refactor
* mostly working create
* working happy create path
* handle errors for vnet, aad application
* make account setup interactive
* add prompt
* add docs
* rename account_setup_refac to account_setup
* add some logging
* pip install msrest, azure-cli-core, import issues
* remove in script pip, add shell wrapper program
* ellipsis to period
* update branch name for account_setup.sh
* docstring
* retry resource group creation
* fix typo, update retry
* explicitly set output location
* wget overwrite flag, docs update
* add prompt for multi tenants
* fix bug with batch account creation
* add spinner, print statements, fix formatting bug
* fix param bug
2018-04-11 13:27:55 -07:00
Jacob Freck
44a07654aa
Feature: spark debug tool ( #455 )
...
* start implementation of cluster debug utility
* update debug program
* update debug
* fix output directory structure
* cleanup output, add error checking
* sort imports
* start untar
* extract tar
* add debug.py to pylintc ignore, line too long
* crlf->lf
* add app logs
* call get_spark_app_logs, typos
* add docs
* remove debug.py from pylintrc ignore
* added debug.py back to pylint ignore
* change pylint ignore
* remove commented log
* update cluster_run
* refactor cluster_copy
* update debug, add spinner for run and copy
* make new sdk cluster_download endpoint
2018-04-09 15:02:43 -07:00
Jacob Freck
1eaa1b6e42
Feature: add internal flag to node commands ( #482 )
...
* add internal ssh flag
* add --internal flag to cluster get
* cluster run internal flag
* fix add command back
* cluster copy internal
* fix method params
* fix method params
* add debug statement
* fix params
* remove debug statement
* fixes
* add debug statement
* remove debug statement
* add hostname to /etc/hosts
* remove hostname from /etc/hosts
* add sdk docs for internal switch in cluster run and copy
2018-04-06 15:59:13 -07:00
Jacob Freck
4ef3dd09df
Bug: add spark.history.fs.logDirectory to required keys ( #456 )
...
* add spark.history.fs.logDirectory to requried keys
* add spark_event_log_enabled_key to required_keys
* docs, add history server config to spark-defaults.conf
* fix bad logic
* crlf->lf
2018-04-05 14:11:35 -07:00
Timothee Guerin
7f1e77ec15
Added types
2018-04-05 09:38:31 -07:00
Timothee Guerin
0f06d69397
Define plugin docs
2018-04-05 09:28:22 -07:00
Jacob Freck
82ad0296af
Bug: add gitattributes file ( #470 )
...
Bug: line endings, add gitattributes file
2018-04-04 13:44:26 -07:00
Jacob Freck
8aa1843f23
Feature: managed storage for clusters and jobs ( #443 )
...
* add in storage management for clusters, jobs
* add warning logs on cli delete
* whitespace
* add keep-logs flag
* add docs on storage lifetime
2018-03-20 10:45:49 -07:00
Dmitry Stratiychuk
4be5ac2f44
Fix job configuration option for `aztk spark job submit` command ( #435 )
...
`--job-conf` option mentioned in the docs wasn't working.
CLI help was showing that option is named `--configuration-c`
which seems to be a result of a missing comma in option definition.
2018-03-13 11:07:48 -07:00
Jacob Freck
17755e0ffa
Feature: Basic Cluster and Job Submission SDK Tests ( #344 )
...
* add initial cluster tests
* add cluster tests, add simple job submission test scenario
* sort imports
* fix job tests
* fix job tests
* remove pytest from travis build
* cluster per test, parallel pytest plugin
* delete cluster after tests, wait until deleted
* fix bugs
* catch right error, change cluster_id to base_cluster_id
* fix test name
* fixes
* move tests to intregration_tests dir
* update travis to run non-integration tests
* directory structure, decoupled job tests
* fix job tests, issue with submit_job
* fix bug
* add test docs
* add cluster and job delete to finally clause
2018-02-22 14:06:16 -08:00
Jacob Freck
6b1c86195d
Feature: SDK support for file-like configuration objects ( #373 )
...
* add support for filelike objects for conifguration files
* fix custom scripts
* remove os.pathlike
* merge error
2018-02-21 16:55:26 -08:00
Jacob Freck
85e444ce34
Feature: Cluster Run and Copy ( #304 )
...
* start implementation of cluster run
* fix cluster_run
* start debug sequential user add and delete
* parallelize user creation and deletion, start implementation of cluster scp
* continue cluster_scp implementation
* debug statements, disconnect error: permission denied
* untesteed parakimo implementation of clus_run
* continue debugging user creation bug
* fix bug with pool user creation, start concurrent implementation
* start fix of paramiko cluster_run and cluster_copy
* working paramiko cluster_run implementation, start cluster_scp
* fix cluster_scp command
* update requirements, rename cluster_run function
* remove unused shell functions
* parallelize run and scp, add container_name, create logs wrapper
* change scp to copy, clean up
* sort imports
* remove asyncssh from node requirements
* remove old import
* remove bad error handling
* make cluster user management methods private
* remove comment
* remove accidental commit
* fix merge, move delete to finally clause
* add docs
* formatting
2018-02-07 16:19:33 -08:00
Jacob Freck
1a15245eb4
Feature: spark init docker repo customization ( #358 )
...
* customize docker_repo based on init args
* whitespace
* add some docs
* r-base to r
* case insensitive r flag, typo fix
2018-02-07 10:56:20 -08:00
Jacob Freck
748a1269fa
Feature: Spark mixed mode support ( #350 )
...
* add support for aad creds for storage on node
* add mixed mode support
* add docs
* switch error order
* add dedicated to get_cluster
* remove mixed mode in print_cluster_conf
2018-02-07 10:41:51 -08:00
Emlyn Corrin
63fb81b2b6
Allow submitting jobs into a VNET ( #365 )
...
* Add subnet_id to job submission cluster config
* add some docs
2018-02-05 15:43:13 -08:00
Jacob Freck
f490f9643f
Feature: AAD tutorial docs ( #353 )
...
* aad tutorial docs
* typo, specify application type
2018-01-25 13:58:59 -08:00
Jacob Freck
1aae3b5a8b
Feature: expose exit_code ( #320 )
...
* expose the batch task exit_code in ApplicationLog
* add exit_code to job list-apps output
* add execution info to applicationlog object
* whitespace
2018-01-19 16:53:45 -05:00
Jacob Freck
882418a928
Feature: Spark job submission ( #278 )
...
* refactor submit, initial job-submission commit
* fix merge conflicts
* comment out gpu_enabled
* update schedule, user job manager task, wait until container ready
* wait until docker container running
* add get_job_log
* add job id
* start multitask job submission
* start multitask implementation
* wait for node setup to complete before completing start task
* fix incorrect logic
* add environment variable, fix loading task definition, fix waiting for container
* fix early job completion, autokill pool, cleanup code
* include debug print error
* define job submission sdk stubs
* Job and JobConfiguration models
* remove timestamp, implement job function stubs , add Application model
* add delete_job, bug fixes
* add wait_until_job_finished, wait_until_all_jobs_finished
* fix storage output logs location, function names
* whitespace
* add cli file stubs
* start job cli implementation
* better output for job cli, added job.yaml configuration file
* error catch for missing blob, add sdk docs, models updates
* start Jobs tutorial doc
* template for rest of job tutorial doc
* rename app_id to application_name, add job sdk docs
* docs fix
* add support for low pri nodes
* better formatting for job.yaml
* remove unecessary comments
* add defaults and commnets for job.yaml, fix spark_configuration file loading
* fix spark_configuration file paths
* fix app_arg issue
* clean up code, address comments
* rename to cluster_submit_helper
* update get job print
* add list-apps, update help text for job id
* add application metadata to job, update job get print
* better error for get_application_log, add working example to job.yaml, add print_application
* add warning about master selection
* update list_applications return value
* whitespace
* Add link to docs in job.yaml, validation for job.yaml
* convert to commandbuilder, remove print
* fix submit exit code, fix no app_args bug, set autoscale interval
* wait until custom scripts are completed
* add missed import in get_app_logs, whitespace
* use correct python version for on node app submit
* update get output format
2018-01-19 13:37:58 -05:00
Timothee Guerin
e193ed30dd
Feature: VNet support ( #324 )
...
* VNet support
* Azure active directory authentication
2018-01-17 11:14:56 -08:00
Jacob Freck
7e9bf9c383
Feature: Spark retry job ( #318 )
2018-01-08 12:31:47 -08:00
JS
6091b1d390
Docs: update ( #263 )
...
* Update README.md
streamline and update main readme.md
* Update README.md
* Update README.md
* Update 13-configuration.md
* Update 12-docker-image.md
* Update 12-docker-image.md
* Update README.md
* Create README.md
* Update README.md
* Update 10-clusters.md
2017-12-11 17:02:51 -08:00
Jacob Freck
40bd2d62f3
Bug: fix wrong path for global secrets ( #265 )
...
* fix wrong path for global secrets
* load spark_conf files correctly
* docker-image docs fix
* docker-image docs fix
* move load_aztk_spark_config function to config.py
2017-12-11 15:06:08 -05:00
JS
c12ecebad2
Update 60-gpu.md ( #253 )
...
* Update 60-gpu.md
make sure is available in region
* Update 60-gpu.md
2017-12-07 14:11:53 -08:00
Jacob Freck
6c26943819
Feature: update docker image doc ( #251 )
...
* update docker-image readme with new images
* update docs
2017-12-06 14:04:07 -08:00
Jacob Freck
8a060a2f78
Feature: Spark GPU ( #206 )
...
* conditionally install and use nvidia-docker
* status statements, and -y flag for install
* add example, remove unnecessary ppa
* rename custom script, remove print statement, update example
* add Dockerfile
* fix path in Dockerfile
* update Docker images to use service account
* updated docs, changed default docker repo for gpu skus
* make timing statements more verbose
* remove unnecessary script
* added gpu docs
* fix up docs and numba example
2017-12-04 13:28:05 -08:00
Jacob Freck
d74ceee3f5
Feature: Rename SDK ( #231 )
...
* initial refactor
* rename cli_fe to cli
* add docs for sdk client
* typo
* remove conflict
* fix zip node scripts bug, add sdk_example program
* start models docs
* add ClusterConfiguration docs, fix merge bug
* Application docs update
* added Application and SparkConfiguration docs
* whitespace
* rename cli.py and spark/cli
* add docstring for load_spark_client
2017-12-01 13:42:55 -08:00
Pablo Selem
cabcc29b3c
Feature: Azure Files ( #241 )
...
* initial take on installing azure files
* fix cluster.yaml parsing of files shares
* remove test code
* add docs for Azure Files
2017-11-30 14:16:53 -08:00
JS
b983d12419
update 10-clusters.md - rm jupyter ref (for now) ( #222 )
2017-11-27 09:37:15 -08:00
Ian McDonald
7c59567bec
Fix a typo in link ( #235 )
2017-11-27 08:56:54 -08:00
Matt Scanlon
e50cf8c52c
Spellchecking ( #233 )
...
aztk was misspelled as aztb, amended to correct spelling.
2017-11-27 08:35:02 -08:00
Matt Scanlon
ef0acbabbe
Amended documentation to correctly obtain logs ( #234 )
...
Documentation amended: Use of aztk spark app logs results in an error. Correct usage is aztk spark cluster app-logs. Document amended to reflect this.
2017-11-27 08:31:17 -08:00
Jacob Freck
60cae3b8dd
Feature: HDFS plugin ( #215 )
...
* hdfs plugin initial
* fix passwordless ssh key, allow raw ip for datanodes
* remoe debug statement
* description of forwarded ports
* add prcryptodome to requirements.txt
* fix file copy bug
* add namenode ui to ssh command, add docs
2017-11-22 14:51:11 -08:00
JS
65a56dd657
Feature/python container ( #210 )
...
* added python container, jupyter install script, vanilla container
* Update README.md
* Create README.md
* Create README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update jupyter.sh
* Update README.md
* Update README.md
* python
* readme update
* docker updates
* Update README.md
* Update 12-docker-image.md
* Update constants.py
* add image files for wiki
* update imageS
* .
* dockerfile typo
* dockerfiles
* Removed r
* readme
* update constants.py
* update readme
* readme updates
* readme updates
2017-11-09 00:43:32 -08:00
Pablo Selem
dbc1adddcd
Feature: Azure Data Lake Store support ( #170 )
...
* initial adls work
* initial native support for ADLS connector
* add core-site.xml and azure storage/adls jars to project
* enable custom jars and core-site.xml changes for custom storage connectors
* remove unused ADL crendentials from environment
* documentation feedback updates
* documentation feedback
* PR feedback
2017-10-19 09:16:14 -07:00
Jacob Freck
9e84949980
Feature: global flag for init ( #149 )
...
* add-user accept ssh-key path and prompt for password
* added global flag for init
* global flag only initializes home environment
* remove unnecessary print statement
* add docs
* fix global path and change constant names
* add help message for global flag
* whitespace
2017-10-18 10:23:30 -07:00
Jacob Freck
8cfb7b2967
Bug: replace dash with underscore in docs ( #165 )
2017-10-16 15:42:02 -07:00
JS
f15d18fd5d
Update docs ( #148 )
...
* Update 11-custom-scripts.md
* Update 11-custom-scripts.md
* Update 12-docker-image.md
* Update 13-configuration.md
* Update 20-spark-submit.md
2017-10-05 10:13:26 -07:00
Jacob Freck
dd5a5c938a
Feature: ssh directly into container ( #142 )
...
* add-user accept ssh-key path and prompt for password
* change ssh command to connect directly to container
* add --host flag for ssh
* update docs
* cleaner checking of config options
2017-10-04 15:51:22 -07:00
Timothee Guerin
f51f613fe7
Enable docker authentication for private images ( #92 )
...
* Docker login working
* fix docker image ignored
* Fix job need to wait for pool to be created
* Fix
* fix
* Fix
* Fix
* Update docs
* Fix cr
* Fix merge issue
2017-10-03 13:30:12 -07:00
Jacob Freck
37c5d873a0
Feature: Support for multiple custom scripts and master only scrips ( #93 )
...
* added IS_MASTER environment variable, moved custom script execution to python
* allow for custom_scripts to check if executing on spark master
* allow for multiple custom scripts
* updated docs, fixed overwrite bug, more precise typing
* added support for script ordering, consolidated file uploading
* rename custom script on upload to prevent conflicts
* removed unnecessary import
* added environment variable IS_MASTER back
* white space
* updated docs
* updated docs
* fix pylint errors
* undo previous commit error
* clearer docs
* removed cli custom script functionality, fixed no custom scirpts bug
* fixed pylint errors
* changed location to runOn, removed unused parameters, add error checking
* removed unused parameter
* added information about storage account
* remove unused functions to upload scripts
* chagned dtde to aztk
2017-10-02 13:20:11 -07:00
JS
f158ceafc5
Docs/update ( #128 )
...
* Update README.md
* Update 10-clusters.md
* Update README.md
* Update README.md
* Update README.md
2017-09-30 05:02:25 -04:00
JS
57cfa9dd1d
Docs/update ( #106 )
...
* Updates to README.md
* typos
* Update 00-getting-started.md
* Update README.md
* Update 30-cloud-storage.md
* faq in readme
* Update 13-configuration.md
* switch to use underscore for consistency
* Update 10-clusters.md
* Update README.md
* Update README.md
* fixes
* fix broken link
* Update 20-spark-submit.md
* update to aztk on README
* aztk fix
* Update 00-getting-started.md
* Update 10-clusters.md
* Update 12-docker-image.md
* Update 13-configuration.md
* Update 20-spark-submit.md
* Update 30-cloud-storage.md
* Update README.md
* Update 10-clusters.md
* vm size link
* dummy commit1
* dummy commit 2
* Update 00-getting-started.md
* Update 13-configuration.md
2017-09-29 22:23:14 -04:00
Jacob Freck
c2172b5f64
Bug: add spark conf defaults ( #115 )
...
* added default spark conf files to config/
* refactored loading and cleanup of spark conf files
* curated spark conf files and added docs
2017-09-29 13:03:11 -07:00
Jacob Freck
5f12ff66dc
Feature: prompt for password ( #116 )
...
* prompt for password on cluster create, add secrets validation, add custom ssh priv key
* whitespace
* fixed error
* remove ssh_priv_key from secrets template, add docs
* updated error message, fixed typo
* fixed innacurate message
2017-09-29 11:16:54 -07:00
Jacob Freck
859c6fa6c8
Feature: ssh configuration file support ( #77 )
...
* added ssh.yaml configuration file
* added jupyter support, better format for ssh.yaml instructions
* refactored read_conf_file method
* rename master ui to web ui
* fixed typo in comments
* added documentation for ssh.yaml
* refactored merge and _merge_dict methods
* changed default ssh experience to require --id cli parameter
* renamed conflicting documentation
* improved docs, changed default port forwarding to standard ports
* fix typo
2017-09-21 17:45:57 -07:00
Jacob Freck
4ae20caa14
Feature: added azb spark init command ( #79 )
...
* added command azb spark init
* azb spark init no longer overwrites existing files
* added documentation for azb spark init
* init renames secrets template file automatically
* updated docs to reflect changed workflow
* fixed bug where file rename would fail if file exists
2017-09-20 15:36:01 -07:00
Jacob Freck
611cc584cf
Feature: config files ( #65 )
...
* moved secrets config template to config directory
* updated new default file location for secrets.cfg
* added ConfigObject class
* added template for cluster config file
* updated .gitignore to include cluster.yaml
* updated cluster.yaml.template to include more fields
* added read_config_file method to ConfigObject, modified cluster_create.py to use cluster.yaml configuration by default
* added pyyaml to requirements.txt
* added documentation for cluster.yaml config file
* added constant for default cluster.yaml location
* changed secrets.cfg to secrets.yaml, added SecretsConfig object
* removed secrets.cfg.template
* improved error messages for bad secrets.yaml config
* changed config/ directory to .thunderbolt/
* remove old unused code
* removed unncessary ignored file
* renamed configuration class to ClusterConfig and added merge method with error checking
* collapsed cluster.yaml file
* updated cluster.yaml.template to new collapsed format
* better descriptions of cluster config settings
* added error checking to cluster config settings
* added back accidentally removed code
* removed template for cluster.yaml, added cluster.yaml
* renamed .thunderbolt/ to config/, added defaults for cluster.yaml
* fixed typo in secrets.yaml.template
* added .thunderbolt/ directory to .gitignore
* fixed bug when adding user with ssh_key
* code cleanup
* updated docs
* removed redundant docs, added instructions to secrets.yaml.template
* fixed typo
* refactored read_config_file method and changed default cluster id
* removed unnecessary ignored file
* changed format for cluster.yaml instructions
* added back docker-repo support
* refactored merge and merge_dict methods
2017-09-19 14:08:26 -07:00
Jacob Freck
43dd8e57f1
Bug: Move commands under app to cluster ( #82 )
...
* moved azb spark app commands to azb spark cluster
* fixed docs to reflect new cli structure
* renamed commands for clarity
2017-09-19 08:29:23 -07:00
JS
03baff6a26
Feature: custom docker images ( #68 )
...
* symlink /home/spark-version to /home/spark-current
* working checkpoint
* working checkpoint
* docker readme update
* removed deprecated dockerfile
* docker cli + docs for docker
* docs update
* docs update
* remove unneeded comments
* docker version and docs updates, misc fixes
2017-09-15 20:57:16 -07:00