Dmitry Stratiychuk
4be5ac2f44
Fix job configuration option for `aztk spark job submit` command ( #435 )
...
`--job-conf` option mentioned in the docs wasn't working.
CLI help was showing that option is named `--configuration-c`
which seems to be a result of a missing comma in option definition.
2018-03-13 11:07:48 -07:00
Jacob Freck
1c31335fb0
Bug: filter job submission clusters out of cluster list ( #409 )
2018-03-09 14:42:06 -08:00
Jacob Freck
f1e3f7a0f8
Bug: remove unnecessary example ( #417 )
2018-03-08 12:35:29 -08:00
Timothee Guerin
2bed496c39
Internal: Cluster data helpers and upload_node_script into cluster_data module ( #401 )
2018-03-08 10:34:19 -08:00
Timothee Guerin
c237501a9f
Feature: Pypi auto deployement ( #428 )
2018-03-05 17:18:47 -08:00
Jacob Freck
216f63dd64
Bug: add plugins to cluster_install_cmd call ( #423 )
2018-02-27 14:45:32 -08:00
Emlyn Corrin
6827181a34
Fix typo load_aztk_screts -> load_aztk_secrets ( #421 )
2018-02-27 04:34:46 -08:00
Timothee Guerin
c724d9403f
Feature: Plugins ( #387 )
2018-02-26 16:36:31 -08:00
Jacob Freck
b833561c7e
Release: v0.6.0 ( #416 )
...
* update changelog and version
* underscores to stars
2018-02-23 15:11:19 -08:00
Jacob Freck
146345da1c
Feature: task affinity to master node ( #413 )
2018-02-23 12:20:45 -08:00
Jacob Freck
e188170d9a
Feature: Spark add worker on master option ( #415 )
...
* Add worker_on_master to ClusterConfiguration
* add worker_on_master to JobConfiguration
2018-02-23 11:37:27 -08:00
Jacob Freck
17755e0ffa
Feature: Basic Cluster and Job Submission SDK Tests ( #344 )
...
* add initial cluster tests
* add cluster tests, add simple job submission test scenario
* sort imports
* fix job tests
* fix job tests
* remove pytest from travis build
* cluster per test, parallel pytest plugin
* delete cluster after tests, wait until deleted
* fix bugs
* catch right error, change cluster_id to base_cluster_id
* fix test name
* fixes
* move tests to intregration_tests dir
* update travis to run non-integration tests
* directory structure, decoupled job tests
* fix job tests, issue with submit_job
* fix bug
* add test docs
* add cluster and job delete to finally clause
2018-02-22 14:06:16 -08:00
Jacob Freck
6b1c86195d
Feature: SDK support for file-like configuration objects ( #373 )
...
* add support for filelike objects for conifguration files
* fix custom scripts
* remove os.pathlike
* merge error
2018-02-21 16:55:26 -08:00
Jacob Freck
3e42e25eba
Bug: move spark.local.dir to location usable by rstudioserver ( #407 )
2018-02-21 13:23:04 -08:00
Timothee Guerin
fdcdb265a2
Fix: Trying to add user before master is ready show better error ( #402 )
2018-02-14 15:51:38 -08:00
Jacob Freck
0fc5c76cee
Bug: fix spark-submit cores args ( #399 )
2018-02-13 12:10:19 -08:00
Jacob Freck
2b76905d79
Bug: Spark Job list apps exit code 0 ( #396 )
2018-02-13 12:04:48 -08:00
Jacob Freck
d662a2974b
Bug: spark submit upload error log type error ( #397 )
2018-02-12 17:12:43 -08:00
Jacob Freck
f727ae3929
Bug: always upload spark job logs errors ( #395 )
2018-02-12 14:37:28 -08:00
Jacob Freck
a680fbde4e
Bug: stop using mutable default parameters ( #392 )
2018-02-09 12:12:40 -08:00
Jacob Freck
9bd3490f6a
Feature: enable dynamic allocation by default ( #386 )
2018-02-09 11:01:10 -08:00
Jacob Freck
d75ae44efc
Feature: spark shuffle service ( #374 )
...
* start shuffle service by default
* whitespace, delete misplaced file
* crlf->lf
* crlf->lf
* move spark scratch space off os drive
2018-02-09 10:52:47 -08:00
Timothee Guerin
d7d5faaf7a
Fix: Custom scripts not read from cluster.yaml ( #388 )
2018-02-09 09:56:53 -08:00
Jacob Freck
e98de9f8ac
Bug: spark SDK example fix ( #383 )
...
* start fix sdk
* fix sdk example
* crlf->lf
2018-02-08 17:33:24 -08:00
Jacob Freck
f0b98159b8
Bug: fix regex for is_gpu_enabled ( #380 )
...
* fix regex for is_gpu_enabled
* crlf->lf
2018-02-08 12:03:36 -08:00
Jacob Freck
4fa8017025
Bug: fix core-site.xml typo ( #378 )
...
* fix typo
* crlf->lf
2018-02-08 11:07:49 -08:00
Timothee Guerin
5b191b4357
Feature: Refactor cluster config to use ClusterConfiguration model ( #343 )
2018-02-07 18:01:53 -08:00
Jacob Freck
85e444ce34
Feature: Cluster Run and Copy ( #304 )
...
* start implementation of cluster run
* fix cluster_run
* start debug sequential user add and delete
* parallelize user creation and deletion, start implementation of cluster scp
* continue cluster_scp implementation
* debug statements, disconnect error: permission denied
* untesteed parakimo implementation of clus_run
* continue debugging user creation bug
* fix bug with pool user creation, start concurrent implementation
* start fix of paramiko cluster_run and cluster_copy
* working paramiko cluster_run implementation, start cluster_scp
* fix cluster_scp command
* update requirements, rename cluster_run function
* remove unused shell functions
* parallelize run and scp, add container_name, create logs wrapper
* change scp to copy, clean up
* sort imports
* remove asyncssh from node requirements
* remove old import
* remove bad error handling
* make cluster user management methods private
* remove comment
* remove accidental commit
* fix merge, move delete to finally clause
* add docs
* formatting
2018-02-07 16:19:33 -08:00
Jacob Freck
af8d037dcb
Bug: Load default Jars for job submission CLI ( #367 )
...
* load jars in .aztk/ by default
* rewrite loading config files
2018-02-07 11:21:33 -08:00
Jacob Freck
1a15245eb4
Feature: spark init docker repo customization ( #358 )
...
* customize docker_repo based on init args
* whitespace
* add some docs
* r-base to r
* case insensitive r flag, typo fix
2018-02-07 10:56:20 -08:00
Jacob Freck
748a1269fa
Feature: Spark mixed mode support ( #350 )
...
* add support for aad creds for storage on node
* add mixed mode support
* add docs
* switch error order
* add dedicated to get_cluster
* remove mixed mode in print_cluster_conf
2018-02-07 10:41:51 -08:00
Emlyn Corrin
63fb81b2b6
Allow submitting jobs into a VNET ( #365 )
...
* Add subnet_id to job submission cluster config
* add some docs
2018-02-05 15:43:13 -08:00
Emlyn Corrin
3430fef976
Fix list-apps crash ( #364 )
2018-02-01 12:05:23 -08:00
Jacob Freck
85a472c591
Feature: on node user creation ( #303 )
...
* client side on node user creation
* start create user on node implementation
* fix on node user creation
* remove debug statements
* remove commented code
* line too long
* fix spinner password prompt ui bug
* set wait to false by default, formatting
* encrypt password on client, decrypt on node
* update docs, log warning if password used
2018-01-26 16:14:15 -08:00
Jacob Freck
97fcc50465
Feature: v0.5.1 version update ( #360 )
...
* update version number
* update changelog
* update changelog
2018-01-25 15:12:01 -08:00
Jacob Freck
f490f9643f
Feature: AAD tutorial docs ( #353 )
...
* aad tutorial docs
* typo, specify application type
2018-01-25 13:58:59 -08:00
Brian
5ceec67346
Bug: Fixed R Dockerfiles ( #357 )
...
* Fixed consistency in format
* Give user permissions to reinstall R packages
2018-01-25 08:25:39 -08:00
Jacob Freck
a86dc7ee26
Bug: fix get pool crash on delete ( #313 )
2018-01-24 10:39:42 -08:00
Jacob Freck
ce5abe4937
Bug: Support on node AAD Storage creds ( #349 )
2018-01-23 17:12:58 -08:00
Jacob Freck
0c606609cc
Bug: Fix Copyfile Directory bug ( #346 )
...
* ensure dirs exist before copying
* remove filename before making directory
* merge master
* remove pytest command from travis build
* undo travis build change
2018-01-23 08:50:18 -08:00
Brian
89948de905
R GPU Dockerfiles ( #341 )
...
* Added dockerfiles for aztk r gpu
* Removed tensorflow/cntk cpu versions
* Enabled configure option for shared lib for python
* Removed temp paths
* Replaced python version with aztk version
* Cleaning up args
* Added variable definition for tensorflow, shorten lines
2018-01-22 16:36:36 -08:00
Jacob Freck
717f3d04a3
Bug: Ensure Directories Exist Before Copying ( #342 )
2018-01-22 16:13:13 -05:00
Jacob Freck
1aae3b5a8b
Feature: expose exit_code ( #320 )
...
* expose the batch task exit_code in ApplicationLog
* add exit_code to job list-apps output
* add execution info to applicationlog object
* whitespace
2018-01-19 16:53:45 -05:00
Jacob Freck
882418a928
Feature: Spark job submission ( #278 )
...
* refactor submit, initial job-submission commit
* fix merge conflicts
* comment out gpu_enabled
* update schedule, user job manager task, wait until container ready
* wait until docker container running
* add get_job_log
* add job id
* start multitask job submission
* start multitask implementation
* wait for node setup to complete before completing start task
* fix incorrect logic
* add environment variable, fix loading task definition, fix waiting for container
* fix early job completion, autokill pool, cleanup code
* include debug print error
* define job submission sdk stubs
* Job and JobConfiguration models
* remove timestamp, implement job function stubs , add Application model
* add delete_job, bug fixes
* add wait_until_job_finished, wait_until_all_jobs_finished
* fix storage output logs location, function names
* whitespace
* add cli file stubs
* start job cli implementation
* better output for job cli, added job.yaml configuration file
* error catch for missing blob, add sdk docs, models updates
* start Jobs tutorial doc
* template for rest of job tutorial doc
* rename app_id to application_name, add job sdk docs
* docs fix
* add support for low pri nodes
* better formatting for job.yaml
* remove unecessary comments
* add defaults and commnets for job.yaml, fix spark_configuration file loading
* fix spark_configuration file paths
* fix app_arg issue
* clean up code, address comments
* rename to cluster_submit_helper
* update get job print
* add list-apps, update help text for job id
* add application metadata to job, update job get print
* better error for get_application_log, add working example to job.yaml, add print_application
* add warning about master selection
* update list_applications return value
* whitespace
* Add link to docs in job.yaml, validation for job.yaml
* convert to commandbuilder, remove print
* fix submit exit code, fix no app_args bug, set autoscale interval
* wait until custom scripts are completed
* add missed import in get_app_logs, whitespace
* use correct python version for on node app submit
* update get output format
2018-01-19 13:37:58 -05:00
Pablo Selem
b56c551189
always upgrade jupyter to the latest version on install ( #336 )
2018-01-18 19:46:37 -08:00
Jacob Freck
87f02b3b2d
Bug: fix secrets shared key backwards compatibility ( #334 )
2018-01-17 13:05:34 -08:00
Timothee Guerin
e193ed30dd
Feature: VNet support ( #324 )
...
* VNet support
* Azure active directory authentication
2018-01-17 11:14:56 -08:00
Jacob Freck
f7c1cb5172
add MIT license ( #323 )
2018-01-11 10:19:14 -08:00
Jacob Freck
1d6000d759
Feature; performance tune core-site.xml ( #321 )
2018-01-09 11:51:27 -08:00
Jacob Freck
7e9bf9c383
Feature: Spark retry job ( #318 )
2018-01-08 12:31:47 -08:00