Граф коммитов

324 Коммитов

Автор SHA1 Сообщение Дата
Dmitry Stratiychuk 4be5ac2f44 Fix job configuration option for `aztk spark job submit` command (#435)
`--job-conf` option mentioned in the docs wasn't working.

CLI help was showing that option is named `--configuration-c`
which seems to be a result of a missing comma in option definition.
2018-03-13 11:07:48 -07:00
Jacob Freck 1c31335fb0
Bug: filter job submission clusters out of cluster list (#409) 2018-03-09 14:42:06 -08:00
Jacob Freck f1e3f7a0f8
Bug: remove unnecessary example (#417) 2018-03-08 12:35:29 -08:00
Timothee Guerin 2bed496c39
Internal: Cluster data helpers and upload_node_script into cluster_data module (#401) 2018-03-08 10:34:19 -08:00
Timothee Guerin c237501a9f
Feature: Pypi auto deployement (#428) 2018-03-05 17:18:47 -08:00
Jacob Freck 216f63dd64
Bug: add plugins to cluster_install_cmd call (#423) 2018-02-27 14:45:32 -08:00
Emlyn Corrin 6827181a34 Fix typo load_aztk_screts -> load_aztk_secrets (#421) 2018-02-27 04:34:46 -08:00
Timothee Guerin c724d9403f
Feature: Plugins (#387) 2018-02-26 16:36:31 -08:00
Jacob Freck b833561c7e
Release: v0.6.0 (#416)
* update changelog and version

* underscores to stars
2018-02-23 15:11:19 -08:00
Jacob Freck 146345da1c
Feature: task affinity to master node (#413) 2018-02-23 12:20:45 -08:00
Jacob Freck e188170d9a
Feature: Spark add worker on master option (#415)
* Add worker_on_master to ClusterConfiguration

* add worker_on_master to JobConfiguration
2018-02-23 11:37:27 -08:00
Jacob Freck 17755e0ffa
Feature: Basic Cluster and Job Submission SDK Tests (#344)
* add initial cluster tests

* add cluster tests, add simple job submission test scenario

* sort imports

* fix job tests

* fix job tests

* remove pytest from travis build

* cluster per test, parallel pytest plugin

* delete cluster after tests, wait until deleted

* fix bugs

* catch right error, change cluster_id to base_cluster_id

* fix test name

* fixes

*  move tests to intregration_tests dir

* update travis to run non-integration tests

* directory structure, decoupled job tests

* fix job tests, issue with submit_job

* fix bug

* add test docs

* add cluster and job delete to finally clause
2018-02-22 14:06:16 -08:00
Jacob Freck 6b1c86195d
Feature: SDK support for file-like configuration objects (#373)
* add support for filelike objects for conifguration files

* fix custom scripts

* remove os.pathlike

* merge error
2018-02-21 16:55:26 -08:00
Jacob Freck 3e42e25eba
Bug: move spark.local.dir to location usable by rstudioserver (#407) 2018-02-21 13:23:04 -08:00
Timothee Guerin fdcdb265a2
Fix: Trying to add user before master is ready show better error (#402) 2018-02-14 15:51:38 -08:00
Jacob Freck 0fc5c76cee
Bug: fix spark-submit cores args (#399) 2018-02-13 12:10:19 -08:00
Jacob Freck 2b76905d79
Bug: Spark Job list apps exit code 0 (#396) 2018-02-13 12:04:48 -08:00
Jacob Freck d662a2974b
Bug: spark submit upload error log type error (#397) 2018-02-12 17:12:43 -08:00
Jacob Freck f727ae3929
Bug: always upload spark job logs errors (#395) 2018-02-12 14:37:28 -08:00
Jacob Freck a680fbde4e
Bug: stop using mutable default parameters (#392) 2018-02-09 12:12:40 -08:00
Jacob Freck 9bd3490f6a
Feature: enable dynamic allocation by default (#386) 2018-02-09 11:01:10 -08:00
Jacob Freck d75ae44efc
Feature: spark shuffle service (#374)
* start shuffle service by default

* whitespace, delete misplaced file

* crlf->lf

* crlf->lf

* move spark scratch space off os drive
2018-02-09 10:52:47 -08:00
Timothee Guerin d7d5faaf7a
Fix: Custom scripts not read from cluster.yaml (#388) 2018-02-09 09:56:53 -08:00
Jacob Freck e98de9f8ac
Bug: spark SDK example fix (#383)
* start fix sdk

* fix sdk example

* crlf->lf
2018-02-08 17:33:24 -08:00
Jacob Freck f0b98159b8
Bug: fix regex for is_gpu_enabled (#380)
* fix regex for is_gpu_enabled

* crlf->lf
2018-02-08 12:03:36 -08:00
Jacob Freck 4fa8017025
Bug: fix core-site.xml typo (#378)
* fix typo

* crlf->lf
2018-02-08 11:07:49 -08:00
Timothee Guerin 5b191b4357
Feature: Refactor cluster config to use ClusterConfiguration model (#343) 2018-02-07 18:01:53 -08:00
Jacob Freck 85e444ce34
Feature: Cluster Run and Copy (#304)
* start implementation of cluster run

* fix cluster_run

* start debug sequential user add and delete

* parallelize user creation and deletion, start implementation of cluster scp

* continue cluster_scp implementation

* debug statements, disconnect error: permission denied

* untesteed parakimo implementation of clus_run

* continue debugging user creation bug

* fix bug with pool user creation, start concurrent implementation

* start fix of paramiko cluster_run and cluster_copy

* working paramiko cluster_run implementation, start cluster_scp

* fix cluster_scp command

* update requirements, rename cluster_run function

* remove unused shell functions

* parallelize run and scp, add container_name, create logs wrapper

* change scp to copy, clean up

* sort imports

* remove asyncssh from node requirements

* remove old import

* remove bad error handling

* make cluster user management methods private

* remove comment

* remove accidental commit

* fix merge, move delete to finally clause

* add docs

* formatting
2018-02-07 16:19:33 -08:00
Jacob Freck af8d037dcb
Bug: Load default Jars for job submission CLI (#367)
* load jars in .aztk/ by default

* rewrite loading config files
2018-02-07 11:21:33 -08:00
Jacob Freck 1a15245eb4
Feature: spark init docker repo customization (#358)
* customize docker_repo based on init args

* whitespace

* add some docs

* r-base to r

* case insensitive r flag, typo fix
2018-02-07 10:56:20 -08:00
Jacob Freck 748a1269fa
Feature: Spark mixed mode support (#350)
* add support for aad creds for storage on node

* add mixed mode support

* add docs

* switch error order

* add dedicated to get_cluster

* remove mixed mode in print_cluster_conf
2018-02-07 10:41:51 -08:00
Emlyn Corrin 63fb81b2b6 Allow submitting jobs into a VNET (#365)
* Add subnet_id to job submission cluster config

* add some docs
2018-02-05 15:43:13 -08:00
Emlyn Corrin 3430fef976 Fix list-apps crash (#364) 2018-02-01 12:05:23 -08:00
Jacob Freck 85a472c591
Feature: on node user creation (#303)
* client side on node user creation

* start create user on node implementation

* fix on node user creation

* remove debug statements

* remove commented code

* line too long

* fix spinner password prompt ui bug

* set wait to false by default, formatting

* encrypt password on client, decrypt on node

* update docs, log warning if password used
2018-01-26 16:14:15 -08:00
Jacob Freck 97fcc50465
Feature: v0.5.1 version update (#360)
* update version number

* update changelog

* update changelog
2018-01-25 15:12:01 -08:00
Jacob Freck f490f9643f
Feature: AAD tutorial docs (#353)
* aad tutorial docs

* typo, specify application type
2018-01-25 13:58:59 -08:00
Brian 5ceec67346
Bug: Fixed R Dockerfiles (#357)
* Fixed consistency in format

* Give user permissions to reinstall R packages
2018-01-25 08:25:39 -08:00
Jacob Freck a86dc7ee26
Bug: fix get pool crash on delete (#313) 2018-01-24 10:39:42 -08:00
Jacob Freck ce5abe4937
Bug: Support on node AAD Storage creds (#349) 2018-01-23 17:12:58 -08:00
Jacob Freck 0c606609cc
Bug: Fix Copyfile Directory bug (#346)
* ensure dirs exist before copying

* remove filename before making directory

* merge master

* remove pytest command from travis build

* undo travis build change
2018-01-23 08:50:18 -08:00
Brian 89948de905
R GPU Dockerfiles (#341)
* Added dockerfiles for aztk r gpu

* Removed tensorflow/cntk cpu versions

* Enabled configure option for shared lib for python

* Removed temp paths

* Replaced python version with aztk version

* Cleaning up args

* Added variable definition for tensorflow, shorten lines
2018-01-22 16:36:36 -08:00
Jacob Freck 717f3d04a3
Bug: Ensure Directories Exist Before Copying (#342) 2018-01-22 16:13:13 -05:00
Jacob Freck 1aae3b5a8b
Feature: expose exit_code (#320)
* expose the batch task exit_code in ApplicationLog

* add exit_code to job list-apps output

* add execution info to applicationlog object

* whitespace
2018-01-19 16:53:45 -05:00
Jacob Freck 882418a928
Feature: Spark job submission (#278)
* refactor submit, initial job-submission commit

* fix merge conflicts

* comment out gpu_enabled

* update schedule, user job manager task, wait until container ready

* wait until docker container running

* add get_job_log

* add job id

* start multitask job submission

* start multitask implementation

* wait for node setup to complete before completing start task

* fix incorrect logic

* add environment variable, fix loading task definition, fix waiting for container

* fix early job completion, autokill pool, cleanup code

* include debug print error

* define job submission sdk stubs

* Job and JobConfiguration models

* remove timestamp, implement job function stubs , add Application model

* add delete_job, bug fixes

* add wait_until_job_finished, wait_until_all_jobs_finished

* fix storage output logs location, function names

* whitespace

* add cli file stubs

* start job cli  implementation

* better output for job cli, added job.yaml configuration file

* error catch for missing blob, add sdk docs, models updates

* start Jobs tutorial doc

* template for rest of job tutorial doc

* rename app_id to application_name, add job sdk docs

* docs fix

* add support for low pri nodes

* better formatting for job.yaml

* remove unecessary comments

* add defaults and commnets for job.yaml, fix spark_configuration file loading

* fix spark_configuration file paths

* fix app_arg issue

* clean up code, address comments

* rename to cluster_submit_helper

* update get job print

* add list-apps, update help text for job id

* add application metadata to job, update job get print

* better error for get_application_log, add working example to job.yaml, add print_application

* add warning about master selection

* update list_applications return value

* whitespace

* Add link to docs in job.yaml, validation for job.yaml

* convert to commandbuilder, remove print

* fix submit exit code, fix no app_args bug, set autoscale interval

* wait until custom scripts are completed

* add missed import in get_app_logs, whitespace

* use correct python version for on node app submit

* update get output format
2018-01-19 13:37:58 -05:00
Pablo Selem b56c551189
always upgrade jupyter to the latest version on install (#336) 2018-01-18 19:46:37 -08:00
Jacob Freck 87f02b3b2d
Bug: fix secrets shared key backwards compatibility (#334) 2018-01-17 13:05:34 -08:00
Timothee Guerin e193ed30dd
Feature: VNet support (#324)
* VNet support
* Azure active directory authentication
2018-01-17 11:14:56 -08:00
Jacob Freck f7c1cb5172 add MIT license (#323) 2018-01-11 10:19:14 -08:00
Jacob Freck 1d6000d759
Feature; performance tune core-site.xml (#321) 2018-01-09 11:51:27 -08:00
Jacob Freck 7e9bf9c383
Feature: Spark retry job (#318) 2018-01-08 12:31:47 -08:00