Граф коммитов

82 Коммитов

Автор SHA1 Сообщение Дата
Jacob Freck 6e04372d19
Release v0.10.3 (#714)
* Build(deps): Bump requests from 2.19.1 to 2.20.0 in /aztk/node_scripts

Bumps [requests](https://github.com/requests/requests) from 2.19.1 to 2.20.0.
- [Release notes](https://github.com/requests/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/master/HISTORY.md)
- [Commits](https://github.com/requests/requests/compare/v2.19.1...v2.20.0)

Signed-off-by: dependabot[bot] <support@github.com>

* Build(deps): Bump pyyaml from 3.13 to 5.1 in /aztk/node_scripts

Bumps [pyyaml](https://github.com/yaml/pyyaml) from 3.13 to 5.1.
- [Release notes](https://github.com/yaml/pyyaml/releases)
- [Changelog](https://github.com/yaml/pyyaml/blob/master/CHANGES)
- [Commits](https://github.com/yaml/pyyaml/compare/3.13...5.1)

Signed-off-by: dependabot[bot] <support@github.com>

* update changelog for 0.10.3 release

* update requirements and pipfile

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2020-02-14 15:27:19 -08:00
Jacob Freck 1f6fc8b8e1
Release: v0.10.2 (#691)
* update changelog

* update version

* update version in docs links
2018-12-07 12:58:32 -08:00
Jacob Freck 5379b00b73
Release: 0.10.1 (#682)
* update version

* update changelog for 0.10.1 release

* update relevent links
2018-11-02 11:45:53 -07:00
Jacob Freck 4408c4fc41
Feature: Spark scheduling target (#661)
* initial

* update pipfile and pipfile.lock

* uncomment scheduling target, start ssh_submit impl

* get rid of debug code

* finish ssh_submit implementation

* serialize object instead of properties

* fix upload log bug, temp workaround for get logs

* remove unused function

* clean up node_scripts submit, remove debug code

* ensure warns on deprecated test

* remove commented timeout

* start scheduling_target for job_submission

* continue job scheduling target implementation

* update pipefile.lock

* update Pipfile deps, pin pynacl to fix build failure

* fix syntax

* fix pipfile with latest azure-nspkg

* update path for scheduling scripts

* update config.py import

* add nohup dependency

* use nohup and exit immediately

* remove bad dep

* remove nohup

* remove commented code

* add block to ssh, get retcode from node_exec

* fix typo

* fix some imports, add test stubs

* fixes

* start implementation of task table service

* add scheduling_target support for get_application_log

* todos

* remove useless statement

* move get_application_status to core, add scheduling_target support

* update deps in requirements.txt

* fix false positive pylint import error

* remove bad import

* bad local variable

* add batch task abstraction, add datetime field

* mediate table insertion with task abstraction

* fix issues with task abstraction usage

* fix pylint import error

* fix update task on run

* update job submission test

* make test package, update pylint

* update job submission with scheduling_target

* add job support for scheduling_target

* fix taskstate serialization to storage

* fix job submission job manager task, catch table storage errors

* fix import

* fix imports for batch sdk 5.0+

* fix test model module

* fix node election exception catch

* start fix job tests

* move get_task_status to base

* fix job tests

* fix get_application, add abstraction to batch task gets

* fix some bugs, remove some debug statements

* fix test

* use jobstate and application state

* add start_task retries

* make jobstate an enum

* fix import

* fixes

* fixes

* revert settings.json

* fixes for application state in cli

* conditionally create storage table

* remove commented code

* conditionally create storage table

* remove commented code

* fix test

* respond to comments

* fix debug statement, fix starttask issue

* remove debug test print

* formatting

* update doc string with correct return value

* revert settings.json

* more robust starget test, fix get_application for starget

* whitespace
2018-10-23 15:47:54 -07:00
Jacob Freck b7da355618
Release: 0.9.1 (#669)
* update changelog and version

* update docs links to new version number
2018-10-05 13:07:15 -07:00
Jacob Freck 93615d9a43
Fix: spark roll back scheduling disable (#653)
* disable offlining on node

* disable scheduling_target in config, cli, and sdk

* remove schedluing target function

* formatting

* remove alway none return value
2018-08-29 15:40:09 -07:00
Jacob Freck 442228a30f
Deprecate: remove custom scripts (#650) 2018-08-17 20:36:11 -04:00
Jacob Freck 9098533969
Feature: first run docs update (#644)
* update getting started page, remove custom scripts doc

* recommend venv
2018-08-17 15:58:22 -04:00
Jacob Freck b7bdd8c268
Feature: add brief flag to debug tool (#634)
* add brief flag

* add some docs

* fix requirements
2018-08-16 15:18:43 -04:00
mmduyzend 9d554c3255 Feature: Add ability to specify docker run options in toolkit config (#613)
* Feature: Add ability to specify docker run options in cluster config

* update function calls to match new sdk refactor

* fix empty docker_run_options failure

* formatting

* fix formatting (#3)
2018-08-13 09:03:58 -07:00
Jacob Freck b18eb695a1
Feature: SDK refactor (#622)
* start refactor

* continue refactor for cluster and job functions

* fix imports

* fixes

* fixes

* refactor integration test secrets management

* fix cluster create, add new test

* add tests for new sdk api and fix bugs

* fix naming and bugs

* update job operations naming, bug fixes

* fix cluster tests

* fix joboperations and tests

* update cli and fix some bugs

* start fixes

* fix pylint errors, bugs

* add deprecated warning checks, rename tests

* add docstrings for baseoperations

* add docstrings

* docstrings, add back compat for coreclient, fix init for spark client

* whitespace

* docstrings, whitespace

* docstrings, fixes

* docstrings, fixes

* fix the sdk documentation, bugs

* fix method call

* pool_id->id

* rename ids

* cluster_id->id

* cluster_id->id

* add todo

* fixes

* add some todos

* rename pool to cluster, add todo for nodes params

* add todos for nodes param removal

* update functions names

* remove deprecated fucntion calls

* update docs and docstrings

* update docstrings

* get rid of TODOs, fix docstrings

* remove unused setting

* inheritance -> composition

* fix models bugs

* fix create_user bug

* update sdk_example.py

* fix create user argument issue

* update sdk_example.py

* update doc

* use Software model instead of string

* add job wait flag, add cluster application wait functions

* add docs for wait, update tests

* fix bug

* add clientrequesterror catch to fix tests
2018-08-03 15:20:05 -07:00
Jacob Freck a8f8e92629
Fix: docs links version (#614)
* update changelog

* update versions
2018-06-20 14:55:13 -07:00
Jacob Freck 4b2acc8491
Fix: add toolkit to sdk docs and example (#602) 2018-06-11 15:34:29 -07:00
mmduyzend 7d7a814c50 Fix: fix typos (#595) 2018-06-07 09:57:43 -07:00
Jacob Freck 88d04195ec
Feature: add cluster list quiet flag, ability to compose with delete (#581)
* add quiet flag, ability to compose with delete

* log.print instead of print

* add some docs
2018-06-06 16:03:34 -07:00
Jacob Freck f16aac091e
Feature: pure python ssh (#577)
* forward multiple ports

* plumb through cli

* continue cli implementation

* fixes

* pylint ignore

* spacing

* remove debug stuff, fix bug

* add --internal support

* add to init

* add comment

* remove nesting

* add logging

* add some docs
2018-06-04 17:16:51 -07:00
Jacob Freck af449dc194
Feature: add node run command (#572)
* add node run command

* whitespace

* add node-run doc

* add host flag

* refactor, print->log

* generated username

* more secure random

* better handling of find node, type conversion

* add generate_user_on_node

* docs update

* fix docs

* remove duplicate import, sort
2018-06-04 13:58:33 -07:00
Jacob Freck 49a890a5df
Fix: switch create user to pool wide (#574)
* switch create user to pool wide

* fix password bug

* update doc
2018-05-30 13:36:59 -07:00
Timothee Guerin 8fea9ce092
Feature: Disable scheduling on group of nodes (#540) 2018-05-30 13:02:48 -07:00
Timothee Guerin 02f336b0a0
Feature: New Models design with auto validation, default and merging (#543) 2018-05-30 09:07:09 -07:00
lachiemurray f6735cc6dd Feature: Support passing of remote executables via aztk spark cluster submit (#549) 2018-05-24 10:29:23 -07:00
Jacob Freck 1527929e30
Feature: TensorflowOnSpark python plugin (#525)
* initial commit

* update

* update

* add gpu support

* remove comment

* change class to function

* fix merge issue

* add some docs
2018-05-21 13:22:55 -07:00
Jacob Freck 4d4916e349
Release: v0.7.0 (#535)
* changelog and migration guide

* update version, links, and doc

* update links

* add docs version

* update changelog
2018-05-01 18:44:26 -07:00
Timothee Guerin 7a7e63c54f
Feature: New Toolkit configuration (#507) 2018-05-01 16:36:44 -07:00
Timothee Guerin 9bc76396bc
Docs: Added worker on master docs (#531) 2018-05-01 14:40:31 -07:00
Jacob Freck 779bffb2da
Feature: refactor docker images (#510)
* add spark2.3.0 hadoop2.8.3 dockerfile

* start update to docker image

* add SPARK_DIST_CLASSPATH to bashrc, source .bashrc in docker run

* add maven install for jars

* docker image update and code fix

* add libthrift (still broken)

* start image refactor, build from source,

* add refactor to r base image

* finish refactor r image

* add storage jars and deps

* exclude netty to get rid of dependency conflict

* add miniconda image

* update 2.2.0 base, anaconda image

* remove unused cuda-8.0 image

* start pipenv implementation

* miniconda version arg

* update anaconda and miniconda image

* style

* pivot to virtualenv

* remove virtualenv from path when submitting apps

* flatten layers

* explicit calls to aztk python instead of activating virtualenv

* update base, miniconda, anaconda

* add compatibility version for base aztk images

* typo fix

* update pom

* update environment variable name

* update environment variables

* add anaconda images base & gpu

* update gpu and miniconda base images

* create venv in cluster create

* update base docker files, remove virtualenv

* fix path

* add exclusion to base images

* update r images

* delete python images (in favor of anaconda and miniconda)

* add miniconda gpu images

* update comment

* update aztk_version_compatibility to dokcer image version

* add a build script

* virutalenv->pipenv, add pipfile & pipfile.lock remove secretstorage

* aztk/staging->aztk/spark

* remove jars, add .null to keep directory

* update pipfile, update jupyter and jupyterlab

* update default images

* update base images to fix hdfs

* update build script with correct path

* add spark1.6.3 anaconda, miniconda, r base and gpu images

* update build script to include spark1.6.3

* mkdir out

* exclude commons lang and slf4j dependencies

* mkdir out

* no fail if dir exists

* update node_scripts

* update env var name

* update env var name

* fix the docker_repo docs

* master->0.7.0
2018-04-30 17:19:01 -07:00
Timothee Guerin c98df7d1df
Feature: Added custom scripts functionality for plugins with the cli(Deprecate custom scripts) (#517) 2018-04-27 10:31:24 -07:00
Timothee Guerin e361c3b0b3
Feature: Readthedocs support (#497) 2018-04-26 14:03:45 -07:00
Timothee Guerin b8a3fccaf0
Fix: AZTK_IS_MASTER not set on worker and failing (#506)
* Fix: AZTK_IS_MASTER_NOT_SET

* Update jupyter lab too

* update jupyterlab target role

* True false doc
2018-04-24 12:14:47 -07:00
Timothee Guerin de7898334c
Feature: Plugin V2: Running plugin on host (#461) 2018-04-23 17:20:43 -07:00
Jacob Freck 2e995b4899
Feature: spark ui proxy plugin (#467)
* initial commit

* add args

* add docs

* change default plugins

* update ssh cli ui, remove plugin name

* change conditional

* update docs to include jupyterlab

* remove spark_ui_proxy as default plugin
2018-04-23 12:12:31 -07:00
Jacob Freck 7ef721f0c1
Feature: getting started script (#475)
* initial changes for getting started scripts

* add temp error handling

* rename file - fix typo

* add debug strings

* add handling for existing user

* WIP: wait for subprocess to complete to get exit code

* WIP: handle existing user and refactor code

* WIP: add missing return statements

* WIP: fix typo

* start sdk refactor

* mostly working create

* working happy create path

* handle errors for vnet, aad application

* make account setup interactive

* add prompt

* add docs

* rename account_setup_refac to account_setup

* add some logging

* pip install msrest, azure-cli-core, import issues

* remove in script pip, add shell wrapper program

* ellipsis to period

* update branch name for account_setup.sh

* docstring

* retry resource group creation

* fix typo, update retry

* explicitly set output location

* wget overwrite flag, docs update

* add prompt for multi tenants

* fix bug with batch account creation

* add spinner, print statements, fix formatting bug

* fix param bug
2018-04-11 13:27:55 -07:00
Jacob Freck 44a07654aa
Feature: spark debug tool (#455)
* start implementation of cluster debug utility

* update debug program

* update debug

* fix output directory structure

* cleanup output, add error checking

* sort imports

* start untar

* extract tar

* add debug.py to pylintc ignore, line too long

* crlf->lf

* add app logs

* call get_spark_app_logs, typos

* add docs

* remove debug.py from pylintrc ignore

* added debug.py back to pylint ignore

* change pylint ignore

* remove commented log

* update cluster_run

* refactor cluster_copy

* update debug, add spinner for run and copy

* make new sdk cluster_download endpoint
2018-04-09 15:02:43 -07:00
Jacob Freck 1eaa1b6e42
Feature: add internal flag to node commands (#482)
* add internal ssh flag

* add --internal flag to cluster get

* cluster run internal flag

* fix add command back

* cluster copy internal

* fix method params

* fix method params

* add debug statement

* fix params

* remove debug statement

* fixes

* add debug statement

* remove debug statement

* add hostname to /etc/hosts

* remove hostname from /etc/hosts

* add sdk docs for internal switch in cluster run and copy
2018-04-06 15:59:13 -07:00
Jacob Freck 4ef3dd09df
Bug: add spark.history.fs.logDirectory to required keys (#456)
* add spark.history.fs.logDirectory to requried keys

* add spark_event_log_enabled_key to required_keys

* docs, add history server config to spark-defaults.conf

* fix bad logic

* crlf->lf
2018-04-05 14:11:35 -07:00
Jacob Freck 82ad0296af
Bug: add gitattributes file (#470)
Bug: line endings, add gitattributes file
2018-04-04 13:44:26 -07:00
Jacob Freck 8aa1843f23
Feature: managed storage for clusters and jobs (#443)
* add in storage management for clusters, jobs

* add warning logs on cli delete

* whitespace

* add keep-logs flag

* add docs on storage lifetime
2018-03-20 10:45:49 -07:00
Dmitry Stratiychuk 4be5ac2f44 Fix job configuration option for `aztk spark job submit` command (#435)
`--job-conf` option mentioned in the docs wasn't working.

CLI help was showing that option is named `--configuration-c`
which seems to be a result of a missing comma in option definition.
2018-03-13 11:07:48 -07:00
Jacob Freck 17755e0ffa
Feature: Basic Cluster and Job Submission SDK Tests (#344)
* add initial cluster tests

* add cluster tests, add simple job submission test scenario

* sort imports

* fix job tests

* fix job tests

* remove pytest from travis build

* cluster per test, parallel pytest plugin

* delete cluster after tests, wait until deleted

* fix bugs

* catch right error, change cluster_id to base_cluster_id

* fix test name

* fixes

*  move tests to intregration_tests dir

* update travis to run non-integration tests

* directory structure, decoupled job tests

* fix job tests, issue with submit_job

* fix bug

* add test docs

* add cluster and job delete to finally clause
2018-02-22 14:06:16 -08:00
Jacob Freck 6b1c86195d
Feature: SDK support for file-like configuration objects (#373)
* add support for filelike objects for conifguration files

* fix custom scripts

* remove os.pathlike

* merge error
2018-02-21 16:55:26 -08:00
Jacob Freck 85e444ce34
Feature: Cluster Run and Copy (#304)
* start implementation of cluster run

* fix cluster_run

* start debug sequential user add and delete

* parallelize user creation and deletion, start implementation of cluster scp

* continue cluster_scp implementation

* debug statements, disconnect error: permission denied

* untesteed parakimo implementation of clus_run

* continue debugging user creation bug

* fix bug with pool user creation, start concurrent implementation

* start fix of paramiko cluster_run and cluster_copy

* working paramiko cluster_run implementation, start cluster_scp

* fix cluster_scp command

* update requirements, rename cluster_run function

* remove unused shell functions

* parallelize run and scp, add container_name, create logs wrapper

* change scp to copy, clean up

* sort imports

* remove asyncssh from node requirements

* remove old import

* remove bad error handling

* make cluster user management methods private

* remove comment

* remove accidental commit

* fix merge, move delete to finally clause

* add docs

* formatting
2018-02-07 16:19:33 -08:00
Jacob Freck 1a15245eb4
Feature: spark init docker repo customization (#358)
* customize docker_repo based on init args

* whitespace

* add some docs

* r-base to r

* case insensitive r flag, typo fix
2018-02-07 10:56:20 -08:00
Jacob Freck 748a1269fa
Feature: Spark mixed mode support (#350)
* add support for aad creds for storage on node

* add mixed mode support

* add docs

* switch error order

* add dedicated to get_cluster

* remove mixed mode in print_cluster_conf
2018-02-07 10:41:51 -08:00
Emlyn Corrin 63fb81b2b6 Allow submitting jobs into a VNET (#365)
* Add subnet_id to job submission cluster config

* add some docs
2018-02-05 15:43:13 -08:00
Jacob Freck f490f9643f
Feature: AAD tutorial docs (#353)
* aad tutorial docs

* typo, specify application type
2018-01-25 13:58:59 -08:00
Jacob Freck 1aae3b5a8b
Feature: expose exit_code (#320)
* expose the batch task exit_code in ApplicationLog

* add exit_code to job list-apps output

* add execution info to applicationlog object

* whitespace
2018-01-19 16:53:45 -05:00
Jacob Freck 882418a928
Feature: Spark job submission (#278)
* refactor submit, initial job-submission commit

* fix merge conflicts

* comment out gpu_enabled

* update schedule, user job manager task, wait until container ready

* wait until docker container running

* add get_job_log

* add job id

* start multitask job submission

* start multitask implementation

* wait for node setup to complete before completing start task

* fix incorrect logic

* add environment variable, fix loading task definition, fix waiting for container

* fix early job completion, autokill pool, cleanup code

* include debug print error

* define job submission sdk stubs

* Job and JobConfiguration models

* remove timestamp, implement job function stubs , add Application model

* add delete_job, bug fixes

* add wait_until_job_finished, wait_until_all_jobs_finished

* fix storage output logs location, function names

* whitespace

* add cli file stubs

* start job cli  implementation

* better output for job cli, added job.yaml configuration file

* error catch for missing blob, add sdk docs, models updates

* start Jobs tutorial doc

* template for rest of job tutorial doc

* rename app_id to application_name, add job sdk docs

* docs fix

* add support for low pri nodes

* better formatting for job.yaml

* remove unecessary comments

* add defaults and commnets for job.yaml, fix spark_configuration file loading

* fix spark_configuration file paths

* fix app_arg issue

* clean up code, address comments

* rename to cluster_submit_helper

* update get job print

* add list-apps, update help text for job id

* add application metadata to job, update job get print

* better error for get_application_log, add working example to job.yaml, add print_application

* add warning about master selection

* update list_applications return value

* whitespace

* Add link to docs in job.yaml, validation for job.yaml

* convert to commandbuilder, remove print

* fix submit exit code, fix no app_args bug, set autoscale interval

* wait until custom scripts are completed

* add missed import in get_app_logs, whitespace

* use correct python version for on node app submit

* update get output format
2018-01-19 13:37:58 -05:00
Timothee Guerin e193ed30dd
Feature: VNet support (#324)
* VNet support
* Azure active directory authentication
2018-01-17 11:14:56 -08:00
Jacob Freck 7e9bf9c383
Feature: Spark retry job (#318) 2018-01-08 12:31:47 -08:00
JS 6091b1d390
Docs: update (#263)
* Update README.md

streamline and update main readme.md

* Update README.md

* Update README.md

* Update 13-configuration.md

* Update 12-docker-image.md

* Update 12-docker-image.md

* Update README.md

* Create README.md

* Update README.md

* Update 10-clusters.md
2017-12-11 17:02:51 -08:00