Jacob Freck
8c2bf0c1a6
Feature: spark submit scheduling internal ( #674 )
...
* add internal support for scheduling_target cluster submit
* add internal support for scheduling target job submission
* add cli flag
2018-10-26 16:58:38 -07:00
Jacob Freck
fc5053654c
Feature: 0.10.0 remove deprecated code ( #671 )
...
* remove deprecated code
* remove unused imports
* fix field name
* remove unused imports
* clean up comments, linting warnings
2018-10-24 10:43:47 -07:00
Jacob Freck
4408c4fc41
Feature: Spark scheduling target ( #661 )
...
* initial
* update pipfile and pipfile.lock
* uncomment scheduling target, start ssh_submit impl
* get rid of debug code
* finish ssh_submit implementation
* serialize object instead of properties
* fix upload log bug, temp workaround for get logs
* remove unused function
* clean up node_scripts submit, remove debug code
* ensure warns on deprecated test
* remove commented timeout
* start scheduling_target for job_submission
* continue job scheduling target implementation
* update pipefile.lock
* update Pipfile deps, pin pynacl to fix build failure
* fix syntax
* fix pipfile with latest azure-nspkg
* update path for scheduling scripts
* update config.py import
* add nohup dependency
* use nohup and exit immediately
* remove bad dep
* remove nohup
* remove commented code
* add block to ssh, get retcode from node_exec
* fix typo
* fix some imports, add test stubs
* fixes
* start implementation of task table service
* add scheduling_target support for get_application_log
* todos
* remove useless statement
* move get_application_status to core, add scheduling_target support
* update deps in requirements.txt
* fix false positive pylint import error
* remove bad import
* bad local variable
* add batch task abstraction, add datetime field
* mediate table insertion with task abstraction
* fix issues with task abstraction usage
* fix pylint import error
* fix update task on run
* update job submission test
* make test package, update pylint
* update job submission with scheduling_target
* add job support for scheduling_target
* fix taskstate serialization to storage
* fix job submission job manager task, catch table storage errors
* fix import
* fix imports for batch sdk 5.0+
* fix test model module
* fix node election exception catch
* start fix job tests
* move get_task_status to base
* fix job tests
* fix get_application, add abstraction to batch task gets
* fix some bugs, remove some debug statements
* fix test
* use jobstate and application state
* add start_task retries
* make jobstate an enum
* fix import
* fixes
* fixes
* revert settings.json
* fixes for application state in cli
* conditionally create storage table
* remove commented code
* conditionally create storage table
* remove commented code
* fix test
* respond to comments
* fix debug statement, fix starttask issue
* remove debug test print
* formatting
* update doc string with correct return value
* revert settings.json
* more robust starget test, fix get_application for starget
* whitespace
2018-10-23 15:47:54 -07:00
Jacob Freck
93615d9a43
Fix: spark roll back scheduling disable ( #653 )
...
* disable offlining on node
* disable scheduling_target in config, cli, and sdk
* remove schedluing target function
* formatting
* remove alway none return value
2018-08-29 15:40:09 -07:00
Jacob Freck
828162ef10
Internal: fix pylint warnings ( #651 )
...
* inital, remove unused imports
* run yapf
* remove unused imports and variables, fix declaration outside init
* fix some pylint warnings, add ssh_into_master
* remove unused imports
* unused variables
* string and function normalization
* stop using list comprehension for side effects, make method function
* stop using protected member
* various pylint fixes
* formatting
* formatting
* add retry decorator with tests
* start adding retry decorator, retry docker compose download
* update pip and tests
* logic fix
* change no delete if
* factor out reused functions
* fix wait_for_all_nodes
* fix download return type bug
* test vsts ci update
* temporarily disable integration tests
* syntax fix
* update vsts build
* add back integration tests, remove debug branch
* remove parallel unit tests
* more verbose clis
* update pylint
* typo
* fix imports
* function returns nothing, don't return
* make iterator list
* change debug value
2018-08-24 17:21:22 -07:00
Jacob Freck
442228a30f
Deprecate: remove custom scripts ( #650 )
2018-08-17 20:36:11 -04:00
Jacob Freck
7c14648005
Fix: expose get cluster configuration API ( #648 )
...
* fix get and ssh cli calls, add get_configuration api
* update builds to lint aztk_cli in parallel
* remove unnecessary get_configuration calls
2018-08-17 16:12:08 -04:00
Jacob Freck
b7bdd8c268
Feature: add brief flag to debug tool ( #634 )
...
* add brief flag
* add some docs
* fix requirements
2018-08-16 15:18:43 -04:00
Jacob Freck
eef36dc062
Feature: 0.9.0 deprecated code removal ( #645 )
...
* remove deprecated code
* remove deprecated tests, yapf test directory
* add import
* remove unused test
* remove deprecated field name in tests
* update test parameter to non deprecated name
2018-08-16 15:11:59 -04:00
mmduyzend
9d554c3255
Feature: Add ability to specify docker run options in toolkit config ( #613 )
...
* Feature: Add ability to specify docker run options in cluster config
* update function calls to match new sdk refactor
* fix empty docker_run_options failure
* formatting
* fix formatting (#3 )
2018-08-13 09:03:58 -07:00
Jacob Freck
7730c46ee4
Internal: verify code formatting in build ( #633 )
...
* format all files, enforce formatting in travis build
* add yapf to vsts build
* update vsts build
* fix
* fix
* fix
* change queue to ubuntu
* revert
* temporarily enable builds on pushes to this branch
* change to non preview
* revert
* update yapf version, rerun
* update pytest parallelism
* add retry to arm call to avoid failures
* remove non-master trigger
* update builds, formatting style
2018-08-06 15:29:06 -07:00
Jacob Freck
b18eb695a1
Feature: SDK refactor ( #622 )
...
* start refactor
* continue refactor for cluster and job functions
* fix imports
* fixes
* fixes
* refactor integration test secrets management
* fix cluster create, add new test
* add tests for new sdk api and fix bugs
* fix naming and bugs
* update job operations naming, bug fixes
* fix cluster tests
* fix joboperations and tests
* update cli and fix some bugs
* start fixes
* fix pylint errors, bugs
* add deprecated warning checks, rename tests
* add docstrings for baseoperations
* add docstrings
* docstrings, add back compat for coreclient, fix init for spark client
* whitespace
* docstrings, whitespace
* docstrings, fixes
* docstrings, fixes
* fix the sdk documentation, bugs
* fix method call
* pool_id->id
* rename ids
* cluster_id->id
* cluster_id->id
* add todo
* fixes
* add some todos
* rename pool to cluster, add todo for nodes params
* add todos for nodes param removal
* update functions names
* remove deprecated fucntion calls
* update docs and docstrings
* update docstrings
* get rid of TODOs, fix docstrings
* remove unused setting
* inheritance -> composition
* fix models bugs
* fix create_user bug
* update sdk_example.py
* fix create user argument issue
* update sdk_example.py
* update doc
* use Software model instead of string
* add job wait flag, add cluster application wait functions
* add docs for wait, update tests
* fix bug
* add clientrequesterror catch to fix tests
2018-08-03 15:20:05 -07:00
Jacob Freck
a8f8e92629
Fix: docs links version ( #614 )
...
* update changelog
* update versions
2018-06-20 14:55:13 -07:00
Jacob Freck
4e0b1ecd0f
Fix: spark debug tool filter out .venv, make debug tool testable ( #612 )
...
* filter out .venv
* add NodeOutput model
* add debug tool integration test
* add test for debug tool
* split condition
* revert style change
* remove debug print
* whitespace
* remove other model implementation
* fix cluster copy
* fix cluster run and cluster copy
2018-06-20 14:26:35 -07:00
Jacob Freck
34b25855d5
Fix: release v0.8.0 ( #600 )
...
* update changelog and version
* update changelog
* add deprecation version and tests
* fix tests
* update changelog with deprecations
* update changelog
2018-06-11 15:48:08 -07:00
mmduyzend
1cc71c7a59
Fix: allow cluster config to be printed when no username has been set ( #597 )
2018-06-11 10:42:24 -07:00
mmduyzend
98c601ceb8
Fix: Deprecation messages cause TypeError in non-verbose mode ( #596 )
...
* Stop deprecate() from throwing when not in verbose mode
* Improve deprecation warning messages
2018-06-08 10:59:13 -07:00
mmduyzend
7d7a814c50
Fix: fix typos ( #595 )
2018-06-07 09:57:43 -07:00
Jacob Freck
88d04195ec
Feature: add cluster list quiet flag, ability to compose with delete ( #581 )
...
* add quiet flag, ability to compose with delete
* log.print instead of print
* add some docs
2018-06-06 16:03:34 -07:00
Brian
fbf1bab704
Conda, Apt-Get and Pip Install Plugins ( #594 )
...
* Added install plugins
* Moved packages to directory
* Removed channel from conda install
* changed default to none
* Added line
* fixed template
* Fixed naming of apt get
2018-06-06 15:16:27 -07:00
Timothee Guerin
fa3ac0eb3b
Fix: --size-low-pri being ignored ( #593 )
2018-06-05 10:54:02 -07:00
Jacob Freck
3f0c8f9bfc
Fix: set logger to stdout ( #588 )
...
* set logger to stdout
* typo
* add log.print level
2018-06-04 17:39:24 -07:00
Jacob Freck
f16aac091e
Feature: pure python ssh ( #577 )
...
* forward multiple ports
* plumb through cli
* continue cli implementation
* fixes
* pylint ignore
* spacing
* remove debug stuff, fix bug
* add --internal support
* add to init
* add comment
* remove nesting
* add logging
* add some docs
2018-06-04 17:16:51 -07:00
Jacob Freck
af449dc194
Feature: add node run command ( #572 )
...
* add node run command
* whitespace
* add node-run doc
* add host flag
* refactor, print->log
* generated username
* more secure random
* better handling of find node, type conversion
* add generate_user_on_node
* docs update
* fix docs
* remove duplicate import, sort
2018-06-04 13:58:33 -07:00
Timothee Guerin
b9a863b2f5
Warnings show stacktrace on verbose ( #587 )
2018-06-04 08:10:00 -07:00
Jacob Freck
8b8cd6260f
Fix: Remove old spark-defaults.conf jars ( #567 )
2018-05-30 13:05:55 -07:00
Timothee Guerin
8fea9ce092
Feature: Disable scheduling on group of nodes ( #540 )
2018-05-30 13:02:48 -07:00
Timothee Guerin
02f336b0a0
Feature: New Models design with auto validation, default and merging ( #543 )
2018-05-30 09:07:09 -07:00
lachiemurray
f6735cc6dd
Feature: Support passing of remote executables via aztk spark cluster submit ( #549 )
2018-05-24 10:29:23 -07:00
Jacob Freck
1527929e30
Feature: TensorflowOnSpark python plugin ( #525 )
...
* initial commit
* update
* update
* add gpu support
* remove comment
* change class to function
* fix merge issue
* add some docs
2018-05-21 13:22:55 -07:00
Jacob Freck
603a413d12
Feature: nvBLAS and OpenBLAS plugin ( #539 )
...
* add openblas plugin, update gpu docker images with netlib-lgpl
* update images and plugins
* add nvblas plugin
* revert gpu docker image change, add -Pnetlib-lgpl to base images
* change configuraitons to functions, add pugins to cluster.yaml
2018-05-15 17:47:41 -07:00
Timothee Guerin
a99bbe19e6
Fix pass docker repo command back to the cluster config ( #538 )
2018-05-03 08:48:38 -07:00
Timothee Guerin
7a7e63c54f
Feature: New Toolkit configuration ( #507 )
2018-05-01 16:36:44 -07:00
Timothee Guerin
9bc76396bc
Docs: Added worker on master docs ( #531 )
2018-05-01 14:40:31 -07:00
Jacob Freck
779bffb2da
Feature: refactor docker images ( #510 )
...
* add spark2.3.0 hadoop2.8.3 dockerfile
* start update to docker image
* add SPARK_DIST_CLASSPATH to bashrc, source .bashrc in docker run
* add maven install for jars
* docker image update and code fix
* add libthrift (still broken)
* start image refactor, build from source,
* add refactor to r base image
* finish refactor r image
* add storage jars and deps
* exclude netty to get rid of dependency conflict
* add miniconda image
* update 2.2.0 base, anaconda image
* remove unused cuda-8.0 image
* start pipenv implementation
* miniconda version arg
* update anaconda and miniconda image
* style
* pivot to virtualenv
* remove virtualenv from path when submitting apps
* flatten layers
* explicit calls to aztk python instead of activating virtualenv
* update base, miniconda, anaconda
* add compatibility version for base aztk images
* typo fix
* update pom
* update environment variable name
* update environment variables
* add anaconda images base & gpu
* update gpu and miniconda base images
* create venv in cluster create
* update base docker files, remove virtualenv
* fix path
* add exclusion to base images
* update r images
* delete python images (in favor of anaconda and miniconda)
* add miniconda gpu images
* update comment
* update aztk_version_compatibility to dokcer image version
* add a build script
* virutalenv->pipenv, add pipfile & pipfile.lock remove secretstorage
* aztk/staging->aztk/spark
* remove jars, add .null to keep directory
* update pipfile, update jupyter and jupyterlab
* update default images
* update base images to fix hdfs
* update build script with correct path
* add spark1.6.3 anaconda, miniconda, r base and gpu images
* update build script to include spark1.6.3
* mkdir out
* exclude commons lang and slf4j dependencies
* mkdir out
* no fail if dir exists
* update node_scripts
* update env var name
* update env var name
* fix the docker_repo docs
* master->0.7.0
2018-04-30 17:19:01 -07:00
Jacob Freck
47000a5c7d
Bug: add timeout handling to cluster_run and copy ( #524 )
...
* update cluster_run and copy to handle timeouts
* fix
* move timeout default to connect function
2018-04-30 16:49:58 -07:00
Timothee Guerin
c98df7d1df
Feature: Added custom scripts functionality for plugins with the cli(Deprecate custom scripts) ( #517 )
2018-04-27 10:31:24 -07:00
Jacob Freck
2e995b4899
Feature: spark ui proxy plugin ( #467 )
...
* initial commit
* add args
* add docs
* change default plugins
* update ssh cli ui, remove plugin name
* change conditional
* update docs to include jupyterlab
* remove spark_ui_proxy as default plugin
2018-04-23 12:12:31 -07:00
Jacob Freck
44a07654aa
Feature: spark debug tool ( #455 )
...
* start implementation of cluster debug utility
* update debug program
* update debug
* fix output directory structure
* cleanup output, add error checking
* sort imports
* start untar
* extract tar
* add debug.py to pylintc ignore, line too long
* crlf->lf
* add app logs
* call get_spark_app_logs, typos
* add docs
* remove debug.py from pylintrc ignore
* added debug.py back to pylint ignore
* change pylint ignore
* remove commented log
* update cluster_run
* refactor cluster_copy
* update debug, add spinner for run and copy
* make new sdk cluster_download endpoint
2018-04-09 15:02:43 -07:00
Jacob Freck
1eaa1b6e42
Feature: add internal flag to node commands ( #482 )
...
* add internal ssh flag
* add --internal flag to cluster get
* cluster run internal flag
* fix add command back
* cluster copy internal
* fix method params
* fix method params
* add debug statement
* fix params
* remove debug statement
* fixes
* add debug statement
* remove debug statement
* add hostname to /etc/hosts
* remove hostname from /etc/hosts
* add sdk docs for internal switch in cluster run and copy
2018-04-06 15:59:13 -07:00
Jacob Freck
be8cd2a490
Bug: Remove unused ssh plugin flags ( #488 )
2018-04-06 14:55:47 -07:00
Jacob Freck
a33bdbc5a9
Bug: fix broken spark init command ( #486 )
2018-04-06 14:10:40 -07:00
Jacob Freck
4ef3dd09df
Bug: add spark.history.fs.logDirectory to required keys ( #456 )
...
* add spark.history.fs.logDirectory to requried keys
* add spark_event_log_enabled_key to required_keys
* docs, add history server config to spark-defaults.conf
* fix bad logic
* crlf->lf
2018-04-05 14:11:35 -07:00
Jacob Freck
32de752d53
Feature: Spark add output logs flag ( #468 )
...
* add output flag to cluster submit
* add output flag to cluster app-logs
* add output flag to job get-app-logs
* sort imports
* make spinner context
2018-04-05 12:21:56 -07:00
Jacob Freck
8889059aad
Feature: match cluster submit exit code in cli ( #478 )
2018-04-05 11:54:25 -07:00
Jacob Freck
ee1e61bb9d
Bug: fix spark job submit path ( #474 )
...
* fix job submit path, fix raise error, remove print
* source bashrc before executing
2018-04-03 11:19:35 -07:00
Jacob Freck
2dd7891499
Bug: add support for jars, pyfiles, files in Jobs ( #408 )
...
* add support for jars, pyfiles, files, refactor JobConfig
* set encoding explicitly
* fix typerror bug in mixed_mode()
2018-03-26 11:38:05 -07:00
Jacob Freck
5761a3663a
Bug: set explicit file open encoding ( #448 )
...
* explicit file encoding
* crlf->lf
2018-03-23 13:42:30 -07:00
Jacob Freck
8aa1843f23
Feature: managed storage for clusters and jobs ( #443 )
...
* add in storage management for clusters, jobs
* add warning logs on cli delete
* whitespace
* add keep-logs flag
* add docs on storage lifetime
2018-03-20 10:45:49 -07:00
Timothee Guerin
9253aac0ea
Fix: VNet required error now showing if using mixed mode without it ( #440 )
2018-03-14 10:27:48 -07:00