Граф коммитов

324 Коммитов

Автор SHA1 Сообщение Дата
Timothee Guerin 002fef31e7 Wip 2018-05-03 08:26:36 -07:00
Timothee Guerin 4c530f61c0 Wip 2018-05-02 10:47:18 -07:00
Timothee Guerin 1be7620c30 move stuff around 2018-05-02 09:55:19 -07:00
Timothee Guerin 7bd55b49f5 Added cluster state class doc 2018-05-01 18:11:20 -07:00
Timothee Guerin 069716bd4c Added preempted state 2018-05-01 18:08:42 -07:00
Timothee Guerin 7a9092b30a Cluster wip 2018-05-01 17:34:51 -07:00
Timothee Guerin 7a7e63c54f
Feature: New Toolkit configuration (#507) 2018-05-01 16:36:44 -07:00
Timothee Guerin 9bc76396bc
Docs: Added worker on master docs (#531) 2018-05-01 14:40:31 -07:00
Pablo Selem 23c97dede2
Feature: monitor tick (#508)
* Wip

* Fix issues

* more tweaks

* FIx more

* More env renaming

* Start

* Docker runs now

* Wip docker run on node

* fix issues

* More fix

* Starts

* Starting spark again

* Works

* Fix

* More fixes

* Running plugins on the host works

* tweak

* Fix: tests

* Define plugin docs

* Added types

* Fix jupyterlab

* initial commit with grafana and influxdb

* changes to find .env files

* start refactor into single plugin

* remove unused plugins and add required files to resource_mon

* make it work with multiple nodes

* remove sudo calls and update default dashboard

* fix merge issue

* updates to make plugins work in container again

* remove bad characters from previous checking

* Added test for invalid target and target role

* Fix pylint

* Rename

* Added docs for debug plugins

* add docs for resource_monitor plugin

* surface passwords in metrics plugin config

* surface passwords in metrics plugin config p2

* updated comments

* initial work for TICK stack

* try getting telegraf working

* use tvm name as hostname

* update run_on to target_role

* update sources to only use tick stack

* remove unused external port

* update start script

* docs

* PR feedback

* update readme with a warning that data is only local

* change chronograf port to use 8890

* remove jars

* remove unused port

* update docs with new port info
2018-05-01 14:27:30 -07:00
Jacob Freck 779bffb2da
Feature: refactor docker images (#510)
* add spark2.3.0 hadoop2.8.3 dockerfile

* start update to docker image

* add SPARK_DIST_CLASSPATH to bashrc, source .bashrc in docker run

* add maven install for jars

* docker image update and code fix

* add libthrift (still broken)

* start image refactor, build from source,

* add refactor to r base image

* finish refactor r image

* add storage jars and deps

* exclude netty to get rid of dependency conflict

* add miniconda image

* update 2.2.0 base, anaconda image

* remove unused cuda-8.0 image

* start pipenv implementation

* miniconda version arg

* update anaconda and miniconda image

* style

* pivot to virtualenv

* remove virtualenv from path when submitting apps

* flatten layers

* explicit calls to aztk python instead of activating virtualenv

* update base, miniconda, anaconda

* add compatibility version for base aztk images

* typo fix

* update pom

* update environment variable name

* update environment variables

* add anaconda images base & gpu

* update gpu and miniconda base images

* create venv in cluster create

* update base docker files, remove virtualenv

* fix path

* add exclusion to base images

* update r images

* delete python images (in favor of anaconda and miniconda)

* add miniconda gpu images

* update comment

* update aztk_version_compatibility to dokcer image version

* add a build script

* virutalenv->pipenv, add pipfile & pipfile.lock remove secretstorage

* aztk/staging->aztk/spark

* remove jars, add .null to keep directory

* update pipfile, update jupyter and jupyterlab

* update default images

* update base images to fix hdfs

* update build script with correct path

* add spark1.6.3 anaconda, miniconda, r base and gpu images

* update build script to include spark1.6.3

* mkdir out

* exclude commons lang and slf4j dependencies

* mkdir out

* no fail if dir exists

* update node_scripts

* update env var name

* update env var name

* fix the docker_repo docs

* master->0.7.0
2018-04-30 17:19:01 -07:00
Jacob Freck 47000a5c7d
Bug: add timeout handling to cluster_run and copy (#524)
* update cluster_run and copy to handle timeouts

* fix

* move timeout default to connect function
2018-04-30 16:49:58 -07:00
Jacob Freck 9ccc1c6b83
Bug: fix job submission cluster data issues (#533) 2018-04-30 16:39:04 -07:00
Jacob Freck 0015e22d01
Bug: make node scripts upload in memory (#519) 2018-04-27 11:59:14 -07:00
Timothee Guerin c98df7d1df
Feature: Added custom scripts functionality for plugins with the cli(Deprecate custom scripts) (#517) 2018-04-27 10:31:24 -07:00
Jacob Freck 07ac9b7596
Bug: azure file share not being shared with container (#521)
* share all of /mnt

* fix todo message
2018-04-26 17:49:33 -07:00
Jacob Freck db7a2ef994
Bug: pypi long description (#450)
* update version and change long description content type

* update travis to build on version tags

* update version

* update twine version and aztk version

* add twine to travis

* Update version.py

* bump version

* add plugins

* bump version

* bump version

* bump version

* update dest

* remove debug from travis build

* update travis, fix setup.py includes, bump version

* update azure batch version to 4.1.3

* add reqs back to travis

* bump version

* remove commented dependencies
2018-04-26 15:24:53 -07:00
Timothee Guerin e361c3b0b3
Feature: Readthedocs support (#497) 2018-04-26 14:03:45 -07:00
Timothee Guerin a00dbb7d6c
fix(hdfs): using wrong conditions (#515) 2018-04-26 10:31:53 -07:00
Timothee Guerin 5579d95b41
Fix: Worker on master flag ignored and standardize boolean environment (#514) 2018-04-26 09:27:37 -07:00
Jacob Freck 3cc43c3277
Feature: disable msrestazure keyring log (#509) 2018-04-25 12:06:49 -07:00
Timothee Guerin b8a3fccaf0
Fix: AZTK_IS_MASTER not set on worker and failing (#506)
* Fix: AZTK_IS_MASTER_NOT_SET

* Update jupyter lab too

* update jupyterlab target role

* True false doc
2018-04-24 12:14:47 -07:00
Timothee Guerin de7898334c
Feature: Plugin V2: Running plugin on host (#461) 2018-04-23 17:20:43 -07:00
Timothee Guerin 12450fb672
Fix keyring (#505) 2018-04-23 17:04:48 -07:00
Timothee Guerin 5e79a2ced4
Bug: Dependency issue with keyring not having good dependencies (#504) 2018-04-23 15:17:54 -07:00
Jacob Freck 2e995b4899
Feature: spark ui proxy plugin (#467)
* initial commit

* add args

* add docs

* change default plugins

* update ssh cli ui, remove plugin name

* change conditional

* update docs to include jupyterlab

* remove spark_ui_proxy as default plugin
2018-04-23 12:12:31 -07:00
Pablo Selem 4ba3c9d7c6
Update file to point at master branch (#501)
The file is pointing at the development branch instead of master.
2018-04-20 09:19:32 -07:00
Jacob Freck 7ef721f0c1
Feature: getting started script (#475)
* initial changes for getting started scripts

* add temp error handling

* rename file - fix typo

* add debug strings

* add handling for existing user

* WIP: wait for subprocess to complete to get exit code

* WIP: handle existing user and refactor code

* WIP: add missing return statements

* WIP: fix typo

* start sdk refactor

* mostly working create

* working happy create path

* handle errors for vnet, aad application

* make account setup interactive

* add prompt

* add docs

* rename account_setup_refac to account_setup

* add some logging

* pip install msrest, azure-cli-core, import issues

* remove in script pip, add shell wrapper program

* ellipsis to period

* update branch name for account_setup.sh

* docstring

* retry resource group creation

* fix typo, update retry

* explicitly set output location

* wget overwrite flag, docs update

* add prompt for multi tenants

* fix bug with batch account creation

* add spinner, print statements, fix formatting bug

* fix param bug
2018-04-11 13:27:55 -07:00
Jacob Freck 44a07654aa
Feature: spark debug tool (#455)
* start implementation of cluster debug utility

* update debug program

* update debug

* fix output directory structure

* cleanup output, add error checking

* sort imports

* start untar

* extract tar

* add debug.py to pylintc ignore, line too long

* crlf->lf

* add app logs

* call get_spark_app_logs, typos

* add docs

* remove debug.py from pylintrc ignore

* added debug.py back to pylint ignore

* change pylint ignore

* remove commented log

* update cluster_run

* refactor cluster_copy

* update debug, add spinner for run and copy

* make new sdk cluster_download endpoint
2018-04-09 15:02:43 -07:00
Jacob Freck 61e7c591cd
Feature: Spark vnet custom dns hostname fix (#490)
* add hostname to /etc/hosts

* conditionally set hostname in /etc/hosts
2018-04-09 10:22:32 -07:00
Jacob Freck 013f6e402f
Bug: Spark shuffle service worker registration fail (#492)
* stop calling start-shuffle-service.sh script

* whitespace

* remove unused method
2018-04-09 10:10:48 -07:00
Jacob Freck 1eaa1b6e42
Feature: add internal flag to node commands (#482)
* add internal ssh flag

* add --internal flag to cluster get

* cluster run internal flag

* fix add command back

* cluster copy internal

* fix method params

* fix method params

* add debug statement

* fix params

* remove debug statement

* fixes

* add debug statement

* remove debug statement

* add hostname to /etc/hosts

* remove hostname from /etc/hosts

* add sdk docs for internal switch in cluster run and copy
2018-04-06 15:59:13 -07:00
Jacob Freck be8cd2a490
Bug: Remove unused ssh plugin flags (#488) 2018-04-06 14:55:47 -07:00
Jacob Freck a33bdbc5a9
Bug: fix broken spark init command (#486) 2018-04-06 14:10:40 -07:00
Jacob Freck 4ef3dd09df
Bug: add spark.history.fs.logDirectory to required keys (#456)
* add spark.history.fs.logDirectory to requried keys

* add spark_event_log_enabled_key to required_keys

* docs, add history server config to spark-defaults.conf

* fix bad logic

* crlf->lf
2018-04-05 14:11:35 -07:00
Jacob Freck 32de752d53
Feature: Spark add output logs flag (#468)
* add output flag to cluster submit

* add output flag to cluster app-logs

* add output flag to job get-app-logs

* sort imports

* make spinner context
2018-04-05 12:21:56 -07:00
Jacob Freck 8889059aad
Feature: match cluster submit exit code in cli (#478) 2018-04-05 11:54:25 -07:00
Jacob Freck a59fe8b959
Bug: throw error if submitting before master elected (#479) 2018-04-05 11:51:57 -07:00
Jacob Freck 82ad0296af
Bug: add gitattributes file (#470)
Bug: line endings, add gitattributes file
2018-04-04 13:44:26 -07:00
Jacob Freck ee1e61bb9d
Bug: fix spark job submit path (#474)
* fix job submit path, fix raise error, remove print

* source bashrc before executing
2018-04-03 11:19:35 -07:00
Pablo Selem da61337bfe
Feature: JupyterLab plugin (#459)
* initial commit

* enable jupyter lab as a default plugin

* remove hack text and add more logging

* remove docker compose code. it is not used yet

* remove unused code and comment
2018-03-29 09:10:11 -07:00
Jacob Freck c1f43c73c1
Bug: fix aztk cluster submit paths, imports (#464)
* fix cluster submit

* add export pythonpath to docker_main
2018-03-27 16:05:54 -07:00
Jacob Freck 2dd7891499
Bug: add support for jars, pyfiles, files in Jobs (#408)
* add support for jars, pyfiles, files, refactor JobConfig

* set encoding explicitly

* fix typerror bug in mixed_mode()
2018-03-26 11:38:05 -07:00
Jacob Freck 5761a3663a
Bug: set explicit file open encoding (#448)
* explicit file encoding

* crlf->lf
2018-03-23 13:42:30 -07:00
Timothee Guerin dfbfead4aa
Internal: Move node scripts under aztk and upload all aztk to cluster (#433) 2018-03-22 15:39:06 -07:00
Timothee Guerin f2eb1a4e92
Update storage sdk from 0.33.0 to 1.1.0 (#439) 2018-03-22 10:16:57 -07:00
Jacob Freck 8aa1843f23
Feature: managed storage for clusters and jobs (#443)
* add in storage management for clusters, jobs

* add warning logs on cli delete

* whitespace

* add keep-logs flag

* add docs on storage lifetime
2018-03-20 10:45:49 -07:00
lachiemurray 27822f42e9 Fix typo in command_builder 'expecity' -> 'explicitly' (#447) 2018-03-20 08:24:52 -07:00
Jacob Freck 8d00a2c444
Feature: enable mixed mode for jobs (#442)
* enable mixed mode for jobs

* simplify

* add job configuration validation

* whitespace
2018-03-16 11:25:56 -07:00
Timothee Guerin 9253aac0ea
Fix: VNet required error now showing if using mixed mode without it (#440) 2018-03-14 10:27:48 -07:00
stevekuo4 bcefca3d2f Fix the endpoint (#437) 2018-03-13 13:45:45 -07:00