Timothee Guerin
002fef31e7
Wip
2018-05-03 08:26:36 -07:00
Timothee Guerin
4c530f61c0
Wip
2018-05-02 10:47:18 -07:00
Timothee Guerin
1be7620c30
move stuff around
2018-05-02 09:55:19 -07:00
Timothee Guerin
7bd55b49f5
Added cluster state class doc
2018-05-01 18:11:20 -07:00
Timothee Guerin
069716bd4c
Added preempted state
2018-05-01 18:08:42 -07:00
Timothee Guerin
7a9092b30a
Cluster wip
2018-05-01 17:34:51 -07:00
Timothee Guerin
7a7e63c54f
Feature: New Toolkit configuration ( #507 )
2018-05-01 16:36:44 -07:00
Timothee Guerin
9bc76396bc
Docs: Added worker on master docs ( #531 )
2018-05-01 14:40:31 -07:00
Pablo Selem
23c97dede2
Feature: monitor tick ( #508 )
...
* Wip
* Fix issues
* more tweaks
* FIx more
* More env renaming
* Start
* Docker runs now
* Wip docker run on node
* fix issues
* More fix
* Starts
* Starting spark again
* Works
* Fix
* More fixes
* Running plugins on the host works
* tweak
* Fix: tests
* Define plugin docs
* Added types
* Fix jupyterlab
* initial commit with grafana and influxdb
* changes to find .env files
* start refactor into single plugin
* remove unused plugins and add required files to resource_mon
* make it work with multiple nodes
* remove sudo calls and update default dashboard
* fix merge issue
* updates to make plugins work in container again
* remove bad characters from previous checking
* Added test for invalid target and target role
* Fix pylint
* Rename
* Added docs for debug plugins
* add docs for resource_monitor plugin
* surface passwords in metrics plugin config
* surface passwords in metrics plugin config p2
* updated comments
* initial work for TICK stack
* try getting telegraf working
* use tvm name as hostname
* update run_on to target_role
* update sources to only use tick stack
* remove unused external port
* update start script
* docs
* PR feedback
* update readme with a warning that data is only local
* change chronograf port to use 8890
* remove jars
* remove unused port
* update docs with new port info
2018-05-01 14:27:30 -07:00
Jacob Freck
779bffb2da
Feature: refactor docker images ( #510 )
...
* add spark2.3.0 hadoop2.8.3 dockerfile
* start update to docker image
* add SPARK_DIST_CLASSPATH to bashrc, source .bashrc in docker run
* add maven install for jars
* docker image update and code fix
* add libthrift (still broken)
* start image refactor, build from source,
* add refactor to r base image
* finish refactor r image
* add storage jars and deps
* exclude netty to get rid of dependency conflict
* add miniconda image
* update 2.2.0 base, anaconda image
* remove unused cuda-8.0 image
* start pipenv implementation
* miniconda version arg
* update anaconda and miniconda image
* style
* pivot to virtualenv
* remove virtualenv from path when submitting apps
* flatten layers
* explicit calls to aztk python instead of activating virtualenv
* update base, miniconda, anaconda
* add compatibility version for base aztk images
* typo fix
* update pom
* update environment variable name
* update environment variables
* add anaconda images base & gpu
* update gpu and miniconda base images
* create venv in cluster create
* update base docker files, remove virtualenv
* fix path
* add exclusion to base images
* update r images
* delete python images (in favor of anaconda and miniconda)
* add miniconda gpu images
* update comment
* update aztk_version_compatibility to dokcer image version
* add a build script
* virutalenv->pipenv, add pipfile & pipfile.lock remove secretstorage
* aztk/staging->aztk/spark
* remove jars, add .null to keep directory
* update pipfile, update jupyter and jupyterlab
* update default images
* update base images to fix hdfs
* update build script with correct path
* add spark1.6.3 anaconda, miniconda, r base and gpu images
* update build script to include spark1.6.3
* mkdir out
* exclude commons lang and slf4j dependencies
* mkdir out
* no fail if dir exists
* update node_scripts
* update env var name
* update env var name
* fix the docker_repo docs
* master->0.7.0
2018-04-30 17:19:01 -07:00
Jacob Freck
47000a5c7d
Bug: add timeout handling to cluster_run and copy ( #524 )
...
* update cluster_run and copy to handle timeouts
* fix
* move timeout default to connect function
2018-04-30 16:49:58 -07:00
Jacob Freck
9ccc1c6b83
Bug: fix job submission cluster data issues ( #533 )
2018-04-30 16:39:04 -07:00
Jacob Freck
0015e22d01
Bug: make node scripts upload in memory ( #519 )
2018-04-27 11:59:14 -07:00
Timothee Guerin
c98df7d1df
Feature: Added custom scripts functionality for plugins with the cli(Deprecate custom scripts) ( #517 )
2018-04-27 10:31:24 -07:00
Jacob Freck
07ac9b7596
Bug: azure file share not being shared with container ( #521 )
...
* share all of /mnt
* fix todo message
2018-04-26 17:49:33 -07:00
Jacob Freck
db7a2ef994
Bug: pypi long description ( #450 )
...
* update version and change long description content type
* update travis to build on version tags
* update version
* update twine version and aztk version
* add twine to travis
* Update version.py
* bump version
* add plugins
* bump version
* bump version
* bump version
* update dest
* remove debug from travis build
* update travis, fix setup.py includes, bump version
* update azure batch version to 4.1.3
* add reqs back to travis
* bump version
* remove commented dependencies
2018-04-26 15:24:53 -07:00
Timothee Guerin
e361c3b0b3
Feature: Readthedocs support ( #497 )
2018-04-26 14:03:45 -07:00
Timothee Guerin
a00dbb7d6c
fix(hdfs): using wrong conditions ( #515 )
2018-04-26 10:31:53 -07:00
Timothee Guerin
5579d95b41
Fix: Worker on master flag ignored and standardize boolean environment ( #514 )
2018-04-26 09:27:37 -07:00
Jacob Freck
3cc43c3277
Feature: disable msrestazure keyring log ( #509 )
2018-04-25 12:06:49 -07:00
Timothee Guerin
b8a3fccaf0
Fix: AZTK_IS_MASTER not set on worker and failing ( #506 )
...
* Fix: AZTK_IS_MASTER_NOT_SET
* Update jupyter lab too
* update jupyterlab target role
* True false doc
2018-04-24 12:14:47 -07:00
Timothee Guerin
de7898334c
Feature: Plugin V2: Running plugin on host ( #461 )
2018-04-23 17:20:43 -07:00
Timothee Guerin
12450fb672
Fix keyring ( #505 )
2018-04-23 17:04:48 -07:00
Timothee Guerin
5e79a2ced4
Bug: Dependency issue with keyring not having good dependencies ( #504 )
2018-04-23 15:17:54 -07:00
Jacob Freck
2e995b4899
Feature: spark ui proxy plugin ( #467 )
...
* initial commit
* add args
* add docs
* change default plugins
* update ssh cli ui, remove plugin name
* change conditional
* update docs to include jupyterlab
* remove spark_ui_proxy as default plugin
2018-04-23 12:12:31 -07:00
Pablo Selem
4ba3c9d7c6
Update file to point at master branch ( #501 )
...
The file is pointing at the development branch instead of master.
2018-04-20 09:19:32 -07:00
Jacob Freck
7ef721f0c1
Feature: getting started script ( #475 )
...
* initial changes for getting started scripts
* add temp error handling
* rename file - fix typo
* add debug strings
* add handling for existing user
* WIP: wait for subprocess to complete to get exit code
* WIP: handle existing user and refactor code
* WIP: add missing return statements
* WIP: fix typo
* start sdk refactor
* mostly working create
* working happy create path
* handle errors for vnet, aad application
* make account setup interactive
* add prompt
* add docs
* rename account_setup_refac to account_setup
* add some logging
* pip install msrest, azure-cli-core, import issues
* remove in script pip, add shell wrapper program
* ellipsis to period
* update branch name for account_setup.sh
* docstring
* retry resource group creation
* fix typo, update retry
* explicitly set output location
* wget overwrite flag, docs update
* add prompt for multi tenants
* fix bug with batch account creation
* add spinner, print statements, fix formatting bug
* fix param bug
2018-04-11 13:27:55 -07:00
Jacob Freck
44a07654aa
Feature: spark debug tool ( #455 )
...
* start implementation of cluster debug utility
* update debug program
* update debug
* fix output directory structure
* cleanup output, add error checking
* sort imports
* start untar
* extract tar
* add debug.py to pylintc ignore, line too long
* crlf->lf
* add app logs
* call get_spark_app_logs, typos
* add docs
* remove debug.py from pylintrc ignore
* added debug.py back to pylint ignore
* change pylint ignore
* remove commented log
* update cluster_run
* refactor cluster_copy
* update debug, add spinner for run and copy
* make new sdk cluster_download endpoint
2018-04-09 15:02:43 -07:00
Jacob Freck
61e7c591cd
Feature: Spark vnet custom dns hostname fix ( #490 )
...
* add hostname to /etc/hosts
* conditionally set hostname in /etc/hosts
2018-04-09 10:22:32 -07:00
Jacob Freck
013f6e402f
Bug: Spark shuffle service worker registration fail ( #492 )
...
* stop calling start-shuffle-service.sh script
* whitespace
* remove unused method
2018-04-09 10:10:48 -07:00
Jacob Freck
1eaa1b6e42
Feature: add internal flag to node commands ( #482 )
...
* add internal ssh flag
* add --internal flag to cluster get
* cluster run internal flag
* fix add command back
* cluster copy internal
* fix method params
* fix method params
* add debug statement
* fix params
* remove debug statement
* fixes
* add debug statement
* remove debug statement
* add hostname to /etc/hosts
* remove hostname from /etc/hosts
* add sdk docs for internal switch in cluster run and copy
2018-04-06 15:59:13 -07:00
Jacob Freck
be8cd2a490
Bug: Remove unused ssh plugin flags ( #488 )
2018-04-06 14:55:47 -07:00
Jacob Freck
a33bdbc5a9
Bug: fix broken spark init command ( #486 )
2018-04-06 14:10:40 -07:00
Jacob Freck
4ef3dd09df
Bug: add spark.history.fs.logDirectory to required keys ( #456 )
...
* add spark.history.fs.logDirectory to requried keys
* add spark_event_log_enabled_key to required_keys
* docs, add history server config to spark-defaults.conf
* fix bad logic
* crlf->lf
2018-04-05 14:11:35 -07:00
Jacob Freck
32de752d53
Feature: Spark add output logs flag ( #468 )
...
* add output flag to cluster submit
* add output flag to cluster app-logs
* add output flag to job get-app-logs
* sort imports
* make spinner context
2018-04-05 12:21:56 -07:00
Jacob Freck
8889059aad
Feature: match cluster submit exit code in cli ( #478 )
2018-04-05 11:54:25 -07:00
Jacob Freck
a59fe8b959
Bug: throw error if submitting before master elected ( #479 )
2018-04-05 11:51:57 -07:00
Jacob Freck
82ad0296af
Bug: add gitattributes file ( #470 )
...
Bug: line endings, add gitattributes file
2018-04-04 13:44:26 -07:00
Jacob Freck
ee1e61bb9d
Bug: fix spark job submit path ( #474 )
...
* fix job submit path, fix raise error, remove print
* source bashrc before executing
2018-04-03 11:19:35 -07:00
Pablo Selem
da61337bfe
Feature: JupyterLab plugin ( #459 )
...
* initial commit
* enable jupyter lab as a default plugin
* remove hack text and add more logging
* remove docker compose code. it is not used yet
* remove unused code and comment
2018-03-29 09:10:11 -07:00
Jacob Freck
c1f43c73c1
Bug: fix aztk cluster submit paths, imports ( #464 )
...
* fix cluster submit
* add export pythonpath to docker_main
2018-03-27 16:05:54 -07:00
Jacob Freck
2dd7891499
Bug: add support for jars, pyfiles, files in Jobs ( #408 )
...
* add support for jars, pyfiles, files, refactor JobConfig
* set encoding explicitly
* fix typerror bug in mixed_mode()
2018-03-26 11:38:05 -07:00
Jacob Freck
5761a3663a
Bug: set explicit file open encoding ( #448 )
...
* explicit file encoding
* crlf->lf
2018-03-23 13:42:30 -07:00
Timothee Guerin
dfbfead4aa
Internal: Move node scripts under aztk and upload all aztk to cluster ( #433 )
2018-03-22 15:39:06 -07:00
Timothee Guerin
f2eb1a4e92
Update storage sdk from 0.33.0 to 1.1.0 ( #439 )
2018-03-22 10:16:57 -07:00
Jacob Freck
8aa1843f23
Feature: managed storage for clusters and jobs ( #443 )
...
* add in storage management for clusters, jobs
* add warning logs on cli delete
* whitespace
* add keep-logs flag
* add docs on storage lifetime
2018-03-20 10:45:49 -07:00
lachiemurray
27822f42e9
Fix typo in command_builder 'expecity' -> 'explicitly' ( #447 )
2018-03-20 08:24:52 -07:00
Jacob Freck
8d00a2c444
Feature: enable mixed mode for jobs ( #442 )
...
* enable mixed mode for jobs
* simplify
* add job configuration validation
* whitespace
2018-03-16 11:25:56 -07:00
Timothee Guerin
9253aac0ea
Fix: VNet required error now showing if using mixed mode without it ( #440 )
2018-03-14 10:27:48 -07:00
stevekuo4
bcefca3d2f
Fix the endpoint ( #437 )
2018-03-13 13:45:45 -07:00