Граф коммитов

334 Коммитов

Автор SHA1 Сообщение Дата
Brian 89948de905
R GPU Dockerfiles (#341)
* Added dockerfiles for aztk r gpu

* Removed tensorflow/cntk cpu versions

* Enabled configure option for shared lib for python

* Removed temp paths

* Replaced python version with aztk version

* Cleaning up args

* Added variable definition for tensorflow, shorten lines
2018-01-22 16:36:36 -08:00
Jacob Freck 717f3d04a3
Bug: Ensure Directories Exist Before Copying (#342) 2018-01-22 16:13:13 -05:00
Jacob Freck 1aae3b5a8b
Feature: expose exit_code (#320)
* expose the batch task exit_code in ApplicationLog

* add exit_code to job list-apps output

* add execution info to applicationlog object

* whitespace
2018-01-19 16:53:45 -05:00
Jacob Freck 882418a928
Feature: Spark job submission (#278)
* refactor submit, initial job-submission commit

* fix merge conflicts

* comment out gpu_enabled

* update schedule, user job manager task, wait until container ready

* wait until docker container running

* add get_job_log

* add job id

* start multitask job submission

* start multitask implementation

* wait for node setup to complete before completing start task

* fix incorrect logic

* add environment variable, fix loading task definition, fix waiting for container

* fix early job completion, autokill pool, cleanup code

* include debug print error

* define job submission sdk stubs

* Job and JobConfiguration models

* remove timestamp, implement job function stubs , add Application model

* add delete_job, bug fixes

* add wait_until_job_finished, wait_until_all_jobs_finished

* fix storage output logs location, function names

* whitespace

* add cli file stubs

* start job cli  implementation

* better output for job cli, added job.yaml configuration file

* error catch for missing blob, add sdk docs, models updates

* start Jobs tutorial doc

* template for rest of job tutorial doc

* rename app_id to application_name, add job sdk docs

* docs fix

* add support for low pri nodes

* better formatting for job.yaml

* remove unecessary comments

* add defaults and commnets for job.yaml, fix spark_configuration file loading

* fix spark_configuration file paths

* fix app_arg issue

* clean up code, address comments

* rename to cluster_submit_helper

* update get job print

* add list-apps, update help text for job id

* add application metadata to job, update job get print

* better error for get_application_log, add working example to job.yaml, add print_application

* add warning about master selection

* update list_applications return value

* whitespace

* Add link to docs in job.yaml, validation for job.yaml

* convert to commandbuilder, remove print

* fix submit exit code, fix no app_args bug, set autoscale interval

* wait until custom scripts are completed

* add missed import in get_app_logs, whitespace

* use correct python version for on node app submit

* update get output format
2018-01-19 13:37:58 -05:00
Pablo Selem b56c551189
always upgrade jupyter to the latest version on install (#336) 2018-01-18 19:46:37 -08:00
Jacob Freck 87f02b3b2d
Bug: fix secrets shared key backwards compatibility (#334) 2018-01-17 13:05:34 -08:00
Timothee Guerin e193ed30dd
Feature: VNet support (#324)
* VNet support
* Azure active directory authentication
2018-01-17 11:14:56 -08:00
Jacob Freck f7c1cb5172 add MIT license (#323) 2018-01-11 10:19:14 -08:00
Jacob Freck 1d6000d759
Feature; performance tune core-site.xml (#321) 2018-01-09 11:51:27 -08:00
Jacob Freck 7e9bf9c383
Feature: Spark retry job (#318) 2018-01-08 12:31:47 -08:00
Jacob Freck 510e2ec635
Bug: fix alignment in get print cluster (#312) 2018-01-04 11:17:20 -08:00
Jacob Freck 54936e555b
Bug: suppress warning on add-user (#302) 2017-12-22 14:11:32 -05:00
Jacob Freck 2fa4e69d67
Bug: fix logic for worker custom scripts (#295) 2017-12-22 13:35:52 -05:00
JS 89f0e54182
jupyter azfiles bug + gpu sample (#291)
* gpu sample + jupyter mnt point

* rename jupyter gpu sample
2017-12-19 11:16:12 -08:00
Jacob Freck 7d6706f5e9
Bug: History server parse file not exist (#288) 2017-12-18 12:56:45 -08:00
Jacob Freck d3f6fa1e8d
Feature: update to v0.5.0 (#283) 2017-12-15 12:57:20 -08:00
Jacob Freck 1d9d9cabb9
Bug: fix loading local spark config (#282) 2017-12-14 21:57:48 -08:00
Jacob Freck 46fd44414a
Feature: add feedback for cluster create wait (#273)
* add feedback for cluster create wait

* whitespace

* alphasort imports
2017-12-12 19:32:54 -08:00
JS 6091b1d390
Docs: update (#263)
* Update README.md

streamline and update main readme.md

* Update README.md

* Update README.md

* Update 13-configuration.md

* Update 12-docker-image.md

* Update 12-docker-image.md

* Update README.md

* Create README.md

* Update README.md

* Update 10-clusters.md
2017-12-11 17:02:51 -08:00
Jacob Freck 1a73e0d840
Feature: Default Spark filesystem master HA (#271)
* add default filesystem master ha

* move settings to spark-defaults.conf

* whitespace
2017-12-11 18:49:02 -05:00
Jacob Freck 40bd2d62f3
Bug: fix wrong path for global secrets (#265)
* fix wrong path for global secrets

* load spark_conf files correctly

* docker-image docs fix

* docker-image docs fix

* move load_aztk_spark_config function to config.py
2017-12-11 15:06:08 -05:00
Emlyn Corrin 86515038e9 Retry asking for password when it doesn't match or is empty (#252)
* Retry asking for password when it doesn't match or is empty

* Limit to 3 retries and let user know of add-user command on failure

* Throw error on failure
2017-12-10 07:13:34 -08:00
Brian 0efadefb98
Feature: Sparklyr (#243)
* Added rstudio server script

* Added rstudio server port to aztk sdk

* Added R dockerfiles

* Added new line on dockerfiles

* Pointing dockerfiles to new aztk-base

* allow any user or application in the server to write to the history server log directory
2017-12-08 11:17:27 -08:00
JS c12ecebad2
Update 60-gpu.md (#253)
* Update 60-gpu.md

make sure is available in region

* Update 60-gpu.md
2017-12-07 14:11:53 -08:00
Jacob Freck 6c26943819
Feature: update docker image doc (#251)
* update docker-image readme with new images

* update docs
2017-12-06 14:04:07 -08:00
Jacob Freck 8a060a2f78
Feature: Spark GPU (#206)
* conditionally install and use nvidia-docker

* status statements, and -y flag for install

* add example, remove unnecessary ppa

* rename custom script, remove print statement, update example

* add Dockerfile

* fix path in Dockerfile

* update Docker images to use service account

* updated docs, changed default docker repo for gpu skus

* make timing statements more verbose

* remove unnecessary script

* added gpu docs

* fix up docs and numba example
2017-12-04 13:28:05 -08:00
Jacob Freck 195602b852
Bug: fix bad reference to FileShare (#245) 2017-12-01 15:05:34 -08:00
Jacob Freck d74ceee3f5
Feature: Rename SDK (#231)
* initial refactor

* rename cli_fe to cli

* add docs for sdk client

* typo

* remove conflict

* fix zip node scripts bug, add sdk_example program

* start models docs

* add ClusterConfiguration docs, fix merge bug

* Application docs update

* added Application and SparkConfiguration docs

* whitespace

* rename cli.py and spark/cli

* add docstring for load_spark_client
2017-12-01 13:42:55 -08:00
Pablo Selem cabcc29b3c
Feature: Azure Files (#241)
* initial take on installing azure files

* fix cluster.yaml parsing of files shares

* remove test code

* add docs for Azure Files
2017-11-30 14:16:53 -08:00
Pablo Selem 62f3995c2c
remove redundant setting in non-master code section and use non-os drive to mount HDFS (#242) 2017-11-30 12:17:50 -08:00
JS b983d12419 update 10-clusters.md - rm jupyter ref (for now) (#222) 2017-11-27 09:37:15 -08:00
Dario c2a30c4207 Use merged config instead of args (#227) 2017-11-27 09:11:42 -08:00
Ian McDonald 7c59567bec Fix a typo in link (#235) 2017-11-27 08:56:54 -08:00
Matt Scanlon e50cf8c52c Spellchecking (#233)
aztk was misspelled as aztb, amended to correct spelling.
2017-11-27 08:35:02 -08:00
Matt Scanlon ef0acbabbe Amended documentation to correctly obtain logs (#234)
Documentation amended: Use of aztk spark app logs results in an error. Correct usage is aztk spark cluster app-logs. Document amended to reflect this.
2017-11-27 08:31:17 -08:00
Jacob Freck 60cae3b8dd
Feature: HDFS plugin (#215)
* hdfs plugin initial

* fix passwordless ssh key, allow raw ip for datanodes

* remoe debug statement

* description of forwarded ports

* add prcryptodome to requirements.txt

* fix file copy bug

* add namenode ui to ssh command, add docs
2017-11-22 14:51:11 -08:00
Jacob Freck cd37fd5586
Feature: time docker pull statements (#224) 2017-11-21 16:54:29 -08:00
Daniel Ciborowski ea7e17aec4 Update README.md (#225)
If you set both --size and --size-low-pri you receive an error...

dciborow@6433739-0524:/mnt/c/GIT/ABN/spark-recommender$ aztk spark cluster create --id testCluster1 --size 0 --size-low-pri 2 --vm-size standard_d2_v2
usage: aztk spark cluster create [-h] [--id CLUSTER_ID]
                                 [--size SIZE | --size-low-pri SIZE_LOW_PRI]
                                 [--vm-size VM_SIZE] [--username USERNAME]
                                 [--password PASSWORD] [--ssh-key SSH_KEY]
                                 [--docker-repo DOCKER_REPO] [--no-wait]
                                 [--wait]
aztk spark cluster create: error: argument --size-low-pri: not allowed with argument --size
2017-11-21 10:09:15 -08:00
Jacob Freck 27c10300c1
Bug: fix file reference for cwd only (#204)
Partial fix for #204
2017-11-13 10:42:29 -08:00
Pablo Selem dfd73960c3
add a default port for the history server (#213)
* add a default port for the history server

* use default ssh ports in contructor and fix ssh output strings
2017-11-13 09:55:53 -08:00
JS 65a56dd657
Feature/python container (#210)
* added python container, jupyter install script, vanilla container

* Update README.md

* Create README.md

* Create README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update jupyter.sh

* Update README.md

* Update README.md

* python

* readme update

* docker updates

* Update README.md

* Update 12-docker-image.md

* Update constants.py

* add image files for wiki

* update imageS

* .

* dockerfile typo

* dockerfiles

* Removed r

* readme

* update constants.py

* update readme

* readme updates

* readme updates
2017-11-09 00:43:32 -08:00
Pablo Selem d44f0ebf08
Enable history server on batch nodes (#209)
* enable the history server on the cluster by default

* detect spark event log settings and set up the node accordingly

* fix spacing between methods
2017-11-08 19:55:13 -08:00
Jacob Freck e0a42b8abf
Bug: fix issue with quoted arguments that have special characters (#208) 2017-11-08 15:56:39 -08:00
Jacob Freck 6ce3f048ba
Bug: fix pyenv init (#201)
* fix spark-submit command file paths for extra files

* fix files flag typo

* remove too long lines

* fix command
2017-11-02 15:55:40 -07:00
Jacob Freck 34d587b4b4
Bug: fix spark-submit command file paths for extra files (#194)
* fix spark-submit command file paths for extra files

* fix files flag typo

* remove too long lines
2017-11-02 15:03:55 -07:00
Jacob Freck 1e0dc78108
Bug: fix docker auth for private repos (#196) 2017-11-02 12:53:54 -07:00
Jacob Freck 62c7ef2c38
Bug: only show clusters created with aztk in list (#200)
* only show clusters created with aztk in list

* check the full metadata key value pair and cleanup code
2017-11-02 12:48:29 -07:00
Jacob Freck 608a3c8408
Bug: remove explicit set of PYSPARK_PYTHON (#193)
* remove explicit set of PYSPARK_PYTHON

* undo accidental change
2017-11-02 12:42:48 -07:00
Pablo Selem f31e2b1225
fix issue with creating cluster erroring out with an invalid file (#198) 2017-11-02 11:15:34 -07:00
Jacob Freck e31d5e73ae
Feature: SDK (#180)
* initial sdk commit

* added submit, wait_until_cluster_ready, wait_until_jobs_done, async options

* remove incorrect public method

* initial error checking

* factored helper commands out of spark client file

* remove unnecessary print statement

* add get_cluster and list_cluster, fix imports

* add create_user

* remove appmodel from base class, create app_logs_model

* fix imports and models call bug

* change asynchronous to wait, add get_logs(), add wait_until_app_done()

* add get_application_status(), add_create_cluster_in_parallel(), add submit_all_applications(), add wait_until_all_clusters_are_ready, create_user() accepts cluster_id, rename app to application

* add try catches for all public methods, raise AztkErrors

* add Custom Script model

* added custom script support

* added ssh conf model

* added ssh conf subclass, fixed typing issue

* add support for spark configuration files, move upload_node_scripts to spark

* changed submit to require cluster_id

* whitespace

* initial integration commit

* create_user takes ssh key or path to key

* fix get_user_public_key

* add name for parameter

* integrate cluster_create and cluster_add_user with sdk

* expose pool in Cluster model

* add bool return value to delete_cluster

* integrate cluster_delete

* integrate cluster_get and cluster_list with sdk

* integrate cluster_submit and cluster_app_logs with sdk

* integrate ssh with sdk

* change master_ui to web_ui and web_ui to job_ui

* fix cluster_create, cluster_get, and cluster_ssh, aztklib

* add home_directory_path constant

* remove unnecessary files in cli

* remove unnecessary files

* fix setup.py constants

* redo #167

* fix constants and setup.py

* remove old tests, fix constants

* fix get_log typo

* refactor cluster_create for readability

* decouple cli from sdk, and batch functions from software functions

* update version, fix in setup.py

* whitespace

* fix init source path

* change import

* move error.py to root sdk directory

* fix cluster_ssh error call

* fix bug if no app_args are present

* remove default value for docker_repo in constructor
2017-10-31 12:34:23 -07:00