Brian
89948de905
R GPU Dockerfiles ( #341 )
...
* Added dockerfiles for aztk r gpu
* Removed tensorflow/cntk cpu versions
* Enabled configure option for shared lib for python
* Removed temp paths
* Replaced python version with aztk version
* Cleaning up args
* Added variable definition for tensorflow, shorten lines
2018-01-22 16:36:36 -08:00
Jacob Freck
717f3d04a3
Bug: Ensure Directories Exist Before Copying ( #342 )
2018-01-22 16:13:13 -05:00
Jacob Freck
1aae3b5a8b
Feature: expose exit_code ( #320 )
...
* expose the batch task exit_code in ApplicationLog
* add exit_code to job list-apps output
* add execution info to applicationlog object
* whitespace
2018-01-19 16:53:45 -05:00
Jacob Freck
882418a928
Feature: Spark job submission ( #278 )
...
* refactor submit, initial job-submission commit
* fix merge conflicts
* comment out gpu_enabled
* update schedule, user job manager task, wait until container ready
* wait until docker container running
* add get_job_log
* add job id
* start multitask job submission
* start multitask implementation
* wait for node setup to complete before completing start task
* fix incorrect logic
* add environment variable, fix loading task definition, fix waiting for container
* fix early job completion, autokill pool, cleanup code
* include debug print error
* define job submission sdk stubs
* Job and JobConfiguration models
* remove timestamp, implement job function stubs , add Application model
* add delete_job, bug fixes
* add wait_until_job_finished, wait_until_all_jobs_finished
* fix storage output logs location, function names
* whitespace
* add cli file stubs
* start job cli implementation
* better output for job cli, added job.yaml configuration file
* error catch for missing blob, add sdk docs, models updates
* start Jobs tutorial doc
* template for rest of job tutorial doc
* rename app_id to application_name, add job sdk docs
* docs fix
* add support for low pri nodes
* better formatting for job.yaml
* remove unecessary comments
* add defaults and commnets for job.yaml, fix spark_configuration file loading
* fix spark_configuration file paths
* fix app_arg issue
* clean up code, address comments
* rename to cluster_submit_helper
* update get job print
* add list-apps, update help text for job id
* add application metadata to job, update job get print
* better error for get_application_log, add working example to job.yaml, add print_application
* add warning about master selection
* update list_applications return value
* whitespace
* Add link to docs in job.yaml, validation for job.yaml
* convert to commandbuilder, remove print
* fix submit exit code, fix no app_args bug, set autoscale interval
* wait until custom scripts are completed
* add missed import in get_app_logs, whitespace
* use correct python version for on node app submit
* update get output format
2018-01-19 13:37:58 -05:00
Pablo Selem
b56c551189
always upgrade jupyter to the latest version on install ( #336 )
2018-01-18 19:46:37 -08:00
Jacob Freck
87f02b3b2d
Bug: fix secrets shared key backwards compatibility ( #334 )
2018-01-17 13:05:34 -08:00
Timothee Guerin
e193ed30dd
Feature: VNet support ( #324 )
...
* VNet support
* Azure active directory authentication
2018-01-17 11:14:56 -08:00
Jacob Freck
f7c1cb5172
add MIT license ( #323 )
2018-01-11 10:19:14 -08:00
Jacob Freck
1d6000d759
Feature; performance tune core-site.xml ( #321 )
2018-01-09 11:51:27 -08:00
Jacob Freck
7e9bf9c383
Feature: Spark retry job ( #318 )
2018-01-08 12:31:47 -08:00
Jacob Freck
510e2ec635
Bug: fix alignment in get print cluster ( #312 )
2018-01-04 11:17:20 -08:00
Jacob Freck
54936e555b
Bug: suppress warning on add-user ( #302 )
2017-12-22 14:11:32 -05:00
Jacob Freck
2fa4e69d67
Bug: fix logic for worker custom scripts ( #295 )
2017-12-22 13:35:52 -05:00
JS
89f0e54182
jupyter azfiles bug + gpu sample ( #291 )
...
* gpu sample + jupyter mnt point
* rename jupyter gpu sample
2017-12-19 11:16:12 -08:00
Jacob Freck
7d6706f5e9
Bug: History server parse file not exist ( #288 )
2017-12-18 12:56:45 -08:00
Jacob Freck
d3f6fa1e8d
Feature: update to v0.5.0 ( #283 )
2017-12-15 12:57:20 -08:00
Jacob Freck
1d9d9cabb9
Bug: fix loading local spark config ( #282 )
2017-12-14 21:57:48 -08:00
Jacob Freck
46fd44414a
Feature: add feedback for cluster create wait ( #273 )
...
* add feedback for cluster create wait
* whitespace
* alphasort imports
2017-12-12 19:32:54 -08:00
JS
6091b1d390
Docs: update ( #263 )
...
* Update README.md
streamline and update main readme.md
* Update README.md
* Update README.md
* Update 13-configuration.md
* Update 12-docker-image.md
* Update 12-docker-image.md
* Update README.md
* Create README.md
* Update README.md
* Update 10-clusters.md
2017-12-11 17:02:51 -08:00
Jacob Freck
1a73e0d840
Feature: Default Spark filesystem master HA ( #271 )
...
* add default filesystem master ha
* move settings to spark-defaults.conf
* whitespace
2017-12-11 18:49:02 -05:00
Jacob Freck
40bd2d62f3
Bug: fix wrong path for global secrets ( #265 )
...
* fix wrong path for global secrets
* load spark_conf files correctly
* docker-image docs fix
* docker-image docs fix
* move load_aztk_spark_config function to config.py
2017-12-11 15:06:08 -05:00
Emlyn Corrin
86515038e9
Retry asking for password when it doesn't match or is empty ( #252 )
...
* Retry asking for password when it doesn't match or is empty
* Limit to 3 retries and let user know of add-user command on failure
* Throw error on failure
2017-12-10 07:13:34 -08:00
Brian
0efadefb98
Feature: Sparklyr ( #243 )
...
* Added rstudio server script
* Added rstudio server port to aztk sdk
* Added R dockerfiles
* Added new line on dockerfiles
* Pointing dockerfiles to new aztk-base
* allow any user or application in the server to write to the history server log directory
2017-12-08 11:17:27 -08:00
JS
c12ecebad2
Update 60-gpu.md ( #253 )
...
* Update 60-gpu.md
make sure is available in region
* Update 60-gpu.md
2017-12-07 14:11:53 -08:00
Jacob Freck
6c26943819
Feature: update docker image doc ( #251 )
...
* update docker-image readme with new images
* update docs
2017-12-06 14:04:07 -08:00
Jacob Freck
8a060a2f78
Feature: Spark GPU ( #206 )
...
* conditionally install and use nvidia-docker
* status statements, and -y flag for install
* add example, remove unnecessary ppa
* rename custom script, remove print statement, update example
* add Dockerfile
* fix path in Dockerfile
* update Docker images to use service account
* updated docs, changed default docker repo for gpu skus
* make timing statements more verbose
* remove unnecessary script
* added gpu docs
* fix up docs and numba example
2017-12-04 13:28:05 -08:00
Jacob Freck
195602b852
Bug: fix bad reference to FileShare ( #245 )
2017-12-01 15:05:34 -08:00
Jacob Freck
d74ceee3f5
Feature: Rename SDK ( #231 )
...
* initial refactor
* rename cli_fe to cli
* add docs for sdk client
* typo
* remove conflict
* fix zip node scripts bug, add sdk_example program
* start models docs
* add ClusterConfiguration docs, fix merge bug
* Application docs update
* added Application and SparkConfiguration docs
* whitespace
* rename cli.py and spark/cli
* add docstring for load_spark_client
2017-12-01 13:42:55 -08:00
Pablo Selem
cabcc29b3c
Feature: Azure Files ( #241 )
...
* initial take on installing azure files
* fix cluster.yaml parsing of files shares
* remove test code
* add docs for Azure Files
2017-11-30 14:16:53 -08:00
Pablo Selem
62f3995c2c
remove redundant setting in non-master code section and use non-os drive to mount HDFS ( #242 )
2017-11-30 12:17:50 -08:00
JS
b983d12419
update 10-clusters.md - rm jupyter ref (for now) ( #222 )
2017-11-27 09:37:15 -08:00
Dario
c2a30c4207
Use merged config instead of args ( #227 )
2017-11-27 09:11:42 -08:00
Ian McDonald
7c59567bec
Fix a typo in link ( #235 )
2017-11-27 08:56:54 -08:00
Matt Scanlon
e50cf8c52c
Spellchecking ( #233 )
...
aztk was misspelled as aztb, amended to correct spelling.
2017-11-27 08:35:02 -08:00
Matt Scanlon
ef0acbabbe
Amended documentation to correctly obtain logs ( #234 )
...
Documentation amended: Use of aztk spark app logs results in an error. Correct usage is aztk spark cluster app-logs. Document amended to reflect this.
2017-11-27 08:31:17 -08:00
Jacob Freck
60cae3b8dd
Feature: HDFS plugin ( #215 )
...
* hdfs plugin initial
* fix passwordless ssh key, allow raw ip for datanodes
* remoe debug statement
* description of forwarded ports
* add prcryptodome to requirements.txt
* fix file copy bug
* add namenode ui to ssh command, add docs
2017-11-22 14:51:11 -08:00
Jacob Freck
cd37fd5586
Feature: time docker pull statements ( #224 )
2017-11-21 16:54:29 -08:00
Daniel Ciborowski
ea7e17aec4
Update README.md ( #225 )
...
If you set both --size and --size-low-pri you receive an error...
dciborow@6433739-0524:/mnt/c/GIT/ABN/spark-recommender$ aztk spark cluster create --id testCluster1 --size 0 --size-low-pri 2 --vm-size standard_d2_v2
usage: aztk spark cluster create [-h] [--id CLUSTER_ID]
[--size SIZE | --size-low-pri SIZE_LOW_PRI]
[--vm-size VM_SIZE] [--username USERNAME]
[--password PASSWORD] [--ssh-key SSH_KEY]
[--docker-repo DOCKER_REPO] [--no-wait]
[--wait]
aztk spark cluster create: error: argument --size-low-pri: not allowed with argument --size
2017-11-21 10:09:15 -08:00
Jacob Freck
27c10300c1
Bug: fix file reference for cwd only ( #204 )
...
Partial fix for #204
2017-11-13 10:42:29 -08:00
Pablo Selem
dfd73960c3
add a default port for the history server ( #213 )
...
* add a default port for the history server
* use default ssh ports in contructor and fix ssh output strings
2017-11-13 09:55:53 -08:00
JS
65a56dd657
Feature/python container ( #210 )
...
* added python container, jupyter install script, vanilla container
* Update README.md
* Create README.md
* Create README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update README.md
* Update jupyter.sh
* Update README.md
* Update README.md
* python
* readme update
* docker updates
* Update README.md
* Update 12-docker-image.md
* Update constants.py
* add image files for wiki
* update imageS
* .
* dockerfile typo
* dockerfiles
* Removed r
* readme
* update constants.py
* update readme
* readme updates
* readme updates
2017-11-09 00:43:32 -08:00
Pablo Selem
d44f0ebf08
Enable history server on batch nodes ( #209 )
...
* enable the history server on the cluster by default
* detect spark event log settings and set up the node accordingly
* fix spacing between methods
2017-11-08 19:55:13 -08:00
Jacob Freck
e0a42b8abf
Bug: fix issue with quoted arguments that have special characters ( #208 )
2017-11-08 15:56:39 -08:00
Jacob Freck
6ce3f048ba
Bug: fix pyenv init ( #201 )
...
* fix spark-submit command file paths for extra files
* fix files flag typo
* remove too long lines
* fix command
2017-11-02 15:55:40 -07:00
Jacob Freck
34d587b4b4
Bug: fix spark-submit command file paths for extra files ( #194 )
...
* fix spark-submit command file paths for extra files
* fix files flag typo
* remove too long lines
2017-11-02 15:03:55 -07:00
Jacob Freck
1e0dc78108
Bug: fix docker auth for private repos ( #196 )
2017-11-02 12:53:54 -07:00
Jacob Freck
62c7ef2c38
Bug: only show clusters created with aztk in list ( #200 )
...
* only show clusters created with aztk in list
* check the full metadata key value pair and cleanup code
2017-11-02 12:48:29 -07:00
Jacob Freck
608a3c8408
Bug: remove explicit set of PYSPARK_PYTHON ( #193 )
...
* remove explicit set of PYSPARK_PYTHON
* undo accidental change
2017-11-02 12:42:48 -07:00
Pablo Selem
f31e2b1225
fix issue with creating cluster erroring out with an invalid file ( #198 )
2017-11-02 11:15:34 -07:00
Jacob Freck
e31d5e73ae
Feature: SDK ( #180 )
...
* initial sdk commit
* added submit, wait_until_cluster_ready, wait_until_jobs_done, async options
* remove incorrect public method
* initial error checking
* factored helper commands out of spark client file
* remove unnecessary print statement
* add get_cluster and list_cluster, fix imports
* add create_user
* remove appmodel from base class, create app_logs_model
* fix imports and models call bug
* change asynchronous to wait, add get_logs(), add wait_until_app_done()
* add get_application_status(), add_create_cluster_in_parallel(), add submit_all_applications(), add wait_until_all_clusters_are_ready, create_user() accepts cluster_id, rename app to application
* add try catches for all public methods, raise AztkErrors
* add Custom Script model
* added custom script support
* added ssh conf model
* added ssh conf subclass, fixed typing issue
* add support for spark configuration files, move upload_node_scripts to spark
* changed submit to require cluster_id
* whitespace
* initial integration commit
* create_user takes ssh key or path to key
* fix get_user_public_key
* add name for parameter
* integrate cluster_create and cluster_add_user with sdk
* expose pool in Cluster model
* add bool return value to delete_cluster
* integrate cluster_delete
* integrate cluster_get and cluster_list with sdk
* integrate cluster_submit and cluster_app_logs with sdk
* integrate ssh with sdk
* change master_ui to web_ui and web_ui to job_ui
* fix cluster_create, cluster_get, and cluster_ssh, aztklib
* add home_directory_path constant
* remove unnecessary files in cli
* remove unnecessary files
* fix setup.py constants
* redo #167
* fix constants and setup.py
* remove old tests, fix constants
* fix get_log typo
* refactor cluster_create for readability
* decouple cli from sdk, and batch functions from software functions
* update version, fix in setup.py
* whitespace
* fix init source path
* change import
* move error.py to root sdk directory
* fix cluster_ssh error call
* fix bug if no app_args are present
* remove default value for docker_repo in constructor
2017-10-31 12:34:23 -07:00