779bffb2da
* add spark2.3.0 hadoop2.8.3 dockerfile * start update to docker image * add SPARK_DIST_CLASSPATH to bashrc, source .bashrc in docker run * add maven install for jars * docker image update and code fix * add libthrift (still broken) * start image refactor, build from source, * add refactor to r base image * finish refactor r image * add storage jars and deps * exclude netty to get rid of dependency conflict * add miniconda image * update 2.2.0 base, anaconda image * remove unused cuda-8.0 image * start pipenv implementation * miniconda version arg * update anaconda and miniconda image * style * pivot to virtualenv * remove virtualenv from path when submitting apps * flatten layers * explicit calls to aztk python instead of activating virtualenv * update base, miniconda, anaconda * add compatibility version for base aztk images * typo fix * update pom * update environment variable name * update environment variables * add anaconda images base & gpu * update gpu and miniconda base images * create venv in cluster create * update base docker files, remove virtualenv * fix path * add exclusion to base images * update r images * delete python images (in favor of anaconda and miniconda) * add miniconda gpu images * update comment * update aztk_version_compatibility to dokcer image version * add a build script * virutalenv->pipenv, add pipfile & pipfile.lock remove secretstorage * aztk/staging->aztk/spark * remove jars, add .null to keep directory * update pipfile, update jupyter and jupyterlab * update default images * update base images to fix hdfs * update build script with correct path * add spark1.6.3 anaconda, miniconda, r base and gpu images * update build script to include spark1.6.3 * mkdir out * exclude commons lang and slf4j dependencies * mkdir out * no fail if dir exists * update node_scripts * update env var name * update env var name * fix the docker_repo docs * master->0.7.0 |
||
---|---|---|
.. | ||
spark1.6.3 | ||
spark2.1.0 | ||
spark2.2.0 | ||
spark2.3.0 | ||
README.md |
README.md
R
This Dockerfile is used to build the aztk-r Docker image used by this toolkit. This image uses CRAN R3.4.1, RStudio-Server v1.1.383, SparklyR and comes packaged with Tidyverse.
You can modify these Dockerfiles to build your own image. However, in mose cases, building on top of the aztk-base image is recommended.
NOTE: If you plan to use RStudio-Server, hosted on the Spark cluster's master node, with your Spark cluster, we recommend using this image.
How to build this image
This Dockerfile takes in a variable at build time that allow you to specify your desired R version: R_VERSION
By default, we set R_VERSION=3.4.1.
For example, if I wanted to use R v3.4.0 with Spark v2.1.0, I would select the appropriate Dockerfile and build the image as follows:
# spark2.1.0/Dockerfile
docker build \
--build-arg R_VERSION=3.4.0 \
-t <my_image_tag> .
R_VERSION is used to set the version of R for your cluster.
NOTE: Most versions of R will work. However, when selecting your R version, please make sure that the it is compatible with your selected version of Spark.