aztk/docker-image/r
Jacob Freck 779bffb2da
Feature: refactor docker images (#510)
* add spark2.3.0 hadoop2.8.3 dockerfile

* start update to docker image

* add SPARK_DIST_CLASSPATH to bashrc, source .bashrc in docker run

* add maven install for jars

* docker image update and code fix

* add libthrift (still broken)

* start image refactor, build from source,

* add refactor to r base image

* finish refactor r image

* add storage jars and deps

* exclude netty to get rid of dependency conflict

* add miniconda image

* update 2.2.0 base, anaconda image

* remove unused cuda-8.0 image

* start pipenv implementation

* miniconda version arg

* update anaconda and miniconda image

* style

* pivot to virtualenv

* remove virtualenv from path when submitting apps

* flatten layers

* explicit calls to aztk python instead of activating virtualenv

* update base, miniconda, anaconda

* add compatibility version for base aztk images

* typo fix

* update pom

* update environment variable name

* update environment variables

* add anaconda images base & gpu

* update gpu and miniconda base images

* create venv in cluster create

* update base docker files, remove virtualenv

* fix path

* add exclusion to base images

* update r images

* delete python images (in favor of anaconda and miniconda)

* add miniconda gpu images

* update comment

* update aztk_version_compatibility to dokcer image version

* add a build script

* virutalenv->pipenv, add pipfile & pipfile.lock remove secretstorage

* aztk/staging->aztk/spark

* remove jars, add .null to keep directory

* update pipfile, update jupyter and jupyterlab

* update default images

* update base images to fix hdfs

* update build script with correct path

* add spark1.6.3 anaconda, miniconda, r base and gpu images

* update build script to include spark1.6.3

* mkdir out

* exclude commons lang and slf4j dependencies

* mkdir out

* no fail if dir exists

* update node_scripts

* update env var name

* update env var name

* fix the docker_repo docs

* master->0.7.0
2018-04-30 17:19:01 -07:00
..
spark1.6.3 Feature: refactor docker images (#510) 2018-04-30 17:19:01 -07:00
spark2.1.0 Feature: refactor docker images (#510) 2018-04-30 17:19:01 -07:00
spark2.2.0 Feature: refactor docker images (#510) 2018-04-30 17:19:01 -07:00
spark2.3.0 Feature: refactor docker images (#510) 2018-04-30 17:19:01 -07:00
README.md Docs: update (#263) 2017-12-11 17:02:51 -08:00

README.md

R

This Dockerfile is used to build the aztk-r Docker image used by this toolkit. This image uses CRAN R3.4.1, RStudio-Server v1.1.383, SparklyR and comes packaged with Tidyverse.

You can modify these Dockerfiles to build your own image. However, in mose cases, building on top of the aztk-base image is recommended.

NOTE: If you plan to use RStudio-Server, hosted on the Spark cluster's master node, with your Spark cluster, we recommend using this image.

How to build this image

This Dockerfile takes in a variable at build time that allow you to specify your desired R version: R_VERSION

By default, we set R_VERSION=3.4.1.

For example, if I wanted to use R v3.4.0 with Spark v2.1.0, I would select the appropriate Dockerfile and build the image as follows:

# spark2.1.0/Dockerfile
docker build \
    --build-arg R_VERSION=3.4.0 \
    -t <my_image_tag> .

R_VERSION is used to set the version of R for your cluster.

NOTE: Most versions of R will work. However, when selecting your R version, please make sure that the it is compatible with your selected version of Spark.