This commit is contained in:
jiata 2017-11-07 16:44:32 -08:00
Родитель 9a6ba14915
Коммит 648a05aabc
13 изменённых файлов: 29 добавлений и 516 удалений

Просмотреть файл

@ -5,7 +5,7 @@ import os
"""
CLI_EXE = 'aztk'
DEFAULT_DOCKER_REPO = "jiata/aztk-vanilla:0.1.0-spark2.2.0"
DEFAULT_DOCKER_REPO = "jiata/aztk-base:0.1.0-spark2.2.0"
DOCKER_SPARK_CONTAINER_NAME = "spark"
# DOCKER

Просмотреть файл

@ -15,7 +15,7 @@ size: 2
username: spark
# docker_repo: <name of docker image repo (for more information, see https://github.com/Azure/aztk/blob/master/docs/12-docker-image.md)>
docker_repo: jiata/aztk:0.1.0-spark2.2.0-python3.5.4
docker_repo: jiata/aztk-base:0.1.0-spark2.2.0
# # optional custom scripts to run on the Spark master, Spark worker or all nodes in the cluster
# custom_scripts:

Просмотреть файл

@ -3,9 +3,9 @@
# This custom script only works on images where jupyter is pre-installed on the Docker image
#
# This custom script has been tested to work on the following docker images:
# - jiata/aztk-python:0.1.0-spark2.2.0-anaconda3-5.0.0 (python3.6.2)
# - jiata/aztk-python:0.1.0-spark2.1.0-anaconda3-5.0.0 (python3.6.2)
# - jiata/aztk-python:0.1.0-spark1.6.3-anaconda3-5.0.0 (python3.6.2)
# - jiata/aztk-python:0.1.0-spark2.2.0-python3.6.2
# - jiata/aztk-python:0.1.0-spark2.1.0-python3.6.2
# - jiata/aztk-python:0.1.0-spark1.6.3-python3.6.2
if [ "$IS_MASTER" = "1" ]; then

Просмотреть файл

@ -10,7 +10,7 @@ On top of that, we also provide two flavors of Spark images, one geared towards
Docker Image | Image Type | User Language(s) | What's Included?
:-- | :-- | :-- | :--
[aztk-vanilla](https://hub.docker.com/r/jiata/aztk-vanilla/) | Vanilla | Java, Scala | `Spark`
[aztk-base](https://hub.docker.com/r/jiata/aztk-base/) | Base | Java, Scala | `Spark`
[aztk-python](https://hub.docker.com/r/jiata/aztk-python/) | Pyspark | Python | `Anaconda`</br>`Jupyter Notebooks` </br> `PySpark`
[aztk-r](https://hub.docker.com/r/jiata/aztk-r/) | SparklyR | R | `CRAN`</br>`RStudio Server`</br>`SparklyR and SparkR`
@ -22,15 +22,15 @@ Today, all the AZTK images are hosted on Docker Hub under [jiata](https://hub.do
Docker Repo (hosted on Docker Hub) | Spark Version | Python Version | R Version
:-- | :-- | :-- | :--
jiata/aztk-vanilla:0.1.0-spark2.2.0 __(defaul)__ | v2.2.0 | -- | --
jiata/aztk-vanilla:0.1.0-spark2.1.0 | v2.1.0 | -- | --
jiata/aztk-vanilla:0.1.0-spark1.6.3 | v1.6.3 | -- | --
jiata/aztk-base:0.1.0-spark2.2.0 __(defaul)__ | v2.2.0 | -- | --
jiata/aztk-base:0.1.0-spark2.1.0 | v2.1.0 | -- | --
jiata/aztk-base:0.1.0-spark1.6.3 | v1.6.3 | -- | --
jiata/aztk-python:0.1.0-spark2.2.0-anaconda3-5.0.0 | v2.2.0 | v3.6.2 | --
jiata/aztk-python:0.1.0-spark2.1.0-anaconda3-5.0.0 | v2.1.0 | v3.6.2 | --
jiata/aztk-python:0.1.0-spark1.6.3-anaconda3-5.0.0 | v1.6.3 | v3.6.2 | --
jiata/aztk-r:0.1.0-spark2.2.0-r3.4.1 | v2.2.0 | -- | v3.4.1
jiata/aztk-r:0.1.0-spark2.1.0-r3.4.1 | v2.1.0 | -- | v3.4.1
jiata/aztk-r:0.1.0-spark1.6.3-r3.4.1 | v1.6.3 | -- | v3.4.1
[coming soon] jiata/aztk-r:0.1.0-spark2.2.0-r3.4.1 | v2.2.0 | -- | v3.4.1
[coming soon] jiata/aztk-r:0.1.0-spark2.1.0-r3.4.1 | v2.1.0 | -- | v3.4.1
[coming soon] jiata/aztk-r:0.1.0-spark1.6.3-r3.4.1 | v1.6.3 | -- | v3.4.1
If you have requests to add to the list of supported images, please file a Github issue.

Просмотреть файл

@ -1,7 +1,7 @@
# Python
This Dockerfile is used to build the __aztk-python__ Docker image used by this toolkit. This image uses Anaconda, providing access to a wide range of popular python packages.
You can modify these Dockerfiles to build your own image. However, in mose cases, building on top of the __aztk-vanilla__ image is recommended.
You can modify these Dockerfiles to build your own image. However, in mose cases, building on top of the __aztk-base__ image is recommended.
NOTE: If you plan to use Jupyter Notebooks with your Spark cluster, we recommend using this image as Jupyter Notebook comes pre-installed with Anaconda.

Просмотреть файл

@ -6,15 +6,7 @@ ARG ANACONDA_VERSION=anaconda3-5.0.0
# install user specificed version of anaconda
RUN pyenv install -f $ANACONDA_VERSION \
&& pyenv global $ANACONDA_VERSION \
&& apt-get install unzip \
# Fetch h2o_pysparkling
&& pip install http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/7/Python/h2o-3.14.0.7-py2.py3-none-any.whl \
&& pip install h2o_pysparkling_2.2 \
# Install Sparkling water 2.2.2
&& cd /home \
&& wget http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.2/2/sparkling-water-2.2.2.zip \
&& unzip sparkling-water-2.2.2.zip
&& pyenv global $ANACONDA_VERSION
# set env vars
ENV USER_PYTHON_VERSION $ANACONDA_VERSION

Просмотреть файл

@ -6,18 +6,9 @@ ARG ANACONDA_VERSION=anaconda3-5.0.0
# install user specificed version of anaconda
RUN pyenv install -f $ANACONDA_VERSION \
&& pyenv global $ANACONDA_VERSION \
&& apt-get install unzip \
# Fetch h2o_pysparkling
&& pip install http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/7/Python/h2o-3.14.0.7-py2.py3-none-any.whl \
&& pip install h2o_pysparkling_2.2 \
# Install Sparkling water 2.2.2
&& cd /home \
&& wget http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.2/2/sparkling-water-2.2.2.zip \
&& unzip sparkling-water-2.2.2.zip
&& pyenv global $ANACONDA_VERSION
# set env vars
ENV SPARKLING_WATER /home/sparkling-water-2.2.2/assembly/build/libs/sparkling-water-assembly_2.11-2.2.2-all.jar
ENV USER_PYTHON_VERSION $ANACONDA_VERSION
CMD ["/bin/bash"]

Просмотреть файл

@ -1,21 +0,0 @@
# R
This Dockerfile is used to build the __aztk-r__ Docker image used by this toolkit. This image uses R and RStudio Server, providing access to a wide range of popular R packages.
You can modify these Dockerfiles to build your own image. However, in mose cases, building on top of the __aztk-vanilla__ image is recommended.
## How to build this image
This Dockerfile takes in two variables at build time that allow you to specify your desired Rstudio server versions and R versions: **RSTUDIO_SERVER_VERSION** and **R_VERSION**
By default, we set **R_VERSION=3.4.2** and **RSTUDIO_SERVER_VERSION=1.1.383**.
For example, if I wanted to use Rstudio Server v1.1.383 and R 3.2.1 with Spark v2.1.0, I would select the appropriate Dockerfile and build the image as follows:
```sh
# spark2.1.0/Dockerfile
docker build \
--build-arg RSTUDIO_SERVER_VERSION=1.1.383 \
--build-arg R_VERSION=3.2.1 \
-t <my_image_tag> .
```
**R_VERSION** is used to set the version of R version for your cluster.
**RSTUDIO_SERVER_VERSION** is used to set the version of rstudio server for your cluster.

Просмотреть файл

@ -1,150 +0,0 @@
FROM jiata/aztk-base:0.1.0-spark1.6.3
ARG R_VERSION=3.4.1
ARG RSTUDIO_SERVER_VERSION=1.1.383
ARG BUILD_DATE
ENV BUILD_DATE ${BUILD_DATE:-}
ENV R_VERSION=${R_VERSION:-3.4.2} \
LC_ALL=en_US.UTF-8 \
LANG=en_US.UTF-8 \
TERM=xterm
ADD bootstrap.sh /bootstrap.sh
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
bash-completion \
ca-certificates \
file \
fonts-texgyre \
g++ \
gfortran \
gsfonts \
libbz2-1.0 \
libcurl3 \
libopenblas-dev \
libpangocairo-1.0-0 \
libpcre3 \
libpng16-16 \
libtiff5 \
liblzma5 \
locales \
make \
unzip \
zip \
zlib1g \
libcurl4-openssl-dev \
libxml2-dev \
libapparmor1 \
gdebi-core \
lsb-release \
psmisc \
git \
libssl-dev \
sudo \
wget \
&& echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8 \
&& BUILDDEPS="curl \
default-jdk \
libbz2-dev \
libcairo2-dev \
libcurl4-openssl-dev \
libpango1.0-dev \
libjpeg-dev \
libicu-dev \
libpcre3-dev \
libpng-dev \
libreadline-dev \
libtiff5-dev \
liblzma-dev \
libx11-dev \
libxt-dev \
perl \
tcl8.6-dev \
tk8.6-dev \
texinfo \
texlive-extra-utils \
texlive-fonts-recommended \
texlive-fonts-extra \
texlive-latex-recommended \
x11proto-core-dev \
xauth \
xfonts-base \
xvfb \
zlib1g-dev" \
&& apt-get install -y --no-install-recommends $BUILDDEPS \
&& cd tmp/ \
## Download source code
&& /bootstrap.sh ${R_VERSION} \
## Extract source code
&& tar -xf R-${R_VERSION}.tar.gz \
&& cd R-${R_VERSION} \
## Set compiler flags
&& R_PAPERSIZE=letter \
R_BATCHSAVE="--no-save --no-restore" \
R_BROWSER=xdg-open \
PAGER=/usr/bin/pager \
PERL=/usr/bin/perl \
R_UNZIPCMD=/usr/bin/unzip \
R_ZIPCMD=/usr/bin/zip \
R_PRINTCMD=/usr/bin/lpr \
LIBnn=lib \
AWK=/usr/bin/awk \
CFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
CXXFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
## Configure options
./configure --enable-R-shlib \
--enable-memory-profiling \
--with-readline \
--with-blas="-lopenblas" \
--disable-nls \
--without-recommended-packages \
## Build and install
&& make \
&& make install \
## Add a default CRAN mirror
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/local/lib/R/etc/Rprofile.site \
## Add a library directory (for user-installed packages)
&& mkdir -p /usr/local/lib/R/site-library \
&& chown root:staff /usr/local/lib/R/site-library \
&& chmod g+wx /usr/local/lib/R/site-library \
## Fix library path
&& echo "R_LIBS_USER='/usr/local/lib/R/site-library'" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib/R/library'}" >> /usr/local/lib/R/etc/Renviron \
## install packages from date-locked MRAN snapshot of CRAN
&& [ -z "$BUILD_DATE" ] && BUILD_DATE=$(TZ="America/Los_Angeles" date -I) || true \
&& MRAN=https://mran.microsoft.com/snapshot/${BUILD_DATE} \
&& echo MRAN=$MRAN >> /etc/environment \
&& export MRAN=$MRAN \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl')" >> /usr/local/lib/R/etc/Rprofile.site \
## Use littler installation scripts
&& Rscript -e "install.packages(c('littler', 'docopt'), repo = '$MRAN')" \
&& ln -s /usr/local/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/local/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/local/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## TEMPORARY WORKAROUND to get more robust error handling for install2.r prior to littler update
&& curl -O /usr/local/bin/install2.r https://github.com/eddelbuettel/littler/raw/master/inst/examples/install2.r \
&& chmod +x /usr/local/bin/install2.r \
## Clean up from R source install
&& cd / \
&& rm -rf /tmp/* \
&& apt-get remove --purge -y $BUILDDEPS \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& rm -rf /var/lib/apt/lists/* \
## Downloading and Installing RStudio Server
RUN Rscript -e "install.packages(c('tidyverse', 'sparklyr'))" \
&& wget https://download2.rstudio.org/rstudio-server-$RSTUDIO_SERVER_VERSION-amd64.deb \
&& gdebi rstudio-server-1.1.383-amd64.deb --non-interactive \
&& echo "server-app-armor-enabled=0" | tee -a /etc/rstudio/rserver.conf \
&& echo "Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> ${R_HOME}/etc/Rprofile.site \
## Preparing default user for Rstudio Server
&& set -e \
&& useradd -m -d /home/rstudio rstudio \
&& echo rstudio:rstudio | chpasswd
CMD ["R"]
EXPOSE 8787

Просмотреть файл

@ -1,3 +0,0 @@
#!/bin/bash
IFS='.' read -r -a baseVersion << $1
curl -O https://cran.r-project.org/src/base/R-${baseVersion[0]}/R-$1.tar.gz \

Просмотреть файл

@ -1,148 +0,0 @@
FROM jiata/aztk-base:0.1.0-spark2.1.0
ARG R_VERSION=3.4.1
ARG RSTUDIO_SERVER_VERSION=1.1.383
ARG BUILD_DATE
ENV BUILD_DATE ${BUILD_DATE:-}
ENV R_VERSION=${R_VERSION:-3.4.2} \
LC_ALL=en_US.UTF-8 \
LANG=en_US.UTF-8 \
TERM=xterm
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
bash-completion \
ca-certificates \
file \
fonts-texgyre \
g++ \
gfortran \
gsfonts \
libbz2-1.0 \
libcurl3 \
libopenblas-dev \
libpangocairo-1.0-0 \
libpcre3 \
libpng16-16 \
libtiff5 \
liblzma5 \
locales \
make \
unzip \
zip \
zlib1g \
libcurl4-openssl-dev \
libxml2-dev \
libapparmor1 \
gdebi-core \
lsb-release \
psmisc \
git \
libssl-dev \
sudo \
wget \
&& echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8 \
&& BUILDDEPS="curl \
default-jdk \
libbz2-dev \
libcairo2-dev \
libcurl4-openssl-dev \
libpango1.0-dev \
libjpeg-dev \
libicu-dev \
libpcre3-dev \
libpng-dev \
libreadline-dev \
libtiff5-dev \
liblzma-dev \
libx11-dev \
libxt-dev \
perl \
tcl8.6-dev \
tk8.6-dev \
texinfo \
texlive-extra-utils \
texlive-fonts-recommended \
texlive-fonts-extra \
texlive-latex-recommended \
x11proto-core-dev \
xauth \
xfonts-base \
xvfb \
zlib1g-dev" \
&& apt-get install -y --no-install-recommends $BUILDDEPS \
&& cd tmp/ \
## Download source code
&& IFS='.' read -r -a baseVersion << ${R_VERSION} \
&& curl -O https://cran.r-project.org/src/base/R-${baseVersion[0]}/R-${R_VERSION}.tar.gz \
## Extract source code
&& tar -xf R-${R_VERSION}.tar.gz \
&& cd R-${R_VERSION} \
## Set compiler flags
&& R_PAPERSIZE=letter \
R_BATCHSAVE="--no-save --no-restore" \
R_BROWSER=xdg-open \
PAGER=/usr/bin/pager \
PERL=/usr/bin/perl \
R_UNZIPCMD=/usr/bin/unzip \
R_ZIPCMD=/usr/bin/zip \
R_PRINTCMD=/usr/bin/lpr \
LIBnn=lib \
AWK=/usr/bin/awk \
CFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
CXXFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
## Configure options
./configure --enable-R-shlib \
--enable-memory-profiling \
--with-readline \
--with-blas="-lopenblas" \
--disable-nls \
--without-recommended-packages \
## Build and install
&& make \
&& make install \
## Add a default CRAN mirror
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/local/lib/R/etc/Rprofile.site \
## Add a library directory (for user-installed packages)
&& mkdir -p /usr/local/lib/R/site-library \
&& chown root:staff /usr/local/lib/R/site-library \
&& chmod g+wx /usr/local/lib/R/site-library \
## Fix library path
&& echo "R_LIBS_USER='/usr/local/lib/R/site-library'" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib/R/library'}" >> /usr/local/lib/R/etc/Renviron \
## install packages from date-locked MRAN snapshot of CRAN
&& [ -z "$BUILD_DATE" ] && BUILD_DATE=$(TZ="America/Los_Angeles" date -I) || true \
&& MRAN=https://mran.microsoft.com/snapshot/${BUILD_DATE} \
&& echo MRAN=$MRAN >> /etc/environment \
&& export MRAN=$MRAN \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl')" >> /usr/local/lib/R/etc/Rprofile.site \
## Use littler installation scripts
&& Rscript -e "install.packages(c('littler', 'docopt'), repo = '$MRAN')" \
&& ln -s /usr/local/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/local/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/local/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## TEMPORARY WORKAROUND to get more robust error handling for install2.r prior to littler update
&& curl -O /usr/local/bin/install2.r https://github.com/eddelbuettel/littler/raw/master/inst/examples/install2.r \
&& chmod +x /usr/local/bin/install2.r \
## Clean up from R source install
&& cd / \
&& rm -rf /tmp/* \
&& apt-get remove --purge -y $BUILDDEPS \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& rm -rf /var/lib/apt/lists/* \
## Downloading and Installing RStudio Server
&& Rscript -e "install.packages(c('tidyverse', 'sparklyr'))" \
&& wget https://download2.rstudio.org/rstudio-server-$RSTUDIO_SERVER_VERSION-amd64.deb \
&& gdebi rstudio-server-1.1.383-amd64.deb --non-interactive \
&& echo "server-app-armor-enabled=0" | tee -a /etc/rstudio/rserver.conf \
&& echo "Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> ${R_HOME}/etc/Rprofile.site \
## Preparing default user for Rstudio Server
&& set -e \
&& useradd -m -d /home/rstudio rstudio \
&& echo rstudio:rstudio | chpasswd
CMD ["R"]
EXPOSE 8787

Просмотреть файл

@ -1,148 +0,0 @@
FROM jiata/aztk-base:0.1.0-spark2.2.0
ARG R_VERSION=3.4.1
ARG RSTUDIO_SERVER_VERSION=1.1.383
ARG BUILD_DATE
ENV BUILD_DATE ${BUILD_DATE:-}
ENV R_VERSION=${R_VERSION:-3.4.2} \
LC_ALL=en_US.UTF-8 \
LANG=en_US.UTF-8 \
TERM=xterm
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
bash-completion \
ca-certificates \
file \
fonts-texgyre \
g++ \
gfortran \
gsfonts \
libbz2-1.0 \
libcurl3 \
libopenblas-dev \
libpangocairo-1.0-0 \
libpcre3 \
libpng16-16 \
libtiff5 \
liblzma5 \
locales \
make \
unzip \
zip \
zlib1g \
libcurl4-openssl-dev \
libxml2-dev \
libapparmor1 \
gdebi-core \
lsb-release \
psmisc \
git \
libssl-dev \
sudo \
wget \
&& echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen \
&& locale-gen en_US.utf8 \
&& /usr/sbin/update-locale LANG=en_US.UTF-8 \
&& BUILDDEPS="curl \
default-jdk \
libbz2-dev \
libcairo2-dev \
libcurl4-openssl-dev \
libpango1.0-dev \
libjpeg-dev \
libicu-dev \
libpcre3-dev \
libpng-dev \
libreadline-dev \
libtiff5-dev \
liblzma-dev \
libx11-dev \
libxt-dev \
perl \
tcl8.6-dev \
tk8.6-dev \
texinfo \
texlive-extra-utils \
texlive-fonts-recommended \
texlive-fonts-extra \
texlive-latex-recommended \
x11proto-core-dev \
xauth \
xfonts-base \
xvfb \
zlib1g-dev" \
&& apt-get install -y --no-install-recommends $BUILDDEPS \
&& cd tmp/ \
## Download source code
&& IFS='.' read -r -a baseVersion << ${R_VERSION} \
&& curl -O https://cran.r-project.org/src/base/R-${baseVersion[0]}/R-${R_VERSION}.tar.gz \
## Extract source code
&& tar -xf R-${R_VERSION}.tar.gz \
&& cd R-${R_VERSION} \
## Set compiler flags
&& R_PAPERSIZE=letter \
R_BATCHSAVE="--no-save --no-restore" \
R_BROWSER=xdg-open \
PAGER=/usr/bin/pager \
PERL=/usr/bin/perl \
R_UNZIPCMD=/usr/bin/unzip \
R_ZIPCMD=/usr/bin/zip \
R_PRINTCMD=/usr/bin/lpr \
LIBnn=lib \
AWK=/usr/bin/awk \
CFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
CXXFLAGS="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g" \
## Configure options
./configure --enable-R-shlib \
--enable-memory-profiling \
--with-readline \
--with-blas="-lopenblas" \
--disable-nls \
--without-recommended-packages \
## Build and install
&& make \
&& make install \
## Add a default CRAN mirror
&& echo "options(repos = c(CRAN = 'https://cran.rstudio.com/'), download.file.method = 'libcurl')" >> /usr/local/lib/R/etc/Rprofile.site \
## Add a library directory (for user-installed packages)
&& mkdir -p /usr/local/lib/R/site-library \
&& chown root:staff /usr/local/lib/R/site-library \
&& chmod g+wx /usr/local/lib/R/site-library \
## Fix library path
&& echo "R_LIBS_USER='/usr/local/lib/R/site-library'" >> /usr/local/lib/R/etc/Renviron \
&& echo "R_LIBS=\${R_LIBS-'/usr/local/lib/R/site-library:/usr/local/lib/R/library:/usr/lib/R/library'}" >> /usr/local/lib/R/etc/Renviron \
## install packages from date-locked MRAN snapshot of CRAN
&& [ -z "$BUILD_DATE" ] && BUILD_DATE=$(TZ="America/Los_Angeles" date -I) || true \
&& MRAN=https://mran.microsoft.com/snapshot/${BUILD_DATE} \
&& echo MRAN=$MRAN >> /etc/environment \
&& export MRAN=$MRAN \
&& echo "options(repos = c(CRAN='$MRAN'), download.file.method = 'libcurl')" >> /usr/local/lib/R/etc/Rprofile.site \
## Use littler installation scripts
&& Rscript -e "install.packages(c('littler', 'docopt'), repo = '$MRAN')" \
&& ln -s /usr/local/lib/R/site-library/littler/examples/install2.r /usr/local/bin/install2.r \
&& ln -s /usr/local/lib/R/site-library/littler/examples/installGithub.r /usr/local/bin/installGithub.r \
&& ln -s /usr/local/lib/R/site-library/littler/bin/r /usr/local/bin/r \
## TEMPORARY WORKAROUND to get more robust error handling for install2.r prior to littler update
&& curl -O /usr/local/bin/install2.r https://github.com/eddelbuettel/littler/raw/master/inst/examples/install2.r \
&& chmod +x /usr/local/bin/install2.r \
## Clean up from R source install
&& cd / \
&& rm -rf /tmp/* \
&& apt-get remove --purge -y $BUILDDEPS \
&& apt-get autoremove -y \
&& apt-get autoclean -y \
&& rm -rf /var/lib/apt/lists/* \
## Downloading and Installing RStudio Server
&& Rscript -e "install.packages(c('tidyverse', 'sparklyr'))" \
&& wget https://download2.rstudio.org/rstudio-server-$RSTUDIO_SERVER_VERSION-amd64.deb \
&& gdebi rstudio-server-1.1.383-amd64.deb --non-interactive \
&& echo "server-app-armor-enabled=0" | tee -a /etc/rstudio/rserver.conf \
&& echo "Sys.setenv(SPARK_HOME ='"$SPARK_HOME"')" >> ${R_HOME}/etc/Rprofile.site \
## Preparing default user for Rstudio Server
&& set -e \
&& useradd -m -d /home/rstudio rstudio \
&& echo rstudio:rstudio | chpasswd
CMD ["R"]
EXPOSE 8787

Просмотреть файл

@ -1,25 +1,25 @@
# Docker
Azure Distributed Data Engineering Toolkit runs Spark on Docker.
Supported Azure Distributed Data Engineering Toolkit images are hosted publicly on [Docker Hub](https://hub.docker.com/r/jiata/aztk/tags).
Supported Azure Distributed Data Engineering Toolkit images are hosted publicly on [Docker Hub](https://hub.docker.com/r/jiata/aztk-base/tags).
## Versioning with Docker
The default image that this package uses is a the __aztk-vanilla__ Docker image that comes with **Spark v2.2.0**.
The default image that this package uses is a the __aztk-base__ Docker image that comes with **Spark v2.2.0**.
You can use several versions of the __aztk-vanilla__ image:
- Spark 2.2.0 - jiata/aztk-vanilla:0.1.0-spark2.2.0 (default)
- Spark 2.1.0 - jiata/aztk-vanilla:0.1.0-spark2.1.0
- Spark 1.6.3 - jiata/aztk-vanilla:0.1.0-spark1.6.3
You can use several versions of the __aztk-base__ image:
- Spark 2.2.0 - jiata/aztk-base:0.1.0-spark2.2.0 (default)
- Spark 2.1.0 - jiata/aztk-base:0.1.0-spark2.1.0
- Spark 1.6.3 - jiata/aztk-base:0.1.0-spark1.6.3
We also provide two other image types tailored for the Python and R users: __aztk-r__ and __aztk-python__. You can choose between the following:
- Anaconda3-5.0.0 (Python 3.6.2) / Spark 2.2.0 - jiata/aztk-python:0.1.0-spark2.2.0-anaconda3-5.0.0
- Anaconda3-5.0.0 (Python 3.6.2) / Spark 2.1.0 - jiata/aztk-python:0.1.0-spark2.1.0-anaconda3-5.0.0
- Anaconda3-5.0.0 (Python 3.6.2) / Spark 1.6.3 - jiata/aztk-python:0.1.0-spark1.6.3-anaconda3-5.0.0
- R 3.4.0 / Spark v2.2.0 - jiata/aztk-r:0.1.0-spark2.2.0-r3.4.0
- R 3.4.0 / Spark v2.1.0 - jiata/aztk-r:0.1.0-spark2.1.0-r3.4.0
- R 3.4.0 / Spark v1.6.3 - jiata/aztk-r:0.1.0-spark1.6.3-r3.4.0
- Anaconda3-5.0.0 (Python 3.6.2) / Spark 2.2.0 - jiata/aztk-python:0.1.0-spark2.2.0-python3.6.2
- Anaconda3-5.0.0 (Python 3.6.2) / Spark 2.1.0 - jiata/aztk-python:0.1.0-spark2.1.0-python3.6.2
- Anaconda3-5.0.0 (Python 3.6.2) / Spark 1.6.3 - jiata/aztk-python:0.1.0-spark1.6.3-python3.6.2
- [coming soon] R 3.4.0 / Spark v2.2.0 - jiata/aztk-r:0.1.0-spark2.2.0-r3.4.1
- [coming soon] R 3.4.0 / Spark v2.1.0 - jiata/aztk-r:0.1.0-spark2.1.0-r3.4.1
- [coming soon] R 3.4.0 / Spark v1.6.3 - jiata/aztk-r:0.1.0-spark1.6.3-r3.4.1
*Today, these supported images are hosted on Docker Hub under the repo ["jiata/aztk-vanilla/r/python:<tag>"](https://hub.docker.com/r/jiata).*
*Today, these supported images are hosted on Docker Hub under the repo ["jiata/aztk-base/r/python:<tag>"](https://hub.docker.com/r/jiata).*
To select an image other than the default, you can set your Docker image at cluster creation time with the optional **--docker-repo** parameter:
@ -29,7 +29,7 @@ aztk spark cluster create ... --docker-repo <name_of_docker_image_repo>
For example, if I am using the image version 0.1.0, and wanted to use Spark v1.6.3, I could run the following cluster create command:
```sh
aztk spark cluster create ... --docker-repo jiata/aztk:0.1.0-spark1.6.3
aztk spark cluster create ... --docker-repo jiata/aztk-base:0.1.0-spark1.6.3
```
## Using a custom Docker Image